Document Structure
Mark up document structure

A markup language like HTML wraps standard document elements in identifying “tags” that describe the meaning of each element, such as titles, headings, paragraphs, lists, tables, links, addresses, citations, and quotes. The resulting document is “machine-friendly”—it can be read and interpreted by software. When a document has structural markup, software can make use of it by displaying the document title in the browser title bar, by providing a list of links or headings, and so on. Screen reader software can use structure to modulate tone by reading headings more slowly than main text, or by reading links using a different voice. Search engines can index a structured document more accurately than a plain text document because phrases marked as headings help the software determine the document’s subject and primary focus.

Many of today’s Web documents do not contain structural markup, or they make use of only the most basic tags: TITLE, BODY, maybe a P or two. Many documents do contain structural markup, but for visual purposes, such as BLOCKQUOTE for margins and TABLE for page layout. Most other markup is presentation markup: tags that describe the visual attributes of page elements. These include such tags as BR for line breaks, FONT for setting type size and typeface, B for bold, and I for italic.

On the surface, a nonstructured document may look no different than a structured one. Whether a designer marks paragraphs with a P or two BRs (line breaks) is not visually apparent. However, the logical structure underlying a well-structured document adds a layer of meaning that gives power and utility to the Web. Software can read text documents; with structured text documents, software can both read and derive meaning. A truly interconnected Web requires documents that can be cataloged and connected by software. To do this well, software needs structure.

Take the title of this book. <i>Access by Design</i> is visually identifiable as a book title because it is italicized and uses title case—two conventions that denote book titles. However, software cannot recognize the phrase as a title because I means italics—nothing more. On the other hand, <cite>Access by Design</cite> is universally identifiable as a book title because the HTML tag CITE is used to denote citations. When a book title is tagged for structure, software can do useful things, such as scan the Web for all instances where the book is cited in other Web documents. In this case, instances marked with I would fall through the cracks.

To build structured documents, encode content using structural markup. Identify page sections—header, navigation, content, footer—and the elements contained within the sections—headings, paragraphs, lists, and tables. Instead of thinking about what each element should look like, think about what each element is, and tag it using the appropriate HTML structural tag (Figure 2.4). Avoid meaningless tags, such as FONT, BR, B, and I, and do not misuse structural tags for presentation purposes, such as tagging paragraphs with the BLOCKQUOTE tag to create margins. When it comes time to think about visual design, turn to CSS to define the appearance of structural elements.

Figure 2.4: Table of common structural tags.
Element Usage
h1, h2, h3, h3, h5, h6 Headings
p Paragraphs
blockquote Quoted text
ul, ol Unordered and ordered lists
table, th, tr, td Tabular information
em, strong Emphasized words and phrases
cite Citations (e.g., book titles)
abbr, acronym Abbreviations and acronyms