Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp.com.

Notice: This material is excerpted from Running A Perfect Web Site with Apache, ISBN: 0-7897-0745-4. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

Chapter 08 - Basic htmL: Understanding Hypertext

The whole point of setting up a Web site is so that users can access the information you place on the site. Publishing documents on the Web requires them to be prepared in HyperText Markup Language (htmL), a page description language with provisions for linking related documents together. It is a simple, text-based language that you can view in a variety of fonts on any platform. You can use it with text-only clients, such as Lynx, on a VT220 terminal or with fully graphical clients, such as Mosaic, on advanced graphical workstations.

The present version of htmL, otherwise known as htmL 2.0, is the most commonly used version. Most clients support htmL 2.0, but a few, such as Netscape Navigator and Microsoft Internet Explorer, support additional features, such as blinking text or background sounds, that are not in any htmL specification. Most of this chapter covers standard htmL. Future standards for htmL will include features such as style sheets, tables, support for embedding objects, and possibly a framework for implementing experimental features. The future of htmL promises to make strides toward a universal document format that is both compact and rich in formatting.

In this chapter, you will learn the basics of htmL, including:

  • The basic structure of an htmL document

  • How to format text into headings, paragraphs, and lists

  • The difference between physical and logical styles and how to apply these styles in your documents

  • The gif and JPEG graphical formats

  • How to place inline images in a document

  • How to set up hypertext and hypergraphic links to other documents

htmL Fundamentals

Before charging right into the htmL tutorial, it is helpful to review some introductory remarks on htmL to give you a sense of what it is, where it came from, and where it is heading.

History of htmL

htmL is an application of the Standard Generalized Markup Language (SGML). SGML arose out of the international standards community to meet the need for "structured" content, which could be validated algorithmically. In other words, SGML is an open, standards-based (ISO8879) language for describing document languages, and describes what their structure is (what "tags" can go in other "tags"). This definition occurs in a Document Type Definition, otherwise known as a DTD. When Tim Berners-Lee first started using htmL, it was not defined using a DTD, but thanks to the work of Dan Connolly and many others, its syntax and format were regularized and a DTD was created. Oftentimes, keeping htmL pure to its SGML background can be a challenge, but the benefits of this to the publishing community are too large to ignore.


Refer to Que's Special Edition Using SGML for more information.

The first version of htmL, htmL 0, was developed at CERN in 1990 and is largely out of use today. htmL 1.0 incorporated inline images and text styles (highlighting) and was the version of htmL used by most of the initial Web browsers. htmL 2.0 is the current standard. The future of htmL is being decided by vendor-sponsored groups like the World Wide Web Consortium (http://www.w3.org/), or in volunteer standards groups like the IETF (http://www.ietf.org/).

htmL Tags

An htmL document is simply the informational text of the document with structural tags embedded in the text. These tags are character sequences that begin with a less-than sign (<) and end with a greater-than sign (>). Tags can be used to, among other things, apply a style to text, insert a line break, or place an image in the document. To the "purist," a tag signifies a structure - you're not just saying "make this phrase really big by putting an <H1> around it," you're saying "this is a first-level heading in my document." The idea is similar to older word processors and page layout systems that require insertion of formatting tags to specify bold, underlined, or italicized type. Newer word processors use the same premise, but usually hide these tags from the user. Some word processors, however, allow you to display the formatting tags - WordPerfect, for example, provides you with the Reveal Codes menu option.

For a look at some htmL, first consult figure 8.1, which shows the World Wide Web (W3) Consortium's home page (http://www.w3.org/). Choose the Document Source option from Netscape Navigator's View menu to activate a window with the htmL source loaded. The htmL source corresponding to figure 8.1 is shown in figure 8.2.

Fig. 8.1 - The W3C home page as displayed by the Netscape Navigator.

Fig. 8.2 - Netscape allows you to view the htmL source of the document in the browser window.

Viewing the source code of a document is a great way to learn htmL, but you should be aware that not all browsers have this feature. In addition to differences in features, you should also know that different browsers often display the same page in different ways. Figure 8.3 shows the W3C home page in Lynx, a text-only browser. Notice how the elements pointed out in figures 8.1 and 8.2 are rendered differently in Lynx.

You should also be extremely cautious about simply learning by example; while sometimes someone is able to get an interesting effect with a particular tag combination, sometimes this combination is illegal by the specifications. Even though it might look all right in the browser you are using, other browsers may not be able to handle it at all, even if they completely conform to the spec. When in doubt, consult the specs.

Fig. 8.3 - The W3C home page as displayed by Lynx.

The differences in browser rendering are not a significant problem with the basic htmL formatting tags, but they can be an issue when your documents contain more advanced htmL, particularly those tags that are extensions to htmL supported by only a few browsers. This points to an important challenge in creating Web documents: how to incorporate the advanced features while not breaking browsers that can't render those features. As you read this chapter and the next, note the suggestions for writing browser-friendly htmL. Following these suggestions will make your documents accessible to the largest audience possible.

Platform Independent

Most of htmL's formatting features specify logical rather than physical styles. For example, the heading tags, which normally indicate larger font sizes, do not specify which size to use. Instead, a browser chooses a size for the heading that is larger than its default text size. This allows Macs to view files written on PCs and served by UNIX boxes. This also allows clients like Lynx to render the important text in all caps, if it can't handle changing the font size or color. Even though you can't control the exact font and size with logical structures, it's best to leave it up to the client to handle that logical-to-presentational formatting, since only the client understands best its own rendering limitations.

Three Basic Rules

In spite of the differences between them, Web browsers do consistently follow three rules when parsing htmL. These are:

  • White space is ignored

  • Tags are not case-sensitive

  • Most tags occur in pairs

White Space Ignored

The fact that browsers ignore white space is often a source of frustration for the beginning htmL author. Consider the following htmL:

<TITLE>Our Mailing Address</TITLE>
Que Corporation
201 West 103rd Street
Indianapolis, IN  46290-1097
The address looks fine on the page, but notice how NSCA Mosaic renders it in figure 8.4.

Fig. 8.4 - Carriage returns in the htmL source code don't translate to carriage returns on the browser screen.

Mosaic tries to display the address all on one line! The carriage returns in the file, which make the address look fine in an editor or on a printout, are ignored by the browser. The same is true of other white space characters like tabs and extra spaces. In the htmL above, there are two spaces between IN and 46290-1097, but only one space between them in the browser window. The second space character is ignored.

Formatting Tags Are Not Case-Sensitive

You can write all htmL formatting tags in upper-, lower-, or mixed case. For example, browsers interpret <TITLE>, <title>, and <Title> the same way.

Most Formatting Tags Occur in Pairs

With only a few exceptions, htmL formatting tags occur in pairs in which the beginning tag activates an effect and the ending tag turns off the effect. Tag pairs are often called container tags, since the effects they turn on and off are applied to the text they contain. For example, to specify that a line of text appears in bold, you write:

<B>This text will appear in bold.</B>
The ending tag in the pair is always preceded by a slash. Among the basic htmL tags, those that do not have a companion ending tag include: <BASE> (base information), <BR> (line break), <HR> (horizontal rule), and <IMG> (image).

Uniform Resource Locators (URLs)

While not directly related to htmL, Uniform Resource Locators (URLs) are an important part of htmL documents used in many different tags. For this reason, a quick primer on URLs is in order.

A URL is basically the address of a document on the World Wide Web. The URL is a way of compactly identifying any document on any type of Web-compatible server anywhere in the world. The URL consists of four parts: an "access scheme," Internet address, port, and object. With the exception of the "news" and "mailto" access schemes, the general format for a URL is as follows:

access-scheme://internet_address:port/object

In addition, you can optionally specify search or query information after the object when sending data to a search or script. This is covered in Chapter 12, "htmL Forms."

Access Scheme

The access scheme indicates what type of Internet application is requested. Usually an "access scheme" maps directly to an Internet protocol, as is the case with "http," but not always. NNTP for example has both "nntp" and "news" schemes. In order to use a given protocol, both the client (browser) and Internet server must be able to speak that protocol. The most common protocol in Web documents is "http" (HyperText Transfer Protocol), which is spoken by all Web servers and clients. In addition, almost all browsers support FTP, Gopher, Telnet, and News. Some also support WAIS. Some fictional examples of URLs using these protocols follow:

http://webwise.walcoff.com/frontier/pick.html

ftp://ftp.fedworld.gov/pub/irs-pdf/form1040.pdf

gopher://gopher.government.gov/reports/census.txt

telnet://loc.gov

news:sci.psychology.clinical

mailto:info@netscape.com


The news URL is substantially different than the others because it does not specify an Internet address or file name. Instead, it simply names a newsgroup. The name of the news server must be made known to the browser when you initially configure the browser.


Where To Get News

To read Internet news through a Web browser, you have to be able to connect to a news server, which continually receives messages over the Internet and stores them locally for a short time (usually about two weeks). Newsfeeds cost money, and for this reason, no news servers are publicly available on the Internet. If your site wishes to take full advantage of Internet news, you must obtain a newsfeed from your Internet service provider or obtain authorization to connect to your provider's news server.

The mailto: URL allows you to send electronic mail to the specified address directly from your browser. The mailto: URL is supported by Netscape, Lynx, and others, but it isn't supported by all browsers.

Address

The address portion of a URL is simply the hostname or IP (Internet Protocol) number of an Internet server. This address can be either the familiar named dot notation (like ftp.ncsa.uiuc.edu) or a number sequence (like 127.0.0.1).

Port

The port is an optional URL element. If the port is omitted, the default port for the specified protocol is assumed. In the case of HTTP, this is 80.

File Name

The document path, or file name, is the same as that used by DOS and UNIX systems alike, although the slash is forward (/) rather than backward (\) for DOS users. Each slash goes down to the next subdirectory having the specified name, and the path ends in a file name with an extension (such as TXT or htmL). It is also possible to specify a path to an entire directory simply by ending with the directory name and a trailing slash (/). For example, to see the contents of the fruits directory on an FTP server, you can use:

ftp://ftp.healthy.com/fruits/

A URL that specifies a protocol, Internet address, and file name is said to be an absolute URL. In some cases, it is also possible to specify one URL relative to another, resulting in a relative URL. For example, suppose your base URL is http://www.healthy.com/fruits/citrus/tarty_fruits.html and you need to specify the URL of the file intro.html located in the fruits directory (one directory level up from citrus). You can do this with the absolute URL http://www.healthy.com/fruits/intro.html, but it can also be appropriate to give the URL relative to the base URL. In this case, the relative URL would be "../intro.html." The two dots followed by a forward slash (../) are an indicator to move up one directory level. If you need to specify the URL of the file "lemonade.html" in the lemons directory (a subdirectory of the citrus directory), you can use the relative URL "lemons/lemonade.html."


The base URL for a document is specified in the <BASE HREF="base_url"> tag. If this tag is not present, then the base is determined by the browser by whatever URL it used to access the document. <BASE> tags are not mandatory. This tag is discussed in the Document Structure portion of the next section.

General htmL Style

While you are generally free to write htmL any way you want, there are a few issues of style to keep in mind. If you're just starting out, take these style issues to heart and develop good authoring habits from the onset. If you've been writing htmL for a while and have perhaps "forgotten" about some of the aspects of good style, this is a great time to remind yourself of them and work them back into your documents.

Uppercase Tags

While it is true that htmL tags are not case-sensitive, it is a good idea to always make them all uppercase. Remember that tags are embedded in other text and this can make them difficult to read when writing or editing htmL. Tags that are all uppercase stand out much better in a sea of text.

Remember, though, that URL's are case sensitive.

Document Structure

It used to be that a discussion of htmL document structure would be right at the beginning of an htmL tutorial. However, since most browsers can still parse an htmL file without the structure-defining tags, many authors have fallen out of the habit of including these tags in their documents and their inclusion becomes an issue of style. Good htmL style suggests that you always include tags to define the major parts of your documents. The three major parts are:

  • The htmL declaration

  • The document head

  • The document body

The htmL Declaration

The htmL declaration is simply accomplished by making the <htmL> tag the first thing in your file and making the </htmL> tag the last thing in your file. These container tags say "Everything between us is htmL code."

The Document Head

The document head should immediately follow the <htmL> tag and is contained in the <HEAD> ... </HEAD> tag pair. The document head contains information about the document that is typically transparent to the user. While many informational items can be specified in the document head, the two that you should always include are the title and the base URL of the document.

The document's title is designated with the <TITLE> ... </TITLE> tag pair. You should make your titles descriptive, while still keeping them fairly short. A forty character title is a good rule of thumb. Document titles typically appear at the top of the browser window (refer to fig. 8.1). They are also used in bookmark files.


In the absence of a specified title, the URL of the document is displayed at the top of the browser window and in bookmark files. URLs aren't as descriptive to users as titles are, so always be courteous to your users and include a title.

The base URL of the document is given in the <BASE HREF="base_url"> tag. You really only need to set this if you anticipate someone arriving at your page through a URL other than the one on which you wish to base relative URL links.

The Document Body

The document body immediately follows the head and is enclosed in the <BODY> and </BODY> tags. The body contains all of the information that will be presented to the user and the tags used to format that information.

Putting these three parts of the document together, you a basic template for an htmL document (see listing 8.1)

Listing 8.1 htmL Document Template

<htmL>
<HEAD>
<TITLE>Document Title</TITLE>
</HEAD>
<BODY>
Information and formatting commands
</BODY>
</htmL>
Many htmL editing programs make this basic template available to you when you create a new document. If you're using a word processor or a simple text editor to write htmL, you can probably create and store this template easily. In either case, there's no reason not to include the structure-defining tags.

Getting Started

To start writing htmL, all you really need is an editor that allows you to save files in ASCII format and a browser to test your documents. If you plan to include images in your documents, you'll need a graphics program as well.

Editor

On UNIX, many people will claim that the best editors are the same editors people on UNIX have been using for a long time, namely, Emacs and vi. vi is a very simple text editor. Crafted for an era of low memory requirements and small feature sets, "vi" is relatively easy to use but not incredibly full-featured. Emacs, on the other hand, is a very full-featured application. It has a built-in LISP interpreter; one particularly relevant Emacs-LISP module that has been created is the "htmL-Mode" module. Not only will it automatically give you all the default elements of an htmL document when you edit a new file named ".html," it also colors different tags and structural elements, making it very easy to see the difference between an <H1> tag section and an <A> section. More information on these will be provided later.

Browser

You only really need one browser to test your documents, but it's a good idea to look at your htmL files in two or three browsers to make sure your code is as browser-friendly as possible. It's easy to get a copy of the popular browsers. NCSA Mosaic 2.0 and Netscape Navigator 2.0 are available for public download on Mosaic (ftp://ftp.ncsa.uiuc.edu/Mosaic/) and Netscape's FTP (ftp://ftp.netscape.com/) sites. A browser that actually implements more of the future htmL features is Arena (http://www.w3.org/pub/WWW/Arena/), an experimental browser developed and maintained as a reference software piece by the W3C. It should be noted that UNIX only accounts for about 15 percent of the browser market as of this writing, so to really test your pages, it would be wise to check them out on Windows and Mac browsers as well.

htmL Tutorial

With the preliminaries covered, you're now ready to learn the basic htmL tags. All of the tags discussed in this section are found in the document body (between the <BODY> and </BODY> tags) and fall into several categories:

  • Paragraphs and line breaks

  • Heading styles

  • Physical styles

  • Logical styles

  • Lists

  • Special characters

  • Horizontal lines

  • Images

  • Hypertext and hypergraphics

Paragraphs and Line Breaks

The <P> tag is used to indicate the start of a new paragraph. Paragraphs are separated by a blank line. To start a new paragraph without the extra line of separation or to just move to the next line, use the <BR> tag (line break). Line breaks were needed back in figure 8.4 to render an address properly. Figure 8.5 shows the difference between paragraphs and line breaks. Listing 8.2 shows the corresponding htmL.

Listing 8.2 htmL for Figure 8.5

<P>Que is the premiere publisher of Internet-related books.
Be sure to visit our Web site at http://www.mcp.com/que/
for more information.
<P>Our mailing address is:
<P>Que Corporation<BR>
201 West 103rd Street<BR>
Indianapolis, IN  46290-1097
Fig. 8.5 - Paragraphs and line breaks help to offset sections of a document.

Heading Styles

htmL supports six heading styles, which are used to make text stand out by varying degrees. These are numbered one through six, with one being the largest. To format text in a heading style, enclose it in the <Hn> and </Hn> tags, where n is the number of the heading style you want to apply. Figure 8.6 shows how the six heading styles are rendered in Microsoft Internet Explorer by default. The corresponding htmL is shown in listing 8.3.

Listing 8.3 htmL for Figure 8.6

<H6>Heading Style 6</H6>
<H5>Heading Style 5</H5>
<H4>Heading Style 4</H4>
<H3>Heading Style 3</H3>
<H2>Heading Style 2</H2>
<H1>Heading Style 1</H1>
Fig. 8.6 - Headings are used to name and separate sections of a document.


In addition to changing the size of the text and making it boldface, applying a heading style adds some white space above and below the line containing the heading.

Physical Styles

Physical styles are actual attributes of a font, such as bold or italic. htmL supports the four physical styles shown in table 8.1. To apply a physical style, simply place the text to be formatted between the appropriate tag pair shown in the table.

Table 8.1 Physical Styles in htmL
NameTag
Bold<B>...</B>
Italics<I>...</I>
Underline<U>...</U>
Typewriter (fixed-width)<TT>...</TT>


According to the htmL specification, browsers are not required to support any text styles. Do not assume that any given style is available in all browsers. In many browsers, for example, the underline style is reserved for displaying hyperlinks. These browsers will ignore the <U> and </U> tags, as shown in figure 8.7.

Fig. 8.7 - Physical styles are used to render text in boldface, italics, or a fixed width. The underline style is frequently not supported.


The htmL specification allows nesting of physical text styles, though not all browsers support this. For example, "<B>Hello, <I>brown</I> cow</B>" makes sense, but be careful, because something like "<B>Hello, <I>brown</B> cow</I>" does not, and may cause some browsers to crash.

Logical Styles

Logical styles indicate the meaning of the text they mark in the context of the document. Since they are not related to font attributes, logical styles can be rendered differently on different browsers. Table 8.2 lists the common logical styles and their meanings and typical renderings. Closing tags are required for all logical styles, but have been omitted in the table to save space. To create a closing tag, just add a slash before the tag name, like </ADDRESS>.

Table 8.2 Logical Styles in htmL
Style NameTagTypical Rendering
Address<ADDRESS>Italics
Block quote<BLOCKQUOTE>Left and right indent
Citation<CITE>Italics
Code<CODE>Fixed-width font
Definition<DFN>Bold or bold italics
Emphasis<EM>Italics
Keyboard<KBD>Fixed-width font
Sample<SAMP>Fixed-width font
Strong<STRONG>Bold
Variable<VAR>Italics

Figure 8.8 shows how Netscape renders many of the logical styles. Listing 8.4 shows the corresponding htmL.

Listing 8.4 htmL for Figure 8.8

<H1>Logical Styles</H1>
According to <CITE>Corporate Manual of Style</CITE>,
you <EM>must</EM> include your
<VAR>e-mail address</VAR> below the signature block
of your business letters. Specifically:
<BLOCKQUOTE>Employees with electronic mail addresses
<STRONG>must</STRONG> include them in the signature block.
For example:<BR>
Mary Simpson<BR>
Account Representative<BR>
<ADDRESS>msimpson@abc_corp.com</ADDRESS>
</BLOCKQUOTE>
Fig. 8.8 - The logical styles, shown here in Netscape, describe the meaning of marked up text as it relates to the document.


While some browsers allow it, nesting logical styles often does not make sense. For example, why would you ever put a block quote inside keyboard input?


Physical versus Logical Styles

As you look at the typical renderings in table 8.3, you probably noticed that you can accomplish almost all of them by using the physical styles. If you did notice, you're likely asking "Why should I use the logical styles?" An "official" answer is: to give a contextual meaning to the text that you're marking up. Formatting doesn't really matter with the logical styles; it's the meaning they impart that is important. Such an official answer would come from a person who subscribes to the school of thought that htmL is a page-description language only.

Authors who use htmL as a design tool are likely to cast aside such official responses and just use the physical styles to get the same effect. After all, it is easier to type <I>info@abc_corp.com</I> than it is to type <ADDRESS>info@abc_corp.com</ADDRESS>.

The decision to use physical styles, logical styles, or both ultimately rests with each author, based on his or her take on whether htmL is for page description or page design.

Preformatted Text

Text tagged with the <PRE> and </PRE> tags is treated as preformatted text and rendered in a fixed-width font. Since each character in a fixed-width font has the same width, it is easy to line up text into columns and produce a table. Listing 8.5 produces the table you see in figure 8.9.

Listing 8.5 htmL for Figure 8.9

<H1>Preformatted Text</H1>
<PRE>
User Name              Login ID         Disk Space
---------------        --------         ----------
Terri Johnson          tjohnson           15 MB
Fred Hansen            fredh              15 MB
Pat Norton             pnorton            20 MB
</PRE>
Fig. 8.9 - Preformatted text is rendered in a fixed-width font and includes extra white space characters, making it easy to create tables.


Extra spaces, tabs, and carriage returns inside the <PRE> and </PRE> tags are not ignored.


Before you make all of your tables with preformatted text, you should look into the table tags proposed for future versions of htmL. Many browsers, such as Netscape and Mosaic, already support these tags.

Lists

htmL lists provide an easy and attractive way to present information in your documents. All lists require a pair of tags for the type of list and for each list item. Table 8.3 lists three types of formatted lists.

Table 8.3 Formatted Lists in htmL
TypeList TagItem Tag(s)
Ordered<OL>...</OL><LI>...</LI>
Unordered<UL>...</UL><LI>...</LI>
Description<DL>...</DL><DD>...</DD>,<DT>...</DT>

Items in an ordered list are automatically numbered by the browser, starting with the number one. The automatic numbering is convenient, because it spares you from having to do it if you rearrange list items. Unordered list items are bulleted rather than numbered. Description lists allow you to present a term, followed by a description below and indented under the term.


Description lists are sometimes called definition lists since they are useful in presenting the term/definition structure of a glossary.

List items in all three list types are indented from the left margin, making it easy to distinguish them from the rest of the body text.

Figure 8.10 shows examples of unordered, ordered, and description lists as produced by listing 8.6.

Listing 8.6 htmL for Figure 8.10

<H2>Unordered Lists</H2>
<UL>
<LI>Bulleted list items</LI>
<LI>List items are indented</LI>
</UL>
<H2>Ordered Lists</H2>
<OL>
<LI>Numbered list items</LI>
<LI>List items are indented</LI>
</OL>
<H2>Description Lists</H2>
<DL>
<DT>First term</DT>
<DD>Description of first term</DD>
<DT>Second term</DT>
<DD>Description of second term</DD>
</DL>
*** Insert 09fig10.pcx ***

Fig. 8.10 - Unordered, ordered, and description lists provide an easy way to break out information.


Many browsers will "forgive" you if you leave off the </LI> tag at the end of a list item. The next <LI> tag is enough to tell the browser to end the current list item and start a new one. However, browsers won't forgive you if you leave off a </DT> or a </DD> tag, so don't forget them.

You can nest lists inside of other lists, as shown in figure 8.11. Listing 8.7 shows the htmL to produce this figure.

Listing 8.7 htmL for Figure 8.11

<H1>Nested Lists</H1>
<UL>
<LI>Basic htmL</LI>
<OL>
<LI>Text formatting</LI>
<LI>Graphics</LI>
<LI>Hyperlinks</LI>
</OL>
<LI>Advanced htmL</LI>
<OL>
<LI>htmL 2.0</LI>
<LI>htmL 3.0</LI>
<LI>Netscape Extensions</LI>
</OL>
</UL>
Fig. 8.11 - Nesting ordered lists inside an unordered list lets you create an outline structure.

Special Characters

Because many characters have special meanings in htmL, it is necessary to use special character sequences when you want special characters to show up as themselves. You can also use special character sequences to produce foreign language characters and symbols. These are referred to as SGML entities.

Reserved Characters

Because the less than (<), greater than (>), and quotation mark (") characters are used in htmL formatting tags, the characters themselves must be represented by special character sequences. The ampersand (&) is used in these special sequences, so it also must be represented differently. Table 8.4 lists all the special character sequences in htmL. The semicolon (;) is necessary to indicate where the character description ends and normal text resumes.

Table 8.4 Special Character Sequences for htmL Reserved Characters
SequenceAppearanceMeaning
<<Less than
>>Greater than
&&Ampersand
""Quotation mark

If you're writing htmL code to produce htmL code on a browser screen, you will use the sequences in table 8.4 frequently. For example, to produce a list of the physical style tags, you would need to use the htmL shown in listing 8.8.

Listing 8.8 htmL for Producing a List of Physical Style Tags

<H2>htmL Physical Style Tags</H2>
<UL>
<LI><B> ... </B></LI>
<LI><I> ... </I></LI>
<LI><U> ... </U></LI>
<LI><TT> ... </TT></LI>
</UL>
The resulting screen is shown in figure 8.12.

Fig. 8.12 - Writing htmL to produce on-screen htmL requires the use of special character sequences.

Foreign Language Characters

htmL uses the ISO-8859-Latin1 character set, which includes foreign language characters for all Latin-based languages. Since these characters are not on most keyboards, you need to use special character sequences to place them in your documents. Like the other special character sequences in htmL, these sequences begin with an ampersand (&) followed by a written-out description of the character and a semicolon (;). Table 8.5 lists all the foreign-language sequences available.

Table 8.5 Foreign Language Characters in htmL
CharacterSequence
Æ,æ&Aelig;,&aelig;
Á,á&Aacute;,&aacute;
Â,â&Acirc;,&acirc;
À,à&Agrave;,&agrave;
Å,å&Aring;,&aring;
Ã,ã&Atilde;,&atilde;
Ä,ä&Auml;,&auml;
Ç,ç&Ccedil;,&ccedil;
Ð,ð&ETH;,&eth;
É,é&Eacute;,&eacute;
Ê,ê&Ecirc;,&ecirc;
È,è&Egrave;,&egrave;
Ë,ë&Euml;,&euml;
Í,í&Iacute;,&iacute;
Î,î&Icirc;,&icirc;
Ì,ì&Igrave;,&igrave;
Ï,ï&Iuml;,&iuml;
Ñ,ñ&Ntilde;,&ntilde;
Ó,ó&Oacute;,&oacute;
Ô,ô&Ocirc;,&ocirc;
Ò,ò&Ograve;,&ograve;
Ø,ø&Oslash;,&oslash;
Õ,õ&Otilde;,&otilde;
Ö,ö&Ouml;,&ouml;
ß&szlig;
Þ,þ&THORN;,&thorn;
Ú,ú&Uacute;,&uacute;
Û,û&Ucirc;,&ucirc;
Ù,ù&Ugrave;,&ugrave;
Ü,ü&Uuml;,&uuml;
Ý,ý&Yacute;,&yacute;
ÿ&yuml;

Characters by ASCII Number

You can reference any ASCII character in an htmL document by including the ampersand (&) and pound sign (#) followed by the character number in decimal and a semicolon (;). For example, to include the copyright symbol (©) in an htmL document, you write:

Copyright &#169;, 1996
However, this is dangerous because you cannot guarantee that the character set mapping will always be US-ASCII or Latin1. For example, a friend of mine was putting chemical information on the Web, and one synthesis involved heating a compound to "270° C." Unfortunately, one particularly incompetent browser decided to render "°" as a "0" instead of a "degree" sign, so his formula ended up asking to heat the compound to 2700 degrees C!

You can find more information on SGML entities, character sets, and more at http://www.bbsinc.com/iso8859.html.

Comments

It is possible to include comment lines in htmL that do not show up in browsers. You should consider placing comments in documents that you and others will be working on together. Many stand-alone htmL editors provide templates that include a comment area for information like the author's name and the date the document was last changed. The format for a comment is as follows:

<!-- Everything in here is part of the comment. -->
This is going to sound extremely bizarre, but for the purposes of compliance with SGML parsing rules, the number of "--" segments in the comment must be an even number, while "-" by itself can appear as often as it likes.


Server-side include commands embedded in htmL use the same character sequence as comments. This is so that the server-side include commands do not show up even when a server does not support server-side includes. More information about server-side includes is available in later chapters.

Non-breaking Space

You can prevent a browser from breaking a line between two words by inserting a non-breaking space between the words. Non-breaking spaces are represented by the special character sequence  .


Non-breaking space characters can also be used to put in extra white space where you need it. A browser ignores the last two spaces in a sequence of three space characters, but it does print three spaces if you use    .

Horizontal Lines

Horizontal lines are a great way to break up sections of text-intensive documents. Placing a horizontal line is easy: just put an <HR> ("horizontal rule") tag in where you want the line to go. No closing tag is required.

Images

Without the visual appeal of inline images, it is doubtful that the World Wide Web would have become as popular as it has so rapidly. Graphical Web browsers such as Netscape Navigator, Mosaic, and Microsoft Internet Explorer can automatically display images in both the gif and JPEG formats inside documents.

Graphics Formats: gif and JPEG

gif (Graphics Interchange Format) was originally developed for users of CompuServe as a standard for storing image files. Graphics stored in the gif format are limited to 256 colors.

gif supports two desirable Web page effects. The first is interlacing, in which non-adjacent parts of the image are stored together. As a browser reads in an interlaced gif, the image appears to "fade in" over several passes. The other effect supported by the gif format is transparency. In a transparent gif, one of the colors is designated as transparent, allowing the background of the document to show through.


Transparent gifs

A frequently asked question on the World Wide Web newsgroups is: "How can I create transparent gifs?" Both UNIX and Windows users can use a program called giftrans to create transparent gifs from existing images. Another useful tool for this purpose is "giftool." Pointers to both are available from http://melmac.harris-atd.com/transparent_images.html.

JPEG (Joint Picture Experts Group) refers to a set of formats that supports full color images and stores them in a compressed form. Most popular graphical browsers currently display JPEG images, though previously these images had to be viewed in a separate program. The progressive JPEG format, which has recently emerged, gives the effect of an image fading in just as an interlaced gif would. Transparency is not possible with JPEG images because the compression tends to make small changes to the image data. If a pixel originally colored with the transparent color is given another color, or if a non-transparent pixel is assigned the transparency color, the on-screen results would be dreadful.


As a general rule, you should use JPEG for color photos so you can harness its full color capabilities. Other graphics and illustrations should be stored in the gif format.

The <IMG ...> Tag

You must save images as separate files even though they are referenced and displayed inside an htmL document. To place an inline image on a page, you use the <IMG ...> tag.

Syntax: <IMG SRC="URL">

Inline images always aligned flush left, although future versions of htmL may allow centering and flush right alignment. For example, to place the World Wide Web Consortium's logo next to its name on its home page (refer to fig. 8.2), the htmL looked like:

<H1><IMG ALIGN=MIDDLE ALT="W3C"
SRC="Icons/WWW/w3c_96x67.gif">
The World Wide Web Consortium</H1>
The SRC attribute, which is mandatory, specifies the URL of the image file. Because URLs can point anywhere, you can reference images on remote servers as well as your local server. Browsers can load images from a server running any protocol supported by the browser, including FTP and Gopher. You can modify the <IMG ...> tag by several other attributes as well (see table 8.6).


Because browsers can load images from any server on the Internet, browsers establish separate server connections for each image in a document, even if all images are on the same server. For small images, it takes more time to establish the connection than to transfer the image data. Therefore, avoid numerous small images. This is largely fixed using persistent connections in HTTP/1.1.

Table 8.6 IMG Tag Attributes
AttributeDescription
ALIGN={TOP|MIDDLE|BOTTOM}Location of text next to image
ALT="text"Text to show instead of image
ISMAPUsed to make imagemaps

The ALIGN attribute controls the location of text that follows the image. By default, text appears at the bottom of an inline image. Figure 8.13 shows how you can use the ALIGN attribute to change the text to be aligned with the middle or top of the image. Specifically, ALIGN=MIDDLE aligns the baseline of the text with the middle of the image and ALIGN=TOP aligns the top of the text with the top of the image. Listing 8.9 shows the htmL for this figure.

Listing 8.9 htmL for Figure 8.13

<IMG SRC="/images/w3c.gif" ALIGN="MIDDLE">
The World Wide Web (W3) Consortium
<HR>
<IMG SRC="/images/w3c.gif" ALIGN="TOP">
The World Wide Web (W3) Consortium
Fig. 8.13 - The ALIGN attribute lets you align text with the middle and top of an image.

The ALT attribute specifies alternate text to be shown in place of an image in text-only browsers. Including the ALT attribute tag is a courtesy to dial-up and dumb terminal users; don't overlook this courtesy. Also, graphical browsers sometimes fail to load an image, in which case they use the text specified by ALT instead. For example, to include text-only support in the previous example, the line would look like this:

<IMG SRC="/images/w3c.gif" ALIGN="TOP" ALT="W3C Logo">
The World Wide Web (W3) Consortium
In Lynx, this line would appear as:

[W3C Logo]The World Wide Web (W3) Consortium
ISMAP is a stand-alone attribute that signifies that the image is to be used as an imagemap. Imagemaps are discussed in chapter 11, "Graphics and Imagemaps."


Two Netscape extensions to the <IMG ...> tag that bear an early introduction are WIDTH and HEIGHT. These attributes are set equal to the width and height of the image in pixels. The advantage of doing this is that it allows the browser to leave an appropriately-sized space for the image as it lays out the page. Thus, page layout is finished quickly, without having to wait for the image to load completely so that the browser can determine its size. Use of these tags is strongly recommended.

Hypertext and Hypergraphics

Now to the other half of the HyperText Markup Language - the hypertext part. A hypertext reference is very simple. It consists of only two parts: an anchor and a URL. The anchor is the text or graphic that the user clicks to go somewhere. The URL points to the document that the browser will load when the user clicks on the anchor.

In htmL, an anchor can be either text or a graphic. Text anchors usually appear underlined and in a different color than normal text on graphical browsers and in bold on text-only browsers such as Lynx. Graphic anchors (hypergraphics) usually have a colored border around them to distinguish them from plain graphics.

Creating Hypertext Anchors

Any text can be a hypertext anchor in htmL, regardless of size or formatting. An anchor can consist of a few letters, words, or even lines of text. The format for an anchor-address pair is simple:

<A HREF="URL">text of the anchor</A>
The letter A in the <A HREF> tag stands for "anchor." HREF stands for "hypertext reference." Everything between the <A HREF="URL"> and </A> tags is the text of the anchor, which appears underlined or bold, depending on the browser.


Other formatting codes can be used in conjunction with hypertext anchors. For example, to create a text anchor that appears in the level 3 heading style, you write:

<A HREF="URL"><H3>text of the anchor</H3></A>

The order of nesting formatting codes is not important. It's also possible to write:

<H3><A HREF="URL">text of the anchor</A></H3>.

Creating Hypergraphics

You can use hypergraphics to create button-like effects and provide a nice alternative to clicking plain text. The format for a graphic anchor is the same as a text anchor. However, instead of putting text between the <A HREF> and </A> tags, you reference an inline image. Figure 8.14 shows a hypergraphic.

<A HREF="http://www.w3.org/"><IMG SRC="images/w3c.gif">
</A>Visit the World Wide Web (W3) Consortium's Home Page
Fig. 8.14 - Hypergraphics create button-like objects.

In this example, when the user clicks the W3C logo, the browser jumps to the W3C home page.


If text or images used in hypertext anchors don't seem to be working right, check to see that the URL in the <A ...> tag is completely enclosed in quotes. Omitting the final quotation mark is a common and easy mistake.

Linking to a Named Anchor

When you link to another document, the browser shows information starting from the top of the linked document. This is fine, unless the document is long and the information you really want displayed isn't near the top. In this case, users have to scroll through the document to find the information you want them to see. An alternative to inflicting this on your users is to set up named anchors in longer documents and then have your hyperlink references point directly to the named anchors.

As an example, suppose you have a ten part document stored in a single file longdoc.html and that each section has its own heading. You can set up named anchors on each of the headings using the <A NAME="anchor_name"> and </A> tags as follows:

<A NAME="one"><H1>Part One</H1></A>
With all of the anchors established, you can instruct a browser to link to a specific anchor by including a pound sign (#) and the anchor's name at the end of the long document's URL:

View <A HREF="longdoc.html#seven">Part Seven</A>.
When users click on the hypertext "Part Seven," they are taken directly to part seven in the document, rather than to the top of the document from which they would have to scroll all the way down to part seven (see fig. 8.15).

Fig. 8.15 - Linking to named anchors takes users right to the information you want them to see.


Named anchors let you set up a miniature "Table of Contents" at the top of long documents with links pointing to the different sections of the manuscript. Users appreciate this courtesy because it spares them from excessive scrolling and searching through the document.


QUE Home Page

For technical support For our books And software contact support@mcp.com

Copyright © 1996, Que Corporation

Table of Contents

07 - Creating and Managing an Intranet

09 - htmL 2.0, htmL 3.0, and Extensions