Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp.com.

Notice: This material is excerpted from Running A Perfect Web Site with Apache, ISBN: 0-7897-0745-4. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

Chapter 02 - Introduction to Web Servers

The World Wide Web is an evolving paradigm. The Web sports a different look today than it did at its inception only a few short years ago. This chapter describes some Web nomenclature; also, some types of data you can convey via the Web are discussed. There is almost no limit to the type of data you can provide to your Web users.

You'll want to provide Web services that are both innovative and useful. You can accomplish this by first understanding some of the terminology associated with the Web; furthermore, you will develop an appreciation for the type of material available through the Web by visiting some popular sites. This chapter provides:

  • Definitions of terms associated with the World-Wide Web
  • An introduction to the HyperText Transport Protocol
  • A discussion of some of the Internet protocols that predate the World Wide Web
  • The types of data that you can serve via the World Wide Web
  • Methods used to secure Web servers

Web Tech Terminology

Before covering the types of services you can offer through the Web, this section covers some of the terminology that is used in this book. In addition, it describes some of the underlying protocols that make data transfer using the World Wide Web possible.

Definitions

The World Wide Web describes a cross-platform, interactive network of Internet sites that offer hypertext document access. Also known as the WWW or simply the Web, the World Wide Web supports a variety of data formats.

HyperText Transport Protocol, more commonly known as HTTP, is the Internet protocol that allows data transfer through the World Wide Web. It's a stateless protocol similar to Gopher; connections are opened and closed as data is transferred between hosts. FTP connections differ because they are held open at the users' discretion.

A Web browser is an application that allows users to view documents within a hypertext context. Web browsers allow text and graphics to be viewed and formatted beside each other. The Web supports transfer of many different data types; when a Web browser encounters a data format that it cannot natively display, it launches relevant applications to display those files.

A Web server is a program that responds to requests from Web browsers via HTTP. Servers transfer htmL files, corresponding graphics, and other content via HTTP to remote computers that are running Web browsers.

HyperText Markup Language, or htmL, is the de facto document format of the World Wide Web. Text and graphics are formatted in WWW documents using htmL; Web browsers process these documents transferring the htmL commands into the desired format in the Web browser display window.

Helper applications are those applications defined within Web browsers that display file formats that the browser itself can not "inline." Browsers such as Netscape and Mosaic can display inline text and graphics. However, file formats as MPEG, audio files, and PostScript are not supported within most Web browsers. Therefore, the browser hands the file off to the requisite helper application so that the user can view the file.

The Client-Server Model

As with most other enterprise systems, the World Wide Web works within a client-server paradigm. The Web operates through exchange of data between Web clients, or browsers, and Web servers. These servers field requests from Web browsers for certain files that can be comprised of almost an unlimited number of data types.

Figure 2.1 details a schematic of how Web browsers interact with a single Web server. Several browsers can simultaneously request files from a single server. This server, depending on its processing and networking resources, processes these requests and returns requisite files to the browsers.

Fig. 2.1 - World Wide Web servers interact with requests from various Web browsers. Servers can transmit any arbitrary number of content types to the client.

Web Protocols

The HyperText Transport Protocol (HTTP) is the most common method of transporting data between WWW browsers and clients. The protocol was developed in 1989 for the purpose of transporting documents along the Internet via a hypertext interface. In contrast to FTP, an HTTP connection between computers requires few resources. The protocol was designed to very nimbly recover text and other data from HTTP servers with very little overhead required from the browser or server computers.

The HTTP specifications undergo periodic review by a committee of Internet specialists. The current standard is HTTP/1.0 which supersedes the original HTTP/0.9. Further versions of HTTP are under review; they will provide greater capabilities to Web browsers in the areas of performance and security. For more information, see http://www.ics.uci.edu/pub/ietf/http/.

An HTTP connection between a Web client and server can be separated into four separate actions:

  • Connection Launch The HTTP server constantly listens on a certain IP port for a request from a Web browser. This port is usually specified as port 80, but nonstandard ports can be included in the URL.
  • Client Request After a connection is established, the browser sends a request to the server. In addition to querying the server regarding a CGI script or a certain image, sound, or htmL file, the browser sends a little information about itself, such as what content types it understands, the name of the browser, and more.
  • Server Response The server, having digested the request from the browser, sends an HTTP message to the browser. This server tells the browser what level of HTTP is being supported, a bunch of meta-information about the object requested (such as its last-modified date), and the response itself.
  • Connection Close Having sent the message, the connection is terminated by either the client or server. In HTTP/1.1, connections may stay open at this point to wait for another request, and thus repeat the cycle.

As you can see, as opposed to FTP or Telnet connections, the HTTP connection does not stay open. As a result, a server can maintain many more HTTP connections for a given length of time than it can support remote logins.


For more specific information on HTTP, visit the World-Wide Web Consortium HTTP specification at http://www.w3.org/pub/WWW/Protocols/.

Understanding MIME

The Multimedia Internet Mail Exchange (MIME) message representation protocol is a means of conveying information about a file that is being sent through the Internet. This protocol conveys information about the message through MIME headers but leaves the message content or body in the form of plain text. For this reason, MIME is an excellent means of transferring files between different platforms. For example, you can use the e-mail program Eudora to send a graphics file from your Macintosh to a PC user. If the PC user is also running Eudora, or any other MIME-capable mail reader, the program will read the MIME header and attach the relevant tag to the file to make it readable by the correct application.

Much like HTTP, MIME content headers are under a standards process. The key information in the header is the MIME type and subtype that identify the type of message content. The MIME type will usually consist of one of the types listed in table 4.1.

Table 4.1 Common MIME Types

Type Function
application Defines client applications
audio Defines audio formats
image Defines image formats
message Used for electronic mail messages
multipart Used for transmission with multiple parts
text Defines text formats
video Defines video formats
x-"string" Denotes an experimental MIME type not recognized as a standard

The content header is comprised of a type and subtype. The subtype specifically defines the message content within the context of the MIME type. For example, an HTTP server will send the following MIME type/subtype in response to a Web client query

text/html

This header information tells the browser to expect some text and specifically some htmL text. Web browsers, as opposed to other applications, understand that MIME types need to be interpreted as htmL and displayed accordingly. Similarly, a MIME header containing the information

image/gif

would tell the browser the following ASCII text is actually a gif image. The browser then displays the gif within the window or launches a gif-viewing application.

There are a variety of MIME subtypes defined for each type. The HTTP server needs to correlate the type of information it's serving to a certain MIME type. For example, if it's serving a JPEG file as part of a Web page, the server needs to somehow know that

  • the file is a JPEG formatted-file
  • image/jpeg is the standard MIME classification for that file

The Web server needs to have some means of identifying files and the relevant MIME types in order to tell the browsers what to expect.

Pre-WWW Protocols

One reason for the success of the WWW is the ability of Web browsers to transfer data using protocols other than HTTP. Hence, Web clients such as Mosaic and Netscape Navigator can serve as FTP and Gopher clients in addition to interpreting HTTP. Modern Web clients have positioned themselves as all-in-one Internet tools. There are more uses for an Internet server than just serving Web pages.

There are many Internet protocols built on top of TCP/IP, and while HTTP is the 800-lb gorilla of the bunch, there are other useful capabilities that you may want to offer on your server. The File Transfer Protocol is useful for quickly transferring large amounts of data. Many shareware and freeware applications are available on Internet servers through FTP connections. Gopher offers an even more intuitive and flexible means of transferring files. Furthermore, you may want to set up your Internet site as an e-mail server for your organization. In this manner, users will be able to send and exchange mail with one another as well as with other users on the Internet. Finally, you may want to offer UseNet newsgroup access to your organization. In addition to offering UseNet groups, many large organizations, such as corporations and universities, often establish newsgroups of local interest to the organization.

Content Delivery

As mentioned earlier in this chapter, a variety of file formats can be served via the World Wide Web. This section covers some of the content types that modern Web browsers support.

Text

Tim Berners-Lee conceived of the World Wide Web as a means of displaying documents with hypertext links to other documents, particular documents of other content types. These hypertext links allow users to refer to documents that are located throughout the Internet. This "Web" of documents extends throughout the Net. While many formats are either displayed within Web browsers or viewed with helper applications, most of the information on Web pages is displayed as text.

Web servers use htmL to store text files for the purpose of formatting text and graphics within a Web browser. Figure 2.2 shows how text and graphics can be formatted to appear within a Web browser window. The use of htmL allows Web designers to apply a variety of styles and formatting to the text within a browser window.

Fig. 2.2 - Web files are often constructed using htmL to allow formatting of text and graphics within a Web browser.

Graphics

Figure 2.2 shows how text and graphics can be displayed in the same browser window. Displaying graphics is one of the most appealing features of using the Web. Photographs, clip art, and cartoons can be easily downloaded by Web users. Using various features of htmL allow you to display graphics and text in a manner not unlike printed media. This is one reason why the Web is competing with more conventional media for the public's attention.

In the early days of the Web, the Mosaic browser could only display files using the Graphic Interchange Format (gif). This format allows accurate display of simple images such as clip art, cartoons, or text. With the advent of Netscape, an additional graphic format was supported for inline imaging. The Joint Photographic Expert Group (JPEG) format is more useful for displaying complicated images, such as photographs and intricate line art, more accurately and in smaller files than can gif.


For more information on the pros and cons of using JPEG and gif images, consult the JPEG FAQ at http://www.cis.ohio-state.edu/hypertext/faq/usenet/jpeg-faq/top.html

Audio

The capability to download audio files using the World Wide Web adds an exciting new dimension to the Internet. Any computer with a sound card and the appropriate software can download sound files from sites that publish them. High-fidelity sound files, such as those sampled from an audio CD, can be quite large even for a few seconds recording. Some home pages publish greetings from the Webmaster or even the head of the sponsoring organization.

Not all browsers support sounds; helper applications are needed to play the sound files. Sun Microsystems' AU format is a popular format, as are MPEG audio and the proprietary RealAudio format. For more information, see http://www.iis.fhg.de/departs/amm/layer3/ and http://www.realaudio.com/.

Video

Much like sound, downloading video files through the Web is an exciting means of transferring information. However, like sound files, video files take an enormous amount of space and require a long time to transmit over even high-speed network connections. Downloading movies over a modem connection is nothing short of a tortuous exercise in patience.

Most browsers cannot display movies within the browser; an appropriate helper application is required. Two common formats are the Motion Pictures Expert Group (MPEG) and QuickTime. For more information, see http://www.cis.ohio-state.edu/hypertext/faq/usenet/mpeg-faq/ and http://quicktime.apple.com/. More often than not, movie files served via the Web take up a small portion of the desktop so as to conserve file size. Figure 2.3 shows an example of how movie files can be served via the Web. In this example, movies of the same sequence are stored in a variety of formats.

Fig. 2.3 - Sites containing weather-related movie files are popular with many Web users. Note the large size of some movie files.

Forms

Soon after the introduction of Mosaic, the htmL standard was extended by the NCSA developers to include several new capabilities. The capability to develop interactive forms on a Web page is one of those. Figure 2.4 shows the types of objects that you can use on a Web page to retrieve information from Web users. Users can enter text in text fields and can select options using radio buttons and check boxes. These devices are available as familiar graphical user interface features on many Windows applications.

Fig. 2.4 - There are several types of htmL forms you can use to retrieve information from Web users.

These forms do not process any of the information they contain. They merely act as a conduit for conveying information to a third-party application on the server. These scripts adhere to the Common Gateway Interface standard and represent a means of processing information retrieved from htmL forms. CGI scripts can be written in almost any language with Perl, C/C++, and UNIX shell being prevalent on UNIX-based WWW servers and C/C++ and Visual Basic being heavily used as scripting languages under Windows-based Web servers.

Using information gleaned from a page containing htmL forms, CGI scripts can send e-mail, search databases, or even create htmL pages to present back to the user. These functions are prevalent throughout the Web in a variety of implementations.

Virtual Reality

One alternative to htmL is the Virtual Reality Modeling Language (VRML - rhymes with thermal). VRML is a standard for a language for three-dimensional data. htmL allows you to construct a two-dimensional publishing metaphor for graphics and text, but VRML is a totally separate language designed to extend the metaphor to a third dimension. With Netscape and htmL, you meander across a page and click on links and graphics as you see fit. A VRML browser allows you to traverse a third dimension as well. Instead of two-dimensional imagemaps, VRML "worlds" can have hallways that you can traverse. You can display information from a variety of three-dimensional perspectives rather than from the rigid display defined by a two-dimensional Web browser.

Where as you jump from page to page using htmL, VRML users jump between "worlds." These worlds can be created with various VRML editors, or in very simple cases, by hand. Loading a world over the Internet does not require much more time than a large graphic does using htmL.

Programming in VRML is analogous to programming in htmL. The three-dimensional interface leads to new possibilities in information publishing. If you have ever played Doom or one of the 3-D action games, you have seen some of the applications of three-dimensional graphics under Windows. As you explore other VRML worlds, you can think of ways that you can use the three-dimensional metaphor to present information to your users. Examples of this metaphor can include a VR implementation of a library where users can navigate through virtual stacks to browse some of the libraries selections.

Custom Web Scripts

As mentioned above, CGI scripts stand together with htmL and HTTP as the three major components of the World Wide Web. Using CGI scripts, you can customize the type of information you serve to users. The passes data from the browser to the script residing on your server. The script receives the data, parses the commands into a comprehensible format, and then returns the results in the form of a Web page. Many powerful search engines and other popular devices found on the Web are constructed using CGI scripts.

Security

Possibly no other aspect of your WWW server requires more attention than security. Depending on whether you wish to provide secure communication through your server or whether you wish to protect certain areas of your server from individuals within your organization, securing your server requires a great deal of planning and forethought. There are hardware options for ensuring secure access to your server, but the measures discussed here are implemented in software.

Securing your server transactions allows you to provide a variety of transactions. For example, you can conduct online business by allowing transmission of financial data such as credit card numbers. You can also protect various documents for viewing by authorized personnel within your organization.

The security schemes described in this section are new and not yet widely implemented throughout the Web. For this reason, financial transactions over the Web are not occurring in a widespread fashion. Implementation of these schemes will enable a burgeoning world of commerce to develop.


For a look at how some online transactions are conducted, visit First Virtual at http://www.fv.com.

The WWW Security Model

You have several options with which to restrict access to your server. You may wish to restrict access to certain documents to certain users. You may also want to enable access to groups of users. The following sections discuss these options.

Domain Restrictions

By restricting access to your server by domain, you can enable or deny access to large groups of people. For example, if your organization has the Internet address "anywhere.com," you can enable access to your server only to those computers within the "anywhere.com" domain. Users with computers outside this domain could be restricted from accessing your server.

User Authentication

If you desire to further restrict access to your server to smaller groups of users, you can employ some means of authentication. Much like a remote Telnet session, you can require users to enter an account name and password upon accessing certain documents on the server. You can store and access sensitive documents in this manner.

Data Encryption

One way of securing the data on your server is not to alter the data but to encrypt the communication between your server and various Web browsers. The algorithms discussed in this section are used to encrypt HTTP transactions using a variety of methods. Not only does your server need to support these methods but Web browsers must adhere to these standards.

The Secure Sockets Layer Protocol

The Secure Sockets Layer (SSL) Protocol, proposed by Netscape, is designed to provide accurate and secure communication between two applications such as your Web server and a Web browser. Implemented by several servers such as WebSite and the Netscape Commerce Server, SSL allows secure communication of financial transactions or a variety of other connection types. SSL is an open protocol and has been recently proposed to the Internet Engineering Task Force.

The SSL protocol is composed of two layers: the SSL Record Protocol and the SSL Handshake Protocol. The SSL Record Protocol is used for encapsulation of various higher level protocols. One such encapsulated protocol, the SSL Handshake Protocol, allows the WWW server and client to authenticate each other and to negotiate an encryption algorithm and cryptographic keys before the application protocol transmits or receives its first byte of secure data. The advantage of SSL is that it is application-protocol independent. A higher level protocol can layer on top of the SSL Protocol transparently.

The SSL protocol provides connection security with three basic properties:

  • The transmission is private as encryption is used after an initial handshake to define a secret key
  • The connection can be authenticated using popular cryptographic schemes such as RSA or DSS
  • The connection is reliable - a message integrity check is included with the transmission

The advantage of using SSL is that it's a layered protocol. For your Web site, you may want to use SSL to secure your Web connections. However, you could also use SSL to secure UseNet transactions via NNTP (Network News Transmission Protocol). Furthermore, you could use SSL to secure e-mail traffic via SMTP (Simple Mail Transfer Protocol), or file transfer via FTP. However, both client and server must implement SSL to do this.

Secure-HTTP

Secure-HTTP, or S-HTTP, is an encryption standard designed solely for the purpose of securing HTTP transactions. S-HTTP acts to secure Web connections in three chief ways: signature, encryption, and authentication. You can attach digital signatures to documents using a CGI script. You can encrypt messages using a variety of encryption algorithms, including the very popular PGP. However, one advantage of S-HTTP is that encryption can occur between a server and a client without necessarily requiring a predetermined encryption key. In contrast to a less comprehensive password authentication scheme, S-HTTP authentication requires the unique identifier upon request. Such an authentication scheme might be employed to complete financial transactions.

Both SHTTP and SSL are based on public-key cryptography, where the parties involved have a pair of keys, one that is private and never revealed, and one that is public. The infrastructure for supporting the distribution of keys is known as "X509," and will be deployed over the coming year. However, SSL and SHTTP transactions can be used today.


QUE Home Page

For technical support For our books And software contact support@mcp.com

Copyright © 1996, Que Corporation


Table of Contents

01 - The State of the World Wide Web

03 - Setting Up a Presence on the World Wide Web