Chapter 02 - Introduction to Web ServersThe World Wide Web is an evolving paradigm. The Web sports a different look today than it did at its inception only a few short years ago. This chapter describes some Web nomenclature; also, some types of data you can convey via the Web are discussed. There is almost no limit to the type of data you can provide to your Web users. You'll want to provide Web services that are both innovative and useful. You can accomplish this by first understanding some of the terminology associated with the Web; furthermore, you will develop an appreciation for the type of material available through the Web by visiting some popular sites. This chapter provides:
Web Tech TerminologyBefore covering the types of services you can offer through the Web, this section covers some of the terminology that is used in this book. In addition, it describes some of the underlying protocols that make data transfer using the World Wide Web possible.
DefinitionsThe World Wide Web describes a cross-platform, interactive network of Internet sites that offer hypertext document access. Also known as the WWW or simply the Web, the World Wide Web supports a variety of data formats. HyperText Transport Protocol, more commonly known as HTTP, is the Internet protocol that allows data transfer through the World Wide Web. It's a stateless protocol similar to Gopher; connections are opened and closed as data is transferred between hosts. FTP connections differ because they are held open at the users' discretion. A Web browser is an application that allows users to view documents within a hypertext context. Web browsers allow text and graphics to be viewed and formatted beside each other. The Web supports transfer of many different data types; when a Web browser encounters a data format that it cannot natively display, it launches relevant applications to display those files. A Web server is a program that responds to requests from Web browsers via HTTP. Servers transfer htmL files, corresponding graphics, and other content via HTTP to remote computers that are running Web browsers. HyperText Markup Language, or htmL, is the de facto document format of the World Wide Web. Text and graphics are formatted in WWW documents using htmL; Web browsers process these documents transferring the htmL commands into the desired format in the Web browser display window. Helper applications are those applications defined within Web browsers that display file formats that the browser itself can not "inline." Browsers such as Netscape and Mosaic can display inline text and graphics. However, file formats as MPEG, audio files, and PostScript are not supported within most Web browsers. Therefore, the browser hands the file off to the requisite helper application so that the user can view the file.
The Client-Server ModelAs with most other enterprise systems, the World Wide Web works within a client-server paradigm. The Web operates through exchange of data between Web clients, or browsers, and Web servers. These servers field requests from Web browsers for certain files that can be comprised of almost an unlimited number of data types. Figure 2.1 details a schematic of how Web browsers interact with a single Web server. Several browsers can simultaneously request files from a single server. This server, depending on its processing and networking resources, processes these requests and returns requisite files to the browsers. Fig. 2.1 - World Wide Web servers interact with requests from various Web browsers. Servers can transmit any arbitrary number of content types to the client.
Web ProtocolsThe HyperText Transport Protocol (HTTP) is the most common method of transporting data between WWW browsers and clients. The protocol was developed in 1989 for the purpose of transporting documents along the Internet via a hypertext interface. In contrast to FTP, an HTTP connection between computers requires few resources. The protocol was designed to very nimbly recover text and other data from HTTP servers with very little overhead required from the browser or server computers. The HTTP specifications undergo periodic review by a committee of Internet specialists. The current standard is HTTP/1.0 which supersedes the original HTTP/0.9. Further versions of HTTP are under review; they will provide greater capabilities to Web browsers in the areas of performance and security. For more information, see http://www.ics.uci.edu/pub/ietf/http/. An HTTP connection between a Web client and server can be separated into four separate actions:
As you can see, as opposed to FTP or Telnet connections, the HTTP connection does not stay open. As a result, a server can maintain many more HTTP connections for a given length of time than it can support remote logins. Understanding MIMEThe Multimedia Internet Mail Exchange (MIME) message representation protocol is a means of conveying information about a file that is being sent through the Internet. This protocol conveys information about the message through MIME headers but leaves the message content or body in the form of plain text. For this reason, MIME is an excellent means of transferring files between different platforms. For example, you can use the e-mail program Eudora to send a graphics file from your Macintosh to a PC user. If the PC user is also running Eudora, or any other MIME-capable mail reader, the program will read the MIME header and attach the relevant tag to the file to make it readable by the correct application. Much like HTTP, MIME content headers are under a standards process. The key information in the header is the MIME type and subtype that identify the type of message content. The MIME type will usually consist of one of the types listed in table 4.1.
The content header is comprised of a type and subtype. The subtype specifically defines the message content within the context of the MIME type. For example, an HTTP server will send the following MIME type/subtype in response to a Web client query
text/html This header information tells the browser to expect some text and specifically some htmL text. Web browsers, as opposed to other applications, understand that MIME types need to be interpreted as htmL and displayed accordingly. Similarly, a MIME header containing the information
image/gif would tell the browser the following ASCII text is actually a gif image. The browser then displays the gif within the window or launches a gif-viewing application. There are a variety of MIME subtypes defined for each type. The HTTP server needs to correlate the type of information it's serving to a certain MIME type. For example, if it's serving a JPEG file as part of a Web page, the server needs to somehow know that
The Web server needs to have some means of identifying files and the relevant MIME types in order to tell the browsers what to expect.
Pre-WWW ProtocolsOne reason for the success of the WWW is the ability of Web browsers to transfer data using protocols other than HTTP. Hence, Web clients such as Mosaic and Netscape Navigator can serve as FTP and Gopher clients in addition to interpreting HTTP. Modern Web clients have positioned themselves as all-in-one Internet tools. There are more uses for an Internet server than just serving Web pages. There are many Internet protocols built on top of TCP/IP, and while HTTP is the 800-lb gorilla of the bunch, there are other useful capabilities that you may want to offer on your server. The File Transfer Protocol is useful for quickly transferring large amounts of data. Many shareware and freeware applications are available on Internet servers through FTP connections. Gopher offers an even more intuitive and flexible means of transferring files. Furthermore, you may want to set up your Internet site as an e-mail server for your organization. In this manner, users will be able to send and exchange mail with one another as well as with other users on the Internet. Finally, you may want to offer UseNet newsgroup access to your organization. In addition to offering UseNet groups, many large organizations, such as corporations and universities, often establish newsgroups of local interest to the organization.
Content DeliveryAs mentioned earlier in this chapter, a variety of file formats can be served via the World Wide Web. This section covers some of the content types that modern Web browsers support.
TextTim Berners-Lee conceived of the World Wide Web as a means of displaying documents with hypertext links to other documents, particular documents of other content types. These hypertext links allow users to refer to documents that are located throughout the Internet. This "Web" of documents extends throughout the Net. While many formats are either displayed within Web browsers or viewed with helper applications, most of the information on Web pages is displayed as text. Web servers use htmL to store text files for the purpose of formatting text and graphics within a Web browser. Figure 2.2 shows how text and graphics can be formatted to appear within a Web browser window. The use of htmL allows Web designers to apply a variety of styles and formatting to the text within a browser window. Fig. 2.2 - Web files are often constructed using htmL to allow formatting of text and graphics within a Web browser.
GraphicsFigure 2.2 shows how text and graphics can be displayed in the same browser window. Displaying graphics is one of the most appealing features of using the Web. Photographs, clip art, and cartoons can be easily downloaded by Web users. Using various features of htmL allow you to display graphics and text in a manner not unlike printed media. This is one reason why the Web is competing with more conventional media for the public's attention. In the early days of the Web, the Mosaic browser could only display files using the Graphic Interchange Format (gif). This format allows accurate display of simple images such as clip art, cartoons, or text. With the advent of Netscape, an additional graphic format was supported for inline imaging. The Joint Photographic Expert Group (JPEG) format is more useful for displaying complicated images, such as photographs and intricate line art, more accurately and in smaller files than can gif. AudioThe capability to download audio files using the World Wide Web adds an exciting new dimension to the Internet. Any computer with a sound card and the appropriate software can download sound files from sites that publish them. High-fidelity sound files, such as those sampled from an audio CD, can be quite large even for a few seconds recording. Some home pages publish greetings from the Webmaster or even the head of the sponsoring organization. Not all browsers support sounds; helper applications are needed to play the sound files. Sun Microsystems' AU format is a popular format, as are MPEG audio and the proprietary RealAudio format. For more information, see http://www.iis.fhg.de/departs/amm/layer3/ and http://www.realaudio.com/.
VideoMuch like sound, downloading video files through the Web is an exciting means of transferring information. However, like sound files, video files take an enormous amount of space and require a long time to transmit over even high-speed network connections. Downloading movies over a modem connection is nothing short of a tortuous exercise in patience. Most browsers cannot display movies within the browser; an appropriate helper application is required. Two common formats are the Motion Pictures Expert Group (MPEG) and QuickTime. For more information, see http://www.cis.ohio-state.edu/hypertext/faq/usenet/mpeg-faq/ and http://quicktime.apple.com/. More often than not, movie files served via the Web take up a small portion of the desktop so as to conserve file size. Figure 2.3 shows an example of how movie files can be served via the Web. In this example, movies of the same sequence are stored in a variety of formats. Fig. 2.3 - Sites containing weather-related movie files are popular with many Web users. Note the large size of some movie files.
FormsSoon after the introduction of Mosaic, the htmL standard was extended by the NCSA developers to include several new capabilities. The capability to develop interactive forms on a Web page is one of those. Figure 2.4 shows the types of objects that you can use on a Web page to retrieve information from Web users. Users can enter text in text fields and can select options using radio buttons and check boxes. These devices are available as familiar graphical user interface features on many Windows applications. Fig. 2.4 - There are several types of htmL forms you can use to retrieve information from Web users. These forms do not process any of the information they contain. They merely act as a conduit for conveying information to a third-party application on the server. These scripts adhere to the Common Gateway Interface standard and represent a means of processing information retrieved from htmL forms. CGI scripts can be written in almost any language with Perl, C/C++, and UNIX shell being prevalent on UNIX-based WWW servers and C/C++ and Visual Basic being heavily used as scripting languages under Windows-based Web servers. Using information gleaned from a page containing htmL forms, CGI scripts can send e-mail, search databases, or even create htmL pages to present back to the user. These functions are prevalent throughout the Web in a variety of implementations.
Virtual RealityOne alternative to htmL is the Virtual Reality Modeling Language (VRML - rhymes with thermal). VRML is a standard for a language for three-dimensional data. htmL allows you to construct a two-dimensional publishing metaphor for graphics and text, but VRML is a totally separate language designed to extend the metaphor to a third dimension. With Netscape and htmL, you meander across a page and click on links and graphics as you see fit. A VRML browser allows you to traverse a third dimension as well. Instead of two-dimensional imagemaps, VRML "worlds" can have hallways that you can traverse. You can display information from a variety of three-dimensional perspectives rather than from the rigid display defined by a two-dimensional Web browser. Where as you jump from page to page using htmL, VRML users jump between "worlds." These worlds can be created with various VRML editors, or in very simple cases, by hand. Loading a world over the Internet does not require much more time than a large graphic does using htmL. Programming in VRML is analogous to programming in htmL. The three-dimensional interface leads to new possibilities in information publishing. If you have ever played Doom or one of the 3-D action games, you have seen some of the applications of three-dimensional graphics under Windows. As you explore other VRML worlds, you can think of ways that you can use the three-dimensional metaphor to present information to your users. Examples of this metaphor can include a VR implementation of a library where users can navigate through virtual stacks to browse some of the libraries selections.
Custom Web ScriptsAs mentioned above, CGI scripts stand together with htmL and HTTP as the three major components of the World Wide Web. Using CGI scripts, you can customize the type of information you serve to users. The passes data from the browser to the script residing on your server. The script receives the data, parses the commands into a comprehensible format, and then returns the results in the form of a Web page. Many powerful search engines and other popular devices found on the Web are constructed using CGI scripts.
SecurityPossibly no other aspect of your WWW server requires more attention than security. Depending on whether you wish to provide secure communication through your server or whether you wish to protect certain areas of your server from individuals within your organization, securing your server requires a great deal of planning and forethought. There are hardware options for ensuring secure access to your server, but the measures discussed here are implemented in software. Securing your server transactions allows you to provide a variety of transactions. For example, you can conduct online business by allowing transmission of financial data such as credit card numbers. You can also protect various documents for viewing by authorized personnel within your organization. The security schemes described in this section are new and not yet widely implemented throughout the Web. For this reason, financial transactions over the Web are not occurring in a widespread fashion. Implementation of these schemes will enable a burgeoning world of commerce to develop. The WWW Security ModelYou have several options with which to restrict access to your server. You may wish to restrict access to certain documents to certain users. You may also want to enable access to groups of users. The following sections discuss these options.
Domain RestrictionsBy restricting access to your server by domain, you can enable or deny access to large groups of people. For example, if your organization has the Internet address "anywhere.com," you can enable access to your server only to those computers within the "anywhere.com" domain. Users with computers outside this domain could be restricted from accessing your server.
User AuthenticationIf you desire to further restrict access to your server to smaller groups of users, you can employ some means of authentication. Much like a remote Telnet session, you can require users to enter an account name and password upon accessing certain documents on the server. You can store and access sensitive documents in this manner.
Data EncryptionOne way of securing the data on your server is not to alter the data but to encrypt the communication between your server and various Web browsers. The algorithms discussed in this section are used to encrypt HTTP transactions using a variety of methods. Not only does your server need to support these methods but Web browsers must adhere to these standards.
The Secure Sockets Layer ProtocolThe Secure Sockets Layer (SSL) Protocol, proposed by Netscape, is designed to provide accurate and secure communication between two applications such as your Web server and a Web browser. Implemented by several servers such as WebSite and the Netscape Commerce Server, SSL allows secure communication of financial transactions or a variety of other connection types. SSL is an open protocol and has been recently proposed to the Internet Engineering Task Force. The SSL protocol is composed of two layers: the SSL Record Protocol and the SSL Handshake Protocol. The SSL Record Protocol is used for encapsulation of various higher level protocols. One such encapsulated protocol, the SSL Handshake Protocol, allows the WWW server and client to authenticate each other and to negotiate an encryption algorithm and cryptographic keys before the application protocol transmits or receives its first byte of secure data. The advantage of SSL is that it is application-protocol independent. A higher level protocol can layer on top of the SSL Protocol transparently. The SSL protocol provides connection security with three basic properties:
The advantage of using SSL is that it's a layered protocol. For your Web site, you may want to use SSL to secure your Web connections. However, you could also use SSL to secure UseNet transactions via NNTP (Network News Transmission Protocol). Furthermore, you could use SSL to secure e-mail traffic via SMTP (Simple Mail Transfer Protocol), or file transfer via FTP. However, both client and server must implement SSL to do this.
Secure-HTTPSecure-HTTP, or S-HTTP, is an encryption standard designed solely for the purpose of securing HTTP transactions. S-HTTP acts to secure Web connections in three chief ways: signature, encryption, and authentication. You can attach digital signatures to documents using a CGI script. You can encrypt messages using a variety of encryption algorithms, including the very popular PGP. However, one advantage of S-HTTP is that encryption can occur between a server and a client without necessarily requiring a predetermined encryption key. In contrast to a less comprehensive password authentication scheme, S-HTTP authentication requires the unique identifier upon request. Such an authentication scheme might be employed to complete financial transactions. Both SHTTP and SSL are based on public-key cryptography, where the parties involved have a pair of keys, one that is private and never revealed, and one that is public. The infrastructure for supporting the distribution of keys is known as "X509," and will be deployed over the coming year. However, SSL and SHTTP transactions can be used today.
QUE Home Page For technical support For our books And software contact support@mcp.com Copyright © 1996, Que Corporation |