Chapter 13 - CGI Scripts and Server APIsBy now, you know enough to get your server up and running. You can create and publish htmL documents and display images. You can sit and watch people from around the world log onto your server but you still seem to have this nagging feeling that there's something missing. Maybe your server should be doing...something. You know, something like searching documents or even displaying one of those cute little access counters that tell how many people have linked to your page. Your server just seems a little reactive and not interactive. It's time to talk about creating dynamic documents on your server. One of the earliest means of creating documents on-the-fly is through the use of the Common Gateway Interface (CGI). Using CGI scripts, your server can interact with third-party applications that can execute document searches, query databases, or return a dynamically-created htmL page. You can give users that customized and professional feel to your server. There are other means of creating interactive services on your Web servers. Server Side Includes are customized capabilities offered by Web server applications that allow you to produce dynamic documents but without the resource overhead of CGI script processing. In addition, some servers offer the ability to directly extend the capability of the server program itself using a technology known as Application Programming Interfaces (APIs). This chapter covers the following topics:
Introduction to CGINormally, Web servers respond to requests from Web browsers in the form of htmL documents and images. The browser sends a URL to the server and the server sends the file, whether it's an htmL document, gif or JPEG graphic, sound file, or movie, to the browser via an HTTP connection. Sometimes, the browser sends a URL that does not point to a document but instead points to an application. The server activates this application which then responds to the browser with the requisite information. This application is a CGI script. This section covers how this script interacts with the Web server and browser. One important feature of htmL 2.0 is the capability for Web designers to use the language to create interactive forms. These forms collect data entered by the user; the Web browser processes this data and sends it via an HTTP request to a Web server. Usually, the Web server will receive requests for htmL documents or graphic images. However, the htmL form implies that a specific action is requested of the server. With this type of request, the server knows to ignore the content of the form data and redirect the information to a CGI script specified in the htmL form page. The CGI script is actually a third-party application developed in a language such as C, C++, PERL, Visual Basic, or really any language supported by the operating system in which the server is running. However, some languages lend themselves to CGI scripting more than others and we'll discuss those later in this chapter.
How the CGI WorksThe process through which the Common Gateway Interface works is quite simple.
This process is outlined in figure 13.1. Note that the server merely passes information to the script. The script receives the data from the Web server through some mechanism unique to the language in which the script was developed. As long as this mechanism is in place, any programming language can be used to implement a CGI script. Fig. 13.1 - The CGI script works with the Web server to respond to certain Web browser requests.
Client-Server HTTP Header FormatsWeb browsers communicate with Web servers via the HTTP protocol. Not only does this protocol specify the physical packet structure of the protocol, but it also defines the manner in which the server and browser exchange information. For example, a Netscape Navigator client might send the following text to a Web server for a simple file request:
GET /article1.html HTTP/1.0 Accept: text/html Accept: image/gif Accept: image/jpeg User-Agent: Mozilla/2.0b5 (Windows; I; 32bit) ...a blank line... This message header informs the server that the browser is looking for the file article1.html and intends to use version 1.0 of the HTTP specification. The browser then informs the server as to which file formats it can interpret. In the above message, this list is truncated from what browsers usually express, but the server is informed that the client can interpret several text and graphics MIME types. The browser then informs the server as to its brand of client; in this example, the browser is defined as Netscape Navigator. Finally, the browser passes a blank line to complete the request. The server will respond with a message generally like the following:
HTTP/1.0 200 OK Date: Thursday, 01-Feb-96 19:15:32 GMT Server: Apache 1.0.3 MIME-version: 1.0 Last-modified: Friday, 15-Dec-95 17:54:01 GMT Content-type: text/html Content-length: 7562 ...a blank line... <htmL><HEAD><TITLE>Article.... In this response, the server provides enough information to allow the browser to process the requested data. The server denotes that it too is providing data using the HTTP v.1.0 protocol. Furthermore, it returns an HTTP code of 200 OK which tells the browser to relax and the requested file was not only found but is being returned in this message. The date and server type are described in the header. The server type is included as the browser may interpret certain features not described in other servers. The server tells the Web client which version of MIME encoding is being used so that the browser can reprocess the data. The browser is also informed as to the MIME type of the data and the size of the file; this last datum is important as it allows the browser to inform the user as to the progress of the data transfer. Finally, a blank line (actually, two carriage return/line feed pairs, i.e. CRLFCRLF) separate the HTTP headers from the body of text. The server needs to be flexible enough to provide the file in a format that is accessible to the client. For example, the server would need to provide a gif file if a browser, which could only process gif files, requests a file that is offered in JPEG. As mentioned previously, the HTTP server doesn't usually process output from a CGI application; the response is merely funneled through the server back at the browser. The message, however, must be configured so as conform to the HTTP message header specifications. We will discuss later in this chapter ways that you can program your CGI script to insert an HTTP header at the beginning of your response to ensure correct processing by a Web browser.
htmL Forms and CGIBy using an htmL form page, you can allow users to enter data that is processed by a CGI script. As discussed earlier in this book, users can enter text and specify options using forms developed with htmL. The types of data input options are as follows:
Figure 13.2 shows an example of an htmL form that can be used to transfer data to a CGI application. Note that this sample page contains text, check boxes, radio buttons. The htmL code for this page is shown in listing 13.1. Listing 13.1 Transfer Data to a CGI Application with This Form
<htmL> <HEAD> <TITLE> Forms Test </TITLE> </HEAD> <BODY> <FORM ACTION="http://hoohoo.ncsa.uiuc.edu/cgi-bin/post-query" METHOD=POST> A normal text field: <TEXTAREA NAME="comments1"></TEXTAREA><p> <HR> <DL>Please indicate your favorite holiday: <DD> <INPUT TYPE="radio" NAME="holiday" VALUE="Christmas">Christmas <DD> <INPUT TYPE="radio" NAME="holiday" VALUE="Thanksgiving">Thanksgiving <DD> <INPUT TYPE="radio" NAME="holiday" VALUE="Easter">Easter <DD> <INPUT TYPE="radio" NAME="holiday" VALUE="NYDay">New Year's Day </DL> <DL>Please put a check next to the applications you own: <DD> <INPUT TYPE="checkbox" NAME="msword" VALUE="No" chECKED>Microsoft Word <DD> <INPUT TYPE="checkbox" NAME="photoshop" VALUE="No">Adobe Photoshop <DD> <INPUT TYPE="checkbox" NAME="netscape" VALUE="No">Netscape <DD> <INPUT TYPE="checkbox" NAME="excel" VALUE="No">Microsoft Excel </DL> <INPUT TYPE="submit" VALUE="Submit This Form"> </FORM> </BODY> </htmL> Fig. 13.2 - You can use several types of htmL forms to retrieve information from Web users. Note that all of the form elements in the above code use the NAME attribute. The idea is that the user enters text in a field or checks a radio button; this data is assigned a variable corresponding to the value of the NAME attribute. The CGI script uses these data by referencing the corresponding variable name. For example, the response from a post-query script to the above example is shown in figure 13.3. Fig. 13.3 - A post-query script is useful for displaying the values of an htmL form. Two alternative methods of transferring form data to a CGI script are POST and GET. These are the possible values of the METHOD attribute in the opening <FORM> tag. You're limited to passing no more than 24KB of data back to the server using GET. POST, however, allows transfer of much more data. This results from the fact that a request made through the GET method concatenates all the htmL form variables into a single string; this string is appended to the URL in the HTTP message that identifies the CGI script. Requests made through the POST method combine all the form parameters into an internal variable that is passed to the script.
The CGI EnvironmentIn order to get the CGI application to run on any operating system, there needs to be some mechanism to convey the form data from the HTTP server to the CGI application. With UNIX, this is done through the use of environment variables, standard input and output. With Web servers running under the MacOS, AppleEvents are used to convey data to and from the CGI script and Web server. With Windows 3.1, Windows 95, and Windows NT, CGI variables are exchanged using a Windows private profile file in key-value format.
CGI VariablesThe variables described in this section are passed from the browser to the server; they pertain to information about the browser. Your CGI application can use these variables to display information about the server, the user, the user's browser, or the user's connection to the server. The CGI environment variable is included in parentheses where applicable.
The name and software version of the Web server answering the request and launching the CGI application. Example: Apache 1.0.3
The server's host name or IP address. Example: www.mcp.com
The port number that received the request. Example: 80
The administrative contact for the web server, as obtained from the web server config files. Useful for giving a feedback email address in case of unrecoverable problems.
The version of the CGI standard to which the server replies. Example: CGI/1.1
The name and version of the protocol used by the client for this request. Example: HTTP/1.0
The HTTP method specified in the request. Examples: GET, HEAD, POST The URL of the document from which the CGI script was referred, if the browser sends it. Example: http://www.anywhere.com/cgi-test.html
The e-mail address of the Web browser user. This variable contains the description of the browser software. This is useful, although not used by all browsers, for using CGI specific to various browsers. Example: Mozilla/2.0b6 (Windows; I; 32bit)
This is the part of the URL after the / right after the name of the CGI script, but before any ? - for example http://host/script.cgi/path?foo would set the PATH_INFO to be path, if the script was really script.cgi.
If a logical path is specified in the client message, the server can try and map that as a path onto the document tree of the running Web server. In the above example, the server would see if a request to /path would have mapped to an actual resource on the server, and returned the full pathname to that (i.e., /web/htdocs/path/).
The name of the CGI script specified by the request. Example: http://host/cgi-bin/foo?argument would result in a SCRIPT_NAME of /cgi-bin/foo.
The actual filename of the CGI program on the file system. Example: http://host/cgi-bin/foo?argument could result in a SCRIPT_NAME of /usr/local/etc/httpd/cgi-bin/foo.
The encoded version of the query data. This data follows the ? in the URL and is usually the result of a query from an htmL form. Example: http://host/phonebook.cgi?Joe%20Smith+5551321 would result in Joe%20Smith+5551321.
The IP host name of the Web browser making the request, if available. Example: s115.slipper.net
The IP address of the Web browser making the request. Example: 167.142.100.115
The protocol-specific method of authentication used to validate the user if the document is protected and the server supports authentication. This corresponds with the AuthType directive in Apache.
The name of the authenticated user if the document is protected and the server supports authentication.
The MIME type/subtype of the htmL form data contained in a PUT or POST request. Example: text/plain
The number of bytes of data contained in a PUT or POST request. This allows the browser to display the progress of a lengthy transmission to the user. Example: 42
The list of MIME types accepted by the client. You can pass parameters for some of the MIME type/subtype combinations.
Apache Extensions to the CGI environmentThere are a few environment variables beyond those specified in the CGI 1.1 specification which Apache supports.
DOCUMENT_ROOTAs one would guess, this is the document root for the server, as specified in the server configuration files. This is useful for, for example, allowing PERL scripts to find a common definitions file without having to hardcode the path location in each PERL script. You might have a lib or data directory off of your document_root where you store extra PERL libraries or dynamic data, so to reference it in a PERL script you'd say, for example,
require "$ENV{'DOCUMENT_ROOT'}/lib/common.pl";
This way you can move your document tree around without having to worry about having hardcoded paths - everything would be based around the DocumentRoot as specified in the server configuration files.
REDIRECT_...Apache supports custom error responses. It is usually useful, if that custom error response is a CGI script, to be able to get some information about the original request which caused the error. So, Apache takes each CGI variable from the old request and prefixes it with REDIRECT_ into the new environment. So for example:
QUERY_STRING -> REDIRECT_QUERY_STRING PATH_INFO -> REDIRECT_PATH_INFO Then, whatever appropriate new values for the old variables are defined. There are two more special environment variables defined in this instance: REDIRECT_URL and REDIRECT_STATUS. REDIRECT_URL is simply the URL of the custom error response, while REDIRECT_STATUS is the error code which triggered this response. So for example, you might have
ErrorDocument 500 /error-handler.cgi In this case, the REDIRECT_URL become /error-handler.cgi and the REDIRECT_STATUS is 500.
HTTP_COOKIEIf you are running with the mod_cookie module, you will see another environment variable you wouldn't normally see, HTTP_COOKIE. This is the legendary Netscape cookie functionality, where the server gives the web browser a token (cookie) when they first talk to each other, and then the browser sends it with every request. This token is unique and random, but usually guaranteed to be persistent for at least a user's "session", so it is possible to map this token to a "user". Usually it'll look something like this:
HTTP_COOKIE = s=myhost20434482411973732 What the key actually is isn't important - it's basically just a random number that mod_cookie makes up when it needs one. The important thing is that it can be used as a key in a database, or logged for tracking purposes. For example, if you see a CGI script hit 10 times, 3 times with one cookie, 3 with another, and 4 with another, you can be pretty sure you only had 3 people using that script, instead of 10 people once or 1 person ten times.
Notes on certain CGI variablesThere are three variables that may or may not be set depending on other factors in the server configuration. Those are:
REMOTE_HOSTThis is the DNS-resolved hostname matching the IP number of the client making the request from the server. If the server was compiled with -DMINIMAL_DNS, or if the directive LookupHostname is set to Off, that variable will be set to the same value as REMOTE_ADDR, which is just the IP number. If your CGI program requires the hostname, and you have DNS resolution turned off for performance reasons, you can get the hostname by performing a gethostbyaddr call in C, or PERL, or whatever its equivalent is in other languages. Also, not every IP address is set to respond to reverse-DNS lookups, so even if the server is normally resolving every IP number, you might not be able to get a hostname for that number.
REMOTE_USERThis variable is only set if the script was placed under password-based authentication - this is the username that was used to get access. A very common bug report is "my CGI scripts aren't getting REMOTE_USER set!" - when in actuality what happened was that the CGI script was in a different directory (say, /cgi-bin/) from the other password-protected pages, and cgi-bin wasn't protected in the same way as those other pages. Since browsers cache passwords, and also since there's very rarely any user-interface level distinction between viewing a protected page and viewing an unprotected page, it's easy to understand why this may seem confusing. But don't worry, that's why you own this book.
REMOTE_IDENTThis is the string returned by a lookup to the client's machine using the ident protocol, as defined in RFC831. This will only return something if the IdentityCheck directive is set to On in the server config files, and of course if the remote site is actually running an ident daemon.
Setting Extra VariablesIf you compiled in support for the module mod_env, there are two directives available to you for adding further information into the CGI environment. PassEnvPassEnv will let you pass through any environment variable from the shell environment from which the server is launched. For example
PassEnv USER will pass along the contents of the USER variable from the shell of the user to the CGI environment.
SetEnvThis will let you explicitly set a particular variable in the environment. For example
SetEnv LIBDIR /www/lib might be the best way to pass on to your CGI scripts where their libraries are, just as
SetEnv DEBUG 3 might be the best way to set the debugging level for your scripts.
Server APIsBesides the CGI and SSI specifications, there is one other major means by which you can add functionality your server. Both Apache and Netscape support the notion of an API to the server - this is so code can be written to a published interface, compiled, and linked into the server such that the server and extra code become one actual program. This can give a tremendous boost to performance, and it can also allow the content creator to control certain deep aspects of the server that CGI does not allow. Both the Apache API and the Netscape API are written in C, so modules to those servers must be written in C as well. The Apache API provides a very generalized interface to the functionality of the server. Almost all the functionality beyond the core of the server is implemented through this API - all user authentication functions, the CGI interface, all access control functions, all "URL-munging" functions like Alias and Redirect. Apache modules can be written to implement "handlers" for certain data types - this is how functionality such as internal imagemap handling and Server Side Includes are implemented. Even logfile functionality is implemented as a module. This modular approach makes it very easy to "drop in" new functionality on top of or in place of older functionality. It allows server owners to tune their servers for optimum performance - for example, if you don't want to use the internal imagemap functionality, you can compile Apache without that module, and save yourself a couple dozen kilobytes per running child process, which may be significant at higher levels. Examples of functionality that people have done using an API which normal CGI could not provide, either at all or in an efficient manner:
The NSAPI specification is available at http://www.netscape.com/newsref/std/server_api.html
Using HTTP CookiesCookies (as briefly mentioned above, under the "HTTP_COOKIE" CGI environment variable description) are a new HTTP mechanism proposed by Netscape Communications, but as of this writing, cookies are also supported by the Microsoft Internet Explorer and a couple other browsers as well. Cookies are designed to communicate state information to the browser from the server. This is in contrast to standard the HTTP process where server information outside of the HTTP response is not communicated to the browser. When a browser first visits a "cookie-enabled" Web site, the site can send back a Set-Cookie: header in the HTTP response. On subsequent visits to that site, or to particular other sites within that domain, the browser sends one or more Cookie: headers in the request, and the server can change the cookies sent by sending yet more Set-Cookie headers back to the client. Possible applications include client preferences, such as user accounts and personal information, for online shopping services. The syntax for a cookie header is as follows:
Set-Cookie: name=Value; expires=Date; path=Path; domain=Domain_Name; secure The cookie name is the only required attribute and identifies the cookie. You can set an expiration date with the expires tag; after that date, the cookie becomes invalid. The domain keyword is used by the server to validate the cookie; while searching the cookie list for valid entries, the domain keyword is matched against the domain of the requesting host. This enables the server to match the cookie from many other browsers making similar requests. Similarly, the path keyword is used to validate the cookie request. The cookie is transferred if the path defined by the requesting browser matches the cookie path attribute. The secure keyword alerts the server to transfer the cookie only if the connection is made using the Secure Sockets Layer protocol.
QUE Home Page For technical support For our books And software contact support@mcp.com Copyright © 1996, Que Corporation |