Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp.com.

Notice: This material is excerpted from Running A Perfect Web Site with Apache, ISBN: 0-7897-0745-4. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

Chapter 04 - Getting Started with Your Web Server

While this chapter, like many others, is specific to the Apache server, the vocabulary is certainly applicable to other Web servers. In particular, the NCSA family of servers has much in common with Apache with respect to configuration files, since Apache was derived originally from the NCSA 1.3 server, and maintaining backwards compatibility with existing NCSA servers was a mandate with the development team.

This chapter deals with all the essential steps between the software on the CD-ROM (or downloaded from the Net) and a running, breathing, living server. If you have installed an Apache or NCSA server before, you can probably safely skip this chapter - perhaps skim it to look for essential differences. This chapter covers:

Compiling Apache

Apache is known to compile on just about every UNIX variant out there: Solaris 2.X, SunOS 4.1.X, Irix 5.X and 6.X, Linux, FreeBSD/NetBSD/BSDI, HPUX, AIX, Ultrix, OSF1, NeXT, Sequent, AUX, SCO, UTS, Apollo Domain/OS, QNX, and probably a few you've never even tried yet. A port to OS/2 has been done, and a Windows NT port is rumored to be in the works. Portability has been a high priority for the development team.

Before you go about compiling Apache, make sure a binary suitable for your platform is not already available on the CD-ROM included with this book. There is a README file on the CD-ROM that explains which system binaries are provided. If there is a suitable binary, you can skip the compilation process and move on to the next section, although if you ever want to add new modules or tweak the functionality provided by Apache, you'll need to know how to compile it.

Copy the source code package to a part of your file system. Depending on which OS you are on, you'll need up to 10 spare megabytes of disk to compile the server. Unpack it and go to the /src subdirectory. A sequence of commands to do this might look like

cd /CDROM
cp apache_1.0.3.tar /usr/local/etc/
tar -xvf apache_1.0.3.tar
cd apache_1.0.3/src

Step 1: Edit the "Configuration" File

This file is used by the "Configure" program to create a "Makefile" specifically targeted to your platform, with any runtime defines set if necessary, and with the modules you have chosen compiled together. It also creates a "modules.c," which contains information about which modules to link together at compilation time.

You must declare which C compiler you are using, and you must uncomment the appropriate setting for "AUX_CFLAGS" and "AUX_LIBS" for the platform on which you are compiling. For example, the following is appropriate if you are using the GNU C compiler

CC=gcc

And if, say, you want to set it to use Solaris instead of the default, which is SunOS, you want to change the section which reads:

# For SunOS 4
AUX_CFLAGS= -DSUNOS4
# For Solaris 2.
#AUX_CFLAGS= -DSOLARIS2
#AUX_LIBS= -lsocket -lnsl

to the following:

# For SunOS 4
#AUX_CFLAGS= -DSUNOS4
# For Solaris 2.
AUX_CFLAGS= -DSOLARIS2
AUX_LIBS= -lsocket -lnsl


For the CFLAGS definition: if you want every file with the execute bit set to be parsed for server side includes, set "-DXBITHACK." If you wish to eliminate the overhead of performing the reverse-DNS lookup when an entry is written to the logfile, set "-DMINIMAL_DNS."

If, on the other hand, you want to have an even greater sense of confidence in the hostname, you can set "-DMAXIMAL_DNS." You would set this if you were protecting parts of your site based on hostname. Doing this is optional, and is mostly provided for backwards compatibility with NCSA 1.3.

At the bottom of the file is a list of packaged modules that come with the Apache distribution. Notice that not all of them are compiled in by default. To include a module in the build, uncomment the entry for it. Notice that some modules are mutually exclusive - for example, it would not be wise to compile both the configurable logging module and the common logging module at the same time.

Also, some modules, like mod_auth_dbm, may require linking to an external library, and need an entry added to the EXTRA_LIBS line. You'll learn more about modules in a little bit, for the purposes of getting up and running I'd recommend simply using the defaults as provided.

Step 2: Run the "Configure" Script

This is a simple Bourne shell script that takes the "Configuration" file and creates a "Makefile" out of it, as well as "modules.c." If you are feeling ambitious, you can look at and edit "httpd.h," which sets a lot of defaults for some low-level functionality, most of which is set anyway in the configuration files.

Step 3: Run "make"

This compiles the server. You might see some warnings about data types, particularly if you compiled with -Wall set, but none of the errors should be fatal.

If all went well, you should now have an executable program in your /src directory called httpd.

Establishing the File Hierarchy

The next step in the process of setting up a server is to make some fundamental decisions regarding where on the file system different parts of the server will reside. Write down your decisions for each of these; they will be needed in the section on "Basic Configuration."

First, there is the server root. This is the subdirectory in which you unpacked the server, and from which the conf/ directory, the logs/ subdirectory, the cgi-bin/ subdirectory, and other server-related directories lead. The default suggestion is to have this as /usr/local/etc/httpd. You will be able to have your configuration files and log files in other locations - the server root was designed to be a convenient place to keep everything server-related together. Also, if the server crashes and leaves a core file, it will be found in the server root directory.

Second, there is the document root. This is the directory in which all your htmL and other media reside. A file in here called myfile.html would be referenced as http://host.com/myfile.html. It is recommended that this be outside the server root and in its own directory. Since you'll be referring to it frequently, it should be pretty short - for example, /home/www or /www/htdocs. If you are implementing a Web server on top of an FTP server, for example, you might want to point the document root at /home/ftp/pub.

Finally, you need to decide where on your server you will keep your logfiles. This should be a space with a fairly large working area, depending on how busy you estimate your server will be. For a point of reference, a site with 100K hits per day (which would fall under moderate traffic, relatively speaking) can expect to generate 15 MB per day of logfile information.

Later in this book, you'll deal with automated logfile rotation and logfile analysis tools, but for now just be aware of the disk space issue. Furthermore, for performance reasons, it's usually best to have the log directory on a separate disk partition or even a separate disk altogether, since on even a moderately busy server the access log can be written to several times per second.

Basic Configuration

This section covers the minimal set of changes you need to make to the configuration files in order to launch a basic Web site.

See Chapter 6, "Managing an Internet Web Server," for advanced configuration.

There are three separate configuration files in Apache. This model goes back to NCSA, and the reasoning is sound: there are largely three main areas of administrative configuration, so setting them up as separate files allows the Web master to give different write permissions to each if he or she so desires.

You will find the configuration files for Apache in the conf/ subdirectory of the server root directory. Each has been provided with a "-dist" appendage; it is recommended that you make a copy without the "-dist" and edit those new files, keeping the "-dist" versions as backups and reference.

The basic format of the configuration files are a combination of a shell-like interface and pseudo-htmL. The elemental unit is the directive, which can take a number of arguments. Essentially:

Directive argument argument....

i.e.

Port 80

or

AddIcon /icons/back.gif ..

Directives can also be grouped together inside certain pseudo-htmL "tags." Unlike htmL, these tags should be on their own line. For example:

<Virtualhost www.myhost.com>
DocumentRoot /www/htdocs/myhost.com
ServerName www.myhost.com
</Virtualhost>

httpd.conf

The first configuration file to look at is "httpd.conf." This is the file that sets the basic system-level information about the server - what port it binds to, which users it runs under, and so on. If you are not the systems administrator of the site at which you are installing the server, you might want to ask the administrator to help you with these questions.

The essential items in this file to cover are:

Port <number>

for example,

Port 80

This is the TCP/IP port number to which the Web server binds. Port 80 is the default port in "http:" URLs; in other words, "http://www.myhost.com/" is equivalent to "http://www.myhost.com:80/."

For a number of reasons, however, you might want to run your server on a different port; for example, there is already a server running on port 80, or this is a server you want to keep "secret." (Though if there is sensitive information on this, you should at least do host-based access control, if not password protection.)

User <#number or uid>
Group <#number or uid>

as in

User nobody
Group nogroup

This is the UNIX user that the Web server will run as. Apache needs to be launched as root in order to bind to a port lower than 1024 - this is a basic security feature of all UNIX implementations. Immediately after "grabbing" the port, Apache changes its effective user ID to something else, typically as user "nobody." This is for security reasons - running your Web servers as root means that any hole in the server (be it through the server itself, or through a CGI script, which is much more likely) could be exploited by an outside user to run a command on your machine. Thus, setting the user to "nobody," "www," or some other reasonably innocuous user ID is the safest bet. This user ID needs to be able to read files in the document root, as well as have read permission on the configuration files. The argument should be the actual user name - if you want to give the numeric user ID, prepend the number with a pound sign (#). The Group directive is the same issue; decide which group ID you want the server to run with.

ServerAdmin <email address>

This should be set to the e-mail address of a user who can receive mail related to the actions of the server. In the case of a server error, the message given to the browser visiting your site will include a message to the effect of "please report this problem to user@myhost.com." In the future, Apache may send warning e-mail to this user if it encounters a major systems-related problem.

ServerRoot <directory>

for example

ServerRoot /usr/local/etc/httpd

This is the server root decided upon earlier. Give the full path, and don't end it with a slash.

ErrorLog <directory/filename>
TransferLog <directory/filename>

These two directives specify exactly where to log errors and Web accesses. If the filename given doesn't start with a slash, it is presumed to be relative to the server root directory. It was suggested earlier that the logfiles be sent to a separate directory outside of the server root; this is where you specify that logging directory and the name of the log files within that directory.

ServerName <DNS hostname>

At times, the Web server will have to "know" the hostname it is being referred to as, which can be different from its real hostname. For example, the name "www.myhost.com" might actually be a DNS alias for "gateway.myhost.com." In this case, you don't want the URLs generated by the server to be "http://gateway.myhost.com/." ServerName allows you to set that precisely.

srm.conf

The second configuration file to cover before launch is srm.conf. The important things to set in that file are:

DocumentRoot <directory>

As described before, this is the root level of your tree of documents - be that "/usr/local/etc/httpd/htdocs" or "/www/htdocs." Based on my experience, it's a very good idea to keep it short and concise. This directory must exist and be readable by the user the Web server runs as.

ScriptAlias <request path alias> <directory>

ScriptAlias lets you specify that a particular directory outside of the document root can be aliased to a path in the request, and that objects in that directory are executed instead of simply read from the file system. For example, the default offering

ScriptAlias /cgi-bin/ /usr/local/etc/httpd/cgi-bin/

means that a request for http://www.myhost.com/cgi-bin/fortune will execute the program /usr/local/etc/httpd/cgi-bin/fortune. Apache comes bundled with a number of useful beginner CGI scripts, simple shell scripts that illustrate CGI programming. However, you probably don't want to turn them by default. I recommend commenting this line out until you're sure you want to use it as your CGI invocation mechanism.

Finally, the directory containing the CGI scripts should not be under the document root - bizarre interactions between the code that handles "scriptalias" and the code that handles request/pathname resolution could cause problems.

Just as with httpd.conf, there are many extra features that are discussed in upcoming sections.

access.conf

Access.conf is structured more rigidly than the other configuration files; all the content is contained within <Directory></Directory> pseudo-htmL tags that define the scope of the directives listed within. So for example, the directives in in between

<Directory /www/htdocs>

and

</Directory>

affect everything located under the /www/htdocs directory. Furthermore, wildcards may be used, for example

<Directory /www/htdocs/*/archives/>
....
</Directory>

applies to /www/htdocs/list1/archives/, /www/htdocs/list2/archives/, and so on. The most important directive to set at this point is "Options." "Options" takes a list of keywords that enable or disable particular functionality.

It's important to establish a conservative set of functionality when the site is first launched. I would recommend using just "Indexes" at the very beginning. For example:

Options Indexes

Starting Up Apache

To start Apache, simply run the binary you compiled earlier (or your precompiled binary) with the "-f" flag pointing to the httpd.conf file also created earlier. For example:

/usr/local/etc/httpd/src/httpd -f
    /usr/local/etc/httpd/conf/httpd.conf

It's probably a good idea at this point to use the UNIX command "ps" to see if "httpd" is running, typically something like

ps -augwx | grep httpd    (BSD-based systems)
ps -ef | grep httpd       (SVR4-based systems)

will suffice. To your surprise you, you will hopefully see a number of simultaneous "httpd" processes running. What's going on?

The first Web servers, like the CERN and NCSA servers, used the model of one main Web server "cloning" itself with every single request that came in. The "clone" would respond to the request, while the original server returned to listening to the port for another request. While certainly a simple and robust design, the act of "cloning" (or in UNIX terms, "fork") was an expensive operation under UNIX, so loads above a couple hits per second were quite punishing even on the nicest hardware. It was also difficult to implement any sort of "throttling," reducing the amount of "cloning" that took place when the number of "clones" was very high since it was hard for the original server to know how many "clones" were still around. Thus servers had no easy way to refuse or delay connections based on a lack of resources.

Apache, like NCSA 1.4+, Netscape's Web servers, and a couple of other UNIX-based Web servers, instead uses the model of a group of persistent "children" running in parallel. The children are coordinated by a "parent" process, which can tell how many children are alive, spawn new children if necessary, and even terminate old children if there are many idle ones, depending on the situation. Parent and child are the actual UNIX terms.

Back to the server. Fire up your Web browser and go to http://www.myhost.com/. Did it work? If all went well, you should be able to see a directory index listing of everything in the document root directory, or if there's an "index.html" in that directory, you would see the contents of that file.

Other command line options are shown in the following table:

Option Result
-d serverroot Sets the initial value for ServerRoot.
-X Runs the server in single-process mode; useful for debugging purposes, but don't run the server in this mode for v serving content to the outside world.
-v Prints the version of the server, and then exits.
-? Prints the list of available command-line arguments to Apache.

Debugging the Server Start-Up Process

Apache is usually pretty good about giving meaningful error messages, but some are explained in more detail here.

httpd: could not open document config file .....
fopen: No such file or directory

This is usually the result of giving just a relative path to the -f argument, so Apache looks for it relative to the compiled-in server root (what's set in src/httpd.h) instead of relative to the directory you are in. You must give it either the full path or the path relative to the compiled-in server root.

httpd: could not bind to port [X]
bind: Operation not permitted

This was most likely caused by attempting to run the server on a port below 1024 without launching it as "root." Most UNIX operating systems prevent people without root access from trying to launch a server - any type of server - on a port less than 1024. If you launch the server as root, the error message should disappear.

httpd: could not bind to port
bind: Address already in use

This means that there is already something running on your machine at the port you have specified. Do you have another Web server running? There is no standard UNIX mechanism for determining what's running on what ports; on most systems, the file "/etc/services" can tell you what the most common daemons are, but it's not a complete list. You could also try using the "netstat" command, with various options such as "-a".

httpd: bad user name ....
httpd: bad group name ....

The "User" or the "Group" you had set in httpd.conf didn't actually exist on your system. You might see errors telling you that particular files or directories don't exist. If it looks like the files are there, make sure they are readable by the user IDs that the server runs as (i.e., both root and nobody).

Suppose Apache has started up, and according to "ps" it's actually running. But when you go to the site, you get:

  • No connection at all. Make sure that there are no firewalls between you or the server that would filter out packets to the server. Secondly, try using "telnet" to the port you launched the Web server on; for example "telnet myhost.com 80." If you don't get a Connected to myhost.com message back, your connection is not even making it to the server in the first place.
  • 403 Access Forbidden. Your document root directory may be unreadable, or you may have something in your access.conf file that prevents access to your site from the machine where your Web browser is.
  • 500 Server Error. Is your front page a CGI script? The script may be failing.
  • These are the most common errors made in initial server start-ups. If you can establish that contact with the server is actually being made, the next best place to look for error information is in the ErrorLog. Future sections describe each new piece of functionality and also discuss the errors that misconfiguration can bring up.

    Apache-SSL

    At this point, we will take a slight detour and discuss setting up a variant of the Apache Web server, Apache-SSL, which can conduct secure transactions over the Secure Sockets Layer protocol. SSL is an RSA public-key based encryption protocol developed by Netscape Communications for use in the Netscape Navigator browser and Netscape Web servers.

    Until recently, the only options for doing SSL transactions on the World Wide Web has been to use a proprietary server, such as the Netscape Commerce server or the OpenMarket Secure server. Strongly encrypting versions of these servers have not been available outside the United States due to export restrictions in the U.S.

    Eric Young, author of the widely used libdes package, along with Tim Hudson, wrote a library that implements SSL, eponymously named SSLeay. The SSLeay package has since expanded to become an all purpose cryptography and certificate handling library, while retaining the same name, "SSLeay."

    Ben Laurie, a member of the Apache Group, then took the SSLeay library and interfaced it with the Apache server, making his patches available to people on the Net. Sameer Parekh, of Community ConneXion, Inc., (hereafter referred to as C2) then took Ben Laurie's patches and built a package legal for use within the United States.

    Because the RSA technology used by SSL in the United States is covered by patents owned by RSA Data Security, Inc. (RSADSI) (http://www.rsa.com), it is not legal to use the SSLeay package "out-of-the-box" within the United States. C2 licensed the RSA technology to make use of the package legal within the United States, using the "RSAREF" package, produced by RSADSI and Consensus Development Corporation (http://www.consensus.com).

    Due to export restrictions, it is not legal for someone outside the United States to download and install the C2 Apache-SSL package. In fact, we couldn't even put the SSL patches on the CD-ROM included with this book because the book would suddenly have earned the label "munition" and clearance from the U.S. Government would have been required!

    Therefore, the installation process for Apache-SSL differs for those within the United States versus those who live outside the U.S. People within the U.S. can simply download the package from C2, at http://apachessl.c2.org/, and install it. Outside the United States, people must separately install SSLeay, and then patch Apache with Ben Laurie's patches from his site.

    Within the United States, it is legal to use the version of Apache-SSL available for download from C2 for non-commercial purposes only. In order to use Apache-SSL commercially, people must purchase an Apache-SSL Commerce license from C2. After downloading the package from C2, the installation of the server is rather straightforward.

    As with the standard Apache, you must first edit the "Configuration" file to reflect the system and any custom modules you may want installed. The lines regarding SSL in the Configuration file should be ignored. The installation process automatically deals with them.

    Next, you need to configure RSAREF for your system. The non-commercial distribution of Apache-SSL comes with full RSAREF source code, so you must edit the rsaref/install/makefile to reflect your system. There is usually not much that needs to be edited in this file, except for the C compiler that you need to use. The commercial version, however, does not come with RSAREF source code, but with RSAREF object code for a number of platforms. If you have the commercial release of Apache-SSL, you need to copy the proper version of the RSAREF library from rsaref/install/objs and place it in rsaref/install/rsaref.a.

    Finally, to finish the configuration, run "Configure" in the ssl/ directory. Running "Configure" will give you a list of support platforms for SSLeay. Choose one and run "Configure" again in order to configure SSLeay properly for the platform of your choice.

    To build, type make at the top-level.

    Once the server is successfully compiled, you run "make install," in order to install SSLeay and Apache-SSL into /usr/local/ssl and /usr/local/etc/httpd. The installation installs both an SSL version and a plain non-encrypting version of the Apache server in those locations.

    Before you can begin using SSL, however, you need to generate your key/certificate pair for use with SSL. The C2 Apache-SSL distribution comes with the program "genkey" you can use to generate your key and certificate. Run "/usr/local/ssl/bin/genkey httpd." This generates a public/private RSA keypair for use with the server, and puts it in /usr/local/ssl/private/httpd.key. It also generates a PKCS #10 Certificate Signing Request, which you send to the Certificate Authority of your choice (for example, Verisign) along with the proper documentation. The script also generates a "test certificate," so that you can start using the server immediately, without waiting for your Certificate Authority to reply with a signed certificate.

    After the key/certificate pair is generated and installed in the proper location, you are ready to start using the server! First, however, familiarize yourself with the SSL-specific configuration directives to Apache-SSL.

    SSLCertificateFile filename

    The filename is the location where your server's certificate is stored. It is either relative to /usr/local/ssl/certs, or, if you provide a full pathname, it's the full path to the certificate file. The "SSLCertificateFile" directive is required.

    SSLCertificateKeyFile filename

    The SSLCertificateKeyFile is required, unless the file listed in SSLCertificateFile also contains the key. This filename must be relative to /usr/local/ssl/private. It can't be a full pathname.

    SSLLogFile filename

    The SSLLogFile is where Apache-SSL logs specific information regarding SSL for each connection, such as the cipher used, and client-authentication information.

    SSLVerifyClient   0, 1, or 2

    SSLVerifyClient determines whether or not the server should use X509 client authentication. 0 means no, 1 means it is optional, and 2 means that a client certificate is required.

    SSLVerifyDepth depth

    The SSLVerifyDepth is how far along a certificate chain the server should look for a root Certificate Authority when verifying a given client certificate. If you're not using X509 client certificates, a good default value is probably 1.

    SSLFakeBasicAuth

    The SSLFakeBasicAuth directive allows you to use X509 client certificates to provide for Basic HTTP/1.0 authentication for accessing various realms of your Web server's document tree.


    SSLFakeBasicAuth must be used with "SSLVerifyClient 2." If used with any other SSLVerifyClient setting, it is subject to subversion.

    After having installed the certificate and key in /usr/local/ssl/certs/httpd.cert and /usr/local/ssl/private/ httpd.key, you can start the server by merely running /usr/local/etc/httpd/start, which starts up the server with some default configuration files, located in conf/httpd.conf and ssl_conf/httpd.conf.

    For people outside the United States, the installation process is more involved. You must obtain the SSLeay package (ftp://ftp.psy.uq.oz.au/pub/Crypto/SSL/) and then install it according to the directions in the package.

    Second, you must obtain Ben's patches to Apache from his site, at http://www.algroup.co.uk/ApacheSSL/. The patches must then be installed to your Apache source tree, according to the directions included with the package.

    It should be noted that, at publication time, Verisign (http://www.verisign.com, a spin-off from RSA, which was the first Certificate Authority for SSL) had just started signing keys generated for Apache-SSL. Other CA's are expected to crop up - Netscape 2.0 comes with half a dozen others defined and waiting to be recognized. In fact, Netscape 2.0 (and hopefully by the time you read this, other browsers) allow for arbitrary CA's to be used, warning the user that a new CA is being used, but still allowing the encrypted conversation to take place.


    QUE Home Page

    For technical support For our books And software contact support@mcp.com

    Copyright © 1996, Que Corporation


    Table of Contents

    03 - Setting Up a Presence on the World Wide Web

    05 - Apache Configuration