Copyright ©1996, Que Corporation. All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system without prior written permission of the publisher except in the case of brief quotations embodied in critical articles and reviews. Making copies of any part of this book for any purpose other than your own personal use is a violation of United States copyright laws. For information, address Que Corporation, 201 West 103rd Street, Indianapolis, IN 46290 or at support@mcp.com.

Notice: This material is excerpted from Running A Perfect Web Site with Apache, ISBN: 0-7897-0745-4. The electronic version of this material has not been through the final proof reading stage that the book goes through before being published in printed form. Some errors may exist here that are corrected before the book is published. This material is provided "as is" without any warranty of any kind.

Chapter 07 - Creating and Managing an Intranet

No matter what a company's business, employee information-sharing methods, as well as the criteria which determine to whom such accessibility is granted, will always be top priorities. From company newsletter-type gossip, to high-level proprietary engineering data, the power of htmL and Apache can help to make the search for such material less expensive, less time-consuming, and just easier all around.

Nevertheless, it can be confusing and, initially, complicated to set up an enterprise-wide Web site. This chapter will demonstrate how to set Apache up in your company, as well as how to secure it from use by unauthorized outsiders; following that is a discussion of maintaining your new Web space and adding some useful features.

In this chapter, you learn:

  • Why your company should have an internal Web server
  • What kind of hardware and software will be needed
  • Which browser(s) are best to standardize on
  • How to relocate your existing documents and format new ones
  • How to add documents to your Web space without losing organization
  • How to add some useful features to you internal Web server
  • How to analyze and address security concerns
  • How to safely share your information with other companies

Benefits of an Intranet

Every company already has in place some kind of method for distributing information among its employees: from bulletin boards in the cafeteria, to the sending of overly numerous memoranda (resulting in automatic placement in the so-called "circular file" more often than not), to old-fashioned weekly (or even several-times-a-day) meetings which, by definition, cannot be held at the absolutely most-convenient time for all in attendance. Some have attempted to utilize their networking software to ease this burden, but it is either too simplistic for such purposes, or not compatible across platforms.

Software companies have made great strides in the past few years; a lot of vendors are claiming to have incorporated "Workgroup Technology, " "Groupware, or "Document Sharing." Upon closer examination, however, one quickly determines that these systems are proprietary by nature, the result being that one needs an expensive server, in addition to clients for each machine and, of course, not all platforms are supported by every client. Some packages require fundamental network changes in addition to added software and hardware requirements.

Fortunately, along came the World Wide Web, originally designed to be an easy way for scientists to share ideas. It is based on a simple and open data format (htmL) and a common network protocol (TCP/IP). This means it is easy to set up and operate and, due to the open data format, there is a plethora of clientele. An Intranet is basically an internal Web site, though the term is also used to describe other supporting programs such as Email or UseNet news.

Using Apache(and any WWW browser), you can easily set up many effective ways to share information, communicate ideas, and exchange tips. We will cover a few ways in which developing an internal Web server can help.

Bulletin Boards

One of the more popular uses for internal WWW servers is as a bulletin board system. This, much like the traditional corkboard, allows people to add notices and information for everyone's use. Unlike the older paper system, though, the htmL bulletin board can be set up to be searchable, which, of course, puts much more data at one's nearly instantaneous disposal.

Using CGI scripts or writable documents, it is easy to create an attractive, friendly bulletin-board system which can be used by the whole company. Another advantage the electronic bulletin has is that it can be shared over the network. Users can be in the same building or in a different country; as long as they have network access, they can see your notices.

Information Center

An internal Web server also functions well as an information clearinghouse. Most organizations have many documents that need to be available to employees, such as employee handbooks, phone listings, documents setting out company policies, etc. This is usually done by printing out the information and delivering it separately to each individual employee in the company or group. This is expensive and time-consuming, and is easily replaced by a series of htmL documents. This saves time because the documents no longer need to be individually distributed and, since the documents remain in electronic form, the company will save money on printing costs. Using htmL also allows searching and hypertext-referencing and can easily be changed with a simple text editor.

In addition to policies and handbooks, htmL can be used to distribute company information or industry news. A company newsletter can be as simple as a single htmL page, or it can have multiple pages and contain references to information stored across the Internet. It can have back issues archived and completely searchable. Industry news can also be as simple or as complex as deemed necessary.

It is also possible to have an electronic phone or e-mail list containing all the names, addresses and extensions; users would then search using the browser's search feature, or it may include a form to query a search engine.

Documents To Add to the Information Center
Employee handbooks
Policies and procedures
Phone or E-mail lists
Company Newsletter
Industry News

Common Forms

Common business forms can also be added to the Web server. This allows the owner of the document to modify it and make changes instantly available to form users. It also makes it easier for employees to find the form for which they are looking.

In some cases, the form can be handled automatically by using CGI scripts, or forwarded to the correct people, directly from the client's browser.

For more information on CGI scripts and CGI security see Chapter 13, "CGI Scripts and Server APIs."


Automating scripts can be a time-saver, as well as a security problem. Think carefully before automating a form request. Scripts that don't carefully check what they are doing can cause damage by executing commands improperly.

Forms To Add to the Web Server
Equipment request form
Vacation request form
Network change requests
New user account forms
Support desk trouble ticket forms
Software or hardware bug reports

Workgroup Server

As previously mentioned, the WWW was originally designed to help share information among different research groups located within various organizations. It can also be used to share information between or among workgroups.

Workgroups can use the power of the Web to list current projects, past activities, and planned proposals. This allows different groups to contact people who may have already discovered how to avoid a certain problem, or found a better way of doing something.

Team members can set up a central area for all the documentation necessary to a project, and then make allowances for easily adding individual comments. This could be very helpful when designing a new project - for example, each designer could add his or her input to the design as it goes along. This would allow discussions about the document to take place in the body of the document itself and be recorded, along with the document, for others to see.

This documentation can also include any notes, memoranda, reports and studies related to a particular area. The htmL could be searchable, making it easy for engineers to have a wealth of information right at their fingertips when encountering a new situation. This would be helpful for the novice employee, as well as the veteran engineer.

Workgroups can also use a Web server to track project status. The lead developer could create a page showing when things are expected to be done, other developers could add comments to such a file, and everyone would always be up to date with scheduling changes.

Workgroup servers can also be used to introduce new team members and help them to become more comfortable in their new positions. It can also help establish friendly working relationships between disparate groups by serving as an introduction center.

Workgroup Uses
List past, present and future projects
Store related documentation for each project
Store discussions on documents or specifications
Track project status
Create a friendly working environment
Ease the introduction of new group members

Discussion Forum

You can also use the World Wide Web in conjunction with a news server. News servers fit in nicely with Web servers and allow easy discussion between users. Some browsers allow reading and posting news articles without the need for a separate program. Other browsers may require a separate program for posting news. Browsers are covered in more detail later in this chapter.

Internal newsgroups allow open discussion on many topics. Human Resources can answer questions about insurance benefits or vacation time, engineers can discuss current problems or new products, etc. A local newsgroup dedicated to specific software can eliminate hundreds of hours of needless labor, since users can help each other out and avoid the need to call the support desk. The newly hired can get the feel of the company and its tools by reading and posting questions.

Newsgroups are set up in hierarchical form on the Internet; the same can be true for internal use. For example, one may want to set up a news hierarchy.

Sample News Hierarchy
local.eng Discussion of engineering topics
local.Windows Discussion of Microsoft Windows
local.outages Notification of outages
local.specs Discussions about various company specifications
local.policy Discussion about corporate policy
local.misc General discussion

You might want to set up a simple news hierarchy at first, and then expand as groups get filled up. For example, if "local.eng" were getting a lot of messages covering hardware and software issues, one could split the group into "local.eng.hw" and "local.eng.sw". Further division into "local.eng.hw.proj1" and "local.eng.hw.proj2" might later be to everyone's advantage.

News-server software can be downloaded and used without charge from many Internet sites, and can be set up on the same machine as Apache. Larger sites may need to separate the Apache server from the news server for more efficacious performance.

Uses for an Internal News Server
Forum to discuss software packages
Forum to ask policy questions
Notification of upgrades or downtime
Hardware and software discussion
Workgroup discussions
Support line


There are several free news servers available on the UNIX platform. Some of the more common ones are Bnews, Cnews and the more popular INN (Internet News). If you are setting up a new news server, you will probably want to use INN. You can ftp it from various sites such as ftp.uu.net in the /networking/news/nntp/inn directory.

Monitoring Tool

The World Wide Web can also be used to allow users to see how things are running. Using Apache and Server Side Includes (SSI), or CGI scripts, you can create dynamic pages. These pages can be used in a variety of ways, such as a print queue or network status monitor.

You can create a page to allow users to check the status of their print jobs from their WWW browsers. While most operating systems allow a user to check a print queue, setting up an htmL page makes it easy for everyone's use, as well as eliminating the necessity of having different commands for different machines.

In a large network environment, parts of the network might be having problems which can cause abnormal behavior. Rather than having each and every user call the support desk, you can set up a page for checking various parts of the network and assessing status.

Using the WWW for Monitoring
Print queues
Network status
Who is logged in
Which machines are in use and busy
Available file systems

htmL as a User Environment

There are cases in which one would want to create a hypertext front end to system software. These might include word processing, database querying, data entry, and order processing.

Many companies are starting to migrate their software to be htmL- compatible; with Java and CGI, almost any application can have such an interface. This will require users to learn only one interface, regardless of the underlying system.


SATAN is a network security-testing package that received a lot of attention in 1995. It allows users to test for known weaknesses in the software used by Internet machines. One of the things that the developers did was to create an htmL interface. Using any browser, you could check or configure the tests, or check the status of a test.

There is also work going on in developing a low-cost (under $500) machine consisting only of a network interface, a keyboard, the CPU, and graphics display. This machine would download htmL or Java applets and display them on the screen. This would allow network managers to standardize on a common interface (htmL) for all the company's computing needs. It would also allow companies to replace overpriced PC's with the newer, more affordable "Net computer".

Choosing the Network

Apache is designed to run over a TCP/IP network - whether this network is the Internet, or a private network, makes no difference to the server; however, some clients may not understand this networking protocol and, therefore, will need additional software to work properly with Apache.


It is possible to view simple htmL files without a network or a server. Most browsers allow you to open a local file and view it. This allows you to follow links and view the page, but it does not allow some of the more advanced features, such as CGI scripts or SSI programs.

About TCP/IP

TCP/IP is actually two layered networking protocols: TCP and IP. IP stands for Internet Protocol and, as the name implies, is the protocol used by the Internet. TCP, the Transmission Control Protocol, runs on top of IP and handles the sending of data between the machine and the network. IP can also run on top of PPP for dial in access (in which case it would be TCP/PPP), but is still commonly referred to as TCP/IP.

You have probably used TCP/IP before and not been aware of it. Applications such as FTP (the File Transfer Protocol) and Telnet use TCP/IP, although it is user-transparent. You might also be familiar with the Domain Name System (DNS) or Network File System (NFS); these, too, work over IP, using UDP instead of TCP.


TCP is a reliable stream protocol. This means packets are guaranteed to have been received correctly and in proper order. UDP, on the other hand, simply sends packets. The software using UDP must make sure that the data gets to its destination intact.

One of the main advantages TCP/IP has over other networking protocols is its interconnectivity. It was designed to be used with almost any type of cabling; for example, fiber optic, twisted pair, coaxial cable, or even wireless networks are possible. Additionally, it can be used by any networking topologies, including Ethernet, Token Ring and FDDI. This makes it possible to use TCP/IP in practically any networking environment.

Fig. 7.1 - TCP/IP is a layered protocol.

TCP/IP is also machine-independent. UNIX machines can talk to other computers, printers or network devices, just as long as they all understand TCP/IP. The other machine doesn't need to be using the same OS and it needn't be the same type. This has helped to make TCP/IP one of the most popular networking choices around.

TCP/IP is an open networking protocol which has allowed many vendors to develop applications and systems that incorporate TCP/IP in them. UNIX vendors have been incorporating TCP/IP for years and it is hard to find a UNIX machine without it. It is not, however, as common on the Macintosh or PC platform, and additional software may be required to set it up. Windows 95, Windows NT, and Macintosh System 7.5 all include TCP/IP networking without third-party applications.


TCP/IP Terminology

IP address is the machine address. It consists of four numbers, ranging from 1 to 254. These numbers are called octets, and the entire address is referred to as a dotted quad. As an example, let's use 10.32.21.199. This number actually consists of two parts: the network number and the host number. The network number is usually the first 2 or 3 octets in the IP address, and the machine number is the remainder.

Netmask determines how much of the IP is the network address and how much is the host address. For example, the netmask of 255.255.0.0 tells us that the first 2 octets are network numbers, and the last 2 octets are host numbers. Sometimes, the netmask is referred to as the "subnet mask."

Networks are conventionally split into 3 classes, A, B, and C. A class A network has a single octet for a network portion and three octets for the machine portion. Class B networks have two octets for both the machine and network portion, and Class C have three octets for the network and one for the host. The previous example uses a class B network.

Broadcast address goes along with the netmask. It tells the machine how to talk to everyone on the local network. It is usually the inverse of the netmask plus the network number. For example, if our IP address is 10.32.21.199, and our netmask is 255.255.0.0, then our broadcast address would be 10.32.255.255.

Gateways or Routers are alternative paths to a different network. They are required if going to a network other than the one in your IP address. In complex networks, there may be many gateways; in most networks, there is a default gateway. This default gateway gets all traffic not destined for the local network.

Using TCP/IP Software

Configuring TCP/IP is not any harder then configuring any other network protocol, such as Novell Netware's IPX, or Microsoft's Netbios. In fact, once a few essentials are understood, it may be that TCP/IP is easier to configure and, if one is experiencing any difficulty, there are many consultants or administrators available who are familiar with TCP/IP.

If you need TCP/IP software you may want to look at some of the commercial Internet Suites in the next section. Most of these suites contain all the software needed to start using the network, including e-mail, FTP, a Web browser and a Telnet package.

There are also shareware and freeware versions available on the Internet and various bulletin boards. Gathering all the networking software and clients for each protocol (Web, mail, FTP) can be time-consuming; sometimes it turns out to be less expensive to go with a commercial package.

Another nice thing about the commercial packages is the support. If you are having a problem with a public domain or shareware product, it is usually possible to get an answer from someone, but requires much more dedication and patience than with supported software. With commercial software, of course, it is much easier to get the answers.

There are two ways to add TCP/IP to a machine: either use TCP/IP as the only protocol, or use multiple protocols on the same machine. It is generally much easier to use just one protocol, but if you are already using networking software that may not be an option.


If your current networking software allows it, you may be able to encapsulate your existing network in an IP packet. This makes it easier to configure your networking software. Check the documentation to see if this is possible.

Setting up TCP/IP on a Windows machine will require a WINSOCK.DLL file to be installed and configured. If this is the only networking protocol, then that is all that is needed. If running different networks from this machine (such as Microsoft networking or NetWare), some experimenting will be necessary in order to get both protocols to work side-by-side. Your networking software documentation should "walk you" through setting up multiple protocols.

If you purchase a commercial suite of products, they will include a WINSOCK.DLL. Using a browser separately will require a Winsock package such as Trumpet Winsock or else a commercial TCP/IP stack.


The WINSOCK.DLL file is the TCP/IP stack on Windows.

Trumpet Winsock

The most widely used shareware Winsock package is Trumpet Winsock. It is available from many FTP sites but only for the Windows environment. Although lacking the TCP/IP clients, it does incorporate support for TCP/IP over a network using a packet driver, or over a modem via PPP or SLIP. Instructions for downloading a packet driver are distributed with Trumpet. If using a separate browser such as Mosaic or Netscape, this ought to come in handy.


A packet driver is a software driver for a network board which allows networking software to work "as- is" with any network board or topology. It usually requires a line added to your AUTOEXEC.BAT file.

If using other network products, such as NDIS or ODI, it is still possible to use a packet-driver package by downloading the ODIPKT or NDIS_PKT packages. These packages are called "shims" and allow making your ODI or NDIS software look like a packet driver.

NetManage

NetManage makes a series of TCP/IP applications, ranging from Newt, a TCP/IP stack, to the Chameleon Desktop, a full-fledged Internet suite which includes NFS client and server, X Windows emulation, and almost any Internet application conceivable.

NetManage's TCP/IP stack has support for LANs, and modems using SLIP, CSLIP or PPP. The software runs under Windows 3.1, Windows 95, and Windows NT.

NetManage also has many different packages, so one is only required to pay for what one needs.

FTP Software

FTP software also has many different application suites, ranging from PC/TCP, a TCP/IP stack and some basic applications, to OnNet, a full Internet package, including NFS client and server, E-mail, UseNet, and WWW.

PC/TCP runs on Windows 95, NT, 3.1, and Windows for Workgroups. It also allows DOS applications to have access to TCP/IP. The TCP/IP stack can be used over LANS as well as modems.

MacTCP

If on a Macintosh and looking for a TCP/IP stack, you will probably use MacTCP. MacTCP is a commercial product and contains many clients such as Telnet and FTP. The later versions are very stable, though early versions were known to have problems.

Other TCP/IP packages

There are many other companies making TCP/IP software for PC and Macintosh clients. Look around and find one that has just the utilities necessary. When purchasing a TCP/IP stack for Windows, make sure you are getting a TCP/IP stack that is Winsock- compliant.

Configuring TCP/IP

Configuring TCP/IP either on a Macintosh or on a Windows machine requires a few parameters to be entered, including IP address, netmask or subnetmask, broadcast address, and gateway or router. Some installations might also ask for DNS server, which is used to allow you to use machine names, instead of having to remember their IP address.


If you aren't sure of the answers in the configuration, check with your network manager. Incorrectly setting these not only causes your machine to communicate improperly, but can also prevent other machines on the network from properly working.

Choosing a Browser

Among the benefits of an Intranet over an Internet is the fact that the company can standardize on a browser. This allows the Web designer to take advantage of some advanced features, such as Java, Frames or special data formats.

Since the browser is visible to all users, it is important to choose one that has the necessary features and, at the same time, will run on all the available platforms. In the following section, we will cover the more popular browsers such as Mosaic, Netscape, and Lynx, as well as browsers from some of the TCP/IP bundles.


If you don't get one of the bundled packages and instead get just the browser, you will need to get a separate TCP/IP stack.

Current Web browsers support many different things, including form support, different image formats, Frames, Java, Imagemaps, client-side imagemaps, various htmL tags and many other protocols. Fortunately, not every company needs all these features. Some of the more helpful features are:

  1. Forms. Forms are required to submit data. Most, if not all, current browsers support forms, though some may display them differently than others.
  2. Image formats. Different formats include gif, JPEG, PCX and XBM. gif is supported in almost every browser, with JPEG running a close second. There are also interlaced gif and JPEG formats, which start out blurry, but get clearer as the image comes in. Interlaced images are nice, but are by no means the most important feature. XBM and PCX files are used mostly on UNIX machines and PC's, respectively.
  3. htmL tags. Some companies have added their own extensions to their browsers to make up for a lack of features in the htmL specification. Some of the new tags allow centering of text, blinking, or changing colors of fonts or backgrounds.
  4. Frames are a new feature currently added to Netscape browsers. These allow the Web designer to split the browser display screen into smaller sections. These smaller sections are called frames. If used properly, frames can make navigating the Web much easier, but if used improperly, they clutter the screen and make it unusable.
  5. Java is a new language designed by Sun Microsystems Inc. It allows the Web designer to add "executable content" to a Web page. Java can be used to offload some of the CPU load off of the main server, and is much more sophisticated than mere CGI scripts.
  6. UseNet news and e-mail access allow a user to send and receive e-mail and news articles from within the browser. Some browsers allow reading news and posting e-mail, but not posting news and reading e-mail. Others allow both.
  7. Imagemaps and client-side imagemaps allow the Web designer to have a graphical navigation tool. They can be used to display a floor plan and allow the user to click an area, such as an office, and obtain information about the group or person who uses it. Client-side imagemaps are imagemaps that do the processing on the client side instead of the server, thus reducing the load on the server.


The browser market changes very quickly. before deciding on a browser, recheck the market for new browsers or enhancements to existing browsers.

The Mosaic Browser

Mosaic was the first graphical WWW browser and handles the common platforms (Windows, Macintosh and UNIX), so they shall be discussed first.

Mosaic was developed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois in order to handle the recently created WWW. Mosaic was the first browser to include images, sound, and text in a browser environment (see fig. 7.2).

Fig. 7.2 - Mosaic captured most of the early browser market.

The latest version has a built in e-mail sender, hotlist manager, and news reader. It also has support for forms, imagemaps, and most htmL tags. It does not, however, allow posting of news or reading of e-mail. It includes support for inline viewing of gif, JPEG and XBM. Mosaic must download the entire page before it is displayed and, therefore, has no support for interlaced images. The newer Windows versions require a 32-bit version of Windows such as Win95 or NT. You can download a 32-bit DLL for earlier versions of Windows from Microsoft.

Netscape's Navigator

Netscape Navigator is considered to be the leading browser and now claims over 70 percent of the browser market. It has versions for Windows, Macintosh and UNIX (like Mosaic) but the current versions are no longer free. The last free version was 0.9 and could be used for noncommercial uses only. Navigator 2.0 has many new features, such as Java, Frames, extended htmL tags, client-side imagemaps and interlaced images. See figure 7.3.


Netscape has added some extra tags to allow for more formatting options. These include tags to center text, change text size or color, cause sections of text to blink and also allow background images. These are called Netscape extensions.

Fig. 7.3 - Macmillan Computer Publishing's site viewed with Netscape.

Netscape has support for a complete e-mail system and can filter mail by subject, date, sender or size. It allows the user to do almost anything that can be done in a dedicated e-mail package and allows hypertext to be followed with a click of the button (see fig. 7.4).

Fig. 7.4 - Netscape's e-mail system.

In addition to the e-mail system, there is also a built-in news system. It allows both reading and posting and can be sorted by subject, date and sender. The news system also allows hypertext linking. This hypertext linking helps make Navigator one of the easiest-to-use Internet systems around.

Another nice feature of Navigator is incremental loading, which means that as the page is downloaded, it is displayed. This allows the user to start reading the text before the entire page is read in. On a fast LAN connection, this may not be as important, but on an internal network made up of slow WAN links, this will most certainly be advantageous.

Lynx Text Browser

Lynx is a fast text-only browser, developed at the University of Kansas. Lynx is very useful in environments that use dumb terminals without the ability to display graphics (see fig. 7.5). Lynx is also useful over slow links, since it does not have to download the graphics. See figure 7.5.

Fig. 7.5 - Accessing the Macmillan site with Lynx.

Lynx has support for forms but cannot display inline graphics or imagemaps. Lynx can download the graphic and view it with an external viewer; however, the terminal being used must be able to display graphics, and the viewer must be configured separately.

Lynx allows the reading of news and the sending of e-mail, but is not a full-fledged news or e-mail system like Navigator. For sites with dumb terminals, Lynx is one of the few existing alternatives.

Microsoft's Internet Explorer

Microsoft's WWW browser is an impressive browser. It supports forms, imagemaps and supports almost all the htmL tags that Netscape does. These include centering, changing font color, backgrounds and font sizing. See figure 7.6.

Explorer also supports gif and JPEG images and audio sounds. Microsoft has decided to license the Java language from Sun Microsystems and future browsers are expected to have support for it.

Fig. 7.6 - This is what Microsoft Internet Explorer looks like.

Explorer has limited table support but has problems with some tables that Netscape can display. If standardized on Explorer, though, it is easy to work around this.

Microsoft has also announced a version of Explorer for the Macintosh platform.

Chameleon's WebSurfer

The browser that ships with the Chameleon software has all the basic support, such as gif images, and most htmL tags. It does not however, contain support for Java.

Fig. 7.7 - Macmillan's site viewed using WebSurfer.

WebSurfer also incorporates a news reader and supports sending of e-mail from the browser.

Network Topology and Speed

Most LANs (Local Area Networks) can transfer at least 4 MB of information a second, which is plenty fast enough for transferring the largest text files. If the network is made up of WAN (Wide Area Networks) links, there may occur some network bottlenecks when transferring pages over them.

If using large graphic or audio files, you will experience delays on all but the fastest LAN technologies. Large files will also take more space on the server and place a strain on the client and server machines.

WAN links are generally much slower then LAN connections, usually no more than 1.5 MB and commonly only 56 kbps. Some dial-up links are only 14.4 kbps or 28.8 kbps. If there are many such connections, it is important to be careful about page size.

Even if you have a fast network, and aren't separated by WAN links, it makes sense to place the Apache server as close to the center of the network (or backbone) as possible. This will make the speed even faster and will also increase overall network performance (see fig. 8.8).

Fig. 7.8 - The server should be in the center of the network.

If many groups are using parts of the Web server, one can separate the Web space into multiple servers to increase performance (see fig. 7.9). This allows placing the Web server as close to the main users as possible.

Fig. 7.9 - You might want to split your Web server.

Disk Space

If you are only using htmL files, then very little disk space is needed. htmL files are simply ASCII files and don't take up much room.

Once graphic or audio files are incorporated, the requirement for disk space expands accordingly. An average graphic file can take several megabytes, and some file formats can take over 30 MB per file. Audio files take several megabytes of space for each minute of sound. Using these type of formats will require not only more space, but also more network bandwidth, as discussed in the previous section.

When indexing your Web space, don't forget to consider the space required for the index database. This file can be as much as half the size of the documents combined within. Log files can also take up a lot of space. Log files up to 10 MB a month are not uncommon.

Document Formats

Most companies are made up of different departments and different machine types. Marketing groups may have Macintosh machines, Engineering might have UNIX machines, and there is almost sure to be a wide array of PCs in use.

There are also different formats used on each machine. PC users are likely to be using WordPerfect or Microsoft Word to create files. UNIX users may use text editors or Framemaker to create memoranda, while Macintosh users may use other formats.

Deciding on a common data format is not easy and, in some cases, not possible. Next, we shall take a look at the various formats and their associated benefits.

htmL

htmL was designed to overcome the cross-platform issue, but it is limited in the formatting that can be done to documents. Netscape and others have added specific tags to help, but using these tags requires using specific viewers to properly see them.

Many word-processing companies have developed (or are developing) ways to save their documents as htmL documents. Microsoft, Framemaker, Corel and others have filters or templates that you can use to create htmL files using their software.

These converters and tools are covered in Chapter 10, "htmL Editors and Tools."

There are also conversion utilities that will convert from many formats to htmL. Not all documents can be converted easily, though, and sometimes it is easier to recreate a document than to try and convert it.

Microsoft Word

Microsoft Word is a very popular word processor, having the advantage of being available on both the Mac and PC platforms. It allows graphics and various fonts and is very user friendly.

Microsoft Word files, however, are different between versions, and the early versions can't read the newer versions' files. Microsoft has made a Word viewer which is not available on UNIX machines. Still, Word remains a good choice for those owners of Windows or Macintosh machines exclusively.

Postscript

Postscript is a very powerful document language. It is also very popular, and Postscript viewers are available for the major platforms. Almost all the word-processing and drawing packages can save as a Postscript file, making Postscript seem like a good choice.

Postscript files tend to be quite large, especially if they contain many images. While they are quick on high-end machines, Postscript viewers can be slow to load on some others. In spite of this, Postscript is another very good alternative for documents that need more formatting than htmL allows.

Adobe PDF

Adobe Acrobat files allow extensive formatting and also are available on all the popular platforms, such as PC, Mac and UNIX. PDF viewers are available free of charge over the Internet.

Like Postscript, however, PDF files tend to be large and load slowly.

Images

It is also possible to save all documents as an image format such as gif or JPEG. This would allow easy viewing on the various platforms, except for dumb terminals.

Image files are, unfortunately, very large and would use much more space and network bandwidth than other formats. Also, since image files don't contain ASCII text, they cannot be indexed or searched.

A Combination of Formats

The best alternative is to use a combination of formats to achieve the required result. Any text that does not need specific formatting (such as memoranda or notes) can be done in htmL or converted to htmL. Documents that require specific formatting can be stored in Postscript or image formats to keep their look intact.

Whenever possible, keep the number of formats to a minimum. If your company only uses PC's, then stick to htmL and Word files. If everyone has a fast UNIX machine, use Postscript and htmL.

Setting Up the Web Documents

This section will cover setting up your access-control mechanism and also go over some guidelines to help you set up some of the applications that will make your Web server even more useful.

Managing Content

When setting up documents, one will need to consider how to structure them in order to make their management easier. It often makes sense to group documents according to department. Even though Apache uses hypertext links to create an organized hierarchy of documents, one should still try to maintain a close correspondence between hypertext links and directory layout.

It is also crucial to decide who will be able to change files and create new topics. Will anyone be able to add a new topic? Who should be allowed to change pages? Do all files need to be approved? These questions must be answered in advance of starting to add files to the server.

Organizing Hierarchy

Chapter 5, "Apache Configuration," gives more information about this directive. See.

Apache doesn't have a document control feature, but you can take advantage of the DirectoryIndex directive. Using DirectoryIndex will make it easier for you to find documents when not in a Web browser; for instance, when editing (see fig. 7.10).

Fig. 7.10 - Setting up a directory hierarchy.

One of the best ways to organize a Web site is to set up directories for each major topic in its DocumentRoot. Using the DirectoryIndex directive will set up an "index page" for each topic. That page then references other files stored in that directory, or points to a subdirectory which contains another "index page". Using this file hierarchy makes it easy to organize and maintain the Web server's content.

Getting Files on the Server

If already using NFS, there is an easy way to get your documents on to your Web server. Set up the export list on the Web server to allow developers to mount DocumentRoot to their machines.

NFS may not be desirable in a high-security environment, due to its lack of authentication. PC and Macintosh users can pretend to be any user. Most UNIX users, however, don't allow pretending to be root, but care must still be taken when exporting via NFS.

See Chapter 5, Apache Configuration."


Exporting a file system must be done with care. If an unauthorized user can access DocumentRoot via NFS or FTP, then any server-access control is circumvented.

If you can't use NFS, then FTP is another logical choice. FTP will allow the user to cd to the DocumentRoot directory and also to put the files in place. Using FTP is more secure then NFS, since it asks for a login and password. It does, however, send the login name and password unencrypted across the network, which may be unacceptable in some companies. Security is covered in more detail later in this chapter.


Both of these methods require the UNIX permissions to allow write access to the directory.

Access Control Methods

The first thing you need to do is decide on who will be able to change various documents. There are three types of access control that you can use for your Web server: Open, Distributed and Centralized. We will look at each one in detail later. First, we shall discuss a little about UNIX protections.

UNIX File Permissions

UNIX file permissions control who can do what to a file or directory. This is the only way to protect documents on your Web server, so it is important to discuss them in some detail.

UNIX permissions have three different levels of access to define who can do what to a file or directory:

  • The owner of the file (User)
  • A group of users who have access (Group)
  • Everyone else (Other)

There are also three different things a user can do to a file:

  • View the contents of a file (Read)
  • Change the contents of a file (Write)
  • Run the file or program (eXecute)

When the permissions are applied to a directory they have slightly different meanings:

  • List the contents of a directory (Read)
  • Create new files in the directory (Write)
  • Access files in the directory (eXecute)

The different levels of access can be defined by a number. Read is 4, Write is 2 and eXecute is 1. To determine the permissions add up the numbers. For example to set Read and Write access would be 6 ( 4+2 ).

The UNIX command to change permissions on a file or directory is chmod ugo <file>. u is the access for the User, g is for the group and o is everyone else. To set a directory up for the user and group to be able to access and add files to a directory, called "hr" and everyone else to be able to get files from the directory we would do "chmod 771 hr" . To set up a file, say "policy1", for the user to be able to change and the group and everyone else to be able to read would require "chmod 644 policy1". We could also use "chmod 755 policy1" - in the case of htmL files, the execute doesn't matter, unless using the XbitHack directive.

See Chapter 5, "Apache Configuration," for more details about this directive.


The XBitHack directive is used to tell Apache that an htmL file includes Server Side Includes, and can also be used to send a last-modified header.


Different versions of UNIX act slightly different. If you are having problems, consult your UNIX documentation.

Now let's apply UNIX permissions to our DocumentRoot to set up different access policies.

Open Access

This type of access control is the easiest to setup and maintain. It allows anyone to change any file in your DocumentRoot. To set this up, simply change the permissions on all the files and directories in the DocumentRoot directory tree. In SunOS4.1, you can cd to DocumentRoot (DocumentRoot is defined in the srm.conf file), and "chmod -R 777." The chmod command we have seen before the -R option tells UNIX to perform this recursively.


Setting permissions so that anyone can read, write or delete any file makes adding documents very easy. However, it can lead to problems if unauthorized users can get access to your file system. With the open access model, these users could change or remove any file on your Web server.

Using the open-access model makes adding documents easy. Unfortunately it also tends to get very confusing, since there is no central authority helping to organize where information is found.

Distributed Access

Distributed management is a scenario where several developers jointly manage content of their area, or a single developer manages a particular area. The server administrator would delegate various permissions to lead developers of a project, while maintaining control of the home page and other areas of the server. The lead developer could then further delegate responsibilities to other developers.

Using distributed access allows the server to maintain structure and also allows users to add documents through responsible developers.

To set up a directory to be administered by a single developer requires the server administrator to change ownership of the directory to him or her. This can be done by using the chown command like "chown richc hr-docs". The directory permissions should be set to 711 to allow the server user (defined by the User directive in httpd.conf) to get files from the directory.

If the directory is to be managed by more than one developer, it becomes necessary to create a group with the users that can control this area and set the permissions to be 771.

Any files created can be set so that only the owner can change them (chmod 644)or so that someone in the same group can change them (chmod 664). This will depend on how your policy is defined. The important thing is to make sure that the server-user has at least read permissions on the files, otherwise they will not show up in your Web tree.

Centralized Access

Centralized access requires any changes to documents to go through the server administrator, or a central authority.

Using this access policy allows structure to constantly be maintained throughout the Web. However, since only one person can make changes, it sometimes can make it hard to add documents as well as being difficult to administer.

Directory permissions for centralized control are simple. All the files and directories under DocumentRoot are owned by one person and set so that only that person has write permissions. All files should be 644 and all directories need to be 711.

Using Multiple Access Methods

Often it is desirable to have different sections under different access permissions. It is normal to have the top-level pages under centralized control, and then have each department maintain their own Web documents. It is a good idea to set up an area for anyone to create files.

To enable users to be able to control specific files, but not to be able to add new files, one would set the directory to 711 (owned by the administrator) and have the files underneath it owned by specific users with permissions set to 644. Administrators can then create the file and use the chown command to give the file away.


Some UNIX systems don't allow the transfer of ownership to other users, unless logged in as root.

Adding Useful Features

Earlier in this chapter, we discussed some features to add to an internal Web server to make it more useful. In this section, we will discuss strategies for their implementation.

Bulletin Boards

A bulletin board can be as simple as a writable htmL page. Users can edit the file and add notices, and view the page in their browser to read them. Most browsers allow searching the current page for keywords.

A better way would be to create a fill-out form and use a CGI to add a link to the main bulletin board page, and also create a separate page for the notice. The main page should have a list of notices and a subject, and also a search page. The search page should allow users to search by subject, date, author, or full text. See figure 7.11.

Fig. 7.11 - A Bulletin board page.

Handbooks and Newsletters

Creating or converting an employee handbook can be a good way to save money. It allows changes to be made easily and whenever needed, instead of waiting until the next reprint. Using search capabilities can help save employees time by allowing the computer to search through the text.

First, create the Table of Contents page with links to the other pages. Each chapter should be in a separate directory and each section should be a separate page. Use the DirectoryIndex directive to make things easier to understand. See figure 7.12.

Fig. 7.12 - A sample table of contents.

If your company is using Frames, you can create a special frame for the table of contents so users can easily navigate through the manual.

For more information about search capabilities, see Chapter 15, "Search Engines and Annotation Systems."

Create a search page to make it easy to find references to specific topics.

Adding Business Forms

Forms are covered in detail in Chapter 12, "htmL Forms."

Adding business forms is just like adding any other type of form. Try to make the electronic version look like the paper version so people won't get confused.

See Chapter 17, "Database Access and Applications Integration."

It is also possible in some cases to create a CGI script to automate the handling of the form. This can be as simple as e-mailing someone the information on the form or as complex as adding a record to a database.


Take care when having your CGI make changes - it is possible to have an incorrect script remove records or make unintended changes. It might also be a security problem if users can make changes to which they should not be entitled.

Workgroup Pages

Setting up an area for a specific department can make the WWW a very powerful tool. Different departments might have different ideas on what they will use their area for, but a few common uses are: to track project status, to store documentation (knowledge base) and to introduce the members.

Tracking project status can be done by creating a page with deadlines on it. There should be a form for developers to submit changes such as missing a deadline or finishing part of the project early.

Having all the documentation, notes, meeting minutes and memoranda for a project in a central area makes communication easier.

Setting a directory aside for each project and creating a search capability for it will allow employees to spend more time working on a project and less time searching for relevant information.

A Web page is also a good place to put pages about each team member and what that person's specialty is. Contact information such as phone numbers or e-mail addresses is also helpful.

Discussion Forums

There are several ways to get a discussion area created. The first and easiest way is to create a writable page and have users add comments to it. This, however, is not very interactive or easy to use.

A better way is to use UseNet news groups to allow discussion. Setting up a news server is easy and allows many different discussions to be going on at once.


Most browsers allow you to read news by using a special URL. news:local.eng.sw would generate a list of news articles in the group local.eng.sw.

Some browsers also allow posting from inside the browsers. Other only allow reading. If your browser allows sending e-mail, but not posting news, you can get a news-to-e-mail gateway and still use newsgroups for discussion.


Mailing list software often has a news gateway. One of these is listproc. Listproc can be downloaded from ftp://cs-ftp.bu.edu/pub/listserv.

There is also a PERL script called mail2news which can be found at ftp://relay.cs.toronto.edu/pub/moraes.

Monitoring Tools

It is often desirable to be able to check on the status of a print job from a Web browser. Using CGI scripts or SSI (and a little creativity), this can be easily done. See figure 7.13.

Create a network status page that contains a list of printers along with how many print jobs they have in them. You could also list who owns the print jobs. Under SunOS, use the lpc and lpq commands to get this information. Other versions of UNIX may use other commands.

Fig. 7.13 - Pages can show the status of all the print queues.

For more information about SSIs, see Chapter 13, "CGI Scripts and Server APIs."

Here is a simple script to list who is using the printers. It is to be used as a Server Side Include.

Listing 7.1 Showing the Status of Print Queues

#!/bin/sh
# This script is an example. It is to be used as a SSI.
#
# Cycle through the printers and get the printer name 
# and number of jobs
# This next line gets all of our printer names from
# /etc/printcap. This way the script is always current.
for name in `lpc stat all | grep : | sed 's/://g'`
        do
# In our printcap we have short names and longer names 
# these are from our printcap file
# NiceName is the third name.
#lp|line| The Line Printer:\
#ls| lascii| QMS 860:\
#lw|lpost| NEC Silent Writer:\
#lf|lproff| QMS 860:\
        NiceName=`grep "^$name" /etc/printcap | 
      awk -F\| ' { print $3 }'
      | awk -F: ' { print $1 }'`
        echo  $NiceName '<p>'
        echo '<PRE>'
        lpq -P$name
        echo '</PRE>'
        done

In your network status page, it is good to include a list of machines on the network - placing a green dot if the machine is accessible, for example, or a red dot if it isn't. Using the ping command, one can check to see if the machine is alive or not.

Here is a simple SSI that can be used to test to see if a machine is running or not. It needs a graphic called reddot.gif and greendot.gif. These will be used to show if a machine is reachable on the network or not.

Listing 7.2 Testing Machine Response

#!/bin/sh
# Ping each machine and see if they answer
#
# ping might be in a different spot on other UNIX versions.
# This is OK for SUNOS
PING=/usr/etc/ping
# Time to wait in seconds before deciding a machine is 
# really down. If TIMEOUT is
# too long your SSI will take a long time to load.
TIMEOUT=2
for name in machine1 machine2 machine3
        do
        if [ `$PING $name $TIMEOUT | awk ' { print $3 }
            ' ` = "alive" ]
        then
        echo '<img src="/images/greendot.gif" 
            alt="Alive"> '$name'<BR>'
        else
        echo '<img src="/images/reddot.gif" 
            alt="Down"> '$name'<BR>'
        fi
        done


Ping checks the network connection to a machine. It does not check to see if the machine is really working properly. Most versions of ping allow a time-out value. This is important since ping normally waits many seconds before realizing a machine is down. This would cause your page to load very slowly.

It is possible, using SSI and CGI programs, to make almost any UNIX command into a useful page. These examples are just possibilities, and not a definitive list of what is possible.

Protecting Your Data from Outside Access

Once you decide to develop an internal Web site, you need to make sure outsiders have no access it. This should be one of your top concerns. This section will cover the various ways in which one may protect data.

Security Through Obscurity

One of the easiest ways to reduce the risk of someone getting your internal data is simply to fail to inform them it is there. This is called "security through obscurity" and is the least effective method. If it is important to keep out anyone except the most casual browser, use better security.

The first way to try to hide your server is to give it an unusual name. Most companies call their Web server www.company.com, and that is the first place any determined cracker will look.


If trying to hide the server name, it is important not to post news, send e-mail, or run a Web browser from it. This may cause your machine name to show up in other system's logs. These logs can be set up to get automatically indexed by a search robot.

For more information about the inetd.conf file, see Chapter 5, "Apache Configuration."

The second way to make your server less likely to be found is to run it on a nonstandard port. Ports can range from 0 to 65,535, so there is a wide range to choose from. Generally, the first 1024 are considered reserved ports. Make sure the server isn't running on a port that is already in use. The Port directive (in your httpd.conf file) defines what port you are running on if you are not running from inetd. If you are running your server from inetd, this is defined in your inetd.conf file.


The term reserved port means that only root is allowed to run a server on it. It does not mean that the port is in use. Reserved ports have no meaning on a PC or Macintosh, since anyone can run on any port with those platforms.


Hiding the server will not stop anyone determined to access your data. There is software available that can find any server, regardless of how obscure its name, by searching every possible IP address to which it is assigned. Running on a port other then 80 will slow people down, but there also exists software that will scan all 65,535 ports in a few minutes.

Using the Software To Restrict Access

Chapter 5 covered different security features, including the limit directive. This directive can be restricted to serving only documents within specific IP address ranges.

To use the limit directive as such server-access restriction, set the limit directive to only allow gets or posts from your IP address range. For example, if your network is 10.32.21.0, then your limit directive should say "allow from 10.32.21.*" and "deny from all".


Using the Apache server to protect access only stops attacks against your Web server. It does not prevent people from getting your data through other methods, such as FTP or NFS.

It is possible for intruders to trick your Web server into thinking it is part of your network by using IP spoofing. IP spoofing is a means to make a remote machine appear to be from the trusted local network.

Another software product that will help to protect a server is called TCP wrappers. It can be downloaded from the Internet and is available free of charge. The wrappers allow defining of who can connect to various ports and also offers some protection against IP spoofing. They also offer logging, and can be used to send different messages to different IP addresses. TCP wrappers can only be used if you are running from inetd.

Firewalling the Internet

Firewalls are the best defense from the Internet. There are different types of firewalls and each has its advantages and disadvantages. We will cover the basic design philosophies and how to use them.


The Internet can be a source of unauthorized access, but it is not the only way intruders can get in. Modems on people's machines can unintentionally be set up to allow crackers to get into the network.

Screening Routers

Using routers to block access to all machines and ports is one of the more common ways to protect your internal network. You can use a deny-all policy or an allow-all policy.

The deny-all policy says, in effect, not to let anything in except what services are necessary. This is the most secure method, since one can be fairly confident of what gets into your network. For example, if it is desirable to allow e-mail to get into the network, allow port 25 (smtp). See figure 7.14.

Fig. 7.14 - Screening routers can be used to allow some protocols to pass while refusing others. Http can pass, Telnet cannot.

The allow-all policy defines certain services that won't be allowed. For example, to prevent people from logging in, one would deny Telnet and rlogin services. This is not as safe as deny-all, because some services may be overlooked, leaving a backdoor open.

Using screening routers, it is possible to restrict certain types of traffic to certain specific machines; for example, allowing e-mail to every machine may not be necessary, in which case, one would limit e-mail access to only the one machine.

It may also be desirable to limit who can access the Internet from inside the company. It would be possible for employees to, knowingly or not, transfer information outside of the company. Blocking access to the Internet is discussed in a later section.


Always limit as much as possible, to reduce your risk. Only allow the minimum services through your router that are absolutely necessary and, then, only to the machines that must have them.

Application Gateways

Instead of using a router to pass or deny traffic, it is possible to have programs that decide whether or not to allow specific commands. These are commonly called application gateways, since they are specific to the application they are running (see fig. 7.15).

Fig. 7.15 - Traffic is kept separate using application gateways.

An example of an application gateway would be a proxy server. Proxy servers allow connections on each side of a network to communicate to each other, but the traffic can be analyzed and limited. For example, using a proxy Web gateway, one might allow GET access but not POST access.

Application gateways often have performance problems, since each network transaction must be checked to be sure it is not doing anything inappropriate. Some application gateways may require slightly different commands than the ones to which users have become accustomed.

Application gateways are, however, very good at auditing and logging accesses, and often are used in conjunction with a screening router.

One of the main problems with an application gateway is the fact that a single compromise makes the entire network vulnerable. Once an account on the gateway machine is broken, it can be used to attack the internal network as a whole.

Using a Combination of Security

Most sites that are connected to the Internet use a combination of these security tools. A common firewall setup would involve one or two screening routers surrounding an application gateway. This offers some protection to the gateway host, as well as reducing risks to the internal network, if a gateway-host compromise occurs. See figure 7.16.

Fig. 7.16 - Many sites use multiple protections.

Allowing Users To Safely Access the Internet

To really take advantage of the Internet, users must be able to retrieve information from it. Many people think of the Internet as all fun and games, but there is a wealth of useful business information available, as well. This section will discuss ways to allow users to access such information from the Internet safely.

Internet Threats

There are two main threats when allowing users to access the Internet. They are: viruses, and information leaks. The risks from both can be reduced by a few simple technical solutions and by educating prospective users.

Viruses

Viruses are programs that replicate themselves to other files on the same machine and also to other machines. They may or may not be destructive, but they are most certainly always a nuisance.

Viruses can only be controlled by having users scan files on their local machines before using them. Even though all traffic may go through a firewall, it is still not possible to check all the information for viruses.


Even if it were possible to have your firewall scan for viruses, if people transfer files via other means (such as floppies or tapes) they are still vulnerable.

There are many different type of virus software available both free and commercially. Choose one that is most suitable and make a definite practice of running it.


Virus software must be current to be effective. New viruses are found every day; some can't be found with older virus scanners. Always get the latest version available and keep upgrades current.

Information Leaks

There are many ways for outsiders to break into your network and it is important to reduce the risk if they do. By limiting ease of access to information from within the company, confidential data is protected in the event your site is compromised.

One of the most popular ways to limit information leakage is by using a firewall to limit who can send out information. Information can be sent many different ways, however, and can't be eliminated without disconnecting from the Internet entirely.

The easiest way to limit exposure is to limit what information can get out to the Internet. This is done by blocking direct access out of the company at the router. To allow people to get out to the Internet, install either a proxy server or SOCKS.

Proxy servers are transparent to the user and are available for many different protocols. However, each protocol must have a separate proxy server.

SOCKS, on the other hand, is a more generic tool which can be used with any protocol; however, SOCKS necessitates some changes to the client software. Such software is considered to be SOCKSified.

Sharing Your Private Server

There may be times when your private server needs to be accessed from other companies. You may do this via dial-up connections, dedicated network links, or over the Internet.

Using Dial-up Connections

If users need only access the Web server infrequently, it might make sense to install a dial-in modem. This connection can be running PPP software to allow full network access, or could just be used as a dumb terminal to allow simple access via lynx.

Dial-up connections can be password-protected to eliminate unauthorized use. Using Caller ID (CID) can reduce the risk even more, by enabling one to restrict telephone answering to certain authorized numbers.

Dial-up lines are the most secure way of allowing access into your network, but are usually too slow for simultaneous multiple-party use.

Using a Dedicated Link

If sharing data among sites, you may want to consider a dedicated high-speed link. These links can be ISDN, Frame Relay, or a T1 link.

Dedicated links are only as secure as the network you are connecting to. If the other network can be broken into, so can yours. Therefore, it makes sense to protect yourself from this link, just as you would do so from the Internet in general (see fig. 7.17).

Fig. 7.17 - Treat any connection outside your company as hostile.

Using Router Access Lists

If you are using the Internet to allow other users to access your company server, or a dedicated line to another company, it might be a good idea to set up an access list on your router.

Setting up an access list which restricts access to your Web server to a specific machine, or list of machines, is a good way to reduce the risk incurred when connecting. Such risk may be further reduced by allowing only the most remote sites to connect to the port on which your server is running (check the Port directive or the inetd.conf file).

This will help to protect from access via other means, such as NFS or FTP.

Password Protection

If there is no router to aid in protection, you should at least use the Apache password protection. This will require valid user names and passwords to access via the Web server.

Passwords are set up using the Auth and Limit directives in the access.conf file. This is covered in detail in Chapter 5, "Apache Configuration".


Using the Web server password protection will protect you from attacks through the Web server software. It will not protect you from access by other means such as NFS or FTP.

Encryption

Passwords and access lists will provide limited protection but all of your information is still sent "in the clear". This means anyone on a network through which your traffic passes can still read the contents.

The only way to keep your information private is by using encryption. Encryption is a means of converting your data to code. For example a simple way to hide your data would be to UUEncode it. UUEncode is a program that convert files from 8 bit ( binary ) data to 7 bit data. This makes it unreadable unless someone has the UUDecode program.


UUEncoding a file is not really a secure means of encryption, since all that is needed is the UUDecode program. This program is freely available from most FTP sites and is also distributed with all UNIX machines.

Using UUEncode will keep casual snoopers from seeing your information, but a dedicated cracker can save all your information and then run it through UUDecode in order to read it. A better alternative would be to use an encryption product such as PGP (Pretty Good Privacy) or RSA.

These packages require you to encrypt the data separately and then send it. It is not transparent, so if transparency is important, hardware encryption will be required.

Hardware encryption requires a separate device to be attached, either between you and your network, or between the two networks. An example would be an encrypting router, which will automatically scramble your network data so it will be unreadable. By setting up two encrypting devices at opposite sides of the network, you can have transparent encryption (see fig. 7.18).

Fig. 7.18 - Using hardware encryption makes it transparent.

Running Internal and External servers

Most companies will want to run two Web servers, one for public viewing and one for private use. There are different strategies available to keep these separate.

One Server, Two Directory Trees

For more information about the directives covered in this section, see Chapter 5, "Apache Configuration."

The simplest way to have separate internal and external areas is to set up different directory areas on a single Web server. You can then use the Limit directive in the access.conf file to allow internal IP addresses access only to the private tree.


You must define restrictions for all directories, not just DocumentRoot. This includes alias and script directories. This is done by using the directives Limit and AllowOverides.

To set up the directory trees do the following:

  1. Create two directories. One for internal and one for external access. They must not be subdirectories of each other.
  2. Use the Limit directive to only allow internal IP addresses to get to the internal directory.
  3. Disable per-directory overrides to make sure your Limit directive stays enforced. You can do this by using the AllowOverides directive. Set this to none.

Using the same server is the cheapest solution and can be used if there are no alternatives or your internal data is not that important. If the server is compromised, not only is your external data in danger, but your internal data is compromised as well.

One Machine, Two Servers

This strategy also only uses one machine and allows security for the cost-conscious. It offers slightly more protection than the previous example but is not the most secure option available.

Using this technique, you create two separate http configurations, including configuration files and DocumentRoots. This allows you to run separate server processes for internal and external accesses. You will probably want to run your external server on port 80 since that is where most people will look for it. Your internal serve can then run on any unused port.

Running separate servers allow you to configure your internal server to be more or less restrictive then your external server.

The following is how one sets up multiple servers on one machine:

For detailed instructions on setting up Apache see Chapter 4, "Getting Started with Your Web Server."

  1. Create two separate server directories, including configuration and document directories.
  2. Configure your external server as you normally would, but configure your internal one with the same restrictions as the previous procedure. (Use the Limit and AllowOverides directives.)
  3. Configure your Internal server to use the non-standard port. This is done in either the httpd.conf file, using the Port directive, or in the inetd.conf file.
  4. When you start your httpd server, you may need to use the -d or -f flags to point to the right configuration files.

Two Machines, One Network

A better alternative to using one machine is to use two machines: one that serves the external pages, and one for internal access.

Using two machines protects your data - so long as unauthorized access remains restricted to only the one machine, then your internal and external Web pages will never both be placed in jeopardy.


If any machine on your network has been compromised, it is possible that all of them have been. Crackers can install sniffer programs to watch the network for passwords and store them in a file or e-mail them to the cracker. The only way to get around this is to always encrypt your traffic.

UNIX machines also can be set up to trust one another, either by creating a "/etc/hosts.equiv" file or by putting a ".rhosts" file in your home directory. Trusting a machine that has been compromised is a sure way to get broken into. Never trust your external Web server.

If one of your servers is compromised, that machine can be used to break into other machines on your network, either by installing a sniffer or taking advantage of host trust.

Two Machines, Two Networks

Having your two machines on the same network, as in the previous scenario, can be a problem if one of them is compromised.

An even better alternative is to separate the two machines by a firewall. This firewall can be as simple as a screening router or a series of routers and application gateways. See figure 7.19.

Fig. 7.19 - Set up your external Web server outside your firewall.


QUE Home Page

For technical support For our books And software contact support@mcp.com

Copyright © 1996, Que Corporation


Table of Contents

06 - Managing an Internet Web Server

08 - Basic htmL: Understanding Hypertext