The "OpenPSP" Web Server (home, download)

The overriding design criterion for the OpenPSP web server was that it should be portable and as easy to use. One solution to portability is to use scripting language such as perl. Perl also happens to be well suited to Web programming as a result of features such as regular expressions and string interpolation which no doubt explains its popularity for web development.

Until now two main methods for scripting Active Web Pages using perl have been used: "cgi" scripts or their faster cousins FastCGI and mod_perl. Unfortunately these are not straight forward to configure and test and typically involve root access and an afternoon with a hefty book on the Apache server's configuration files httpd.conf and srm.conf.

I was looking to be able to develop self contained, portable Web "applications" that could be distributed easily and decided it was time the Web Server was just a part of the application rather than the other way around. Applications using the OpenPSP Web server can be included as module files in the "pgi-lib" directory of a distribution of the OpenPSP server and run as one would a script (without requiring root access and freeing the developer from portability concerns, the application being as portable as Perl itself). The entire Web server distribution is 70kb.

Non-forked CGI

CGI scripts have given Perl scripted Web programming rather a bad name unfortunately as traditionally they have required a new process to be created for each request using the fork() system call which is ridiculously wasteful of machine resources. Under UNIX it is possible to avoid forking while maintaining the standard CGI interface by parsing the incoming request, setting required environment variables and simply re-attaching the STDIN and STDOUT of the main process to the incoming client connection using a "dup()". In Perl terms this means you then "do $cgi_file;" to execute the CGI file rather than forking and relaying data to and from a child process.

Keep in mind that HTTP is in fact a extremely simple protocol the basic scenario is as follows:

This means a basic Web server to process cgi scripts can be coded as follows (excluding the query string an environment):

#! /usr/bin/perl -w

use IO::Socket::INET;

my $socket = IO::Socket->new( LocalPort=>8080, Listen=>5 );

while ( my $connection = $socket->accept() ) {
	my ($method, $script, $proto) =
		$socket->getline() =~ /(\S+)\s+(\S+)\s+(\S*)/;

	while ( $socket->getline() =~ /\S/ ) {};

	open STDIN,  ">& ".$socket->fileno();
	open STDOUT, ">& ".$socket->fileno();

	do $script;
}

This is the role of the perl module PSP::Socket.pm in the OpenPSP source. This module is then subclassed to process documents and scripts by PSP::Docs.pm and then subclassed again to process "PGI" modules by PSP::Httpd.pm. This set of classes implements an Apache compatible Web Server and is used by the script httpd.pl which should be run for basic Web server functionality.

Features of OpenPSP server

Ths server to this point has some interesting features. The "document root" for the server is normally the directory "docs" though if the server is running on a machine that answers to the names "spot" and "rover" if you create separate directories with these names these will be used as the document roots to implement "virtual hosting" where one Web server can appear to be more than one Web site.

The HTTP "If-Modified-Since" header attribute is supported returning a "304 Not Modified" if a document has not changed since it was last served. In addition if a document has modification time in the future an "Expires" attribute is placed in the header of the response. This can be used to encourage caching at the client. Transfer of "zipped" documents is supported by looking for a version of the document requested with ".gz" appended to the file name. If present this version of the file will be served with the response attribute "Content-Encoding: gzip" set. This results in a considerable increase in speed in serving HTML documents over modem lines or a Wide Area Network.

Virtual directories for a particular site are created by putting a perl module source file into the "pgi-lib" directory of the site's document root. For example putting the file "ls.pm" into the docs/pgi-lib directory results in all URIs that start with "/ls" being redirected to that module. A PGI module can be coded exactly as you would a CGI script except that the code to be run must be enclosed in a function "main()" in a package of the same name. In this way the module can be compiled once but run repeatedly. The PGI module pointed to by the symbolic link "default.pm" in the pgi-lib directory will be used to implement the root for a particular host.

Documents with a file extension "psp" are "Perl server pages" and contain HTML with perl code embedded inside the special tags <% %>. These documents are compiled into the server by the module pgi-lib/psp.pm

Multi-Process server

Unfortunately if a process does not fork() it means that only one request can be processed at one time which reduces the usability of the server if any given request takes more than a trivial amount of time. Also if a user presses cancel on the client browser the current request should stop processing at the server.

In order to resolve these problems a multi process version of the server (server.pl) receives all connections in a single master process but delegates "PGI" requests which may take an indeterminate amount of time to a pool of "sub-servers". These sub-servers are child http server processes, forked once on demand but reused in the same way as the non-forking() single process Web server discussed above. Separate processes where used even though threads in Perl can now be considered mature as the underlying C libraries many modules use are not necessarily thread safe and for guaranteed reliability. These processes are supervised by the parent process and "killed" in the event of the browser closing the connection before the child process has completed.

The multi-processing server is unfortunately rather complex using unblocking I/O and a generic "select()" loop in Server::Main.pm to relay results from clients to "sub-servers". Server::Httpd.pm is a subclass of PSP::Httpd.pm which interacts with the master service process through local control socket connections recycling the process for the next request when it is available to accept and process a new connection.

Consult the README files in the various source directories for more information.

John Holdsworth
24th July 2001