BBS水木清华站∶精华区
发信人: minix (海盗船长), 信区: Linux
标 题: Apache for Developers
发信站: BBS 水木清华站 (Sun Mar 14 17:20:23 1999)
October 1998
Subscribe NOW!
Apache for Developers
The latest Apache Web server features a modern architecture and a
choice of solid development environments.
by Bjorn Borud
The Apache Web server is probably the most popular Web server among Web
professionals today. Some would say that this is despite the fact that
Apache is a free product developed mainly by what they refer to as
"enthusiasts" and despite the fact that little, if any, money has been
spent promoting it. My personal opinion is that the Apache is exactly what
people want because it is made by the people who use it.
Works With
Apache for Unix, NT
The Apache project grew out of an effort to improve the NCSA httpd, which,
in early 1995, was the most popular Web server. The first incarnations of
Apache were based on NCSA httpd 1.3 and the name "Apache" reflects the
state of the project early on: It was "a patchy server"—a server that
consisted of NCSA httpd 1.3 and a series of patches.
Today Apache is best viewed as an application framework on which you can
build your solutions, rather than a shrink-wrapped product with a fixed
set of features. Sure, Apache is a good Web server in itself, but the real
advantage comes from its extensibility and the fact that many people
publish their extensions so others can use them directly or learn how to
create their own extensions.
In this article, you will learn about the general development features of
Apache, as well as two particular environments that lend themselves to
serious applications. The first, PHP, is a strong language for
database-related functions. The second, mod_perl, moves Perl scripting
into the high-performance arena and extends what you can do with it.
Face it: If you are looking to just serve up files you can use almost any
Web server. If you are looking for an advanced development platform for
solving more involved problems you have to consider what Web server
provides the most cost-effective solution. The API and the module
framework make Apache an excellent choice as a platform on which you can
build your Web applications. The fact that it is free and distributed in
source code form also helps ensure that there are plenty of developers who
have intimate knowledge of the server and will be able to assist you in
development.
Another important aspect to consider if you are looking for a Web platform
is that Apache won't go away any time soon. Companies go out of business,
get taken over, or discontinue products regularly. Since Apache does not
really belong to any one organization it will stay around as long as there
are people who want to use it. The fact that Apache is the most popular
Web server in use on the Internet today, with more than one million users,
suggests that people aren't going to lose interest any time soon.
That community has brought Apache a long way from its patchy beginnings.
The current architecture no longer suffers from NCSA's request handling,
which terminated every process upon completion of each request. Now,
Apache uses a pool of processes that it establishes at start up, a much
more efficient use of server resources.
Get Modular
Since version 1.0, Apache has also been divided into modules (see Table
1). An API was also published to enable third-party developers to provide
their own modules. For the developers of Apache, this meant that they
could move much of the code out of the core of Apache and into modules.
While the core of Apache takes care of all the tasks having to do with
process management etc., the modules provide the more application-oriented
functions of the Web server. Things like authentication, access checks,
URL-to-filename translation, sending content back to the user, and logging
the request were now put into modules so they could easily be maintained
independently of the Apache core. In addition to the standard modules that
come with Apache, there are quite a few modules that have been contributed
by third parties to provide alternative ways to authenticate users, means
to limit the bandwidth usage of different areas on your Web server, etc.
For developers, modules mean it's easier to add functionality to the Web
server or even alter its existing functionality. Modules are usually
compiled into the server. When building Apache you specify what modules
you'd like to include and the module will be compiled into the Web server
binary.
Some people found this a bit cumbersome—having to recompile the entire
server to add just one module; so in more recent versions of Apache
(1.3.x) you can compile a module into a "shared object" (what Windows
people know as a DLL) and load the module at runtime. Adding or removing a
module can now be done by compiling the module and editing the
configuration file. This functionality did exist previously, but in a more
experimental form.
In principle, the Apache Web server is not a complicated piece of
software. Simply put, it consists of a core that takes care of all the
low-level functions; a set of modules to provide whatever functionality
you would want from your Web server; and handlers to call those modules.
But how do the modules work? How does the Web server decide what to do and
when?
Phases: When Apache receives a request it will go through a number of
"phases" in order to serve the request, as shown in Figure 1. That
simplifies the task of the developer who wants to extend some aspect of
the server. For instance, if the developer wants to provide her own
authentication modules she will only need to write the code needed to do
the actual authentication. The developer will not have to bother with the
other tasks that need to be performed, like determining the MIME type of
the requested object or even write the code that sends the object back to
the user.
Handlers: A module can define "handlers" for one or more phases. For each
phase the server has a list of handlers from various modules that should
be called during each phase. Each module defines a hard-coded data
structure that identifies what phases it can handle.
When the server calls a handler, the handler performs its task and returns
a status code indicating how things went. An OK code will be returned if
the handler performed its task successfully.
The handler can also decline to handle the request and return the DECLINED
code, in which case the Web server simply ignores the handler and calls
the next handler in the list for that phase. Should an error occur, the
handler can indicate this by returning one of the HTTP error codes. The
server will then abort further request processing, write a message to the
error log and send an error message to the browser.
To sum up, a request goes through a number of phases. For each phase the
server maintains a list of handlers. The server will call each handler in
the list until a handler signals that it has handled the request or until
an error is reported. A module can contain one or more handlers. Writing
your own modules
If you plan on writing your own modules you should start out by reading
the "Apache API notes" section of the Apache documentation that comes with
the server. This will give you a basic idea of how to write a module.
After that, look at some of the modules that come with Apache. If you look
under the src/modules directory in the Apache 1.3 source distribution you
will find both standard and experimental modules plus a sample module
called mod_example that is heavily commented to help you understand what
it does.
Server-Side Programming
Modules, like the Web server itself, are generally implemented in the C
programming language. While it makes perfect sense to write a Web server
in C, it may not be very practical for the average Web developer to use
it. C is rather hard to use, even harder to debug, and judging by some of
the code I have seen, it can be terribly hard to read and understand.
The Web industry moves at a fast pace. Customers want their Web sites
online quickly and many of the Web developers have little or no prior
experience in software development. Needless to say, the unforgiving
nature of C makes it hard for inexperienced developers or those stressed
for time to produce reasonably stable code fast enough.
Writing modules for Apache in C may be an option for some, but for others
the time constraints or their ability may be too restrictive for a given
project. Fortunately, Apache gives you other ways to program function in,
the most general solutions being the Common Gateway Interface (CGI) and
FastCGI.
Perhaps the most common solution in the past, and presumably to this day,
is to use CGI in conjunction with some scripting language like Perl. While
widely used, CGI is actually rather crude. It relies on the Web server to
spawn a new process, send the pertinent data for the request to the
process, and then read the response from the program and send it back to
the client. As mentioned earlier, starting and stopping processes is the
nemesis of performance, so the standard CGI mechanism is likely to
introduce bottlenecks into your system.
Not only will the Web server have to spawn a new process, but if you use a
scripting language like Perl, the Perl interpreter will have to read the
script, load the appropriate Perl modules and compile the script into byte
code that can then be executed. Even if you use a language that lets you
produce pre-compiled binaries, there is still the significant overhead of
spawning a new process.
But CGI isn't all bad. The fact that it is so simple to use is probably
what made it so popular in the first place, and an added bonus is that you
are not limited to any one language when creating CGI scripts. If you
like, you can use any language that is able to read environment variables
and communicate using the standard I/O mechanisms.
But before you dismiss CGI as old-hat, consider the Fast CGI option. When
you use FastCGI, your CGI scripts will not terminate between requests, but
keep running, waiting for the next request to arrive, thus eliminating the
overhead of starting and stopping the script for each request.
Also, your CGI scripts don't have to run on the same machine as the Web
server when you use FastCGI. You can run your CGI scripts on a different
host to take the load off your Web server and thereby distribute the load
across several machines.
The advantages of using FastCGI are first and foremost speed, but also the
fact that you can make use of it without having to throw away your
existing CGI code. With some simple modifications to your CGI scripts you
can migrate them into a FastCGI environment. For more information on
FastCGI, point your browser to www.fastcgi.org.
PHP:Easy DB Access
When building Web applications, it would be convenient if you could make
the Web server look-up data in a SQL server and insert the data into HTML
documents or templates without having to create and maintain CGI scripts.
To get around the performance bottlenecks of a CGI-style interface and the
tedium of putting HTML code inside print statements, you could embed a
parser in the server which allows you to put code into your HTML markup.
This is exactly what PHP does.
PHP can also interpret input fields from forms and make the values
available as variables in the programming language. Also, the interpreter
can be compiled into the Apache Web server as a module. Now you only need
to create an HTML document with code embedded and point your browser at
the page. The Web server will automatically run the embedded code on
loading the HTML file.
Here's how it works. Imagine you have an HTML file containing a form:
<FORM ACTION="shoy database that OpenLink (www.openlink.co.uk) supports through the
OpenLink broker. This means that you can use the features OpenLink
provides to get persistent connections to databases and a single interface
to several databases of different kinds. It also means that there's an
easy way to use databases that aren't supported on your Web server
platform. For instance, there are no client libraries for Oracle available
under Linux (yet). If you Web server runs Linux and your database server
runs Oracle under Solaris, OpenLink provides a way to use the database
server from the Linux machine.
To make database access more efficient, PHP offers persistent connections
to databases in order to eliminate the need for connecting to the database
every time. PHP will stay logged into the database between requests and
re-use connections where possible. For some databases this makes database
access considerably faster.
Note that the connections are specific to the Web server process. There is
currently no mechanism to pool connections within the main process of
Apache and have the Web server processes share connections when needed.
This means that if you have a large number of processes with persistent
connections to your database, you will have an equally large number of
connections to the database.
As mentioned above, using OpenLink to pool connections in a middle tier
might be a good option if this is a problem. Usually it is not a problem.
Extending PHP
Adding native support for a database or some custom functions could be
awkward to implement in the PHP language itself; fortunately PHP is rather
easy to extend. Besides the API documentation that comes with the PHP
source distribution, there are many built-in functions you can look at for
reference.
Perhaps the most common problems when programming in C are memory and file
descriptor leaks. You allocate memory, but when you are done using it you
forget to give it back to the system. PHP provides a set of wrapper
functions for allocating memory. When using these wrapper functions PHP
will keep track of the memory you allocate and will then be able to clean
up after your code when the memory is not needed anymore.
PHP also provides a framework to handle persistent resources like database
connections etc. This can boost your performance considerably when using
PHP to communicate with systems that have a considerable startup cost
associated with initiating a connection. As we mentioned earlier, database
accesses can often be made persistent.
Built-in Goodies
PHP has a lot of built-in goodies that will come in handy when you want to
develop Web applications. Since the cookie mechanism is a popular way of
having the browser preserve state information across HTTP requests, PHP
provides functions to manipulate cookies. Once a cookie is set its value
can be accessed like a normal variable.
For instance if your PHP code contains the following code:
setcookie("session_id", "1234");
A cookie will be sent back to the browser and the next time the user
accesses the site the cookie will be available through a variable with the
same name as the cookie:
echo "The session id is $session_id\n";
For more details on what other parameters you can specify to the
setcookie() you should check out the documentation for PHP (see
"Availability"). Among the other neat features offered in PHP is
on-the-fly image generation using the GD library and support for TrueType
fonts in images using the FreeType library. PHP also provides an easy
interface to do file uploads from browsers, arbitrary precision math, you
can talk to LDAP, and IMAP servers, you can query SNMP agents and you can
even open files on other Web or FTP servers simply by using an URL instead
of a regular file name.
Development Using PHP
Developing Web applications with PHP 3.0 is very straightforward. Once you
have installed PHP 3.0 and set up the configuration properly you are ready
to go. Since trying to parse all HTML pages isn't too much of a
performance hit, I usually set up the server to interpret anything with
the .html suffix as PHP.
Now you can just create a file containing some HTML markup and some PHP
code and save it to a file with the suffix that Apache will identify as a
(potential) PHP file:
<TITLE>Test page</TITLE>
Hello there, this is my test page
and today is
<?PHP
echo date("l F d Y");
?>
If you make a mistake, the PHP parser will output an error message to your
browser and tell you in what file and on what line it detected an error.
You can also configure PHP to issue warnings when you're about to do silly
things, like using the values of variables before they have been
initialized.
Once your projects start to grow in size it is a good idea to put pieces
of code that you use often into separate files and then use the include()
command to load the code you need. Not only does this encourage code
reuse, but it will also make your HTML files considerably smaller and more
readable. An added bonus is that this will allow you to write less code in
the long run and make it considerably easier to correct your bugs.
To automate inclusion of often used libraries and code you can use the
configuration directives php3_include_path, php3_auto_prepend_file, and
php3_auto_append_file to make PHP load the appropriate libraries for you.
As duly pointed out again and again by the critics of PHP, it is not a
general-purpose language like Perl and thus lacks the immense number of
features and libraries available for Perl. Nor does it have the ability to
com of DBI is to
provide a consistent set of methods and properties that the developer can
use to access databases with different native APIs.
The actual database communication is done by dispatching calls from the
DBI layer to database-specific driver modules, called DBDs. In order to
support a new database you need only install the appropriate DBD module or
if none exists for your database; perhaps develop your own.
In any case, to users of mod_perl this means that whatever database the
DBI interface supports, you can use it in Apache as well. At startup you
can have mod_perl load the DBI interface and on most systems the processes
in the Apache server pool will be able to share the code so you won't have
to waste memory loading DBI in each child.
Persistent Database Connections
Another advantage of using mod_perl in conjunction with DBI is that you
can create persistent connections to the database thereby avoiding the
need to connect and disconnect from the database for every request that
you serve.
To use persistent database connections you should install the Apache::DBI
module. After this is done you simply add PerlModule Apache:: DBI to your
configuration file before any of the modules that actually use DBI. The
reason you have to load this module before any module that uses DBI is
because the DBI package checks to see if Apache::DBI has been loaded. The
Apache::DBI module stores database handles in a global hash and ignores
any attempts to close the connection to the database.
The connections are on a per-server basis, that is, the persistent
connection is established the first time each Web server process makes a
connection to the database. If you try to initialize a database connection
in the parent process of the Web server you may run into a lot of trouble
when several child processes try to access the same database connection at
the same time. Therefore, make sure you do not load any code that opens up
database connections using PerlRequire or PerlModule in the Apache
configuration files.
The Apache::DBI package also comes with some convenient modules for using
databases in authenticating users. This is a neat alternative to the
primitive password files or the somewhat awkward DBM files.
PHP and mod_perl are equally well suited to database connectivity your
choice should be based more on the degree of support that the package has
for your database and what language you are more comfortable with. The
advantage Perl has over PHP is that Perl has a more uniform interface
(remember DBI) to databases and it is probably easier to change database
brands without having to rewrite the code. (Then again, databases are
diverse critters so it may even be worse: You may even have to redesign
your application because the new database does things
differently—regardless of whether you use PHP or Perl)
Embedding Perl in HTML
You can also embed Perl code in HTML documents as you could in PHP. This
is done using the HTML::Embperl package. This package not only offers
embedded Perl in HTML documents for serving pages on-the-fly, but also the
ability to generate static HTML files that can later be served without the
need to run Perl scripts. The latter option may be an alternative if your
content changes at regular intervals, say once per day.
Using HTML::Embperl to embed Perl code in HTML documents is a bit
different than using PHP. When you used PHP you only needed to put <?PHP
and ?> around your code. With HTML::Embperl you have the following ways of
embedding code:
[+ Perl code +] replaces the code inside the [+ and +] marks with the
result of evaluating the Perl code. You can use variables, expressions
and even arrays and hashes.
[- Perl code -] will execute the command inside the brackets, but no
output will be generated.
[! Perl code !] is basically the same as [- Perl code -] but the code
will only be executed on the first request. This way of calling Perl
code can be used for defining subroutines or doing initializations.
[$ Cmd Arg $] HTML::Embperl has a set of meta-commands that allow you to
do things like:
[$ if $ENV{REQUEST_METHOD} eq
'GET' $]
Method was GET
[$ else $]
Method other than GET used
[$ endif $]
that cannot be done just by inserting plain Perl code because statements
like (...) {...} else {...} have to be contained within a single pair of
brackets.
When you use HTML::Embperl, the pages are cached much like normal Perl
code is cached; the page and Perl code is compiled and stored into memory
the first time the page is requested and each time the file changes. The
page is not cached as a static page so the code will be executed (but not
loaded and recompiled unless the file changes) on each request.
CPAN
The Comprehensive Perl Archive Network (CPAN) is a globally mirrored
archive that contains a huge number of Perl modules. If the modules that
come with Perl (or even mod_perl) lack something, this is the place to
look. For a list of CPAN sites you should check the "perlmodlib" manual
page that comes with Perl 5 or just visit www.perl.com.
To make it easier to find, manage, and install Perl modules there is also
something called the "CPAN shell" that will present you with a command
line interface through which you can search for and install modules. Be
warned, though: The CPAN shell won't always work as well as you'd want and
sometimes manual intervention is needed in order to make things work
properly.
Nevertheless, the CPAN shell is very practical in day to day use and it
can simplify installation if you end up needing to install more than one
package due to dependencies between packages.
The mod_perl home page is located at http://perl.apache.org/ and you will
find everything you need in terms of software, documentation, and links to
interesting information from that site. Start by downloading just the
mod_perl package (and of course Apache) and build an Apache Web server
with mod_perl first. Then, as you familiarize yourself with mod_perl you
can start downloading and try out the various packages that can extend
mod_perl to fit your needs.
PHP or mod_perl?
Throughout this article I've tried to give a balanced introduction to what
PHP and mod_perl can offer when used in conjunction with Apache. The
packages cater to slightly different audiences perhaps and I would
hesitate to say that one is inherently better than the other.
If you are considering which Web server infrastructure to offer your
users: install both and let users use whatever they like more. If you are
planning to undertake a large project, give both packages a spin and
listen to your developers after they've tried out both and familiarized
themselves with the software.
To the new user, I would recommend starting with PHP, though. It is easy
to understand and easy to use. Also PHP 3.0 seems to be more common these
days than mod_perl. I have completed several large projects using PHP and
never regretted the choice.
Bjorn Borud is a partner of Guardian Networks (www.guardian.no) in Norway
where he also works as a developer and consultant. Guardian Networks
specializes in Internet security, Unix, and creating the magic behind the
scenes for Web sites. Bjorn can be reached at borud@guardian.no.
--
你若想要自由、爱情和快乐
就只有用你的信心、决心和爱心去换取
除此之外绝对没有别的法子,不是吗?
※ 修改:·minix 於 Mar 18 18:24:32 修改本文·[FROM: 159.226.41.165]
※ 来源:·BBS 水木清华站 bbs.net.tsinghua.edu.cn·[FROM: 159.226.41.165]
BBS水木清华站∶精华区