The Web involves three really important things:
Web Browsers, that generate queries for information,
Web Servers, that respond to the queries, providing responses, and
The HTTP protocol by which this communication of information takes place.
HTTP is not the same thing as the HTML language used to represent information; HTTP primarily involves passing the names of "objects" and their values back and forth. While there may be disagreement as to how HTML ought to be defined, people are pretty clear on the HTTP definition...
There is, however, a proposal for HTTP-NG - Next Generation Hypertext Transfer Protocol.
Apache is the most popular web server in use on the Internet today, and is the software used by approximately 41% of all web servers. (Based on the Netcraft survey, February 1997.) It is not unlikely that the fact that common Linux distributions such as Red Hat, Debian, and Slackware install Apache by default has assisted in its growth. It is commonly run on Intel "boxes" running Linux or one of the BSDs, and provides fine performance for most purposes. Several SSL (Secure Sockets) implementations are available from commercial vendors.
There are a number of related projects that include "Apache on OS/2," integration of scripting languages, a news weekly called Apache Week, and the Apache Module Registry listing developments using the Apache API. This last indicates a cause for Apache's popularity. "Standard" modules provide support for such things as enhanced authentication, server side includes, and some support for external relational databases. Contributed modules provide support for integrated execution of code in languages such as Java, Perl, and Python, enhanced logging, and many ways of managing authentication that allow security to be handled using cookies and database queries.
A tool for collaborative production of (apparently) documentation across the web, with WebDAV support.
sws is a very simple web server implemented as a small shell script. No CGI support, but it nicely feeds static content of various types.
This is a "highly featureful" web server notable for providing fine-grained control over security. It tries to provide as "built-ins" functionality in the areas of document 'rewriting/inclusion' and text searching that obviates the need for the use of CGI for these sorts of purposes. Documents can present different "views" to clients based on such things as the client's IP address, domain name, browser type, browser "Accept" capabilities, and more. The reduced dependence on CGI improves performance while cutting down on possible security holes.
Patches are available to allow use of SSL in conjunction with WN.
This is a web server intended to provide high performance with
limited hardware resources. It does this by having a small RAM
footprint and minimizing the number of processes it has to
fork(); it will only fork when running CGI code. Boa does not
support SSL at this time. I've tended to
use Boa rather than Apache due to it
being tiny, and since it does not have the same security
The Language Agnostic Web Server
Xitami is a combination web server/ftp server that runs on various platforms including Win32.
The main purpose of AllegroServe is to serve dynamic pages using an html generator. It can dynamically generate web pages and "web-enable" existing applications with a browser front-end.
The code is licensed under the LGPL, and presumably could be fitted to work with one of the free Common Lisp implementations.
LSP is a Common Lisp- based dynamic content generation facility. It features a syntax that simultaneously offers the semantics of SGML and Common Lisp in the source LSP page, and the ability to compile the page.
HTTP server inside Linux kernel
This web server may provide the fastest conceivable service for static web pages.
A multithreaded web server with integrated Tcl .
A networking engine implemented in Python
The djb HTTP server... Minimalist, difficult to exploit... difficult to figure out licensing issues...
A small HTTPD implementation that can, when compiled with dietlibc, be a tiny statically compiled binary that can support thousands of concurrent accesses.
Squid caches web pages in memory, spilling to disk as needed, so that if a page is requested multiple times, this can be satisfied by one access to the original page. This cuts down on overall traffic. ISPs should run something like Squid to improve performance for users and cut down on use of scarce communications resources.
Its purpose in life is to filter out gratuitous advertising and cookies from web pages. Configuration files describe sets of "blocking" rules for both URLs and cookies. By blocking sites that are purely used for advertising, you eliminate the images that take so long to load. (Web Search Engines such as AltaVista are getting to be quite annoying in their infliction of annoying advertisements...) Junkbuster can be used in sequence with other proxy servers; I have configured it as the proxy "nearest" to the user.
More sophisticated approaches would be to, rather than blocking URLs, rewrite the HTML so as to:
Remove the blank space resulting from the blocked image,
Substitute another "more desirable" image in place of the blocked image, or even
Reformat the web page in some other manner to make the contents more usable.
Parsers such as WebFilter have been created to do so. Unfortunately, this is a much more complex, CPU-intensive, and error-prone approach which also has the potential to bring up legal questions as it involves modifying the contents of copyrighted works.
A text search engine freely available in source code form.
An interface that uses Glimpse to search documents, using web FORMs to provide a "graphical" front end.
A web server that eliminates identifying information about you when accessing other web sites.
This is an HTTP Proxy that uses ipfwadm to intercept HTTP requests and pass them to the proxy. As a result, there is no need to set any environment (e.g. http_proxy) or internal browser variables in order to make Linux-based programs make use of the proxy server.
CL-HTTP is a web server written in Common LISP
Winning points as about the most bizarre implementation...
Also in the running is a web server implemented in Awk .
If there is anything more bizarre than a web server in Postscript, it would be one in sed .
Sometimes you want to post a web link in a news article. Unfortunately, if it is a reference to (say) a Google -based news article, the URL may get exceedingly long. The above web site basically generates 9 digit lookup codes, so that if your URL is terribly long, you can shorten it to easily fit onto a line.
Which produces somewhat shorter links than makeashorterlink's...
A web server implemented in APL...
Things that a proxy server can buy you include:
Not blocking for DNS hostname resolution (probably provided by all proxies...)
Anonymizing your connection - JunkBusters
Changing the claimed client name - JunkBusters
Controlling the transfer of cookies - JunkBusters
Blocking images - JunkBusters
Allowing end-to-end compression
It looks as though it would be fairly capable of being used to run a full service library, albeit with a couple of things making it unattractive:
The code is pretty verbose, with huge amounts of HTML
tagging hard-coded in
When I installed it, the user authentication code didn't seem to work as planned to the point to which I disabled it.
There are no facilities for pulling bibliographic information from external sources.
It you have an ISBN identifier, it is possible to pull a lot of data from public sources such as book vendors, and this should be able to considerably streamline the data entry process, which involves a whole lot of manual data entry as currently constituted.
Mind you, if one were to do some scrubbing on it, it might not be difficult to clean up...
I use Lynx in conjunction with Software Agents written in scripting languages such as Perl, Python, and Guile to search the web for information on such things as stock and mutual fund prices and weather forecasts.
In order to have a single, common, web cache that is persistent and provides service for any web browser software that I might use, I have installed the Squid web cache package (aka "proxy server"). I also use JunkBuster.
For blog fans, there are plenty of options out there...
Fewer are database agnostic, allowing use of PostgreSQL and such, such as Serendipity Weblog System , Movable Type: Publishing Platform for Business Blogs and Professionals .
It is not uncommon for blog systems to eschew databases in favor of flat files, such as blosxom :: the zen of blogging :: .