The first major approach to "document management" came in the area of systems constructed to store "document images."
This application area first became important in places such as legal departments where there are enormous numbers of legal contracts and the likes.
I first encountered it on a visit to NOVA Corporation in Calgary in the late '80s. They manage oil pipelines, and have huge quantities of associated paperwork. In order to reduce the time spent looking up data, they decided to put the data online. They would:
Scan documents using a fast page scanner, putting images on disk.
Data entry personnel would add to this indexing information, including identifiers for the location where which the originals would be filed.
If possible, they would run the images thru OCR (Optical Character Recognition) software so as to have the full text of the document. This normally involves considerably less disk space than the graphical images, and is from that perspective a "low cost" addition.
Having the text in electronic form allows them to do full-text searches, making it that much easier to find information later on.
As each image, even in compressed form, consumes 100K or so of disk space, the thousands of documents quickly eat up disk space. This necessitates creation of "disk farms" with enormous amounts of disk space.
Archival policies are set up so that material that is rarely accessed migrates to slower/cheaper media and may ultimately be discarded based on various policies. This includes options such as:
Magneto-optical disk arrays
Magnetic tape arrays
Record CD-ROMS and dump them into a large CD-ROM array
Pioneer sells a 500-CD "jukebox;" that's 330GB of storage at the fairly-reasonable media cost of about $10/GB.
If DVD standards (defining the "new" CD-ROM format) ever solidify, this technology should provide significantly higher storage density.
Once development work has turned into product, the upcoming HD-ROM technology seems pretty interesting.
HDROM is a high-density write-once format that makes use of ion etching equipment to write data onto steel or iridium plates. With suitable error correction conventions, it should be more robust than the stone tablets of antiquity. Steel "tablets" remain readable thru "intense" treatment such as attempted erasure using mechanical devices such as jackhammers; iridium is likely to remain readable even through more extreme events such as nearby nuclear bursts.
The cost of the readers does not forceably need to be much higher than that of a CD-ROM, as it uses similar laser reader technology (at a much higher density); the cost of HDROM writers (once made in quantity) is expected to start at about $10,000 for a "steel" version.
Cost per MB should be lower than anything else currently available.
The research work is apparently pretty stable; development work is proceeding via joint ventures with companies such as IBM.
Maintenance of scanned images is important for organizations that need to archive documents for legal purposes. This commonly includes:
Land Registry offices
Once a system has been constructed that can link together scanned images, it is relatively easy to add support for inserting other sorts of documents that come in "naturally electronic form." Viewer software can then invoke the appropriate application as needed to view different kinds of document formats. This opens the utility of this up to organizations such as:
Repair and Maintenance groups (engineering diagrams, maintenance documentation)
Engineering and architectural departments
Integrated document management systems are sold by such companies as:
There are some packages that are somewhat less integrated that are freely available.
FileArchiveSearch (FAS) is a catalog system for CD-ROMs or other media.
Written in PHP as a web application, intended for collectors of MP3s, pictures, and videos.
If you have "way too much stuff" lying around, for sentimental reasons, it may be possible to cut down by taking pictures of things, and perhaps storing these digitally...