Hades Archive System

Introduction

Hades is a near-line archive system, made to fill a niche between a fileserver (online) and a shelf full of CDs or DVDs (offline).

An archive system lets you keep your fileserver with only those files needed for current work; all the old filesets are archived out of sight.

Hades doesn't present an online directory; all the manipulation is done with a web interface. Because of that, it's not necesary to use expensive and complex hard disk arrays. Every module is just a small mainboard with as many disks as possible to keep high densities. This achieves the lowest cost per Gigabyte.

Typically, in a network there's lots of data 'in work', in the user's hard disks, in the fileserver(s) and maybe in removable media. When a unit of work isn't 'current' anymore, it should be archived. Ideally, it would be unmodifiable, and unerasable.

Without Hades, the user would probably just burn a few CDs or DVDs. This is great for a low cost per Gigabyte, but its very cumbersome for retrieval. It's hard to keep a strict discipline to label and store the CDs. There's indexing software the helps to know which disc contains any given file, but it typically doesn't scale with thousands of discs (just a few Terabytes).

With Hades, the user just puts the data he want's to archive in a directory previously set up to be accessible from the network. He then logs in the Hades web page, and selects that directory and assigns it a label. The system indexes the data in a SQL database with this label, along with the exact time, and MD5 checksum. At the same time, the data files are copied to the hard disks of the storage modules, and the checksum is verified for every file.

To retrieve data, the user just enters the label (or part of it) and chooses from the resulting list. Files are either downloaded directly from the webpage, or could be put into a specified directory in a fileserver.