+
+ I am providing here two additional modules and two patches for the
+standard library.
+
+ Those two modules are ZODBhash and MKhash. They provide dbm-like
+interface based on ZODB and MetaKit. They are intended to be used by
+anydbm, so I am also providing corresponding patches for anydbm.py and
+whichdb.py.
+
+ Download mzhash.zip - it contains modules, patches and simple tests.
+
+ Also I made a patch for the shelve.py module. I created two additional
+shalve - CompressedShelf and CompressedKeysShelf. These shelve use zlib to
+compress/decompress data. CompressedShelf compresses only data, and
+CompressedKeysShelf compresses both data and keys.
+
+ Download mshelve.zip.
+
+ Below is a long story why I created all this and how I compared them.
+
+ I started with the need to create ispell-like hash with all forms of
+every word. I needed this for full-text search. (BTW, I think it'd be nice
+to include this kind of search into ZCatalog; I'll think about it later). I
+looked into ispell and htdig sources and manuals, and found that I'd better
+write my own programs and libraries instead of trying to wrap those
+complex ones.
+
+ I found (in ispell manual) I can generate simple text file with all
+neccessary information: ispell -e <russian.dict | sort >ruusian.words. So
+the task is to construct a hash for fast access to this information.
+
+ Very easy, thanks Python! Just read every line, split it and put into
+disk-based hash (anydbm!).
+
+ I wrote the program in a minute. The program generates two hashes. One
+hash, words2root maps every word to its normal form ("passing" => "pass").
+Another, root2words maps normal form to the list of all forms ("pass" =>
+["pass", "passed", "passing", "passes", "passable", "impassable"]). The
+hashes are named after htdig, of course.
+
+ The first run was a surprise. It was running for 5 hours, swapping a
+lot, and finally it generates two 85-megabytes files (Berkeley DB hashes).
+170 megs from 10M text file! Wow!!!
+
+ So I started to think I want to experiment with other disk-based hashes,
+and I wanted to find a way to speed things up and lower disk requirements.
+
+ Next thing I tried - ZODB. ZODB is itself hash (a sort of), so I easily
+wrote ZODBhash wrapper. I reran my program. It failed. ZODB ate /tmp very
+fast - 700 megabatyes by one hour. I tried to commit subtransactions or
+even transactions during write (__setitem__), but this was of not much
+help, and my program stopped by IOError, "no space left on device" :(
+
+ Then I tried to to write compressed data to the hash. I created two
+shelve - CompressedShelf and CompressedKeysShelf and tried them with bsddb.
+I cleared my computer from all jobs, stopped XWindows, etc - and reran the
+program two times - with Shelf and CompressedKeysShelf. Shelf created 2 85
+megs files in 3 hours, and CompressedShelf created 2 files - one 85 and the
+other 21 megs - in 3.5 hours. Win in disk space (not much) and loose in
+time.
+
+ I tried to use gdbm instead of bsddb. Again, I ran the program two
+times. Result: Shelf - 120 and 50 megs in 5 hours, CompressedKeysShelf -
+120 and 13 megs in 4 hours. Some win and some loose. During the runs my
+computer swapped a bit less than when I used Berkeley DB, so it seems gdbm
+uses less memory.