+------------------------------------------------+ | Suffix tree implementation library version 1.3 | +------------------------------------------------+ Fabien Menemenlis 2001-10-22 nihilist@dead-inside.org http://sfxdisk.dead-inside.org/ NOTE: please note that this version is still in testing. I release it because the tests I've made were satisfying, but I couldn't test it on a long period yet. And I need feedback on eventual problems... This archive contains an implementation of a fast indexing library based on suffix trees. It features the basic functions you can expect from this kind of library: storing, retrieving, deleting and dumping/loading the database. All work is done on disk except for a custom cache used to save a few read/writes on nodes and thus requires fast disks but little memory. A performance of about 17000 inserts per second has been achieved on a P3 933 with a 7200 rpm IDE disk. Indexing time is also linear and lookup is extremely fast (requires only n + 2 disk access at most when looking up a word of n letters). It has been developped on FreeBSD/gcc but should be fairly portable. The source code "testsfx.c" shows an example of how to use the library for inserting, retrieving, and deleting data. There aren't many functions and comments should be enough to give you an idea of how to use the library. (read the header of the source file) You should edit sfxdisk.h to suit your needs: you can change the alphabet size and the offset type. It should be OK to use "long long" 64 bits ints instead of long, in fact I tested it succesfully but haven't gone to the point of filling more than 2 GB of data (needless to say you need a 64 bits filesystem). Two "tools" come with the library (new with version 1.2): dumpsfx and loadsfx. dumpsfx is used to dump the database: dumpsfx [-s separator] if you want to output the result as readable text or dumpsfx -h to output it for reloading with loadsfx. dumpsfx outputs on stdout and loadsfx reads from stdin. loadsfx < dumped_file An history list of deleted data blocks is kept so that the space is reused when possible. There's still room for improvement... like partial key lookup. Contribute if you can! Any feedback will be greatly appreciated, mail me at nihilist@dead-inside.org The library is provided under the LGPL license. See COPYING file for more info.