I've been putting a little bit of time into a file indexing engine in python, which you can find here: http://dcs.nac.uci.edu/~strombrg/pyindex.html
It'll do 40,000 mail messages of varying lengths pretty well now, but I want more :) So far, I've been taking the approach of using a single-table database like gdbm or dbhash (actually a small number of them, to map filenames to numbers, numbers to filenames, words to numbers, numbers to words, and numbered words to numbered filenames), and making each entry keyed by a word, and under the word in the database is a null terminated list of filenames (in http://dcs.nac.uci.edu/~strombrg/base255.html representation). However, despite using http://dcs.nac.uci.edu/~strombrg/cachedb.html module to make the database use faster, bringing in psyco, and studying various python optimization pages, the program just isn't performing like I'd like it to. And I think it's because despite the caching and minimal representation conversion, it's still just too slow converting linear lists to arrays back and forth and back and forth. So this leads me to wonder - is there a python database interface that would allow me to define a -lot- of tables? Like, each word becomes a table, and then the fields in that table are just the filenames that contained that word. That way adding filenames to a word shouldn't bog down much at all. -But-, are there any database interfaces for python that aren't going to get a bit upset if you try to give them hundreds of thousands of tables? Thanks! -- http://mail.python.org/mailman/listinfo/python-list