Dan Stromberg wrote: > Bryan Olson wrote: [...] >> Well, you could use simple files instead of fancy database tables. > > That's an interesting thought. Perhaps especially if australopithecine > were saved in a filename like: > > ~/indices/au/st/ra/lo/pi/th/ec/in/e
Right, though the better filesystems can efficiently support large numbers of files per directory. >>Below is a demo of an alternate technique that uses bsddb B-Trees, >>and puts both the word and the file-id in the key. I don't know >>how efficient it is for real data, but at least the time won't grow >>as Theta(n**2). > > > Perhaps I'm missing something, but is it not roughly O(1) for > individual insertions, but O(n*m) (n == number of files, m == number of > words) for lookups? Insertion into a B-Tree with n elements is Theta(lg(n)). Finding the files associated with a word, call the number of them 'k', should be Theta(lg(n) + k). That assumes words and files are roughly constant size. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list