Raja Raman Sundararajan wrote: > Hello guys, > I was investigating how one can use the "text indexers" in python > and I stumbled across several ones. eg., pylucene > > I wanted to know how the algorithm of indexers look like. I have heard > people talking about B-Trees. But this info. is simply know enough. I > would like to know exactly each part of the indexing flow and the > algorightm behind it work. >
Info pls, are you talking about how Analyszers do stopwords and stemming / lemmatization? or how queries are handled, or how indexes (bitfields) are built / queried? I think Hatcher's book is the best place to start, tho doesn't focus intensively on lucene's implementation http://lucenebook.com/blog/ http://divmod.org/trac/wiki/WhitherLupy This *was* a pure python port, Lupy which i remember being pretty easy to follow, maybe you can get them to send it to you -- http://mail.python.org/mailman/listinfo/python-list