On 2010-03-10 12:58 PM, mk wrote:
Hello everyone,
I need to do the following:
(0. transform words in a document into word roots)
1. analyze a set of documents to see which words are highly frequent
2. detect clusters of those highly frequent words
3. map the clusters to some "special" keywords
4. rank the documents on clusters and "top n" most frequent words
5. provide search that would rank documents according to whether search
words were "special" cluster keywords or frequent words
Is there some good open source engine out there that would be suitable
to the task at hand? Anybody has experience with them?
You can probably do most of this with Whoosh:
http://whoosh.ca/
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
--
http://mail.python.org/mailman/listinfo/python-list