Re: Text mining in Python

Robert Kern Wed, 10 Mar 2010 11:09:36 -0800

On 2010-03-10 12:58 PM, mk wrote:

Hello everyone,


I need to do the following:

(0. transform words in a document into word roots)

1. analyze a set of documents to see which words are highly frequent

2. detect clusters of those highly frequent words

3. map the clusters to some "special" keywords

4. rank the documents on clusters and "top n" most frequent words

5. provide search that would rank documents according to whether search
words were "special" cluster keywords or frequent words

Is there some good open source engine out there that would be suitable
to the task at hand? Anybody has experience with them?


You can probably do most of this with Whoosh:

  http://whoosh.ca/

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list

Re: Text mining in Python

Reply via email to