Re: Text categorization / classification

2010-10-27 Thread mvazq...@ova.st
Thanks a lot! I was reading about Mahout today. I'll try that out. Thanks again Maria Sent from my iPhone On Oct 27, 2010, at 20:59, Lance Norskog wrote: > There are tools for this in the Mahout project. These are oriented > toward large-scale work. > > http://mahout.apache.org > > There is

Re: Text categorization / classification

2010-10-27 Thread Lance Norskog
There are tools for this in the Mahout project. These are oriented toward large-scale work. http://mahout.apache.org There is a big learning curve and you have to learn Hadoop somewhat. The book 'Collective Intelligence' includes a suite of Python tools for small-scale experiments. On Wed, Oct

Text categorization / classification

2010-10-27 Thread Maria Vazquez
I need to auto-categorize a large number of documents. They are basically news articles from major news sources (nytimes, npr, abcnews, etc). I'd like to categorize them automatically. Any suggestions? Lucene in Action suggests using a set of documents to build category vectors and then comparing