On Tue, Jun 14, 2011 at 12:07 PM, <bangpypers-requ...@python.org> wrote:

> While looking into news.google.co.in site, they find the similar news by
> grouping them..
>
> For example, The following news headlines from different online portal are
> grouped together.
>
> Jayalalithaa meets PM, DMK watches closely
> Jaya to meet PM today in New Delhi
> Jaya-PM meet, 'jittery' DMK watches on Times
>
> How to do this in Python? I think, NLT toolkit is too large for me to learn
> and do.. Any other fun & simpler way to do that?
>

Both are fairly standard machine learning tasks.

First, you can use
clustering<http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/>to
identify classes - there are quite a few well known algorithms, such
as
k-means. Or, you could manually select which are your classes. Then, you
need to train a
classifier<http://en.wikipedia.org/wiki/Statistical_classification_%28machine_learning%29>which
will classify new articles into one of your classes.

For both these tasks, nltk provides very nice, pythonic tools. You can also
search for other pythonic machine learning toolkits. If you need to do
anything with natural language processing, though, nltk is well worth your
time to learn. It has excellent documentation including a few books.

HTH,
Vijay

-- 
Targeted direct marketing on Twitter - http://www.wisdomtap.com/
_______________________________________________
BangPypers mailing list
BangPypers@python.org
http://mail.python.org/mailman/listinfo/bangpypers

Reply via email to