On Tue, Jun 14, 2011 at 12:07 PM, <bangpypers-requ...@python.org> wrote:
> While looking into news.google.co.in site, they find the similar news by > grouping them.. > > For example, The following news headlines from different online portal are > grouped together. > > Jayalalithaa meets PM, DMK watches closely > Jaya to meet PM today in New Delhi > Jaya-PM meet, 'jittery' DMK watches on Times > > How to do this in Python? I think, NLT toolkit is too large for me to learn > and do.. Any other fun & simpler way to do that? > Both are fairly standard machine learning tasks. First, you can use clustering<http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/>to identify classes - there are quite a few well known algorithms, such as k-means. Or, you could manually select which are your classes. Then, you need to train a classifier<http://en.wikipedia.org/wiki/Statistical_classification_%28machine_learning%29>which will classify new articles into one of your classes. For both these tasks, nltk provides very nice, pythonic tools. You can also search for other pythonic machine learning toolkits. If you need to do anything with natural language processing, though, nltk is well worth your time to learn. It has excellent documentation including a few books. HTH, Vijay -- Targeted direct marketing on Twitter - http://www.wisdomtap.com/ _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers