On Tue, Jun 14, 2011 at 12:07 PM, Gopalakrishnan Subramani < gopalakrishnan.subram...@gmail.com> wrote:
> Jayalalithaa meets PM, DMK watches closely > Jaya to meet PM today in New Delhi > Jaya-PM meet, 'jittery' DMK watches on Times > > How to do this in Python? I think, NLT toolkit is too large for me to learn > and do.. Any other fun & simpler way to do that? > 1) NLTK is pretty simple. You can do duplicate detection pretty easily - look out for sample codes. 2) Do a keyword generation from the content and check the correlation between documents. 3) For headlines alone : do a substring matching?(but this would leave the semantics of the text - i.e, 'Jayalalitha was last seen in KOdagu estate' and 'Real estate would get a boost under Jayalalitha' would be categorized under the same) -V http://blizzardzblogs.blogspot.com/ _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers