On Sep 25, 7:52 pm, Paul Rubin <http://[EMAIL PROTECTED]> wrote: > "exhuma.twn" <[EMAIL PROTECTED]> writes: > > Is it possible to calculate a distance between two chunks of text? I > > suppose one could simply do a simple word-count on the chunks > > (removing common noise words of course). And then go from there. Maybe > > even assigning different weighting to words. But maybe there is a well- > > tested and useful algorithm already available? > > There's a huge field of text mining that attempts to do things like > this. http://en.wikipedia.org/wiki/Latent_semantic_analysisfor some > info about one approach. Manning & Schutz's book "Foundations of Statistical > Natural Language Processing" (http://nlp.stanford.edu/fsnlp/) is > a standard reference about text processing. They also have a > new one about information retrieval (downloadable as a pdf) that > looks very good: <http://informationretrieval.org>.
Thanks a lot. This gives me some bed-time reading. -- http://mail.python.org/mailman/listinfo/python-list