"exhuma.twn" <[EMAIL PROTECTED]> writes: > Is it possible to calculate a distance between two chunks of text? I > suppose one could simply do a simple word-count on the chunks > (removing common noise words of course). And then go from there. Maybe > even assigning different weighting to words. But maybe there is a well- > tested and useful algorithm already available?
There's a huge field of text mining that attempts to do things like this. http://en.wikipedia.org/wiki/Latent_semantic_analysis for some info about one approach. Manning & Schutz's book "Foundations of Statistical Natural Language Processing" (http://nlp.stanford.edu/fsnlp/) is a standard reference about text processing. They also have a new one about information retrieval (downloadable as a pdf) that looks very good: <http://informationretrieval.org>. -- http://mail.python.org/mailman/listinfo/python-list