20 aug 2007 kl. 05.19 skrev Lokeya:
Grant Ingersoll-6 wrote:
On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
I want to find out the document content similarity
A common way of doing this is by calculating the cosine of the angle
between the two vectors.
I can use the getTermFreqVector() on Index Reader and get it. But I am
wondering whats the API which has to be used to find the similarity
between
2 such vectors which would give a score (doc-doc similairty in
essence).
Bob Carpenter wrote an article on the subject for "Lucene in Action".
He also
works on LingPipe, a semi-free peice of software that might be
helpful if
your Greek kung fu is too weak.
<http://www.alias-i.com/lingpipe/docs/api/com/aliasi/spell/
TfIdfDistance.html>
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]