20 aug 2007 kl. 05.19 skrev Lokeya:
Grant Ingersoll-6 wrote:
On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
I want to find out the document content similarity

A common way of doing this is by calculating the cosine of the angle
between the two vectors.

I can use the getTermFreqVector() on Index Reader and get it. But I am
wondering whats the API which has to be used to find the similarity between 2 such vectors which would give a score (doc-doc similairty in essence).

Bob Carpenter wrote an article on the subject for "Lucene in Action". He also works on LingPipe, a semi-free peice of software that might be helpful if
your Greek kung fu is too weak.

<http://www.alias-i.com/lingpipe/docs/api/com/aliasi/spell/ TfIdfDistance.html>



--
karl





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to