Hi,
On Aug 16, 2007, at 2:20 PM, Lokeya wrote:
Hi All,
I have the following set up: a) Indexed set of docs. b) Ran 1st
query and
got tops docs c) Fetched the id's from that and stored in a data
structure.
d) Ran 2nd query , got top docs , fetched id's and stored in a data
structure.
Now i have 2 sets of doc ids (set 1) and (set 1).
I want to find out the document content similarity between these 2
sets(just
using doc ids information which i have).
Not sure what you mean here. What do the doc ids have to do with the
content?
Qn 1: Is it possible using any lucene api's. In that case can you
point me
to the appropriate API's. I did a search at
:http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/
javadoc/index.html
But couldn't find anything.
It is possible if you use Term Vectors (see
IndexReader.getTermFreqVector). You will need to store (when you
construct your Field) and load the term vectors and then calculate
the similarity. A common way of doing this is by calculating the
cosine of the angle between the two vectors.
-Grant
--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]