Hi,

On Aug 16, 2007, at 2:20 PM, Lokeya wrote:


Hi All,

I have the following set up: a) Indexed set of docs. b) Ran 1st query and got tops docs c) Fetched the id's from that and stored in a data structure.
d) Ran 2nd query , got top docs , fetched id's and stored in a data
structure.

Now i have 2 sets of doc ids (set 1) and (set 1).

I want to find out the document content similarity between these 2 sets(just
using doc ids information which i have).


Not sure what you mean here. What do the doc ids have to do with the content?

Qn 1: Is it possible using any lucene api's. In that case can you point me
to the appropriate API's. I did a search at
:http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ javadoc/index.html
But couldn't find anything.


It is possible if you use Term Vectors (see IndexReader.getTermFreqVector). You will need to store (when you construct your Field) and load the term vectors and then calculate the similarity. A common way of doing this is by calculating the cosine of the angle between the two vectors.

-Grant

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to