Thanks for the reply.  It looks like I can use parts of Similarity.

I'll post back once I get it working or at least closer ;-)

Andrew

-----Original Message-----
From: Grant Ingersoll <[EMAIL PROTECTED]>
Sent: Jun 3, 2005 6:51 AM
To: java-user@lucene.apache.org
Subject: RE: calculate wi = tfi * IDFi for each document.

I think the TermFreqVector (reader.getTermVector) has the info you want
per document.  You will need to sort it by frequency to get the top
terms in each document.  It doesn't give you the wi, just tfi, but the
whole score is implied by the fact that you have the top 10 documents, I
think.

-Grant

>>> [EMAIL PROTECTED] 6/2/2005 3:21:35 PM >>>
Ok.  So if I get 10 Documents back from a search and I want to get the
top 5 weighted terms for each of the 10 documents what API call should I
use?  I'm unable to find the connection between Similarity and a
Document.

I know I'm missing the elephant that must be in the middle of the room.
 Or maybe it's not there.
Is what I'm trying to do do-able?

Thanks,

Andrew

-----Original Message-----
From: Max Pfingsthorn <[EMAIL PROTECTED]>
Sent: Jun 2, 2005 5:33 AM
To: java-user@lucene.apache.org 
Subject: RE: calculate wi = tfi * IDFi for each document.

Hi,

DefaultSimilarity uses exactly this weighting scheme. Makes sense since
it's a pretty standard relevance measure...

Bye!
max

-----Original Message-----
From: Andrew Boyd [mailto:[EMAIL PROTECTED] 
Sent: Thursday, June 02, 2005 11:39
To: java-user@lucene.apache.org 
Subject: calculate wi = tfi * IDFi for each document.


If I have search results how can I calculate, using lucene's API,  wi =
tfi * IDFi for each document.

wi    = term weight
tfi    = term frequency in a document
IDFi = inverse document frequency = log(D/dfi)
dfi   = document frequency or number of documents containing term i
D    = number of documents in my search result

Thanks,

Andrew

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Andrew Boyd
Software Architect
Sun Certified J2EE Architect
B&B Technical Services Inc.
205.422.2557

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to