Chris Hostetter wrote:
You really, *REALLY* don't wnat to be doing this using the "Hits" class
like in your example ...
   1) this will re-execute your search behind the scenes many many times
   2) the scores returnd by "Hits" are psuedo-normalized ... they will be
      meaningless for any sort of comparison.

Thank you very much Hoss.

if your concern is making sure that the score you get back matches the
score you would get from executing a search even if you change the
Similarity, you could just make sure you use the lengthNorm and tf
functions from the SImilarity class just like TermScorer does

That sounds very good. The term frequency and the document frequency can I get from the IndexReader. The number of tokens in a field (numTokens) for the Similarity.lengthNorm function can I get from the term vector (TermFreqVector) or I use the IndexReader.norms(String field).

The usage of TermQuery in my previous example is a simplification. The documents of my collection have some fields like title, abstract or keywords. The term weights in my document term matrix should include all fields of a document for a word (token). So I used in reality a BooleanQuery that combines the possible TermQueries for a word. Of-course, I can sum the field weights of a term.

... or you
could keep executing a TermQuery for each term like you are now, but using
a HitCollector so you get the raw score)

take a look at the Searcher.search methods that take in a HitCollector.

That seems to be the easiest way for my BooleanQuery. I will start with this and change my current implementation.

Have a nice weekend.

Sören

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to