Hi Sean, On Thursday 08 September 2005 05:16, Sean O'Connor wrote: > Hi, > I am trying to work through the Hit collection process for a > PhraseQuery (using an exact phrase). For an example search, say I'm > looking for: > "lucene action" (quotes indicating exact phrase) > > in a one doc, one field index consisting of: > wow, lucene rocks, lucene action items are cool, very action packed > > > So my questions are: > - What does (Default)Similarity.idf() do? > Just create a base frequency for a term in the query? > i.e. it creates something of a raw score, based on term (not phrase) > frequency in a doc
It defines a basic weight for a term in a query. It uses the number of documents containing the term and the total number of documents in the index. This value does not vary over different documents in the same query. > - Does (Phrase)Weight do the work of finding a phrase match? > I don't think so, but get confused in the weight/scorer functionality A Weight determines the weights to be used for scoring a query for a particular query search. It collects various weights from parts of the query and then normalizes these. For nested queries this can involve some of recursion. The javadoc of Weight in the svn trunk is more detailed than in 1.4.3. > -Does ExactPhraseScorer do the work of finding a phrase match? Yes, but it does a bit more: it also counts the number of phrase matches in the document. The score of a phrase in a document only depends on this count. > I think so, but seem to keep missing where > i.e. where does the first instance of "lucene" (lucene rocks) get > skipped but the second one (lucene action) become a hit? In ExactPhraseScorer.phraseFreq() the freq variable is not incremented for the first instance, but it is incremented for the second one. Non exact phrases can have more hits in a document, so they need another implementation of phraseFreq(). > - What does SegmentTermPositions do? > I think this is critical to the PhrasePositions process, which > PhraseScorer uses > I think it hold pointers to the positions in the index file streams > (ability to read the indexes?) Correct. This is one of the things that make Lucene fast. It is package private because it is not normally used directly in a Lucene application. Optimizing an index merges all the segments into one. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]