Hi,
I am trying to work through the Hit collection process for a PhraseQuery (using an exact phrase). For an example search, say I'm looking for:
"lucene action"  (quotes indicating exact phrase)

in a one doc, one field index consisting of:
wow, lucene rocks, lucene action items are cool, very action packed


So my questions are:
- What does (Default)Similarity.idf() do?
   Just create a base frequency for a term in the query?
i.e. it creates something of a raw score, based on term (not phrase) frequency in a doc

- Does (Phrase)Weight do the work of finding a phrase match?
   I don't think so, but get confused in the weight/scorer functionality

-Does ExactPhraseScorer do the work of finding a phrase match?
   I think so, but seem to keep missing where
i.e. where does the first instance of "lucene" (lucene rocks) get skipped but the second one (lucene action) become a hit?

- What does SegmentTermPositions do?
I think this is critical to the PhrasePositions process, which PhraseScorer uses I think it hold pointers to the positions in the index file streams (ability to read the indexes?)

- Where does the code identify a 'proper' hit for something like an exact phrase?

Apologies in advance for the poor sample text above, and the repetition in question matter. Hopefully I am getting closer to getting my head wrapped around the query/hit process (and then work on extending the hits to include offset position).
Thanks

Sean



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to