Hi,
I am trying to work through the Hit collection process for a
PhraseQuery (using an exact phrase). For an example search, say I'm
looking for:
"lucene action" (quotes indicating exact phrase)
in a one doc, one field index consisting of:
wow, lucene rocks, lucene action items are cool, very action packed
So my questions are:
- What does (Default)Similarity.idf() do?
Just create a base frequency for a term in the query?
i.e. it creates something of a raw score, based on term (not phrase)
frequency in a doc
- Does (Phrase)Weight do the work of finding a phrase match?
I don't think so, but get confused in the weight/scorer functionality
-Does ExactPhraseScorer do the work of finding a phrase match?
I think so, but seem to keep missing where
i.e. where does the first instance of "lucene" (lucene rocks) get
skipped but the second one (lucene action) become a hit?
- What does SegmentTermPositions do?
I think this is critical to the PhrasePositions process, which
PhraseScorer uses
I think it hold pointers to the positions in the index file streams
(ability to read the indexes?)
- Where does the code identify a 'proper' hit for something like an
exact phrase?
Apologies in advance for the poor sample text above, and the
repetition in question matter. Hopefully I am getting closer to getting
my head wrapped around the query/hit process (and then work on extending
the hits to include offset position).
Thanks
Sean
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]