Hello,
I am trying to work through term positions and how to get them from
a collection of hits. Does setting TermVector.WITH_POSITIONS_OFFSETS to
true save the start/end position of the term in the source text file? (I
_think_ it does).
If so, where would I start for trying to make that information
accessible in a "result set"? I believe it would be extending a query, a
scorer, a hit, and/or a weight object. I will be wanting to process ALL
hits, so I think will need to implement a hitcollector.
As an example of what I want, if I were looking for the offset
position of "brown" in a properly indexed field containing "the lazy
brown fox", I would like to get:
start==10
end==15 (assuming my counting is right)
Based on Paul Elschot's previous response to a similar question I
had (which I am still working on), I _think_ I need to extend something
like the ExactPhraseScorer. While debugging with my IDE (Eclipse) I can
see that the weight object in the scorer contains a reference to the
query. The query contains the fields:
Vector positions (just has ints of term positions in phrase?)
Vector terms (vector of Term, just field name and field contents?)
The weight also seems to have an array of TermPositions, which have
SegmentTermPositions. I thought this was what I wanted, but I don't see
the proper start/end fields, or anything which seems to be on the right
track.
Can anyone point me in the right direction?
Thanks,
Sean
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]