On 15/09/13 12:21, Uwe Schindler wrote:
Using multiple fields is the preferred approach! Internally in the
index this does the same like a single field with some gaps in the
positions.
Right, thanks.
All Tokenizers inside in Lucene *set* the position increment
accordingly, but filters are no
Hi,
Using multiple fields is the preferred approach! Internally in the index this
does the same like a single field with some gaps in the positions.
All Tokenizers inside in Lucene *set* the position increment accordingly, but
filters are not required to read it (unless they change it somehow).
On 15/09/13 11:41, Michael McCandless wrote:
Your understanding is correct: there are two ways to affect the
indexed position.
Thanks for the confirmation, took me a while to figure that out :-)
Either approach would work, but if you do the single-field approach,
the challenge is in making a
Your understanding is correct: there are two ways to affect the
indexed position.
Either approach would work, but if you do the single-field approach,
the challenge is in making a TokenFilter that knows when one chunk
ended so it could set the position increment.
I think it'd be easier to just ad
Firstly, some context. I'm indexing a large set of mbox files which
contain multiple email messages, each mbox file being related to a
single issue. I'm therefore indexing each mbox as a single document,
treating each individual mail as a section of the same document.
To control matching acros