Re: Position increment clarification?

2013-09-15 Thread Alan Burlison
On 15/09/13 12:21, Uwe Schindler wrote: Using multiple fields is the preferred approach! Internally in the index this does the same like a single field with some gaps in the positions. Right, thanks. All Tokenizers inside in Lucene *set* the position increment accordingly, but filters are no

RE: Position increment clarification?

2013-09-15 Thread Uwe Schindler
Hi, Using multiple fields is the preferred approach! Internally in the index this does the same like a single field with some gaps in the positions. All Tokenizers inside in Lucene *set* the position increment accordingly, but filters are not required to read it (unless they change it somehow).

Re: Position increment clarification?

2013-09-15 Thread Alan Burlison
On 15/09/13 11:41, Michael McCandless wrote: Your understanding is correct: there are two ways to affect the indexed position. Thanks for the confirmation, took me a while to figure that out :-) Either approach would work, but if you do the single-field approach, the challenge is in making a

Re: Position increment clarification?

2013-09-15 Thread Michael McCandless
Your understanding is correct: there are two ways to affect the indexed position. Either approach would work, but if you do the single-field approach, the challenge is in making a TokenFilter that knows when one chunk ended so it could set the position increment. I think it'd be easier to just ad

Position increment clarification?

2013-09-15 Thread Alan Burlison
Firstly, some context. I'm indexing a large set of mbox files which contain multiple email messages, each mbox file being related to a single issue. I'm therefore indexing each mbox as a single document, treating each individual mail as a section of the same document. To control matching acros