On 15/09/13 12:21, Uwe Schindler wrote:

Using multiple fields is the preferred approach! Internally in the
index this does the same like a single field with some gaps in the
positions.

Right, thanks.

All Tokenizers inside in Lucene *set* the position increment
accordingly, but filters are not required to read it (unless they
change it somehow). The attribute is solely for the IndexWriter when
creating the index. To insert manual gaps without multiple fields you
have to write an own TokenFilter or use the deprecated PositionFilter
one. But this is in general more work and much more complicated and
harder to understand than adding the same field multiple times.

That confirms what I'd thought based on a wander through the source. I'd read Lucene in Action and just got myself confused about what the best approach was.

The position increment gap is only respected by IndexWriter when
indexing, TokenStreams don't see it (because every field instance
gets own TokenStream).

Yes, that makes sense.

The default position increment gap of all Analyzers has a sensible
value to prevent PhraseQueries to match over 2 field instances. This
is the main reason why the gap is there: prevent position-sensitive
queries to match across fields.

Are you sure? I see this in Analyzer.java:

* Invoked before indexing a IndexableField instance if
* terms have already been added to that field.  This allows custom
* analyzers to place an automatic position increment gap between
* IndexbleField instances using the same field name.  The default value
* position increment gap is 0.  With a 0 position increment gap and
* the typical default token position increment of 1, all terms in a field,
* including across IndexableField instances, are in successive positions, allowing * exact PhraseQuery matches, for instance, across IndexableField instance boundaries.

and I can't find where any of the other analyzers override the getPositionIncrementGap method.

I've been using Luke to examine the generated index but I haven't been able to find a way to display the position value of each instance of a duplicated field so I wasn't quite sure if what I was doing was actually working.

--
Alan Burlison
--

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to