[ 
https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917663#action_12917663
 ] 

Robert Muir commented on LUCENE-2529:
-------------------------------------

bq. Rob, I don't completely follow your first paragraph

What i was trying to say, is that there's no way for positions to be properly 
accumulated across multi-valued fields.
for example (i will use the pipe as a field separator and assume english 
stopwords):
{noformat}
brown fox | went to | market
{noformat}

In this case the index will "lose" the 2 position increments caused by "went", 
and "to", and they 
won't be reflected in the "market" position.

My suggestion is that if you have values like this with position dependencies, 
they are really
one single value, not independent values, and don't belong in a 
multivalued-field.

In this case, if you simply index the entire content as one field, and in your 
tokenstream handle the 
separator however you want, and the "market" token will properly reflect 
whatever you previously did 
with the tokens, either via that separator and/or stopwords or other things.

bq. For my problem space, I'm willing to sacrifice the ability to do phrase 
queries.

Right, but my concern is that other users are not. 
I don't think we should discard the first token's position increment value 
completely, will the QueryParser do this too?

bq. My patch here (and the patch already applied by Koji recently) for this 
issue isn't really code specific to the problem I'm solving, but it is 
necessary for my approach

The previous patch (the one described on the issue) I definitely agreed with. 
But what you speak of here (discarding the first token's position) is 
different, 
and I'm not convinced its necessary for your approach (you could use a 
single-valued field).

bq. All existing tests pass. On the basis of that alone, I'm hopeful that you, 
Michael, and other committers are amenable to applying this patch.

Well, unfortunately (not your fault at all!) that isn't very comforting to me. 
For example, the queryparser has very minimal tests wrt this sorta stuff, yet
as I mentioned above its important to think about how it consumes tokenstreams, 
because if its inconsistent with the indexer then queries start returning less 
results.


> always apply position increment gap between values
> --------------------------------------------------
>
>                 Key: LUCENE-2529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2529
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
>         Environment: (I don't know which version to say this affects since 
> it's some quasi trunk release and the new versioning scheme confuses me.)
>            Reporter: David Smiley
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: 
> LUCENE-2529_always_apply_position_increment_gap_between_values.patch, 
> LUCENE-2529_skip_posIncr_for_1st_token.patch, 
> LUCENE-2529_skip_posIncr_for_1st_token.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm doing some fancy stuff with span queries that is very sensitive to term 
> positions.  I discovered that the position increment gap on indexing is only 
> applied between values when there are existing terms indexed for the 
> document.  I suspect this logic wasn't deliberate, it's just how its always 
> been for no particular reason.  I think it should always apply the gap 
> between fields.  Reference DocInverterPerField.java line 82:
> if (fieldState.length > 0)
>           fieldState.position += 
> docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> This is checking fieldState.length.  I think the condition should simply be:  
> if (i > 0).
> I don't think this change will affect anyone at all but it will certainly 
> help me.  Presently, I can either change this line in Lucene, or I can put in 
> a hack so that the first value for the document is some dummy value which is 
> wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to