[ 
https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-2529:
---------------------------------

    Attachment: LUCENE-2529_skip_posIncr_for_1st_token.patch

Always adding the position increment is good but insufficient to solve my 
problem.

A new patch rectifies the followup situation I reported inadvertently to 
LUCENE-2668 that I should have said here.  The jist is that DocInverterPerField 
_conditionally_ decrements the position and then always increments it, and this 
is problematic for attempting to keep position increments across several 
multi-value fields aligned (using an analyzer setting posIncr to 0) when the 
first value generates no tokens (either blank or stop words).  Mike McCandless 
pointed out that the unfortunate existing logic had to do with preventing the 
position from becoming -1 which doesn't work with payloads -- LUCENE-1542.  

My new patch here doesn't even have a pre-decrement nor post-increment and thus 
I find the code easier to follow.  It ignores the provided position increment 
of the first token (typically 1), voiding the need to shift them back and 
forth.  There is one oddity included here and that is I always add 1 to the 
position increment _gap_ (i.e. between values).  With this oddity included, all 
the tests pass (except for the test for this very issue, which I correct in 
this patch)  --yay!  Without this oddity, a handful of tests failed that 
depended on the first token adding one to the position.  My +1 up at the value 
loop can be seen as actually enforcing that the first token's position is 1, 
and also adding a +1 for when there is no token for a value (critical for 
aligning multiple fields).  Perhaps this +1 should happen at a different line 
number to be less confusing but the end result should be the same.

I expect for many people this is very confusing, especially if you're not knee 
deep in this subject as I am presently.  Mike, hopefully you're understanding 
what I'm up to here.  The tests pass, remember.

> always apply position increment gap between values
> --------------------------------------------------
>
>                 Key: LUCENE-2529
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2529
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.3, 3.0.2, 3.1, 4.0
>         Environment: (I don't know which version to say this affects since 
> it's some quasi trunk release and the new versioning scheme confuses me.)
>            Reporter: David Smiley
>            Assignee: Koji Sekiguchi
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: 
> LUCENE-2529_always_apply_position_increment_gap_between_values.patch, 
> LUCENE-2529_skip_posIncr_for_1st_token.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I'm doing some fancy stuff with span queries that is very sensitive to term 
> positions.  I discovered that the position increment gap on indexing is only 
> applied between values when there are existing terms indexed for the 
> document.  I suspect this logic wasn't deliberate, it's just how its always 
> been for no particular reason.  I think it should always apply the gap 
> between fields.  Reference DocInverterPerField.java line 82:
> if (fieldState.length > 0)
>           fieldState.position += 
> docState.analyzer.getPositionIncrementGap(fieldInfo.name);
> This is checking fieldState.length.  I think the condition should simply be:  
> if (i > 0).
> I don't think this change will affect anyone at all but it will certainly 
> help me.  Presently, I can either change this line in Lucene, or I can put in 
> a hack so that the first value for the document is some dummy value which is 
> wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to