[
https://issues.apache.org/jira/browse/LUCENE-8730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16806642#comment-16806642
]
Jim Ferenczi commented on LUCENE-8730:
--------------------------------------
+1 to output the original token first. Is it possible to set the original token
offset (savedTermLength) once since the value doesn't change ? I also wonder if
the first value in the buffer should be filtered from the sort entirely (e.g.
call sorter.sort(1, bufferedLen)) to ensure correctness ?
> Ensure WordDelimiterGraphFilter always emits its original token first
> ---------------------------------------------------------------------
>
> Key: LUCENE-8730
> URL: https://issues.apache.org/jira/browse/LUCENE-8730
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8730.patch
>
>
> WordDelimiterFilter and WordDelimiterGraphFilter behave almost identically
> outside setting position length; the only difference being that WDGF can
> sometimes emit its original token as the second output token rather than the
> first. We should change this to conform to the behaviour of the older filter
> - this will make it much easier to remove WDF entirely and cut over tests
> that use it incidentally.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]