[
https://issues.apache.org/jira/browse/LUCENE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685878#comment-16685878
]
Alan Woodward commented on LUCENE-8564:
---------------------------------------
bq. How does it handle a graph where one of the side paths itself then splits
(after a token or two) into its own set of side paths?
We'd end up with extra routes through the graph available via incrementGraph()
Let's imagine a TokenStream that looks like this: z a/b:4 c d/e:2 f g h
Starting at position z, calling incrementGraphToken() repeatedly will yield the
tokenstream z a c d f g h
Then we call incrementGraph(); now calling incrementGraphToken() gives us z a c
e g h, following the split at d/e
Call incrementGraph() again; we get z b g h
Now that all routes have been exhausted, calling incrementGraph() will return
false.
How many routes are available depends on how far down the graph you move; if in
the example above you only advance as far as 'z a c' on the first branch, then
incrementGraph() will move directly to the 'a b g' branch.
> Make it easier to iterate over graphs in tokenstreams
> -----------------------------------------------------
>
> Key: LUCENE-8564
> URL: https://issues.apache.org/jira/browse/LUCENE-8564
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Priority: Major
> Attachments: LUCENE-8564.patch
>
>
> We have a number of TokenFilters that read ahead in the token stream (eg
> synonyms, shingles) and ideally these would understand token graphs as well
> as linear streams. FixedShingleFilter already has some mechanisms to deal
> with graphs; this issue is to extract this logic into a GraphTokenStream
> class that can then be reused by other token filters
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]