[jira] [Commented] (LUCENE-8564) Make it easier to iterate over graphs in tokenstreams

Alan Woodward (JIRA) Tue, 13 Nov 2018 15:15:24 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685878#comment-16685878
 ]


Alan Woodward commented on LUCENE-8564:
---------------------------------------

bq. How does it handle a graph where one of the side paths itself then splits 
(after a token or two) into its own set of side paths?

We'd end up with extra routes through the graph available via incrementGraph()

Let's imagine a TokenStream that looks like this: z a/b:4 c d/e:2 f g h

Starting at position z, calling incrementGraphToken() repeatedly will yield the 
tokenstream z a c d f g h
Then we call incrementGraph(); now calling incrementGraphToken() gives us z a c 
e g h, following the split at d/e
Call incrementGraph() again; we get z b g h
Now that all routes have been exhausted, calling incrementGraph() will return 
false.

How many routes are available depends on how far down the graph you move; if in 
the example above you only advance as far as 'z a c' on the first branch, then 
incrementGraph() will move directly to the 'a b g' branch.

> Make it easier to iterate over graphs in tokenstreams
> -----------------------------------------------------
>
>                 Key: LUCENE-8564
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8564
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8564.patch
>
>
> We have a number of TokenFilters that read ahead in the token stream (eg 
> synonyms, shingles) and ideally these would understand token graphs as well 
> as linear streams.  FixedShingleFilter already has some mechanisms to deal 
> with graphs; this issue is to extract this logic into a GraphTokenStream 
> class that can then be reused by other token filters



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8564) Make it easier to iterate over graphs in tokenstreams

Reply via email to