[ 
https://issues.apache.org/jira/browse/LUCENE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491501#comment-16491501
 ] 

David Smiley commented on LUCENE-8332:
--------------------------------------

I had TestRandomChains go at it and it uncovered a couple things.

{{org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException}} has 
two checks:
 # ensures incrementToken() fails if reset() wasn't first called.  This was 
pretty straight-forward to fix by adding an IllegalStateException throw at the 
start in ConcatenateGraphFilter.incrementToken.
 # ensures if you forgot to close(), that trying to get the tokenStream again 
fails.  This one is tricky.  ConcatenateGraphFilter.reset() will consume the 
whole tokenStream including closing it... and it's hard to disagree with that.  
It calls toAutomaton which does this, and there are even some callers of this 
toAutomaton method in the NRTSuggester which is assuming it's going to be 
closed.  I think adding some closed flag isn't enough since when 
Analyzer.tokenStream() is called we want it to fail but all that does is set 
the reader (which throws if it wasn't closed).  I could make  toAutomaton not 
close the input but then the callers need to deal with that; I'm not which path 
to go or if I'm missing something.  Or maybe just punt and have 
TestRandomChains ignore as it's a bit too pedantic here?

> New ConcatenateGraphTokenStream (move/rename CompletionTokenStream)
> -------------------------------------------------------------------
>
>                 Key: LUCENE-8332
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8332
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/analysis
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Lets move and rename the CompletionTokenStream in the suggest module into the 
> analysis module renamed as ConcatenateGraphTokenStream. See comments in 
> LUCENE-8323 leading to this idea. Such a TokenStream (or TokenFilter?) has 
> several uses:
>  * for the suggest module
>  * by the SolrTextTagger for NER/ERD use cases – SOLR-12376
>  * for doing complete match search efficiently
> It will need a factory – a TokenFilterFactory, even though we don't have a 
> TokenFilter based subclass of TokenStream.
> It appears there is no back-compat concern in it suddenly disappearing from 
> the suggest module as it's marked experimental and it only seems to be public 
> now perhaps due to some technicality (it has package level constructors).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to