[
https://issues.apache.org/jira/browse/LUCENE-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491501#comment-16491501
]
David Smiley commented on LUCENE-8332:
--------------------------------------
I had TestRandomChains go at it and it uncovered a couple things.
{{org.apache.lucene.analysis.BaseTokenStreamTestCase#checkResetException}} has
two checks:
# ensures incrementToken() fails if reset() wasn't first called. This was
pretty straight-forward to fix by adding an IllegalStateException throw at the
start in ConcatenateGraphFilter.incrementToken.
# ensures if you forgot to close(), that trying to get the tokenStream again
fails. This one is tricky. ConcatenateGraphFilter.reset() will consume the
whole tokenStream including closing it... and it's hard to disagree with that.
It calls toAutomaton which does this, and there are even some callers of this
toAutomaton method in the NRTSuggester which is assuming it's going to be
closed. I think adding some closed flag isn't enough since when
Analyzer.tokenStream() is called we want it to fail but all that does is set
the reader (which throws if it wasn't closed). I could make toAutomaton not
close the input but then the callers need to deal with that; I'm not which path
to go or if I'm missing something. Or maybe just punt and have
TestRandomChains ignore as it's a bit too pedantic here?
> New ConcatenateGraphTokenStream (move/rename CompletionTokenStream)
> -------------------------------------------------------------------
>
> Key: LUCENE-8332
> URL: https://issues.apache.org/jira/browse/LUCENE-8332
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/analysis
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Lets move and rename the CompletionTokenStream in the suggest module into the
> analysis module renamed as ConcatenateGraphTokenStream. See comments in
> LUCENE-8323 leading to this idea. Such a TokenStream (or TokenFilter?) has
> several uses:
> * for the suggest module
> * by the SolrTextTagger for NER/ERD use cases – SOLR-12376
> * for doing complete match search efficiently
> It will need a factory – a TokenFilterFactory, even though we don't have a
> TokenFilter based subclass of TokenStream.
> It appears there is no back-compat concern in it suddenly disappearing from
> the suggest module as it's marked experimental and it only seems to be public
> now perhaps due to some technicality (it has package level constructors).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]