[
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Rowe updated SOLR-4619:
-----------------------------
Attachment: SOLR-4619.patch
{quote}
bq. A new analyzer class employing PreAnalyzedTokenizer could override
initReader() or setReader(). I'll try with setReader(), since the docs for
initReader() are focused on reader conditioning via char filters.
I was referring to TokenStreamComponents.setReader() here, which is called as
part of Analyzer.tokenStream(): A subclass created in the new analyzer's
overridden createComponents() could call a new method on PreAnalyzedTokenizer
to consume the input reader and in so doing provide the attributes.
{quote}
Patch implementing the idea, splitting reader consumption out from reset() into
its own method: decodeInput(). This method first removes all attributes from
PreAnalyzedTokenizer's AttributeSource, then adds needed ones as a side effect
of parsing the input.
There is a kludge here: because TokenStreamComponents.setReader() doesn't throw
an exception, PreAnalyzedAnalyzer overrides createComponents() to create a
TokenStreamComponents instance that catches and stores exceptions encountered
during reader consumption with the stream's PreAnalyzedTokenizer instance,
whose reset() method will then throw the stored exception, if any.
With this patch, PreAnalyzedAnalyzer can be reused; previously
PreAnalyzedTokenizer reuse would ignore new input and re-emit tokens
deserialized from the initial input.
With this patch, PreAnalyzedField analysis works like this:
# If a query analyzer is specified in the schema then it will be used at query
time.
# If an analyzer is specified in the schema with no type (i.e., it is neither
of "index" nor "query" type), then this analyzer will be used for query
parsing, but will be ignored at index time.
# If only an analyzer of "index" type is specified in the schema, then this
analyzer will be used for query parsing, but will be ignored at index time.
This patch adds a new method removeAllAttributes() to AttributeSource, to
support reuse of token streams with variable attributes, like
PreAnalyzedTokenizer.
I think it's ready to go.
> Improve PreAnalyzedField query analysis
> ---------------------------------------
>
> Key: SOLR-4619
> URL: https://issues.apache.org/jira/browse/SOLR-4619
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Fix For: Trunk
>
> Attachments: SOLR-4619.patch, SOLR-4619.patch, SOLR-4619.patch
>
>
> PreAnalyzed field extends plain FieldType and mistakenly uses the
> DefaultAnalyzer as query analyzer, and doesn't allow for customization via
> <analyzer> schema elements.
> Instead it should extend TextField and support all query analysis supported
> by that type.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]