[
https://issues.apache.org/jira/browse/SOLR-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15097417#comment-15097417
]
Steve Rowe edited comment on SOLR-4619 at 1/14/16 1:34 AM:
-----------------------------------------------------------
Patch that brings Andrzej's patch up to date with trunk, and adds tests for
query-time functionality.
I had assumed that {{PreAnalyzedField}}-s would use the
{{PreAnalyzedTokenizer}} at query time, but that is not (currently) the case:
instead {{FieldType.DefaultAnalyzer}} is used. This patch changes the behavior
when no analyzer is specified to instead use {{PreAnalyzedTokenizer}}.
However, there is a chicken-and-egg interaction between
{{PreAnalyzedTokenizer}} and {{QueryBuilder.createFieldQuery()}}, which aborts
before performing any tokenization if the supplied analyzer's attribute factory
doesn't contain a {{TermToBytesRefAttribute}}. But {{PreAnalyzedTokenizer}}
doesn't have any attributes defined until the input stream is consumed, in
{{reset()}}. [~rcmuir] added a comment as part of LUCENE-5388 to
{{PreAnalyzedTokenizer}}'s ctor, where
{{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}} is set as the attribute factory
rather than the default packed implementation: "we don't pack attributes: since
we are used for (de)serialization and dont want bloat."
This patch moves the {{stream.reset()}} call in
{{QueryBuilder.createFieldQuery()}} in front of the {{TermToBytesRefAttribute}}
check, so that {{PreAnalyzedTokenizer}} (and other tokenizers that don't have a
pre-added set of attributes) has a chance to populate its attributes, and also
moves the {{addAttribute(PositionIncrementAttribute.class)}} call to after the
{{TermToBytesRefAttribute}} check, since that won't be needed if no
tokenization will be performed.
An alternate approach to fix the chicken-and-egg problem might be to have
{{PreAnalyzedTokenizer}} always include a dummy {{TermToBytesRefAttribute}}
implementation, and then remove it when {{reset()}} is called, but that seems
hackish.
I haven't run the full tests yet with this patch, but the included query-time
{{PreAnalyzedField}} tests succeed.
I welcome feedback.
was (Author: steve_rowe):
Patch that brings Andrzej's patch up to date with trunk, and adds tests for
query-time functionality.
I had assumed that {{PreAnalyzedField}}-s would use the
{{PreAnalyzedTokenizer}} at query time, but that is not (currently) the case:
instead {{FieldType.DefaultAnalyzer}} is used. This patch changes the behavior
when no analyzer is specified to instead use {{PreAnalyzedTokenizer}}.
However, there is a chicken-and-egg interaction between
{{PreAnalyzedTokenizer}} and {{QueryBuilder.createFieldQuery()}}, which aborts
before performing any tokenization if the supplied analyzer's attribute factory
doesn't contain a {{TermToBytesRefAttribute}}. But {{PreAnalyzedTokenizer}}
doesn't have any attributes defined until the input stream is consumed, in
{{reset()}}. [~rcmuir] added a comment as part of LUCENE-5388 to
{{PreAnalyzedTokenizer}}'s ctor, where
{{AttributeFactory.DEFAULT_ATTRIBUTE_FACTORY}} is set as the attribute factory
rather than the default packed implementation: "we don't pack attributes: since
we are used for (de)serialization and dont want bloat."
This patch moves the {{stream.reset()}} call in
{{QueryBuilder.createFieldQuery()}} in front of the {{TermToBytesRefAttribute}}
check, so that {{PreAnalyzedTokenizer}} (and other tokenizers that don't have a
pre-added set of attributes) and also moves the
{{addAttribute(PositionIncrementAttribute.class)}} call to after the the
{{TermToBytesRefAttribute}} check.
An alternate approach to fix the chicken-and-egg problem might be to have
{{PreAnalyzedTokenizer}} always include a dummy {{TermToBytesRefAttribute}}
implementation, and then remove it when {{reset()}} is called, but that seems
hackish.
I haven't run the full tests yet with this patch, but the included query-time
{{PreAnalyzedField}} tests success.
I welcome feedback.
> Improve PreAnalyzedField query analysis
> ---------------------------------------
>
> Key: SOLR-4619
> URL: https://issues.apache.org/jira/browse/SOLR-4619
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 4.0, 4.1, 4.2, 4.2.1, Trunk
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Fix For: Trunk
>
> Attachments: SOLR-4619.patch, SOLR-4619.patch
>
>
> PreAnalyzed field extends plain FieldType and mistakenly uses the
> DefaultAnalyzer as query analyzer, and doesn't allow for customization via
> <analyzer> schema elements.
> Instead it should extend TextField and support all query analysis supported
> by that type.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]