[
https://issues.apache.org/jira/browse/LUCENE-5470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911900#comment-13911900
]
Tim Allison edited comment on LUCENE-5470 at 2/25/14 7:27 PM:
--------------------------------------------------------------
{quote}Can we just analyze multiterm queries without trying to parse around
wildcards or what not? This is basically what solr is doing today. I think
trying to interpret the syntax is a bit too funky and error-prone, and its
better if someone wants "magic" to have that in their QP itself.{quote}
I'm of two minds on this. From the Solr perspective, absolutely, getMultiterm
is sufficient. From the Lucene perspective, users may want to use the
off-the-shelf analyzers like StandardAnalyzer and be puzzled that they don't
work for multiterms...AnalyzingQueryParser fits this need for wildcard queries
(not for regex, though). Some thoughts on this:
1) Do the least harm option: consolidate getMultitermTerm as a public static
method in QueryParserBase and let AnalyzingQueryParser do its wildcard stuff as
is.
2) Do the above, but also add an AnalyzingQueryParserBase layer that does the
wildcard trickery (and maybe add something for regex)? Classic QueryParser and
others could then subclass AnalyzingQP. The benefit of this is that we could
get rid of AnalyzingQP and add multiterm analysis to other parsers that
currently subclass QPBase. This would only benefit people working at the
Lucene level.
3) A more drastic step would be to move the Solr MultitermAware processing in
FieldTypePluginLoader down into the Lucene layer...but this wouldn't solve the
problem of Lucene users misusing off the shelf Analyzers.
was (Author: [email protected]):
{quote}
> Refactoring multiterm analysis
> ------------------------------
>
> Key: LUCENE-5470
> URL: https://issues.apache.org/jira/browse/LUCENE-5470
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/queryparser
> Affects Versions: 5.0
> Reporter: Tim Allison
> Priority: Minor
> Attachments: LUCENE-5470.patch
>
>
> There are currently three methods to analyze multiterms in Lucene and Solr:
> 1) QueryParserBase
> 2) AnalyzingQueryParser
> 3) TextField (Solr)
> The code in QueryParserBase and in TextField do not consume the tokenstream
> if more than one token is generated by the analyzer. (Admittedly, thanks to
> the magic of MultitermAwareComponents in Solr, this type of exception
> probably never happens and the unconsumed stream problem is probably
> non-existent in Solr.)
> I propose consolidating the multiterm analysis code into one place:
> QueryBuilder in Lucene core.
> This is part of a refactoring that will also help reduce duplication of code
> with LUCENE-5205.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]