[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

Jack Krupansky (JIRA) Wed, 15 Aug 2012 14:56:39 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435558#comment-13435558
 ]


Jack Krupansky commented on SOLR-3589:
--------------------------------------

The root problem is that with automatic phrase query generation turned off, by 
default and for the text_general field in particular, the core Lucene query 
parser is generating a query for the tuple of sub-terms using the default query 
operator, which is "OR" by default. There is no notion of an "mm" or min-match 
parameter down at that level in Lucene, which knows nothing about Solr or 
edismax or request parameters.

As things stand, the only option is to set the default query operator, "q.op", 
to "AND".

You can of course also turn on autoGeneratePhraseQueries or select an analyzer 
than doesn't split terms.

At this point, I would advise resolving this issue as "Won't Fix", although it 
could also be spun off into a Lucene issue to add support for min-match down at 
that level, which edismax can then also communicate with.


                
> Edismax parser does not honor mm parameter if analyzer splits a token
> ---------------------------------------------------------------------
>
>                 Key: SOLR-3589
>                 URL: https://issues.apache.org/jira/browse/SOLR-3589
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 3.6
>            Reporter: Tom Burton-West
>
> With edismax mm set to 100%  if one of the tokens is split into two tokens by 
> the analyzer chain (i.e. "fire-fly"  => fire fly), the mm parameter is 
> ignored and the equivalent of  OR query for "fire OR fly" is produced.
> This is particularly a problem for languages that do not use white space to 
> separate words such as Chinese or Japenese.
> See these messages for more discussion:
> http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-hypenated-words-WDF-splitting-etc-tc3991911.html
> http://lucene.472066.n3.nabble.com/edismax-parser-ignores-mm-parameter-when-tokenizer-splits-tokens-i-e-CJK-tc3991438.html
> http://lucene.472066.n3.nabble.com/Why-won-t-dismax-create-multiple-DisjunctionMaxQueries-when-autoGeneratePhraseQueries-is-false-tc3992109.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3589) Edismax parser does not honor mm parameter if analyzer splits a token

Reply via email to