[ 
https://issues.apache.org/jira/browse/SOLR-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892291#action_12892291
 ] 

Michael McCandless commented on SOLR-2015:
------------------------------------------

Don't forget that this auto-phrase-gen is buggy: if the user's query
is wi fi, then this will *not* turn into a phrase.

Really, it's QueryParser that's buggy: it should not assume it can
pre-split on whitespace.

As Robert has pointed out, even if the feature weren't buggy, there's
no evidence auto-phrase-gen actually improves relevance even for
English.

Yet it's most definitely disastrous for non-whitespace languages (CJK,
Thai, etc.).

This is why, in my opinion, if we must pick a single global default
(for the 'text' field in Solr's example schema.xml), it should be
disabled by default: it's buggy for English and catastrophic for
non-whitespace languages.

To fix this "correctly", we somehow need a better QueryParser/Analyzer
interaction, such that all variants of wifi (WiFi, wifi, wi fi, wi-fi)
are consistently mapped during indexing and searching.  Just adding a
new per-token attr doesn't fix it (the wi fi example, above).

{quote}
I'm not sure what that would accomplish by itself though... it's not like solr 
is much of an out-of-the-box solution for anything.
We have a default example so that people can easily run through the tutorial, 
and execute examples on wiki pages.
{quote}

I suspect many apps take the default solrconfig/schema and run with
it / iteratitvely tweak it.

bq. Solr doesn't have an installer though... you unzip and "cd example; java 
-jar start.jar".

Maybe we insert a "cp {english,cjk}schema.xml schema.xml" in between
those two steps?  This would avoid the global default, ie, force an
explicit choice.

Or maybe we make separate default fieldTypes in schema.xml
(text_whitespace, text_non_whitespace -- need better names)?

Or, maybe we make this setting take three values: unset, on, off.  It
defaults to unset, but Solr refuses to run with this value, throwing
an exception saying you must set it?

Something along these lines would let us avoid having to agree on a
global default, ie, make the choice explicit.

This is just like what we did with maxFieldLength a while back.  Previously
it silently truncated after 10K terms, which was a dangerous default.  So, we
forced the choice, by making it a required param in IW.   (Later we then
change the default to no truncation, and make it not required).


> add a config hook for autoGeneratePhraseQueries
> -----------------------------------------------
>
>                 Key: SOLR-2015
>                 URL: https://issues.apache.org/jira/browse/SOLR-2015
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 3.1, 4.0
>            Reporter: Koji Sekiguchi
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2015.patch, SOLR-2015.patch, SOLR-2015.patch
>
>
> After committed LUCENE-2458, a hook for autoGeneratePhraseQueries will be 
> convenient for some situation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to