[jira] Commented: (SOLR-2015) add a config hook for autoGeneratePhraseQueries

Yonik Seeley (JIRA) Mon, 26 Jul 2010 07:12:16 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892314#action_12892314
 ]


Yonik Seeley commented on SOLR-2015:
------------------------------------

bq. is wi fi, then this will not turn into a phrase.

Right - but there's just a lack of information that can't be helped?
So while one might want stuff like this as a phrase, I don't think it's a bug 
that it's not.

What *is* a problem though is the lack of ability for the user to add 
additional context to fix the issue (i.e. a SynonymFilter to manually map "wi 
fi" wouldn't work since it would get "wi" and then "fi" in separate runs.

What is also the problem is that if the original doc contained "wifi" then a 
query of "wi-fi" won't match (since it queries for "wi fi").  We work around 
this today (for people that really need it) by indexing a second field that 
catenates instead of splits the parts of a split token).  It's certainly not 
ideal, but people tend to be happy with the cases we can match.

So while our current system is far from perfect (and we should work on 
improving it).
The problem is not that we have an incorrect solution, but an incomplete 
solution.
Let's assume we had a QP that didn't split on whitespace (or whatever our 
optimal solution is).
IMO, I would still want tokens joined by a dash to form a phrase query, just 
like tokens surrounded by quotes.
It's important information and shouldn't be discarded.

bq.  there's no evidence auto-phrase-gen actually improves relevance even for 
English.

IMO, it's a case of "the customer is always right".   Many people have asked 
how to do this sort of matching over the years and I think there is plenty of 
evidence that it increases relevancy.

bq. Maybe we insert a "cp {english,cjk}schema.xml schema.xml" in between those 
two steps? This would avoid the global default, ie, force an explicit choice.

And the tutorial that's in english would tell them to copy the english one... 
that only hurts english speakers and doesn't help anyone else..
We can have different text field types in a single schema - it's just a matter 
of adding another one that's good for non-whitespace delimited languages?


> add a config hook for autoGeneratePhraseQueries
> -----------------------------------------------
>
>                 Key: SOLR-2015
>                 URL: https://issues.apache.org/jira/browse/SOLR-2015
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 3.1, 4.0
>            Reporter: Koji Sekiguchi
>            Assignee: Yonik Seeley
>            Priority: Blocker
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2015.patch, SOLR-2015.patch, SOLR-2015.patch
>
>
> After committed LUCENE-2458, a hook for autoGeneratePhraseQueries will be 
> convenient for some situation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (SOLR-2015) add a config hook for autoGeneratePhraseQueries

Reply via email to