[
https://issues.apache.org/jira/browse/SOLR-2015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892314#action_12892314
]
Yonik Seeley commented on SOLR-2015:
------------------------------------
bq. is wi fi, then this will not turn into a phrase.
Right - but there's just a lack of information that can't be helped?
So while one might want stuff like this as a phrase, I don't think it's a bug
that it's not.
What *is* a problem though is the lack of ability for the user to add
additional context to fix the issue (i.e. a SynonymFilter to manually map "wi
fi" wouldn't work since it would get "wi" and then "fi" in separate runs.
What is also the problem is that if the original doc contained "wifi" then a
query of "wi-fi" won't match (since it queries for "wi fi"). We work around
this today (for people that really need it) by indexing a second field that
catenates instead of splits the parts of a split token). It's certainly not
ideal, but people tend to be happy with the cases we can match.
So while our current system is far from perfect (and we should work on
improving it).
The problem is not that we have an incorrect solution, but an incomplete
solution.
Let's assume we had a QP that didn't split on whitespace (or whatever our
optimal solution is).
IMO, I would still want tokens joined by a dash to form a phrase query, just
like tokens surrounded by quotes.
It's important information and shouldn't be discarded.
bq. there's no evidence auto-phrase-gen actually improves relevance even for
English.
IMO, it's a case of "the customer is always right". Many people have asked
how to do this sort of matching over the years and I think there is plenty of
evidence that it increases relevancy.
bq. Maybe we insert a "cp {english,cjk}schema.xml schema.xml" in between those
two steps? This would avoid the global default, ie, force an explicit choice.
And the tutorial that's in english would tell them to copy the english one...
that only hurts english speakers and doesn't help anyone else..
We can have different text field types in a single schema - it's just a matter
of adding another one that's good for non-whitespace delimited languages?
> add a config hook for autoGeneratePhraseQueries
> -----------------------------------------------
>
> Key: SOLR-2015
> URL: https://issues.apache.org/jira/browse/SOLR-2015
> Project: Solr
> Issue Type: New Feature
> Affects Versions: 3.1, 4.0
> Reporter: Koji Sekiguchi
> Assignee: Yonik Seeley
> Priority: Blocker
> Fix For: 3.1, 4.0
>
> Attachments: SOLR-2015.patch, SOLR-2015.patch, SOLR-2015.patch
>
>
> After committed LUCENE-2458, a hook for autoGeneratePhraseQueries will be
> convenient for some situation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]