Yonik,
Could you please revert your commit, until we've reached some
consensus on this discussion first?
Maybe, post alternative patches on the issue (SOLR-2519), and we can
iterate there?
Adding a new example field type ("text_nwd") is one way to go, and I
agree is least risk/effort, a "quick fix", but I don't think we should
use a quick fix here.
I think it's important for Solr to have good out-of-the-box defaults
for all languages, like ElasticSearch, even if that means we have to
do some extra work now (ie, fixing up the wiki/tutorials) to make that
change.
More below:
On Sun, May 15, 2011 at 12:20 PM, Yonik Seeley
<[email protected]> wrote:
> As far as Solr defaults... perhaps way way back "text" should have
> been named "text_en".
> But any changes now should be comprehensive (we need to consider
> impacts to the example
> data, the example schema, the solr tuturial which relies on some of
> the current behavior, and a ton of documentation
> on the wiki related to both analysis components (multi-word synonyms,
> WDF, etc) and other quickstart guides.
>
> Anyway, changes to the example schema (or the behavior of the example
> schema) can have a large impact.
I agree: we need to fix the wiki pages/examples that rely on
auto-phrase.
But, really, how much work is this? Can you point to an example or
two in the wiki/tutorial that "advertise"/rely on auto phrase? This
would help me get a sense of how much additional work I'm signing up
for ;)
I just went through the tutorial and didn't see one...
(Also, we should add some CJK docs and queries to the tutorial... a
simple pair is the test case in my patch on SOLR-2519.)
We shouldn't avoid/fear good changes to our defaults just because
fixing it will be more work, especially if someone (me!) is signing up
to do that work....
> I personally think that adding a new field is much easier and less
> disruptive, and given the potential impact
I agree the quick fix is somewhat easier than doing it right, but I
think in this case we should do it right. Solr really should just
work well out-of-the-box on all (including non-whitespace) languages.
> we should hear what others have to say about it too
+1
Mike
http://blog.mikemccandless.com