Cassandra Targett created SOLR-10252:
----------------------------------------

             Summary: Example spellcheck config uses _text_ as default field
                 Key: SOLR-10252
                 URL: https://issues.apache.org/jira/browse/SOLR-10252
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: spellchecker
    Affects Versions: 6.4.2
            Reporter: Cassandra Targett


SOLR-8381 made the {{_text_}} field the default field for spellchecking for the 
basic_configs and data_driven_schema_configs example configsets. This is a 
copyField that gets all it's data from every other field in the index.

This field is also of text_general type, which has a default analysis chain 
that includes stopwords and synonyms. If someone has a large synonym list, 
perhaps with a lot of overlapping matches, this would cause spell checking to 
occur on every one of those terms. I recently saw a parsed query that looked 
like this:

{code}"+(((_text_:partn _text_:gesellschaft _text_:teilhab _text_:konkubinat 
_text_:eheahn _text_:eheahn _text_:konkubinatspaar _text_:konkubinatspartn 
_text_:konkubinatsvertrag _text_:lebenspartn _text_:nichteheahn 
_text_:nichteheahn _text_:nichtehe _text_:wild _text_:registriert 
_text_:eingetrag _text_:eingetrag _text_:registriert _text_:vertragspartei 
_text_:kontrahent _text_:partei _text_:vertragspartn)/no_coord) 
((_text_:gemeinschaft _text_:lebensgemeinschaft _text_:gemeinschaft 
_text_:lebensgemeinschaft _text_:lebensgemeinschaft _text_:ehe 
_text_:partnerschaft _text_:partnerschaft _text_:partn 
_text_:partnerschaft)/no_coord) _text_:gleichgeschlecht _text_:paar) 
+_text_:gestorb"
{code}

Since we recommend that users use a lightly analyzed field for spell checking, 
using {{_text_}} and text_general seems a problematic example for us to start 
people out with. The example above is a lot of extra work for little reason.

I'm not sure what a better field is - those two examples are minimal by design, 
and we can't be sure what field they might have in the index to make it work 
out of the box. However, perhaps we can consider a better field type? 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to