Cassandra Targett created SOLR-10252:
----------------------------------------
Summary: Example spellcheck config uses _text_ as default field
Key: SOLR-10252
URL: https://issues.apache.org/jira/browse/SOLR-10252
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Components: spellchecker
Affects Versions: 6.4.2
Reporter: Cassandra Targett
SOLR-8381 made the {{_text_}} field the default field for spellchecking for the
basic_configs and data_driven_schema_configs example configsets. This is a
copyField that gets all it's data from every other field in the index.
This field is also of text_general type, which has a default analysis chain
that includes stopwords and synonyms. If someone has a large synonym list,
perhaps with a lot of overlapping matches, this would cause spell checking to
occur on every one of those terms. I recently saw a parsed query that looked
like this:
{code}"+(((_text_:partn _text_:gesellschaft _text_:teilhab _text_:konkubinat
_text_:eheahn _text_:eheahn _text_:konkubinatspaar _text_:konkubinatspartn
_text_:konkubinatsvertrag _text_:lebenspartn _text_:nichteheahn
_text_:nichteheahn _text_:nichtehe _text_:wild _text_:registriert
_text_:eingetrag _text_:eingetrag _text_:registriert _text_:vertragspartei
_text_:kontrahent _text_:partei _text_:vertragspartn)/no_coord)
((_text_:gemeinschaft _text_:lebensgemeinschaft _text_:gemeinschaft
_text_:lebensgemeinschaft _text_:lebensgemeinschaft _text_:ehe
_text_:partnerschaft _text_:partnerschaft _text_:partn
_text_:partnerschaft)/no_coord) _text_:gleichgeschlecht _text_:paar)
+_text_:gestorb"
{code}
Since we recommend that users use a lightly analyzed field for spell checking,
using {{_text_}} and text_general seems a problematic example for us to start
people out with. The example above is a lot of extra work for little reason.
I'm not sure what a better field is - those two examples are minimal by design,
and we can't be sure what field they might have in the index to make it work
out of the box. However, perhaps we can consider a better field type?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]