On 6/12/23 21:06, gnandre wrote:
Hi,

I am using Solr 9.2.1 (official docker image).

When I try to index a document, I get the error shown at the bottom of this
email.

I added your fieldType to a config, added a field using that type, uploaded it to ZK, created a collection with that config, and then indexed a document that included that new field.

It worked without issues.  My version info (built from source, branch_9x):

solr-spec
9.3.0
solr-impl
9.3.0-SNAPSHOT 555cb35480ec34caca04903e440a2c7b336346ad [snapshot build, details omitted]
lucene-spec
9.5.0
lucene-impl
9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59

An FYI: You should not use a stopword filter. In days long past, stopword removal caused a SIGNIFICANT increase in search performance. But it came at a high price ... certain queries do not work well when stopwords are removed. The classic example of a query that stopwords break is "to be or not to be". But I have a relevant one for more recent times: "the who"

These days, system capacities are a lot better than they were in those days, so stopword removal does not offer as much of a performance boost. Most people who are familiar with search technology feel that the reduction in query correctness is not worth the performance gain that has steadily dwindled over the years.

Your stopword list is particularly long. Any query using any of the words in that list will NOT function correctly ... and with the long list you've got, that's a LOT of words that won't work.

What is this text "development-environment-solr-9-1" that is scattered throughout the error message you pasted? I have never seen anything like that before.

I am not familiar with creating a schema using values like:

<filter name="lowercase"/>

So I do not know how to spot problems with that kind of schema. This is what a complex fieldType looks like in my schema:

<fieldType name="text" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.ICUTokenizerFactory"/>
      <filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1"/>
      <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ICUTokenizerFactory"/>
      <filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1" generateWordParts="1" splitOnNumerics="1" catenateAll="1" catenateWords="1"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>

It's worth noting that my synonym list is the one that comes with Solr. I only left it the synonym filter there in the event that some future version of me decides I want to use a synonym.

#-----------------------------------------------------------------------
#some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterGraphFilter coming
#after us won't split it into two words.

# Synonym mappings can be used for spelling correction too
pixima => pixma
#-----------------------------------------------------------------------

Thanks,
Shawn

Reply via email to