Re: java.lang.NullPointerException: stopWords - solr 9.2.1

Shawn Heisey Mon, 12 Jun 2023 21:33:12 -0700

On 6/12/23 21:06, gnandre wrote:

Hi,


I am using Solr 9.2.1 (official docker image).

When I try to index a document, I get the error shown at the bottom of this
email.

I added your fieldType to a config, added a field using that type,uploaded it to ZK, created a collection with that config, and thenindexed a document that included that new field.


It worked without issues.  My version info (built from source, branch_9x):

solr-spec
9.3.0
solr-impl

9.3.0-SNAPSHOT 555cb35480ec34caca04903e440a2c7b336346ad [snapshot build,details omitted]

lucene-spec
9.5.0
lucene-impl
9.5.0 13803aa6ea7fee91f798cfeded4296182ac43a21 - 2023-01-25 16:44:59

An FYI: You should not use a stopword filter. In days long past,stopword removal caused a SIGNIFICANT increase in search performance.But it came at a high price ... certain queries do not work well whenstopwords are removed. The classic example of a query that stopwordsbreak is "to be or not to be". But I have a relevant one for morerecent times: "the who"

These days, system capacities are a lot better than they were in thosedays, so stopword removal does not offer as much of a performance boost.Most people who are familiar with search technology feel that thereduction in query correctness is not worth the performance gain thathas steadily dwindled over the years.

Your stopword list is particularly long. Any query using any of thewords in that list will NOT function correctly ... and with the longlist you've got, that's a LOT of words that won't work.

What is this text "development-environment-solr-9-1" that is scatteredthroughout the error message you pasted? I have never seen anythinglike that before.


I am not familiar with creating a schema using values like:

<filter name="lowercase"/>

So I do not know how to spot problems with that kind of schema. This iswhat a complex fieldType looks like in my schema:

<fieldType name="text" class="solr.TextField"autoGeneratePhraseQueries="true" positionIncrementGap="100">

    <analyzer type="index">
      <tokenizer class="solr.ICUTokenizerFactory"/>
      <filter class="solr.ICUFoldingFilterFactory"/>

<filter class="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filter class="solr.WordDelimiterGraphFilterFactory"catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1"generateWordParts="1" splitOnNumerics="1" catenateAll="1"catenateWords="1"/>

      <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.ICUTokenizerFactory"/>
      <filter class="solr.ICUFoldingFilterFactory"/>

<filter class="solr.KeywordMarkerFilterFactory"protected="protwords.txt"/><filter class="solr.SynonymGraphFilterFactory" expand="true"ignoreCase="true" synonyms="synonyms.txt"/><filter class="solr.WordDelimiterGraphFilterFactory"catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="1"generateWordParts="1" splitOnNumerics="1" catenateAll="1"catenateWords="1"/>

      <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>
  </fieldType>

It's worth noting that my synonym list is the one that comes with Solr.I only left it the synonym filter there in the event that some futureversion of me decides I want to use a synonym.


#-----------------------------------------------------------------------
#some test synonym mappings unlikely to appear in real input text
aaafoo => aaabar
bbbfoo => bbbfoo bbbbar
cccfoo => cccbar cccbaz
fooaaa,baraaa,bazaaa

# Some synonym groups specific to this example
GB,gib,gigabyte,gigabytes
MB,mib,megabyte,megabytes
Television, Televisions, TV, TVs
#notice we use "gib" instead of "GiB" so any WordDelimiterGraphFilter coming
#after us won't split it into two words.

# Synonym mappings can be used for spelling correction too
pixima => pixma
#-----------------------------------------------------------------------

Thanks,
Shawn

Re: java.lang.NullPointerException: stopWords - solr 9.2.1

Reply via email to