Bug in SynonymGraphFilterFactory

Chunyoku Takahashi Thu, 24 Jul 2025 06:27:19 -0700

Hi,

I am writing to confirm a bug that I believe I am facing in Solr 9.8.0with Lucene 9.11.1 related to the SynonymGraphFilterFactory. When tryingto configure my SynonymGraphFilterFactory with aJapaneseTokenizerFactory, it seems the SynonymGraphFilterFactory is notable to properly use the tokenizerFactory.userDictionary that Ispecified in the arguments. From my understanding, it seems to be asimilar issue to the one mentioned in this bug tickethttps://issues.apache.org/jira/browse/SOLR-13861, which uses theSimplePatternTokenizerFactory instead. The fieldType definition isincluded at the bottom of the email.

When I attached a debugger to my local Solr instance and added abreakpoint to the SynonymGraphFilterFactory's inform() methodhttps://github.com/apache/lucene/blob/releases/lucene/9.11.1/lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.java#L135,it ran twice and in the first execution, the JapaneseTokenizerFactorylooked as expected with the userDictionary and the mode matching myconfig. However, when it stopped again at the same breakpoint, thearguments were all gone and the JapaneseTokenizerFactory was using thedefault values (userDictionary was null and mode was set to "SEARCH").

Please let me know if you would like more details and if I should createa new ticket for this issue.


Thank you,

Chunyoku Takahashi

P.S.

Here is the fieldType definition:

```

<fieldType name="text_ja_ma" class="solr.TextField"positionIncrementGap="100">

   <analyzer type="query">
      <tokenizer class="solr.JapaneseTokenizerFactory"
        mode="normal"
        discardPunctuation="true"
        userDictionary="lang/userdict_ja_1.txt"/>

      <filter class="solr.SynonymGraphFilterFactory"
        synonyms="lang/synonyms_ja_1.txt"
        expand="true"
        ignoreCase="true"
        tokenizerFactory="solr.JapaneseTokenizerFactory"
        tokenizerFactory.mode="normal"
        tokenizerFactory.userDictionary="lang/userdict_ja_2.txt"
        tokenizerFactory.userDictionaryEncoding="UTF-8"
        />
      <filter class="solr.JapaneseBaseFormFilterFactory"/>

<filter class="solr.JapanesePartOfSpeechStopFilterFactory"tags="lang/stoptags_ja.txt"/>

      <filter class="solr.CJKWidthFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true"words="lang/stopwords_ja.txt"/> <filter class="solr.JapaneseKatakanaStemFilterFactory"minimumLength="4"/>

      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>
```

The userdict_ja_2.txt contains:

```
コールセンター,コールセンター,コールセンター,カスタム名詞
予約センター,予約センター,予約センター,カスタム名詞
スカイスイート767,スカイスイート767,スカイスイート767,カスタム名詞

シーマン,シーマン,シーマン,カスタム名詞
```

The synonyms_ja_1.txt contains:

```
コールセンター,予約センター
コルセンタ,予約センタ
```

Bug in SynonymGraphFilterFactory

Reply via email to