Hi,
I am writing to confirm a bug that I believe I am facing in Solr 9.8.0
with Lucene 9.11.1 related to the SynonymGraphFilterFactory. When trying
to configure my SynonymGraphFilterFactory with a
JapaneseTokenizerFactory, it seems the SynonymGraphFilterFactory is not
able to properly use the tokenizerFactory.userDictionary that I
specified in the arguments. From my understanding, it seems to be a
similar issue to the one mentioned in this bug ticket
https://issues.apache.org/jira/browse/SOLR-13861, which uses the
SimplePatternTokenizerFactory instead. The fieldType definition is
included at the bottom of the email.
When I attached a debugger to my local Solr instance and added a
breakpoint to the SynonymGraphFilterFactory's inform() method
https://github.com/apache/lucene/blob/releases/lucene/9.11.1/lucene/analysis/common/src/java/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.java#L135,
it ran twice and in the first execution, the JapaneseTokenizerFactory
looked as expected with the userDictionary and the mode matching my
config. However, when it stopped again at the same breakpoint, the
arguments were all gone and the JapaneseTokenizerFactory was using the
default values (userDictionary was null and mode was set to "SEARCH").
Please let me know if you would like more details and if I should create
a new ticket for this issue.
Thank you,
Chunyoku Takahashi
P.S.
Here is the fieldType definition:
```
<fieldType name="text_ja_ma" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="query">
<tokenizer class="solr.JapaneseTokenizerFactory"
mode="normal"
discardPunctuation="true"
userDictionary="lang/userdict_ja_1.txt"/>
<filter class="solr.SynonymGraphFilterFactory"
synonyms="lang/synonyms_ja_1.txt"
expand="true"
ignoreCase="true"
tokenizerFactory="solr.JapaneseTokenizerFactory"
tokenizerFactory.mode="normal"
tokenizerFactory.userDictionary="lang/userdict_ja_2.txt"
tokenizerFactory.userDictionaryEncoding="UTF-8"
/>
<filter class="solr.JapaneseBaseFormFilterFactory"/>
<filter class="solr.JapanesePartOfSpeechStopFilterFactory"
tags="lang/stoptags_ja.txt"/>
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_ja.txt"/>
<filter class="solr.JapaneseKatakanaStemFilterFactory"
minimumLength="4"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
```
The userdict_ja_2.txt contains:
```
コールセンター,コールセンター,コールセンター,カスタム名詞
予約センター,予約センター,予約センター,カスタム名詞
スカイスイート767,スカイスイート767,スカイスイート767,カスタム名詞
シーマン,シーマン,シーマン,カスタム名詞
```
The synonyms_ja_1.txt contains:
```
コールセンター,予約センター
コルセンタ,予約センタ
```