Thanks a lot. That solved that problem. I am now facing another problem;
---- Caused by: java.lang.IllegalArgumentException: resource tokenization/sentence-boundary-model.bin not found. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220) ~[?:?] at com.google.common.io.Resources.getResource(Resources.java:196) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor.fromDefaultModel(TurkishSentenceExtractor.java:51) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor.access$100(TurkishSentenceExtractor.java:28) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor$Singleton.<init>(TurkishSentenceExtractor.java:261) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor$Singleton.<clinit>(TurkishSentenceExtractor.java:256) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor.<clinit>(TurkishSentenceExtractor.java:33) ~[?:?] at com.github.yasar11732.lucene_zemberek.ZemberekTokenizerFactory.<clinit>(ZemberekTokenizerFactory.java:16) ~[?:?] ---- However, I have tokenization/sentence-boundary-model.bin inside zemberek-tokenization-0.17.1.jar file, which I also copied into lib dir. Interestingly, I can instantiate a simple console application using the same zemberek.tokenization.TurkishSentenceExtractor class (that in turn tries to load tokenization/sentence-boundary-model.bin file) so I don't know why it doesn't find tokenization/sentence-boundary-model.bin when loaded into solr. Best Regards, Chris Hostetter <hossman_luc...@fucit.org>, 31 Oca 2025 Cum, 03:40 tarihinde şunu yazdı: > > > : However, I am getting "A SPI class of type > : org.apache.lucene.analysis.TokenizerFactory with name > : 'zemberekTokenizer' does not exist." > : > : I have defined NAME on my tokenizer factory as can be seen here: > : > https://github.com/yasar11732/lucene-zemberek/blob/master/src/main/java/com/github/yasar11732/lucene_zemberek/ZemberekTokenizerFactory.java#L14 > : > : Is there any other step I should take before solr will recognize my > tokenizer? > > The key piece you are are missing is java level "Service Provider" > Interfact registration of your class as an implemenation of the > TokenizerFactory "Service Interface" ... > > https://docs.oracle.com/javase/tutorial/sound/SPI-intro.html > > ..this is done using files under META-INF/services/ path inside your jar. > You can see an example of how Lucene registers some of it's > TokenizerFactories here... > > https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.TokenizerFactory > > ...if you wanted to implement your own TokenFilter or CharFilter those > would need to go in their own corisponding "Interface" based file name in > your jar. > > > Note: the "resources/" path is just where lucene keeps the source of that > file in git, not a path that exists in the jar... > > $ jar tf lucene-analysis-common-9.11.1.jar | grep > META-INF/services/org.apache.lucene > META-INF/services/org.apache.lucene.analysis.CharFilterFactory > META-INF/services/org.apache.lucene.analysis.TokenFilterFactory > META-INF/services/org.apache.lucene.analysis.TokenizerFactory > > > (I have very little experience with maven, but i believe if you create a > 'src/main/resources/META-INF/services/org.apache.lucene.analysis.TokenizerFactory' > file in your repo, the maven 'jar' plugin will do the right thing for you) > > > > > > -Hoss > http://www.lucidworks.com/