Re: Using the NOT operator with the AND operator
: @hoss, did that replace the previous article by Erick? I can't find the old : one anymore. If you mean the article originally titled "Why Not And Or And Not?" ... 1) That's this article 2) I wrote the original, Erick just really liked sharing it :) 3) SEO "Experts" have gradually sucked the soul out of it's title, headline, and url (and broken the redirects from the original URL i nthe process) -Hoss http://www.lucidworks.com/
Re: How to create a tokenizer in a way solr will recognize
Welcome to java ClassLoader hell! : Caused by: java.lang.IllegalArgumentException: resource : tokenization/sentence-boundary-model.bin not found. : at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220) : ~[?:?] : at com.google.common.io.Resources.getResource(Resources.java:196) ~[?:?] : at : zemberek.tokenization.TurkishSentenceExtractor.fromDefaultModel(TurkishSentenceExtractor.java:51) ... : However, I have tokenization/sentence-boundary-model.bin inside : zemberek-tokenization-0.17.1.jar file, which I also copied into lib : dir. You have to be specific: which "lib" directory are you talking about here? If you mean this... https://solr.apache.org/guide/solr/latest/configuration-guide/libs.html ...i would advise against this approach in general, and instead suggest that you put your custom code in a custom module directory... https://solr.apache.org/guide/solr/latest/configuration-guide/solr-modules.html ...or using the "package manager" (but i have very little experience with this)... https://solr.apache.org/guide/solr/latest/configuration-guide/package-manager.html Doing one of these (instead of directives in your solrconfig.xml) *may* fix your problem. As to what exactly your problem is... By the looks of it, based on the first google search result i found, I'm guessing this code is hte underlying code you are using... https://github.com/ahmetaa/zemberek-nlp/blob/a9c0f88210dd6a4a1b6152de88d117054a105879/tokenization/src/main/java/zemberek/tokenization/TurkishSentenceExtractor.java#L49 ...which uses the single argument version of com.google.common.io.Resources.getResource(...) which is documented to use the context classloader -- if you can change that code to use the two argument getResource(...) and pass in the TurkishSentenceExtractor.class, then it *should* always be the correct classloader that solr has created for your plugin/module/SolrCore (regardless of how exactly you've pointed Solr at your jar files) If you can't modify the TurkishSentenceExtractor class directly, you could maybe change your Factory so that instead of using TurkishSentenceExtractor.DEFAULT you create your own instance of TurkishSentenceExtractor passing in the loaded weights yourself. (H.. Except i think your problem is actaully in the static initializers during the classloading of TurkishSentenceExtractor? ... so that would probably still be a problem) Alternatively: You can end run around all of this and shove your jars into the WEB-INF/lib of solr itself -- which should make all the classes findable no matter what classloader is used. There is some (bad) precidence set for this in one of the spatial plugins https://solr.apache.org/guide/solr/latest/query-guide/spatial-search.html#jts-and-polygons-flat -Hoss http://www.lucidworks.com/
Re: Using the NOT operator with the AND operator
@hoss, did that replace the previous article by Erick? I can't find the old one anymore. On Thu, Jan 30, 2025 at 5:48 PM Chris Hostetter wrote: > > Obligatory reading about "boolean" queries in lucene & solr -- still very > relevant ~13 years later... > > https://lucidworks.com/post/solr-boolean-operators/ > > > > -Hoss > http://www.lucidworks.com/ > -- http://www.needhamsoftware.com (work) https://a.co/d/b2sZLD9 (my fantasy fiction book)
Re: How to create a tokenizer in a way solr will recognize
Thanks a lot. That solved that problem. I am now facing another problem; Caused by: java.lang.IllegalArgumentException: resource tokenization/sentence-boundary-model.bin not found. at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220) ~[?:?] at com.google.common.io.Resources.getResource(Resources.java:196) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor.fromDefaultModel(TurkishSentenceExtractor.java:51) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor.access$100(TurkishSentenceExtractor.java:28) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor$Singleton.(TurkishSentenceExtractor.java:261) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor$Singleton.(TurkishSentenceExtractor.java:256) ~[?:?] at zemberek.tokenization.TurkishSentenceExtractor.(TurkishSentenceExtractor.java:33) ~[?:?] at com.github.yasar11732.lucene_zemberek.ZemberekTokenizerFactory.(ZemberekTokenizerFactory.java:16) ~[?:?] However, I have tokenization/sentence-boundary-model.bin inside zemberek-tokenization-0.17.1.jar file, which I also copied into lib dir. Interestingly, I can instantiate a simple console application using the same zemberek.tokenization.TurkishSentenceExtractor class (that in turn tries to load tokenization/sentence-boundary-model.bin file) so I don't know why it doesn't find tokenization/sentence-boundary-model.bin when loaded into solr. Best Regards, Chris Hostetter , 31 Oca 2025 Cum, 03:40 tarihinde şunu yazdı: > > > : However, I am getting "A SPI class of type > : org.apache.lucene.analysis.TokenizerFactory with name > : 'zemberekTokenizer' does not exist." > : > : I have defined NAME on my tokenizer factory as can be seen here: > : > https://github.com/yasar11732/lucene-zemberek/blob/master/src/main/java/com/github/yasar11732/lucene_zemberek/ZemberekTokenizerFactory.java#L14 > : > : Is there any other step I should take before solr will recognize my > tokenizer? > > The key piece you are are missing is java level "Service Provider" > Interfact registration of your class as an implemenation of the > TokenizerFactory "Service Interface" ... > > https://docs.oracle.com/javase/tutorial/sound/SPI-intro.html > > ..this is done using files under META-INF/services/ path inside your jar. > You can see an example of how Lucene registers some of it's > TokenizerFactories here... > > https://github.com/apache/lucene/blob/main/lucene/analysis/common/src/resources/META-INF/services/org.apache.lucene.analysis.TokenizerFactory > > ...if you wanted to implement your own TokenFilter or CharFilter those > would need to go in their own corisponding "Interface" based file name in > your jar. > > > Note: the "resources/" path is just where lucene keeps the source of that > file in git, not a path that exists in the jar... > > $ jar tf lucene-analysis-common-9.11.1.jar | grep > META-INF/services/org.apache.lucene > META-INF/services/org.apache.lucene.analysis.CharFilterFactory > META-INF/services/org.apache.lucene.analysis.TokenFilterFactory > META-INF/services/org.apache.lucene.analysis.TokenizerFactory > > > (I have very little experience with maven, but i believe if you create a > 'src/main/resources/META-INF/services/org.apache.lucene.analysis.TokenizerFactory' > file in your repo, the maven 'jar' plugin will do the right thing for you) > > > > > > -Hoss > http://www.lucidworks.com/