List of common kstem overrides

2018-06-20 Thread Florian Hopf
Hi, we are using the kstem stemmer which is working fine most of the time but like most stemmers has its problems as well. Did anybody ever come across a list of common overrides to apply for the stemmer? I know that this depends a lot on the data that is being indexed but I was wondering if th

Change suggestion for ComplexPhraseQueryParser

2018-06-20 Thread Otmar Caduff
Dear committers Recently I wanted to be able to extend wildcard queries over phrases. To do so, I dived into ComplexPhraseQueryParser. It turned out that making a small change to that class allows me to achieve my goal. Because I thought that change might help others, I opend a Jira issue and at

Re: Lucene same search result for worlds with and without spaces

2018-06-20 Thread András Péteri
An n-gram tokenizer/filter might also work for you: http://lucene.apache.org/core/7_3_1/analyzers-common/org/apache/lucene/analysis/ngram/NGramTokenizer.html Regards, András On Wed, Jun 20, 2018 at 11:53 AM, Markus Jelsma wrote: > Hi Egorlex, > > Set the tokenSeparator to "" and ShingleFilter w

RE: Lucene same search result for worlds with and without spaces

2018-06-20 Thread Markus Jelsma
Hi Egorlex, Set the tokenSeparator to "" and ShingleFilter will concatenate all shingles without whitespace. Keep in mind, this will greatly increase the size of the index so it might not be a good idea to concatenate all pairs of words. If you are looking for finding "similarissues" with "sim

Re: Lucene same search result for worlds with and without spaces

2018-06-20 Thread egorlex
Thanks for replay! sorry, could you help a little, according to example "given the phrase “Shingles is a viral disease”, a shingle filter might produce: Shingles is is a a viral viral disease " I do not quite understand how this ShingleFilter can turn "similarissues" into "similar issues" Tha