Re: StandardTokenizer#setMaxTokenLength

2015-07-20 Thread Steve Rowe
Hi Piotr, The behavior you mention is an intentional change from the behavior in Lucene 4.9.0 and earlier, when tokens longer than maxTokenLenth were silently ignored: see LUCENE-5897[1] and LUCENE-5400[2]. The new behavior is as follows: Token matching rules are no longer allowed to match aga

Re: StandardTokenizer#setMaxTokenLength

2015-07-20 Thread Piotr Idzikowski
Hello. Btw, I think ClassicAnalyzer has the same problem Regards On Fri, Jul 17, 2015 at 4:40 PM, Steve Rowe wrote: > Hi Piotr, > > Thanks for reporting! > > See https://issues.apache.org/jira/browse/LUCENE-6682 > > Steve > www.lucidworks.com > > > On Jul 16, 2015, at 4:47 AM, Piotr Idzikowski

Re: StandardTokenizer#setMaxTokenLength

2015-07-20 Thread Piotr Idzikowski
I should add that this is Lucene 4.10.4. But I have checked it on the 5.2.1 version and I have got the same result Regards Piotr On Mon, Jul 20, 2015 at 9:44 AM, Piotr Idzikowski wrote: > Hello Steve, > It is always pleasure to help you develop such a great lib. > Talking about StandardTokenize

Re: StandardTokenizer#setMaxTokenLength

2015-07-20 Thread Piotr Idzikowski
Hello Steve, It is always pleasure to help you develop such a great lib. Talking about StandardTokenizer and setMaxTokenLength, I think I have found another problem. It looks like when the word is longer than max length analyzer adds two tokens -> word.substring(0,maxLength) and word.substring(maxL

Re: StandardTokenizer#setMaxTokenLength

2015-07-17 Thread Steve Rowe
Hi Piotr, Thanks for reporting! See https://issues.apache.org/jira/browse/LUCENE-6682 Steve www.lucidworks.com > On Jul 16, 2015, at 4:47 AM, Piotr Idzikowski > wrote: > > Hello. > I am developing own analyzer based on StandardAnalyzer. > I realized that tokenizer.setMaxTokenLength is called

StandardTokenizer#setMaxTokenLength

2015-07-16 Thread Piotr Idzikowski
Hello. I am developing own analyzer based on StandardAnalyzer. I realized that tokenizer.setMaxTokenLength is called many times. *protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) {* *final StandardTokenizer src = new StandardTokenizer(getVersion(),