Hi Piotr,
The behavior you mention is an intentional change from the behavior in Lucene
4.9.0 and earlier, when tokens longer than maxTokenLenth were silently ignored:
see LUCENE-5897[1] and LUCENE-5400[2].
The new behavior is as follows: Token matching rules are no longer allowed to
match aga
Hello.
Btw, I think ClassicAnalyzer has the same problem
Regards
On Fri, Jul 17, 2015 at 4:40 PM, Steve Rowe wrote:
> Hi Piotr,
>
> Thanks for reporting!
>
> See https://issues.apache.org/jira/browse/LUCENE-6682
>
> Steve
> www.lucidworks.com
>
> > On Jul 16, 2015, at 4:47 AM, Piotr Idzikowski
I should add that this is Lucene 4.10.4.
But I have checked it on the 5.2.1 version and I have got the same result
Regards
Piotr
On Mon, Jul 20, 2015 at 9:44 AM, Piotr Idzikowski wrote:
> Hello Steve,
> It is always pleasure to help you develop such a great lib.
> Talking about StandardTokenize
Hello Steve,
It is always pleasure to help you develop such a great lib.
Talking about StandardTokenizer and setMaxTokenLength, I think I have found
another problem.
It looks like when the word is longer than max length analyzer adds two
tokens -> word.substring(0,maxLength) and word.substring(maxL
Hi Piotr,
Thanks for reporting!
See https://issues.apache.org/jira/browse/LUCENE-6682
Steve
www.lucidworks.com
> On Jul 16, 2015, at 4:47 AM, Piotr Idzikowski
> wrote:
>
> Hello.
> I am developing own analyzer based on StandardAnalyzer.
> I realized that tokenizer.setMaxTokenLength is called
Hello.
I am developing own analyzer based on StandardAnalyzer.
I realized that tokenizer.setMaxTokenLength is called many times.
*protected TokenStreamComponents createComponents(final String fieldName,
final Reader reader) {*
*final StandardTokenizer src = new StandardTokenizer(getVersion(),