After testing on 4800 fairly complex queries, I see a performance gain of
10% after doing indexWriter.forceMerge(1); indexWriter.commit(); from 209
ms per query, to 185 ms per query.
Queries are quite complex, often about 30 or words, of the format OR
text:
It went from 214 to 14 files on the for
Thank you! I will give it a try and share my findings with you all
Regards
Amitesh
On Thu, Sep 21, 2023 at 08:18 Uwe Schindler wrote:
> The problem with WhitespaceTokenizer is that is splits only on
> whitespace. If you have text like "This is, was some test." then you get
> tokens like "is," a
The problem with WhitespaceTokenizer is that is splits only on
whitespace. If you have text like "This is, was some test." then you get
tokens like "is," and "test." including the punctuations.
This is the reason why StandardTokenizer is normally used for human
readable text. WhitespaceTokeniz
Hello,
I'm surprised and in doubt it may happen. Would you mind to upload a short
test reproducing it?
On Wed, Sep 20, 2023 at 11:44 PM Amitesh Kumar
wrote:
> Thanks Mikhail!
>
> I have tried all other tokenizers from Lucene4.4. In case of
> WhitespaceTokwnizer, it loses romanizing of special ch