Unfortunately for the real data WhitespaceTokenizer does not work properly.
I also tried KeywordAnalyzer because the data I need to process are just
IDs but for that there is no output at all.
On 9 June 2017 at 14:09, Uwe Schindler wrote:
> Hi,
>
> the tokens are matched as is. It is only a mat
Hi,
the tokens are matched as is. It is only a match if the tokens are exactly the
same bytes. There are never done any substring matches, just simple comparison
of bytes.
To have more fuzzier matches, you have to do text analysis right. This includes
splitting of tokens (Tokenizer), but also
Hi Ahmed,
That works! Still I do not understand how that staff working. I just know
that analysed cut an indexed text into tokens. But I do not know how the
matching is done.
Do you recommend and good book to read. I prefer something with less maths
and more examples?
The only I found is free "An