Re: Penalize fact the searched term is within a world

2017-06-09 Thread Jacek Grzebyta
Unfortunately for the real data WhitespaceTokenizer does not work properly. I also tried KeywordAnalyzer because the data I need to process are just IDs but for that there is no output at all. On 9 June 2017 at 14:09, Uwe Schindler wrote: > Hi, > > the tokens are matched as is. It is only a mat

RE: Penalize fact the searched term is within a world

2017-06-09 Thread Uwe Schindler
Hi, the tokens are matched as is. It is only a match if the tokens are exactly the same bytes. There are never done any substring matches, just simple comparison of bytes. To have more fuzzier matches, you have to do text analysis right. This includes splitting of tokens (Tokenizer), but also

Re: Penalize fact the searched term is within a world

2017-06-09 Thread Jacek Grzebyta
Hi Ahmed, That works! Still I do not understand how that staff working. I just know that analysed cut an indexed text into tokens. But I do not know how the matching is done. Do you recommend and good book to read. I prefer something with less maths and more examples? The only I found is free "An