Re: AlphaNumeric analyzer/tokenizer

2019-08-19 Thread Martin Grigorov
Hi, On Mon, Aug 19, 2019 at 9:31 AM Uwe Schindler wrote: > You already got many responses. Check you inbox. > "many" made me think that I've also missed something. https://markmail.org/message/ohv5qcvxilj3n3fb > > Uwe > > Am August 19, 2019 6:23:20 AM UTC schrieb Abhishek Chauhan < > abhishe

Re: AlphaNumeric analyzer/tokenizer

2019-08-18 Thread Uwe Schindler
You already got many responses. Check you inbox. Uwe Am August 19, 2019 6:23:20 AM UTC schrieb Abhishek Chauhan : >Hi, > >Can someone please check the above mail and provide some feedback? > >Thanks and Regards, >Abhishek > >On Fri, Aug 16, 2019 at 2:52 PM Abhishek Chauhan < >abhishek.chauhan...

Re: AlphaNumeric analyzer/tokenizer

2019-08-18 Thread Abhishek Chauhan
Hi, Can someone please check the above mail and provide some feedback? Thanks and Regards, Abhishek On Fri, Aug 16, 2019 at 2:52 PM Abhishek Chauhan < abhishek.chauhan...@gmail.com> wrote: > Hi, > > We have been using SimpleAnalyzer which keeps only letters in its tokens. > This limits us to se

RE: AlphaNumeric analyzer/tokenizer

2019-08-16 Thread Uwe Schindler
Hi, The easiest is to use PatternTokenizer as part of your analyzer. It uses a regular expression to split words. Just use some regular expression that matches unicode ranges for numbers and digits. To build your Analyzer use the class CustomAnalyzer and its builder API to construct your own a