subject:"URL\/Email tokenizer"

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea

Ah, you want to do it the hard way. Sorry, can't help you there - I prefer to do things the simple way - easier to write and to maintain and, in my experience, usually more robust in the long run. -- Ian. On Tue, Feb 17, 2015 at 11:42 AM, Ravikumar Govindarajan wrote: > Thanks Ian > > What I

Re: URL/Email tokenizer

2015-02-17 Thread Ravikumar Govindarajan

Thanks Ian What I am currently doing is duplicating the data into 2 different fields and having my own PerFieldAnalyzerWrapper just like you pointed out Is there a good way to do this in a single-pass? Like how Bi-Grams or Common-Grams do… -- Ravi On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea wrote

Re: URL/Email tokenizer

2015-02-17 Thread Ian Lea

Sounds like a job for org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. -- Ian. On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan wrote: > We have a requirement in that E-mail addresses need to be added in a > tokenized form to one field while untokenized form is added to

URL/Email tokenizer

2015-02-17 Thread Ravikumar Govindarajan

We have a requirement in that E-mail addresses need to be added in a tokenized form to one field while untokenized form is added to another field Ex: "I have mailed a...@xyz.com" . It should tokenize as below body = {"I", "have", "mailed", "abc", "xyz", "com"}; I also have a body-addr field. To