Re: multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
ooleanClauses are set appropriately. > > -Original Message- > From: Valentin Popov [mailto:valentin...@gmail.com] > Sent: Monday, December 15, 2014 8:35 AM > To: java-user@lucene.apache.org > Subject: Re: multiterm numbers regexp search > > Mike, thanks. > >

RE: multiterm numbers regexp search

2014-12-15 Thread Allison, Timothy B.
27;t vouch for performance with the above options... Whichever path you take, make sure that the MultiTermQuery.RewriteMethod and/or maxBooleanClauses are set appropriately. -Original Message- From: Valentin Popov [mailto:valentin...@gmail.com] Sent: Monday, December 15, 2014 8:35 AM To: jav

Re: multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
Mike, thanks. Problem is that we cant change analyzer, as bank need a search not only for card numbers for compliance and already exist storage is hundred millions of emails. My thinking is make multiterm regexp search query, or search of combination of regexp queries with some distance betwee

Re: multiterm numbers regexp search

2014-12-15 Thread Michael Sokolov
You probably don't want to use StandardAnalyzer: maybe try WhitespaceAnalyzer, but you'll need to enhance your regex a little to deal with punctuation since WA may give you tokens like: 5106-7922-9469-8422. "5106-7922-9469-8422" etc -Mike On 12/15/14 3:45 AM, Valentin Popov wrote: I have

Re: multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
Nope, this is for compliance request for banking system, have a look to PCI DSS. @wmartinusa, please do not get the traffic, if you have nothing to say about subject. > On 15 дек. 2014 г., at 11:54, wmartinusa wrote: > > Sounds crooked. R u a criminal? > > > Sent from my LG Optimus G

multiterm numbers regexp search

2014-12-15 Thread Valentin Popov
I have a need to find mastercard numbers with regular expression. I’m using Query query = new RegexpQuery(new Term("body", "5{1}<1-5>{1}<0-9>{14}"), RegExp.ALL) to search numbers in email’s body and StandardAnalizer used for body indexing. So number like 5106792294698422 will be indexed as it