Or take a look at search.regex.RegexQuery contrib module. You won't be able to use that via QueryParser either.
It might make more sense to do the sanitizing before indexing rather than after. -- Ian. On Fri, Mar 2, 2012 at 7:26 AM, Trejkaz <trej...@trypticon.org> wrote: > On Fri, Mar 2, 2012 at 6:22 PM, su ha <s_han...@yahoo.com> wrote: >> Hi, >> I'm new to Lucene. I'm indexed some documents with Lucene and need to >> sanitize it to ensure >> that they do not have any social security numbers (3-digits 2-digits >> 4-digits). >> >> (How) Can I write a query (with the QueryParser) that searches for this >> pattern? >> >> e.g. I can do [000 to 999] or [00 to 99] or [0000 to 9999], but this causes >> hits with any 2, 3 or 4 digit number. >> Something like "[000 to 999] [00 TO 99] [0000 TO 9999]", I get no hits at >> all. >> >> Is this possible with the default QueryParser? >> Or is there some other programmatic way to do it? > > The programmatic way is to use SpanMultiTermQueryWrapper around each > RangeQuery and then SpanNearQuery around the lot. > > The default QueryParser probably can't do it. I believe someone was > enhancing it for wildcards but I'm not sure if range queries were > included in all that. > > TX > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org