On Jul 31, 2008, at 10:06 PM, Christopher M Collins wrote:
I'm trying to use SpanRegexQuery as one of the clauses in my SpanQuery. When I give it a regex like: "L[a-z]+ing" and do a rewrite on the final query I get terms like "Labinger" and "Lackonsingh" along with the expected
terms "Labeling", "Lacing", etc.  It's as if the regex is treated as a
"find()" and not a "match()" in Java. Is there a way to make it behave
like a full match, and not a prefix regex?

There are two implementations of the regex engine built into SpanRegexQuery, one using Java's java.util.regex, the other using Jakarta Regexp. The default implementation is java.util.regex, which matches like this:

  pattern.matcher(string).lookingAt()

And Jakarta Regexp matches like this:

  regexp.match(string)

I'm not sure myself the differences in these two without doing some tests, but certainly they should, ahem, match in at least the expectation of whether there is an implied ^string$ or not. But at a quick glance the respective javadocs, it does seem like the java.util.regex implementation should be using pattern.matcher(string).matches() instead. lookingAt() always starts at the beginning, so there is an implied ^string effect, but not so with the akarta Regexp implementation.

As Daniel mentioned, putting a $ at the end should do the trick, and seems to me that it really should be necessary... but so should ^ in front if you want it to start at the beginning and not match anywhere in the string.

Changing JavaUtilRegexCapabilities to use matches() seems like the right thing to do, but that'd break backwards compatibility. *ugh*

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to