On Jul 31, 2008, at 10:06 PM, Christopher M Collins wrote:
I'm trying to use SpanRegexQuery as one of the clauses in my
SpanQuery.
When I give it a regex like: "L[a-z]+ing" and do a rewrite on the
final
query I get terms like "Labinger" and "Lackonsingh" along with the
expected
terms "Labeling", "Lacing", etc. It's as if the regex is treated as a
"find()" and not a "match()" in Java. Is there a way to make it
behave
like a full match, and not a prefix regex?
There are two implementations of the regex engine built into
SpanRegexQuery, one using Java's java.util.regex, the other using
Jakarta Regexp. The default implementation is java.util.regex, which
matches like this:
pattern.matcher(string).lookingAt()
And Jakarta Regexp matches like this:
regexp.match(string)
I'm not sure myself the differences in these two without doing some
tests, but certainly they should, ahem, match in at least the
expectation of whether there is an implied ^string$ or not. But at a
quick glance the respective javadocs, it does seem like the
java.util.regex implementation should be using
pattern.matcher(string).matches() instead. lookingAt() always starts
at the beginning, so there is an implied ^string effect, but not so
with the akarta Regexp implementation.
As Daniel mentioned, putting a $ at the end should do the trick, and
seems to me that it really should be necessary... but so should ^ in
front if you want it to start at the beginning and not match anywhere
in the string.
Changing JavaUtilRegexCapabilities to use matches() seems like the
right thing to do, but that'd break backwards compatibility. *ugh*
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]