Re: Python regular expression question!

Ant Wed, 20 Sep 2006 12:41:49 -0700

unexpected wrote:
> > \b matches the beginning/end of a word (characters a-zA-Z_0-9).
> > So that regex will match e.g. MULTX-FOO but not MULTX-.
> >
>
> So is there a way to get \b to include - ?


No, but you can get the behaviour you want using negative lookaheads.
The following regex is effectively \b where - is treated as a word
character:

pattern = r"(?![a-zA-Z0-9_-])"

This effectively matches the next character that isn't in the group
[a-zA-Z0-9_-] but doesn't consume it. For example:

>>> p = re.compile(r".*?(?![a-zA-Z0-9_-])(.*)")
>>> s = "aabbcc_d-f-.XXX YYY"
>>> m = p.search(s)
>>> print m.group(1)
.XXX YYY

Note that the regex recognises the '.' as the end of the word, but
doesn't use it up in the match, so it is present in the final capturing
group. Contrast it with:

>>> p = re.compile(r".*?[^a-zA-Z0-9_-](.*)")
>>> s = "aabbcc_d-f-.XXX YYY"
>>> m = p.search(s)
>>> print m.group(1)
XXX YYY

Note here that "[^a-zA-Z0-9_-]" still denotes the end of the word, but
this time consumes it, so it doesn't appear in the final captured group.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regular expression question!

Reply via email to