Re: regex walktrough

MRAB Sat, 08 Dec 2012 10:12:24 -0800

On 2012-12-08 17:48, rh wrote:

  Look through some code I found this and wondered about what it does:
^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$


Here's my walk through:

1) ^ match at start of string
2) ?P<salsipuedes> if a match is found it will be accessible in a variable
salsipuedes
3) [0-9A-Za-z-_.//] this is the one that looks wrong to me, see below
4) + one or more from the preceeding char class
5) () the grouping we want returned (see #2)
6) $ end of the string to match against but before any newline


more on #3
the z-_ part looks wrong and seems that the - should be at the start
of the char set otherwise we get another range z-_ or does the a-z
preceeding the z-_ negate the z-_ from becoming a range?  The "."
might be ok inside a char set. The two slashes look wrong but maybe
it has some special meaning in some case? I think only one slash is
needed.

I've looked at pydoc re, but it's cursory.

Python itself will help you:

>>> re.compile(r"^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$", flags=re.DEBUG)
at at_beginning
subpattern 1
  max_repeat 1 65535
    in
      range (48, 57)
      range (65, 90)
      range (97, 122)
      literal 45
      literal 95
      literal 46
      literal 47
      literal 47
at at_end

Inside the character set: "0-9", "A-Z" and "a-z" are ranges; "-", "_",
"." and "/" are literals. Doubling the "/" is unnecessary (it has no
special meaning). "-" is a literal because it immediately follows a
range, so it can't be defining another range (if it immediately
followed a literal and wasn't immediately followed by an unescaped "]"
then it would, so r"[a-]" is the same as r"[a\-]").

As for "(?P<salsipuedes>...)", it won't be accessible in a variable
"salsipuedes", but will be accessible as a named group in the match
object:

>>> m = re.match(r"(?P<foo>[a-z]+)", "xyz")
>>> m.group("foo")
'xyz'

--
http://mail.python.org/mailman/listinfo/python-list

Re: regex walktrough

Reply via email to