On 2012-12-08 17:48, rh wrote:
Look through some code I found this and wondered about what it does:
^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$
Here's my walk through:
1) ^ match at start of string
2) ?P<salsipuedes> if a match is found it will be accessible in a variable
salsipuedes
3) [0-9A-Za-z-_.//] this is the one that looks wrong to me, see below
4) + one or more from the preceeding char class
5) () the grouping we want returned (see #2)
6) $ end of the string to match against but before any newline
more on #3
the z-_ part looks wrong and seems that the - should be at the start
of the char set otherwise we get another range z-_ or does the a-z
preceeding the z-_ negate the z-_ from becoming a range? The "."
might be ok inside a char set. The two slashes look wrong but maybe
it has some special meaning in some case? I think only one slash is
needed.
I've looked at pydoc re, but it's cursory.
Python itself will help you:
>>> re.compile(r"^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$", flags=re.DEBUG)
at at_beginning
subpattern 1
max_repeat 1 65535
in
range (48, 57)
range (65, 90)
range (97, 122)
literal 45
literal 95
literal 46
literal 47
literal 47
at at_end
Inside the character set: "0-9", "A-Z" and "a-z" are ranges; "-", "_",
"." and "/" are literals. Doubling the "/" is unnecessary (it has no
special meaning). "-" is a literal because it immediately follows a
range, so it can't be defining another range (if it immediately
followed a literal and wasn't immediately followed by an unescaped "]"
then it would, so r"[a-]" is the same as r"[a\-]").
As for "(?P<salsipuedes>...)", it won't be accessible in a variable
"salsipuedes", but will be accessible as a named group in the match
object:
>>> m = re.match(r"(?P<foo>[a-z]+)", "xyz")
>>> m.group("foo")
'xyz'
--
http://mail.python.org/mailman/listinfo/python-list