On 8/12/12 18:48:13, rh wrote: > Look through some code I found this and wondered about what it does: > ^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$ > > Here's my walk through: > > 1) ^ match at start of string > 2) ?P<salsipuedes> if a match is found it will be accessible in a > variable salsipuedes
I wouldn't call it a variable. If m is a match-object produced by this regex, then m.group('salsipuedes') will return the part that was captured. I'm not sure, though, why you'd want to define a group that effectively spans the whole regex. If there's a match, then m.group(0) will return the matching substring, and m.group('salsipuedes') will return the substring that matched the parenthesized part of the pattern and these two substrings will be equal, since the only bits of the pattern outside the parenthesis are zero-width assertions. > 3) [0-9A-Za-z-_.//] this is the one that looks wrong to me, see below > 4) + one or more from the preceeding char class > 5) () the grouping we want returned (see #2) > 6) $ end of the string to match against but before any newline > > more on #3 > the z-_ part looks wrong and seems that the - should be at the start > of the char set otherwise we get another range z-_ or does the a-z > preceeding the z-_ negate the z-_ from becoming a range? The latter: a-z is a range and block the z-_ from being a range. Consequently, the -_ bit matches only - and _. > The "." might be ok inside a char set. It is. Most special characters lose their special meaning inside a char set. > The two slashes look wrong but maybe it has some special meaning > in some case? I think only one slash is needed. You're correct: there's no special meaning and only one slash is needed. But then, a char set is a set and duplcates are simply ignored, so it does no harm. Perhaps the person who wrote this was confusing slashes and backslashes. > I've looked at pydoc re, but it's cursory. That's one way of putting it. Hope this helps, -- HansM -- http://mail.python.org/mailman/listinfo/python-list