On Sunday, January 26, 2014 12:08:01 PM UTC-5, Chris Angelico wrote: > On Mon, Jan 27, 2014 at 3:59 AM, Blake Adams <blakesad...@gmail.com> wrote: > > > If I want to set up a match replicating the '\w' pattern I would assume > > that would be done with '[A-z0-9_]'. However, when I run the following: > > > > > > re.findall('[A-z0-9_]','^;z %C\@0~_') it matches ['^', 'z', 'C', '\\', '0', > > '_']. I would expect the match to be ['z', 'C', '0', '_']. > > > > > > Why does this happen? > > > > Because \w is not the same as [A-z0-9_]. Quoting from the docs: > > > > """ > > \w For Unicode (str) patterns:Matches Unicode word characters; this > > includes most characters that can be part of a word in any language, > > as well as numbers and the underscore. If the ASCII flag is used, only > > [a-zA-Z0-9_] is matched (but the flag affects the entire regular > > expression, so in such cases using an explicit [a-zA-Z0-9_] may be a > > better choice).For 8-bit (bytes) patterns:Matches characters > > considered alphanumeric in the ASCII character set; this is equivalent > > to [a-zA-Z0-9_]. > > """ > > > > If you're working with a byte string, then you're close, but A-z is > > quite different from A-Za-z. The set [A-z] is equivalent to > > [ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz] (that's > > a literal backslash in there, btw), so it'll also catch several > > non-alphabetic characters. With a Unicode string, it's quite > > distinctly different. Either way, \w means "word characters", though, > > so just go ahead and use it whenever you want word characters :) > > > > ChrisA
Thanks Chris -- https://mail.python.org/mailman/listinfo/python-list