Re: re Questions

Mark Lawrence Sun, 26 Jan 2014 09:33:45 -0800

On 26/01/2014 17:15, Blake Adams wrote:

On Sunday, January 26, 2014 12:08:01 PM UTC-5, Chris Angelico wrote:

On Mon, Jan 27, 2014 at 3:59 AM, Blake Adams <[email protected]> wrote:

If I want to set up a match replicating the '\w' pattern I would assume that 
would be done with '[A-z0-9_]'.  However, when I run the following:

re.findall('[A-z0-9_]','^;z %C\@0~_') it matches ['^', 'z', 'C', '\\', '0', 
'_'].  I would expect the match to be ['z', 'C', '0', '_'].

Why does this happen?




Because \w is not the same as [A-z0-9_]. Quoting from the docs:



"""

\w For Unicode (str) patterns:Matches Unicode word characters; this

includes most characters that can be part of a word in any language,

as well as numbers and the underscore. If the ASCII flag is used, only

[a-zA-Z0-9_] is matched (but the flag affects the entire regular

expression, so in such cases using an explicit [a-zA-Z0-9_] may be a

better choice).For 8-bit (bytes) patterns:Matches characters

considered alphanumeric in the ASCII character set; this is equivalent

to [a-zA-Z0-9_].

"""



If you're working with a byte string, then you're close, but A-z is

quite different from A-Za-z. The set [A-z] is equivalent to

[ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz] (that's

a literal backslash in there, btw), so it'll also catch several

non-alphabetic characters. With a Unicode string, it's quite

distinctly different. Either way, \w means "word characters", though,

so just go ahead and use it whenever you want word characters :)



ChrisA


Thanks Chris


I'm pleased to see that your question has been answered.

Now would you please read and action thishttps://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing thedouble line spacing above, thanks.

--

My fellow Pythonistas, ask not what our language can do for you, askwhat you can do for our language.


Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list

Re: re Questions

Reply via email to