Re: regular expression, unicode

2009-04-30 Thread Simon Strobl
Thanks for your hints. Usually, all my files are utf-8. Obviously, I somehow managed to inadvertently switch the encoding when creating this specific file. I have no idea how this could happen. Simon -- http://mail.python.org/mailman/listinfo/python-list

regular expression, unicode

2009-04-29 Thread Simon Strobl
Hello, why can't I use this statement in python3: good = re.compile("^[A-ZÄÖÜ].*") According to the documentation, patterns can be unicode strings. I get this error message: Traceback (most recent call last): File "./get.py", line 8, in for line in sys.stdin: File "/usr/lib64/python3.

Re: regular expression, unicode

2009-04-29 Thread MRAB
Simon Strobl wrote: Hello, why can't I use this pattern good = re.compile("^[A-ZÄÖÜ].*") in python3. According to the documentation, patterns may be unicode strings. I get this error message: Traceback (most recent call last): File "./get.py", line 8, in for line in sys.stdin: File

Re: regular expression, unicode

2009-04-29 Thread Rhodri James
On Wed, 29 Apr 2009 12:44:12 +0100, Simon Strobl wrote: why can't I use this pattern good = re.compile("^[A-ZÄÖÜ].*") in python3. According to the documentation, patterns may be unicode strings. I get this error message: Traceback (most recent call last): File "./get.py", line 8, in

regular expression, unicode

2009-04-29 Thread Simon Strobl
Hello, why can't I use this pattern good = re.compile("^[A-ZÄÖÜ].*") in python3. According to the documentation, patterns may be unicode strings. I get this error message: Traceback (most recent call last): File "./get.py", line 8, in for line in sys.stdin: File "/usr/lib64/python3.0/

Re: regular expression unicode character class trouble

2005-09-05 Thread Diez B. Roggisch
Steven Bethard wrote: > I'd use something like r"[^_\d\W]", that is, all things that are neither > underscores, digits or non-alphas. In action: > > py> re.findall(r'[^_\d\W]+', '42badger100x__xxA1BC') > ['badger', 'x', 'xxA', 'BC'] > > HTH, Seems so, great! Diez -- http://mail.python.org/ma

Re: regular expression unicode character class trouble

2005-09-04 Thread Steven Bethard
Diez B. Roggisch wrote: > Hi, > > I need in a unicode-environment the character-class > > set("\w") - set("[0-9]") > > or aplha w/o num. Any ideas how to create that? I'd use something like r"[^_\d\W]", that is, all things that are neither underscores, digits or non-alphas. In action: py> re

regular expression unicode character class trouble

2005-09-04 Thread Diez B. Roggisch
Hi, I need in a unicode-environment the character-class set("\w") - set("[0-9]") or aplha w/o num. Any ideas how to create that? And what performance implications do I have to fear? I mean I guess that the characterclasses aren't implementet as sets, but as comparison-function that compares a