Re: Regex similar to "^(?u)\w$", but without digits?

Mark Tolonen Sun, 12 Apr 2009 21:22:26 -0700

"Andreas Pfrengle" <a.pfren...@gmail.com> wrote in messagenews:26d3bec3-8329-4432-a680-05c17f930...@3g2000yqk.googlegroups.com...

On 12 Apr., 02:31, "Mark Tolonen" <metolone+gm...@gmail.com> wrote:

"Andreas" <a.pfren...@gmail.com> wrote in message


news:f953c845-3660-4bb5-8ba7-00b93989c...@b1g2000vbc.googlegroups.com...

> Hello,

> I'd like to create a regex that captures any unicode character, but
> not the underscore and the digits 0-9. "^(?u)\w$" captures them also.
> Is there a possibility to restrict an expression like "\w" to "\w
> without [0-9_]"?

'(?u)[^\W0-9_]' removes 0-9_ from \w.

-Mark


Hello Mark,

haven't tried it yet, but it looks good!
@John: Sorry for being imprecise, I meant *letters*, not *characters*,
so requirement 2 fits my needs.

Note that \w matches alphanumeric Unicode characters. If you only wantletters, consider superscripts(¹²³), fractions (¼½¾), and other charactersare also numbers to Unicode. See the unicodedata.category function andhttp://www.unicode.org/Public/UNIDATA/UCD.html#General_Category_Values.

If you only want letters as considered by the Unicode standard, somethingthis would give you only Unicode letters (it could be optimized to listranges of characters):

u'(?u)[' + u''.join(unichr(n) for n in xrange(65536) ifud.category(unichr(n))[0]=='L') + u']'

Hmm, maybe Python 3.0 with its default Unicode strings needs a regexextension to specify the Unicode category to match.


-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Re: Regex similar to "^(?u)\w$", but without digits?

Reply via email to