On 03/09/2015 01:26 PM, Antoon Pardon wrote:
Op 09-03-15 om 12:17 schreef Tim Chase:
On 2015-03-09 11:37, Wolfgang Maier wrote:
On 03/09/2015 11:23 AM, Antoon Pardon wrote:
Does anyone know what regular expression to use for a sequence of
letters? There is a class for alphanumerics but I can't find one
for just letters, which I find odd.
how about [a-zA-Z] ?
That breaks if you have Unicode letters.  While ugly, since "\w" is
composed of "letters, numbers, and underscores", you can assert that
the "\w" you find is not a number or underscore by using

   (?:(?!_|\d)\w)

So if I understand correctly the following should be a regular expression for
a python3 identifier.

   (?:(?!_|\d)\w)\w+


No, that is not it. For one thing, a leading underscore is fine in identifier names. That is easy to fix in your expression though. Another thing are the Other_ID_Start and Other_ID_Continue categories defined in http://www.unicode.org/Public/6.3.0/ucd/PropList.txt, e.g.,

>>> '\u212E'
'℮'
>>> ℮ = 10
>>> ℮
10

though ℮ is not included in \w.

It seems odd that one should need such an ugly expression for something that is
used rather frequently for parsing computer languages and the like.


There is str.isidentifier, which returns True if something is a valid identifier name:

>>> '℮'.isidentifier()
True



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to