[racket-users] Are Regular Expression classes Unicode aware?

Peter W A Wood Thu, 09 Jul 2020 07:19:48 -0700

I was experimenting with regular expressions to try to emulate the Python 
isalpha() String method. Using a simple [a-zA-Z] character class worked for the 
English alphabet (ASCII characters):


> (regexp-match? #px"^[a-zA-Z]+$" "hello")
#t
> (regexp-match? #px"^[a-zA-Z]+$" "h1llo")
#f 

It then dawned on me that the Python is alpha() method was Unicode aware:

>>> "é".isalpha()
True

I started scratching my head as how to achieve the equivalent using a regular 
expression in Python. I tried the same regular expression with a non-English 
character in the string. To my surprise, the regular expression recognised the 
non-ASCII characters.

> (regexp-match? #px"^[a-zA-Z]+$" "h\U+FFC3\U+FFA9llo")
#t

Are Racket regular expression character classes Unicode aware or is there some 
other explanation why this regular expression matches?

Peter

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/2197C34F-165D-4D97-97AD-F158153316F5%40gmail.com.

[racket-users] Are Regular Expression classes Unicode aware?

Reply via email to