[go-nuts] regexp syntax and named Unicode character classes

Tom Payne Tue, 07 Jan 2020 10:22:44 -0800

Hi,

tl;dr How should I use named Unicode character classes in regexps?


I'm trying to write a regular expression that matches Go identifiers 
<https://golang.org/ref/spec#Identifiers>, which start with a Unicode 
letter or underscore followed by zero or more Unicode letters, decimal 
digits, and/or underscores.

Based on the regexp syntax <https://godoc.org/regexp/syntax>, and the variables 
in the unicode package <https://golang.org/pkg/unicode/#pkg-variables> which 
mention the classes "Letter" and "Number, decimal digit", I was expecting 
to write something like:

  identiferRegexp := 
regexp.MustCompile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal 
digit}]_]*\z`)

However, this pattern does not compile, giving the error:

  regexp: Compile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal 
digit}]_]*\z`): error parsing regexp: invalid character class range: 
`\p{Letter}`

Using the short name for character classes (L for Letter, Nd for Number, 
decimal digit) does work however:

  identiferRegexp := regexp.MustCompile(`\A[\pL_][\pL\p{Nd}_]*\z`)

You can play with these regexps on play.golang.org 
<https://play.golang.org/p/WITTbqvom9F>.

Is this simply an oversight that Unicode character classes like "Letter" 
and "Number, decimal digit" are not available for use in regexps, or should 
I be using them differently?

Many thanks,
Tom

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/844b99d5-0ae1-4e29-b576-cd4f1b1c24b3%40googlegroups.com.

[go-nuts] regexp syntax and named Unicode character classes

Reply via email to