Thank you :) Is this worth adding to the regexp/syntax documentation? I'd happily contribute a patch.
On Tuesday, January 7, 2020 at 7:36:02 PM UTC+1, Ian Lance Taylor wrote: > > On Tue, Jan 7, 2020 at 10:22 AM Tom Payne <twp...@gmail.com <javascript:>> > wrote: > > > > tl;dr How should I use named Unicode character classes in regexps? > > > > I'm trying to write a regular expression that matches Go identifiers, > which start with a Unicode letter or underscore followed by zero or more > Unicode letters, decimal digits, and/or underscores. > > > > Based on the regexp syntax, and the variables in the unicode package > which mention the classes "Letter" and "Number, decimal digit", I was > expecting to write something like: > > > > identiferRegexp := > regexp.MustCompile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal > digit}]_]*\z`) > > > > However, this pattern does not compile, giving the error: > > > > regexp: Compile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal > digit}]_]*\z`): error parsing regexp: invalid character class range: > `\p{Letter}` > > > > Using the short name for character classes (L for Letter, Nd for Number, > decimal digit) does work however: > > > > identiferRegexp := regexp.MustCompile(`\A[\pL_][\pL\p{Nd}_]*\z`) > > > > You can play with these regexps on play.golang.org. > > > > Is this simply an oversight that Unicode character classes like "Letter" > and "Number, decimal digit" are not available for use in regexps, or should > I be using them differently? > > The strings you can use with \p are the ones listed in > unicode.Categories and unicode.Scripts. So use \pL as you do in the > second example. > > Ian > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a22421cc-becb-496e-8d32-b41506536a54%40googlegroups.com.