Re: [go-nuts] regexp syntax and named Unicode character classes

Tom Payne Tue, 07 Jan 2020 10:40:08 -0800

Thank you :) Is this worth adding to the regexp/syntax documentation? I'd 
happily contribute a patch.


On Tuesday, January 7, 2020 at 7:36:02 PM UTC+1, Ian Lance Taylor wrote:
>
> On Tue, Jan 7, 2020 at 10:22 AM Tom Payne <twp...@gmail.com <javascript:>> 
> wrote: 
> > 
> > tl;dr How should I use named Unicode character classes in regexps? 
> > 
> > I'm trying to write a regular expression that matches Go identifiers, 
> which start with a Unicode letter or underscore followed by zero or more 
> Unicode letters, decimal digits, and/or underscores. 
> > 
> > Based on the regexp syntax, and the variables in the unicode package 
> which mention the classes "Letter" and "Number, decimal digit", I was 
> expecting to write something like: 
> > 
> >   identiferRegexp := 
> regexp.MustCompile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal 
> digit}]_]*\z`) 
> > 
> > However, this pattern does not compile, giving the error: 
> > 
> >   regexp: Compile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal 
> digit}]_]*\z`): error parsing regexp: invalid character class range: 
> `\p{Letter}` 
> > 
> > Using the short name for character classes (L for Letter, Nd for Number, 
> decimal digit) does work however: 
> > 
> >   identiferRegexp := regexp.MustCompile(`\A[\pL_][\pL\p{Nd}_]*\z`) 
> > 
> > You can play with these regexps on play.golang.org. 
> > 
> > Is this simply an oversight that Unicode character classes like "Letter" 
> and "Number, decimal digit" are not available for use in regexps, or should 
> I be using them differently? 
>
> The strings you can use with \p are the ones listed in 
> unicode.Categories and unicode.Scripts.  So use \pL as you do in the 
> second example. 
>
> Ian 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a22421cc-becb-496e-8d32-b41506536a54%40googlegroups.com.

Re: [go-nuts] regexp syntax and named Unicode character classes

Reply via email to