PG Doc comments form <nore...@postgresql.org> writes: > On https://www.postgresql.org/docs/11/functions-matching.html paragraph > 9.7.3.2. Bracket Expressions says "Standard character class names are: > alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, > xdigit". The class "ascii" exists, but is not mentioned (probably a > combination of some of the other classes). Are there any other classes?
Hm, fair question. I think the text means to say that these are the character class names required by the POSIX regexp spec, which is accurate. A look into our src/backend/regex/regc_locale.c will show you that we also implement "ascii", and no others. That probably ought to be documented. > Do they work only for ASCII characters (e.g. '\u00A0' is not picked up > by '[:blank:]')? The POSIX ones are implemented by calling the C library, so it's whatever the ctype.h and wctype.h functions think is appropriate for your LC_CTYPE setting. The 20-year-old reference in our text to ctype(3) seems rather unhelpful today; in the first place, there's no such man page on my Linux systems, and in the second place, wctype(3) is more important if it exists, and in the third place what a reader actually wants to know is that this is controlled by the LC_CTYPE server parameter. It'd likely be better to dump the man-page reference altogether and instead point readers to our "Locale Support" chapter. regards, tom lane