On 02/27/2014 12:31 PM, Aharon Robbins wrote:
What a mouthful!  Is all that really necessary?


You should have seen it before I trimmed it down; it listed every POSIX character. I dunno, maybe it could be trimmed, but I was worried about oddball character sets like the unibyte JIS character set that's like ASCII but substitutes Yen-sign for '\', and a couple of other substitutions like that. I figured better safe than sorry. No big deal of course.

I'd suggest parentheses around the bit with the bitwise operator, both for readability and to match the rest of the code.

Done, with the attached patch. Oh, and I fixed an xdigit buglet I found too, in the second patch in the attachment.

>@@ -1000,7 +1043,10 @@ parse_bracket_exp (void)
>                /* Fetch bracket.  */
>                FETCH_WC (c, wc, _("unbalanced ["));
>                if (c1 == ':')
>-                /* build character class.  */
>+                /* Build character class.  POSIX allows character
>+                   classes to match multicharacter collating elements,
>+                   but the regex code does not support that, so do not
>+                   worry about that possibility.  */
I thought GLIBC did support them?
Source code says no. That is, [[:alpha:]] never matches a multicharacter collating sequence. [[=a=]] might do so, but [[:alpha:]] doesn't. (Unless I'm reading the source code wrong, which is possible. It's not documented either way, as far as I know.)
>From 7725d64fb955e9491a0f1e9a95a655f67e0ab74e Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 27 Feb 2014 13:17:45 -0800
Subject: [PATCH 1/2] * src/dfa.c (parse_bracket_exp): Parenthesize.

---
 src/dfa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/dfa.c b/src/dfa.c
index 65ab5d6..a49b834 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1023,7 +1023,7 @@ parse_bracket_exp (void)
           char str[MAX_BRACKET_STRING_LEN + 1];
           FETCH_WC (c1, wc1, _("unbalanced ["));
 
-          if ((c1 == ':' && syntax_bits & RE_CHAR_CLASSES)
+          if ((c1 == ':' && (syntax_bits & RE_CHAR_CLASSES))
               || c1 == '.' || c1 == '=')
             {
               size_t len = 0;
-- 
1.8.5.3


>From 73dc80d42091a2c3d49dd2d9684e65b1107334a2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 27 Feb 2014 13:19:33 -0800
Subject: [PATCH 2/2] * src/dfa.c (prednames): POSIX allows [[:xdigit:]] to
 match multibyte chars.

---
 src/dfa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/dfa.c b/src/dfa.c
index a49b834..f4590da 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -926,7 +926,7 @@ static const struct dfa_ctype prednames[] = {
   {"upper", isupper, false},
   {"lower", islower, false},
   {"digit", isdigit, true},
-  {"xdigit", isxdigit, true},
+  {"xdigit", isxdigit, false},
   {"space", isspace, false},
   {"punct", ispunct, false},
   {"alnum", isalnum, false},
-- 
1.8.5.3

Reply via email to