Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

Koichi Murase Mon, 28 Nov 2022 02:52:39 -0800

2022年11月23日(水) 5:24 Chet Ramey <chet.ra...@case.edu>:
> I attached the latest patch against bash-5.2.9.
----


> commit 3c9dd4565792bc53de3a94ec38a65a1989f3fe2f (upstream/devel)
>
>     associative array elements; last set of changes to globbing
>     bracket expressions; fix for timing subshell commands

Thank you for the discussion and for applying the changes.  Besides, I
am sorry that I still have a discussion on the behavior of BRACKMATCH,
so it was not the last set of changes.  After the above fix, I moved
to check the behavior related to PATSCAN, where I found inconsistent
results related to the difference between BRACKMATCH and PATSCAN in
parsing the bracket expressions.  I checked also other parts of the
codebase and found additional inconsistencies.


Description
-----------

First, let me introduce the symbols (A)..(D) to later reference the
implementations of the bracket expression in the codebase.  There are
four independent codes that implement rules for extracting the bracket
expression in the current codebase:

- (A) The main loop of BRACKMATCH: This handles sub-expressions of a
  bracket expression when a matching sub-expression is not found.

- (B) The section of the `matched' label in BRACKMATCH: This handles
  sub-expressions of the bracket expression after a matching
  sub-expression is found.

- (C) PATSCAN: This skips bracket expressions to determine the end of
  the extglob constructs of the form @(...), ?(...), +(...), etc.

- (D) MATCHLEN (lib/glob/gm_loop.c): This function handles bracket
  expressions to count the number of characters that a fixed-length
  pattern can match.

Actually, each of the four implements a distinct rule, which does not
match any of the other three.  These implementations need to be
adjusted to support an identical and consistent rule.


Repeat-By
---------

The differences between (A)..(D) cause various strange behaviors.

1. Strange behavior caused by an inconsistency between (A/B) and (C)

  This is what I was first faced with.  The following shows the result
  of [example4.sh] with the current devel, where column 4
  `{yes,no}/{yes,no}' shows `(result)/(what I expect)':

  --- PATSCAN vs BRACKMATCH ---
  #1: pat=@([[.].])A])         str=]                no/yes
  #2: pat=@([[.].])A])         str===]A])           no/no
  #3: pat=@([[.].])A])         str=AA])             yes/no
  #4: pat=@([[=]=])A])         str=]                no/no
  #5: pat=@([[=]=])A])         str===]A])           no/yes
  #6: pat=@([[=]=])A])         str=AA])             yes/no

  Obvious strange behaviors can be found in cases #3 and #6, where `A'
  matches twice even if there is only one `A' and no repetition such
  as `*()' or `+()' in the pattern.  This is because PATSCAN (C)
  considers the bracket expression to be `[[.].]' while BRACKMATCH
  (A/B) considers the bracket expression to be `[[.].])A]'.  First,
  PATSCAN extracts `@([[.].])', but BRACKMATCH next matches the first
  `A' in the input string using a pattern character `A' outside the
  construct `@()'.  Finally, the remaining part `A])' in the pattern
  is matched literally.

2. Inconsistency between (A) and (B):

  To fix the above item for (A/B) vs (C), I checked the detailed
  behaviors of both and found this.  The parsing of [.ch.], [=a=], and
  [:var:] are not totally consistent before and after a matching
  sub-expression is found.  The following is the second section of the
  result of [example4.sh]:

  --- BRACKMATCH: after match vs before match ---
  #7: pat=[[=]=]ab]            str=a                yes/no
  #8: pat=[[.[=.]ab]           str=a                yes/yes
  #9: pat=[[.[==].]ab]         str=a                yes/yes

  #10: pat=[a[=]=]b]            str=a                no/no
  #11: pat=[a[.[=.]b]           str=a                no/yes
  #12: pat=[a[.[==].]b]         str=a                no/yes

  #13: pat=[a[=]=]b]            str=b                yes/no
  #14: pat=[a[=]=]b]            str=a=]b]            yes/yes
  #15: pat=[a[.[=.]b]           str=b                yes/yes
  #16: pat=[a[.[=.]b]           str=ab]              yes/no
  #17: pat=[a[.[==].]b]         str=b                yes/yes
  #18: pat=[a[.[==].]b]         str=ab]              yes/no

  Cases #7..#9 succeeds, which means that `[=]=]', `[.[=.]', and
  `[.[==].]' form an equivalence class and collating symbols in
  BRACKMATCH (A).  However, cases #10..#12 (which are the bracket
  expressions of the same sub-expression with different ordering)
  fail, which means that `[=]=]', `[.[=.]', and `[.[==].]'  do not
  form an equivalence class or a collating symbol in BRACKMATCH (B).

  Also, cases #13 vs #14, #15 vs #16, and #17 vs #18 demonstrate that
  the same pattern consisting of bracket expressions and normal
  characters can match different numbers of characters.  This means
  that the boundary of the bracket expression can change depending on
  the input string.

  Actually, these patterns are undefined by the standard because an
  equivalence class shall not contain `]' for cases #7, #10, #13, and
  #14, and the opening `[.' and `[=' shall be followed by the
  corresponding `.]` and `=]`, respectively, for the other cases.
  Nevertheless, even if the behavior is undefined, I expect at least
  the same results for pairs (#7, #10), (#8, #11), and (#9, #12),
  respectively.  I also expect that only one from each pair (#13,
  #14), (#15, #16), or (#17, #18) succeeds.  Otherwise, we cannot
  determine the range of the bracket expression before seeing the
  input string.

3. Differences for incomplete [:ccname:], [=a=], and [:var:] within (A)

  In trying to implement a common implementation for (A)..(D), I also
  noticed that the behavior of incomplete [:cclass:], [=a=], and
  [.ch.] are different from one another within BRACKMATCH (A):

  --- incomplete POSIX brackets ---
  #19: pat=x[a[:y]              str=x[               no/???
  #20: pat=x[a[:y]              str=x:               yes/???
  #21: pat=x[a[:y]              str=xy               yes/???
  #22: pat=x[a[:y]              str=x[ay             no/???

  #23: pat=x[a[.y]              str=x[               no/???
  #24: pat=x[a[.y]              str=x.               no/???
  #25: pat=x[a[.y]              str=xy               no/???
  #26: pat=x[a[.y]              str=x[ay             yes/???

  #27: pat=x[a[=y]              str=x[               yes/???
  #28: pat=x[a[=y]              str=x=               yes/???
  #29: pat=x[a[=y]              str=xy               yes/???
  #30: pat=x[a[=y]              str=x[ay             no/???

  These special POSIX bracket expressions ([:cclass:], [=a=], and
  [.ch.]) are implemented separately in BRACKMATCH (A).  On the other
  hand, these special POSIX bracket expressions ([:cclass:], [=a=],
  and [.ch.]) are handled together in the other parts (B)..(D).

  The variations in the behaviors of [:cclass:], [=a=], and [.ch.]  in
  (A) does not actually violate the standard because the behavior is
  undefined by the standard when there is no corresponding ending
  brackets `:]', `.]', or `=]'.  However, if we would keep these
  variations of the behavior in BRACKMATCH (A), we then need to
  implement these seemingly random variations also in (B)..(D) to
  match the behaviors of (A)..(D).  Instead, I think the opposite
  would be more reasonable, i.e., to change (A) to handle incomplete
  [:cclass:], [=a=], and [.ch.] in a consistent manner and match its
  behavior to (B)..(D).

  However, it is still unclear what would be the preferred behavior
  because each of (B)..(D) implements its distinct rule.  I instead I
  checked the behaviors of other shells/implementations.  Here is the
  summary of the comparison:

  No. pattern input | bash  fnmatch/osh  zsh  ksh yash/busybox
  --- ------- ----- | ----- ------------ ---- --- ------------
  #19 x[a[:y] x[    | no        yes      yes  no  no
  #20 x[a[:y] x:    | yes       yes      yes  no  no
  #21 x[a[:y] xy    | yes       yes      yes  no  no
  #22 x[a[:y] x[ay  | yes       no       no   no  yes
  #23 x[a[.y] x[    | no        no       yes  no  no
  #24 x[a[.y] x.    | no        no       yes  no  no
  #25 x[a[.y] xy    | no        no       yes  no  no
  #26 x[a[.y] x[ay  | yes       no       no   no  yes
  #27 x[a[=y] x[    | yes       yes      yes  no  no
  #28 x[a[=y] x=    | yes       yes      yes  no  no
  #29 x[a[=y] xy    | yes       yes      yes  no  no
  #30 x[a[=y] x[ay  | no        no       no   no  yes

  The behavior of fnmatch(3) of GNU/Linux and osh (oilshell) is
  slightly different from that of Bash but are essentially similar.  I
  guess this is because the Bash implementation (seems to be) derived
  from a fnmatch(3) implementation.  I guess osh calls fnmatch(3)
  internally.  The other shells zsh, ksh, yash, and busybox sh produce
  consistent results for all of `[:cclass:]', `[=a=]', and `[.ch.]',
  though they differ from one another: zsh treats unclosed `[:', `[.',
  and `[=' as normal characters consisting of the bracket expression,
  ksh considers the entire pattern to be invalid and does not let it
  match anything, and yash and busybox sh consider the bracket
  expression to be invalid and let the first bracket `[' match
  literally.

  I decided to choose the behavior of `zsh' because it is consistent
  for all [:cclass:], [.ch.], and [=a=] and closest to the current
  behavior of Bash.

Also, the following points are (partly) addressed in this report.

4. PATSCAN currently does not handle / in parsing bracket expressions
  with FNM_PATHNAME.

5. PATSCAN currently does not handle FNM_NOESCAPE.


Fixes
-----

I attach the first patch
[r0037.brackmatch8.incomplete-subbracket.patch.txt] against the devel
to solve the above "Repeat-By 2 and 3" of BRACKMATCH (A/B), where I
introduced a new helper function PARSE_SUBBRACKET to extract
[:cclass:], [.ch.], and [=a=] in the same way (except for the special
rule for [.].] in POSIX XCU 9.3.5.1).  In the patch, I upgraded the
existing function PARSE_COLLSYM to PARSE_SUBBRACKET.  I initially
tried to add an independent function PARSE_SUBBRACKET and modify
PARSE_COLLSYM to use PARSE_SUBBRACKET internally, but the resulting
PARSE_COLLSYM became trivial, so I decided to remove the function.
Eventually, the patch became to have the current shape.

The fix for PATSCAN (C) is included in the second patch
[r0037.patscan1.parse_subbracket.patch.txt], which solves "Repeat-By 1
and 4" by using PARSE_SUBBRACKET of the first patch.  This patch
applies to devel after applying the first patch.  Actually, this is a
partial fix for "Repeat-By 4"; there are still failing cases after
applying this patch.  For the complete fix, I would like to use a
helper function introduced for the new DFA engine, so I will submit a
patch after it is determined whether the new DFA engine is accepted or
rejected.

The third patch [r0037.patscan2.fnm_noescape.patch.txt] addresses
"Repeat-By 5". This is a single-line fix.  This patch applies to devel
after applying the second patch.

The adjustments of MATCHLEN (D) are not included here because I
intended to remove (or re-implement) MATCHLEN in the new DFA engine,
and the current code would be discarded when the new DFA engine would
be accepted.  I would submit another patch in case the new DFA engine
would be rejected.  (To make it clear, PATSCAN and BRACKMATCH are
still used in the new DFA engine, which is the reason I recently
submit changes to these functions).

--
Koichi

From ece2c094335ded56143c06d87c8e42b9c97a9fba Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Fri, 25 Nov 2022 04:31:36 +0900
Subject: [PATCH 1/4] fix(BRACKMATCH): normalize behavior on failure of special
 POSIX bracket expressions

---
 lib/glob/sm_loop.c | 165 +++++++++++++++++----------------------------
 lib/glob/smatch.c  |   4 +-
 2 files changed, 63 insertions(+), 106 deletions(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index 5d62e60b..d1024495 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -27,7 +27,7 @@ struct STRUCT
 int FCT PARAMS((CHAR *, CHAR *, int));
 
 static int GMATCH PARAMS((CHAR *, CHAR *, CHAR *, CHAR *, struct STRUCT *, 
int));
-static CHAR *PARSE_COLLSYM PARAMS((CHAR *, INT *));
+static CHAR *PARSE_SUBBRACKET PARAMS((CHAR *, int));
 static CHAR *BRACKMATCH PARAMS((CHAR *, U_CHAR, int));
 static int EXTMATCH PARAMS((INT, CHAR *, CHAR *, CHAR *, CHAR *, int));
 
@@ -380,36 +380,31 @@ fprintf(stderr, "gmatch: pattern = %s; pe = %s\n", 
pattern, pe);
   return (FNM_NOMATCH);
 }
 
-/* Parse a bracket expression collating symbol ([.sym.]) starting at P, find
-   the value of the symbol, and move P past the collating symbol expression.
-   The value is returned in *VP, if VP is not null. */
+#define SLASH_PATHNAME(c)      (c == L('/') && (flags & FNM_PATHNAME))
+
+/* Parse special POSIX bracket expressions ([.sym.], [=ch=], and [:cclass:])
+   starting at P and return the position of the ending `.]', `=]', or `:]'.
+   The argument P specifies the position after the opening bracket `['.  */
 static CHAR *
-PARSE_COLLSYM (p, vp)
+PARSE_SUBBRACKET (p, flags)
      CHAR *p;
-     INT *vp;
+     int flags;
 {
-  register int pc;
-  INT val;
-
-  p++;                         /* move past the `.' */
-         
-  for (pc = 0; p[pc]; pc++)
-    if (p[pc] == L('.') && p[pc+1] == L(']'))
-      break;
-   if (p[pc] == 0)
-    {
-      if (vp)
-       *vp = INVALID;
-      return (p + pc);
-    }
-   val = COLLSYM (p, pc);
-   if (vp)
-     *vp = val;
-   return (p + pc + 2);
+  CHAR type = *p;      /* `.', `=', or `:' (The second character after the
+                          opening `[') */
+
+  /* POSIX XCU 9.3.5.1 says `The <right-square-bracket> ( ']' ) shall
+     [...].  Otherwise, it shall terminate the bracket expression,
+     unless it appears in a collating symbol (such as "[.].]" ) or is
+     the ending <right-square-bracket> for a collating symbol,
+     equivalence class, or character class.', so we check `]' when
+     TYPE is not `.'. */
+  while (*++p != L('\0') && SLASH_PATHNAME(*p) == 0 && !(type != L('.') && *p 
== L(']')))
+    if (*p == type && p[1] == L(']'))
+      return p;
+  return NULL;
 }
 
-#define SLASH_PATHNAME(c)      (c == L('/') && (flags & FNM_PATHNAME))
-
 /* Use prototype definition here because of type promotion. */
 static CHAR *
 #if defined (PROTOTYPES)
@@ -423,10 +418,10 @@ BRACKMATCH (p, test, flags)
 {
   register CHAR cstart, cend, c;
   register int not;    /* Nonzero if the sense of the character class is 
inverted.  */
-  int brcnt, forcecoll, isrange;
+  int forcecoll, isrange;
   INT pc;
   CHAR *savep;
-  CHAR *brchrp;
+  CHAR *close;
   U_CHAR orig_test;
 
   orig_test = test;
@@ -451,18 +446,13 @@ BRACKMATCH (p, test, flags)
 
       /* POSIX.2 equivalence class:  [=c=].  See POSIX.2 2.8.3.2.  Find
         the end of the equivalence class, move the pattern pointer past
-        it, and check for equivalence.  XXX - this handles only
-        single-character equivalence classes, which is wrong, or at
-        least incomplete. */
-      if (c == L('[') && *p == L('=') && p[2] == L('=') && p[3] == L(']'))
+        it, and check for equivalence. */
+      if (c == L('[') && *p == L('=') && (close = PARSE_SUBBRACKET (p, flags)) 
!= NULL)
        {
-         pc = FOLD (p[1]);
-         p += 4;
-
-         /* Finding a slash in a bracket expression means you have to
-            match the bracket as an ordinary character (see below). */
-         if (pc == L('/') && (flags & FNM_PATHNAME))
-           return ((test == L('[')) ? savep : (CHAR *)0); /*]*/
+         p++;
+         pc = COLLSYM (p, close - p);
+         pc = FOLD (pc);
+         p = close + 2;
 
          if (COLLEQUIV (test, pc))
            {
@@ -486,30 +476,21 @@ BRACKMATCH (p, test, flags)
        }
 
       /* POSIX.2 character class expression.  See POSIX.2 2.8.3.2. */
-      if (c == L('[') && *p == L(':'))
+      if (c == L('[') && *p == L(':') && (close = PARSE_SUBBRACKET (p, flags)) 
!= NULL)
        {
-         CHAR *close, *ccname;
+         CHAR *ccname;
 
          pc = 0;       /* make sure invalid char classes don't match. */
-         /* Find end of character class name */
-         for (close = p + 1; *close != '\0' && SLASH_PATHNAME(*close) == 0; 
close++)
-           if (*close == L(':') && *(close+1) == L(']'))
-             break;
 
-         if (*close != L('\0') && SLASH_PATHNAME(*close) == 0)
+         ccname = (CHAR *)malloc ((close - p) * sizeof (CHAR));
+         if (ccname)
            {
-             ccname = (CHAR *)malloc ((close - p) * sizeof (CHAR));
-             if (ccname == 0)
-               pc = 0;
-             else
-               {
-                 bcopy (p + 1, ccname, (close - p - 1) * sizeof (CHAR));
-                 *(ccname + (close - p - 1)) = L('\0');
-                 /* As a result of a POSIX discussion, char class names are
-                    allowed to be quoted (?) */
-                 DEQUOTE_PATHNAME (ccname);
-                 pc = IS_CCLASS (orig_test, (XCHAR *)ccname);
-               }
+             bcopy (p + 1, ccname, (close - p - 1) * sizeof (CHAR));
+             *(ccname + (close - p - 1)) = L('\0');
+             /* As a result of a POSIX discussion, char class names are
+                allowed to be quoted (?) */
+             DEQUOTE_PATHNAME (ccname);
+             pc = IS_CCLASS (orig_test, (XCHAR *)ccname);
              if (pc == -1)
                {
                  /* CCNAME is not a valid character class in the current
@@ -521,14 +502,12 @@ BRACKMATCH (p, test, flags)
                     string. If we don't want to do that, take out the update
                     of P. */
                  pc = 0;
-                 p = close + 2;
                }
-             else
-               p = close + 2;          /* move past the closing `]' */
-
-             free (ccname);
            }
-           
+         free (ccname);
+
+         p = close + 2;
+
          if (pc)
            {
 /*[*/        /* Move past the closing `]', since the first thing we do at
@@ -556,13 +535,11 @@ BRACKMATCH (p, test, flags)
         the symbol name, make sure it is terminated by `.]', translate
         the name to a character using the external table, and do the
         comparison. */
-      if (c == L('[') && *p == L('.'))
+      if (c == L('[') && *p == L('.') && (close = PARSE_SUBBRACKET (p, flags)) 
!= NULL)
        {
-         p = PARSE_COLLSYM (p, &pc);
-         /* An invalid collating symbol cannot be the first point of a
-            range.  If it is, we set cstart to one greater than `test',
-            so any comparisons later will fail. */
-         cstart = (pc == INVALID) ? test + 1 : pc;
+         p++;
+         cstart = COLLSYM (p, close - p);
+         p = close + 2;
          forcecoll = 1;
        }
 
@@ -616,13 +593,11 @@ BRACKMATCH (p, test, flags)
            return ((test == L('[')) ? savep : (CHAR *)0);
          else if (cend == L('/') && (flags & FNM_PATHNAME))
            return ((test == L('[')) ? savep : (CHAR *)0);
-         if (cend == L('[') && *p == L('.'))
+         if (cend == L('[') && *p == L('.') && (close = PARSE_SUBBRACKET (p, 
flags)) != NULL)
            {
-             p = PARSE_COLLSYM (p, &pc);
-             /* An invalid collating symbol cannot be the second part of a
-                range expression.  If we get one, we set cend to one fewer
-                than the test character to make sure the range test fails. */
-             cend = (pc == INVALID) ? test - 1 : pc;
+             p++;
+             cend = COLLSYM (p, close - p);
+             p = close + 2;
              forcecoll = 1;
            }
          cend = FOLD (cend);
@@ -658,46 +633,28 @@ BRACKMATCH (p, test, flags)
 matched:
   /* Skip the rest of the [...] that already matched.  */
   c = *--p;
-  brcnt = 1;
-  brchrp = 0;
-  while (brcnt > 0)
+  while (1)
     {
-      int oc;
-
       /* A `[' without a matching `]' is just another character to match. */
       if (c == L('\0'))
        return ((test == L('[')) ? savep : (CHAR *)0);
       else if (c == L('/') && (flags & FNM_PATHNAME))
        return ((test == L('[')) ? savep : (CHAR *)0);
 
-      oc = c;
       c = *p++;
       if (c == L('[') && (*p == L('=') || *p == L(':') || *p == L('.')))
        {
-         brcnt++;
-         brchrp = p++;         /* skip over the char after the left bracket */
-         c = *p;
-         if (c == L('\0'))
-           return ((test == L('[')) ? savep : (CHAR *)0);
-         else if (c == L('/') && (flags & FNM_PATHNAME))
-           return ((test == L('[')) ? savep : (CHAR *)0);
-         /* If *brchrp == ':' we should check that the rest of the characters
-            form a valid character class name. We don't do that yet, but we
-            keep BRCHRP in case we want to. */
-       }
-      /* We only want to check brchrp if we set it above. */
-      else if (c == L(']') && brcnt > 1 && brchrp != 0 && oc == *brchrp)
-       {
-         brcnt--;
-         brchrp = 0;           /* just in case */
+         if ((close = PARSE_SUBBRACKET (p, flags)) != NULL)
+           p = close + 2;
        }
       /* Left bracket loses its special meaning inside a bracket expression.
          It is only valid when followed by a `.', `=', or `:', which we check
-         for above. Technically the right bracket can appear in a collating
-         symbol, so we check for that here. Otherwise, it terminates the
-         bracket expression. */
-      else if (c == L(']') && (brchrp == 0 || *brchrp != L('.')) && brcnt >= 1)
-       brcnt = 0;
+         for above.  The right brackets terminating collating symbols,
+         equivalence classes, or character classes are processed by
+         PARSE_SUBBRACKET.  The other right brackets terminate the bracket
+         expression. */
+      else if (c == L(']'))
+       break;
       else if (!(flags & FNM_NOESCAPE) && c == L('\\'))
        {
          if (*p == '\0')
@@ -1001,7 +958,7 @@ fprintf(stderr, "extmatch: flags = %d\n", flags);
 #undef FCT
 #undef GMATCH
 #undef COLLSYM
-#undef PARSE_COLLSYM
+#undef PARSE_SUBBRACKET
 #undef PATSCAN
 #undef STRCOMPARE
 #undef EXTMATCH
diff --git a/lib/glob/smatch.c b/lib/glob/smatch.c
index a40b9e5e..5cae874d 100644
--- a/lib/glob/smatch.c
+++ b/lib/glob/smatch.c
@@ -322,7 +322,7 @@ is_cclass (c, name)
 #define FCT                    internal_strmatch
 #define GMATCH                 gmatch
 #define COLLSYM                        collsym
-#define PARSE_COLLSYM          parse_collsym
+#define PARSE_SUBBRACKET               parse_subbracket
 #define BRACKMATCH             brackmatch
 #define PATSCAN                        glob_patscan
 #define STRCOMPARE             strcompare
@@ -578,7 +578,7 @@ posix_cclass_only (pattern)
 #define FCT                    internal_wstrmatch
 #define GMATCH                 gmatch_wc
 #define COLLSYM                        collwcsym
-#define PARSE_COLLSYM          parse_collwcsym
+#define PARSE_SUBBRACKET               parse_wcsubbracket
 #define BRACKMATCH             brackmatch_wc
 #define PATSCAN                        glob_patscan_wc
 #define STRCOMPARE             wscompare
-- 
2.37.2

example4.sh
Description: Bourne shell script

From d79ad47938a3dc8a39822aed90fc42506b4827ab Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Fri, 25 Nov 2022 17:14:06 +0900
Subject: [PATCH 2/4] fix(PATSCAN): match the behavior with BRACKMATCH for
 bracket expressions

---
 lib/glob/sm_loop.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index d1024495..113aac83 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -33,7 +33,7 @@ static int EXTMATCH PARAMS((INT, CHAR *, CHAR *, CHAR *, CHAR 
*, int));
 
 extern void DEQUOTE_PATHNAME PARAMS((CHAR *));
 
-/*static*/ CHAR *PATSCAN PARAMS((CHAR *, CHAR *, INT));
+/*static*/ CHAR *PATSCAN PARAMS((CHAR *, CHAR *, INT, int));
 
 int
 FCT (pattern, string, flags)
@@ -192,7 +192,7 @@ fprintf(stderr, "gmatch: pattern = %s; pe = %s\n", pattern, 
pe);
                     that's OK, since we can match 0 or 1 occurrences.
                     We need to skip the glob pattern and see if we
                     match the rest of the string. */
-                 newn = PATSCAN (p + 1, pe, 0);
+                 newn = PATSCAN (p + 1, pe, 0, flags);
                  /* If NEWN is 0, we have an ill-formed pattern. */
                  p = newn ? newn : pe;
                }
@@ -225,7 +225,7 @@ fprintf(stderr, "gmatch: pattern = %s; pe = %s\n", pattern, 
pe);
                     that's OK, since we can match 0 or more occurrences.
                     We need to skip the glob pattern and see if we
                     match the rest of the string. */
-                 newn = PATSCAN (p + 1, pe, 0);
+                 newn = PATSCAN (p + 1, pe, 0, flags);
                  /* If NEWN is 0, we have an ill-formed pattern. */
                  p = newn ? newn : pe;
                }
@@ -691,16 +691,15 @@ matched:
    first character after the matching DELIM or NULL if the pattern is
    empty or invalid. */
 /*static*/ CHAR *
-PATSCAN (string, end, delim)
+PATSCAN (string, end, delim, flags)
      CHAR *string, *end;
      INT delim;
+     int flags;
 {
   int pnest, bnest, skip;
-  INT cchar;
-  CHAR *s, c, *bfirst;
+  CHAR *s, c, *bfirst, *t;
 
   pnest = bnest = skip = 0;
-  cchar = 0;
   bfirst = NULL;
 
   if (string == end)
@@ -736,7 +735,11 @@ PATSCAN (string, end, delim)
              bnest++;
            }
          else if (s[1] == L(':') || s[1] == L('.') || s[1] == L('='))
-           cchar = s[1];
+           {
+             t = PARSE_SUBBRACKET (s + 1, flags);
+             if (t)
+               s = t + 2 - 1;  /* -1 to cancel s++ in `for (;; s++)' */
+           }
          break;
 
        /* `]' is not special if it's the first char (after a leading `!'
@@ -745,9 +748,7 @@ PATSCAN (string, end, delim)
        case L(']'):
          if (bnest)
            {
-             if (cchar && s[-1] == cchar)
-               cchar = 0;
-             else if (s != bfirst)
+             if (s != bfirst)
                {
                  bnest--;
                  bfirst = 0;
@@ -836,7 +837,7 @@ fprintf(stderr, "extmatch: p = %s; pe = %s\n", p, pe);
 fprintf(stderr, "extmatch: flags = %d\n", flags);
 #endif
 
-  prest = PATSCAN (p + (*p == L('(')), pe, 0); /* ) */
+  prest = PATSCAN (p + (*p == L('(')), pe, 0, flags); /* ) */
   if (prest == 0)
     /* If PREST is 0, we failed to scan a valid pattern.  In this
        case, we just want to compare the two as strings. */
@@ -859,7 +860,7 @@ fprintf(stderr, "extmatch: flags = %d\n", flags);
         string. */
       for (psub = p + 1; ; psub = pnext)
        {
-         pnext = PATSCAN (psub, pe, L('|'));
+         pnext = PATSCAN (psub, pe, L('|'), flags);
          for (srest = s; srest <= se; srest++)
            {
              /* Match this substring (S -> SREST) against this
@@ -896,7 +897,7 @@ fprintf(stderr, "extmatch: flags = %d\n", flags);
         rest of the string. */
       for (psub = p + 1; ; psub = pnext)
        {
-         pnext = PATSCAN (psub, pe, L('|'));
+         pnext = PATSCAN (psub, pe, L('|'), flags);
          srest = (prest == pe) ? se : s;
          for ( ; srest <= se; srest++)
            {
@@ -917,7 +918,7 @@ fprintf(stderr, "extmatch: flags = %d\n", flags);
          m1 = 0;
          for (psub = p + 1; ; psub = pnext)
            {
-             pnext = PATSCAN (psub, pe, L('|'));
+             pnext = PATSCAN (psub, pe, L('|'), flags);
              /* If one of the patterns matches, just bail immediately. */
              if (m1 = (GMATCH (s, srest, psub, pnext - 1, NULL, flags) == 0))
                break;
-- 
2.37.2

From c3b402ecdb819b546869e7b9ef010b7573a7efe4 Mon Sep 17 00:00:00 2001
From: Koichi Murase <myoga.mur...@gmail.com>
Date: Mon, 28 Nov 2022 18:31:31 +0900
Subject: [PATCH 3/4] fix(PATSCAN): support FNM_NOESCAPE

---
 lib/glob/sm_loop.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/glob/sm_loop.c b/lib/glob/sm_loop.c
index 113aac83..c3334c8a 100644
--- a/lib/glob/sm_loop.c
+++ b/lib/glob/sm_loop.c
@@ -717,7 +717,8 @@ PATSCAN (string, end, delim, flags)
       switch (c)
        {
        case L('\\'):
-         skip = 1;
+         if (!(flags & FNM_NOESCAPE))
+           skip = 1;
          break;
 
        case L('\0'):
-- 
2.37.2

Re: bash "extglob" needs to upgrade at least like zsh "kshglob"

Reply via email to