On 11/14/22 6:06 AM, Koichi Murase wrote:

However, I noticed two strange behaviors of the current engine.
Before adjusting the behavior of the new engine and submitting it for
review, I would like to confirm the (expected) behavior of the current
engine in the current devel.

These two behaviors finally turned out to be both related to the
matching of bracket expression by the function `BRACKMATCH'
(lib/glob/sm_loop.c).

----------------------------------------------------------------------
1. pattern [[=B=]][c] matches with c

   $ bash-devel --norc
   $ [[ Bc == [[=B=]][c] ]]; echo $?
   0      <-- OK. This is expected.
   $ [[ c == [[=B=]][c] ]]; echo $?
   0      <-- This is unexpected.

This is clearly a problem, and your fix is a good one.


----------------------------------------------------------------------
2. bracket expression sometimes match or unmatch the slash with
   FNM_PATHNAME.

   FNM_PATHNAME is only used in two places in the Bash codebase.  1)
   One is for the glob matching for the filenames in the directory
   (lib/glob/glob.c).  However, this actually does not seem to have an
   actual effect because FNM_PATHNAME only causes differences in the
   matching when the target string contains a slash but the filenames
   do not contain any slashes.  2) The other is the filtering of the
   pathname-expansion results with GLOBIGNORE (pathexp.c).  So the only
   way to test the behavior of Bash's strmatch with FNM_PATHNAME
   (without writing a C program to directly use the function
   `strmatch') is to use GLOBIGNORE.

This is true, but the two uses above are not directly analogous to
fnmatch(). The bash implementation, due to its origin as an fnmatch()
implementation, uses the same flag names but uses FNM_PATHNAME to signal
the matching engine that it's matching filenames for pathname expansion.

That triggers the difference. The first part of your patch is correct:
a bracket expression can never match a slash in the string. This is
common to both fnmatch and what POSIX calls "Patterns Used For Filename
Expansion."

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13_03

fnmatch with FNM_PATHNAME only has to avoid matching the slash with a
bracket expression. The shell has an additional constraint: a slash that
appears in a bracket expression renders the bracket expression void and
requires the `[' to be matched explicitly. That's why there have to be
tests for slash in BRACKMATCH. There are two bugs: the off-by-one error
you note and matching the open bracket explicitly.


   * In fnmatch(3), a bracket expression never matches a / with
     FNM_PATHNAME.

This is one part of the fix.
   * In Bash's strmatch, a bracket expression sometimes matches `/' and
     sometimes does not.  In the codebase, `/' is excluded only when it
     explicitly appears after another character in the bracket
     expression (lib/glob/sm_loop.c:574) even though the comment
     mentions [/].  This is the reason that only [a/] fails with Bash's
     implementation in cases #2..#6 in the above result.

That's the consequence of the test being in the wrong place in BRACKMATCH.


   * What is happening with Bash's implementation in cases #7..#12 is
     related the assumption of the backtracking trick for `*': The
     trick for `*' backtracking explained in the code comment
     lib/glob/sm_loop.c:320---"This is a clever idea from
     glibc"---relies on the behavior that the bracket expression never
matches a slash that `*' cannot match.

These two changes should satisfy that.

The cleverness is due to Russ Cox, a really smart guy who figured this
stuff out first:

https://research.swtch.com/glob

(https://swtch.com/~rsc/regexp/ is a collection of his writing on regular
expressions. It's well worth reading.)

I attached the patch I applied. I didn't include your fix to issue 1 above.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/
*** ../bash-5.2-patched/lib/glob/sm_loop.c      2021-08-03 10:24:49.000000000 
-0400
--- lib/glob/sm_loop.c  2022-11-16 16:22:42.000000000 -0500
***************
*** 344,347 ****
--- 344,352 ----
              return (FNM_NOMATCH);
  
+           /* If we are matching pathnames, we can't match a slash with a
+              bracket expression. */
+           if (sc == L('/') && (flags & FNM_PATHNAME))
+             return (FNM_NOMATCH);
+ 
            /* `?' cannot match `.' or `..' if it is the first character of the
               string or if it is the first character following a slash and
***************
*** 452,455 ****
--- 457,466 ----
          pc = FOLD (p[1]);
          p += 4;
+ 
+         /* Finding a slash in a bracket expression means you have to
+            match the bracket as an ordinary character (see below). */
+         if (c == L('/') && (flags & FNM_PATHNAME))
+           return ((test == L('[')) ? savep : (CHAR *)0); /*]*/
+ 
          if (COLLEQUIV (test, pc))
            {
***************
*** 566,569 ****
--- 579,592 ----
        return ((test == L('[')) ? savep : (CHAR *)0);
  
+       /* POSIX.2 2.13.3 says: `If a <slash> character is found following an
+          unescaped <left-square-bracket> character before a corresponding
+          <right-square-bracket> is found, the open bracket shall be treated
+          as an ordinary character.' If we find a slash in a bracket
+          expression and the flags indicate we're supposed to be treating the
+          string like a pathname, we have to treat the `[' as just a character
+          to be matched. */
+       if ((flags & FNM_PATHNAME) && c == L('/'))
+       return ((test == L('[')) ? savep : (CHAR *)0);
+ 
        c = *p++;
        c = FOLD (c);
***************
*** 572,579 ****
        return ((test == L('[')) ? savep : (CHAR *)0);
  
-       if ((flags & FNM_PATHNAME) && c == L('/'))
-       /* [/] can never match when matching a pathname.  */
-       return (CHAR *)0;
- 
        /* This introduces a range, unless the `-' is the last
         character of the class.  Find the end of the range
--- 595,598 ----

Reply via email to