On Thu, Oct 11, 2018 at 05:19:06AM -0500, dana wrote:
> Hello,
>
> I'm a contributor to ripgrep, which is a grep-like tool that supports using
> gitignore files to control which files are searched in a repo (or any other
> directory tree). ripgrep's support for the patterns in these files is based on
> git's official documentation, as seen here:
>
> https://git-scm.com/docs/gitignore
>
> One of the most common reports on the ripgrep bug tracker is that it does not
> allow patterns like the following real-world examples, where a ** is used
> along
> with other text within the same path component:
>
> **/**$$*.java
> **.orig
> **local.properties
> !**.sha1
>
> The reason it doesn't allow them is that the gitignore documentation
> explicitly
> states that they're invalid:
>
> ...
I've checked the code and run some tests. There is a twist here. "**"
is only special when matched in "pathname" mode. That is when the
pattern contains at least one slash. In your patterns above, that only
applies to the first pattern.
When '**' is special, if it's neither '**/', '/**/' or '/**', it _is_
considered invalid (i.e. bad pattern) and the pattern will not match
anything.
The confusion comes from when '**' is not special for the remaining
three patterns, it's considered as regular '*' and still matches
stuff.
So, I think we have two options. The document could be clarified with
something like this
-- 8< --
diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
index d107daaffd..500cd43939 100644
--- a/Documentation/gitignore.txt
+++ b/Documentation/gitignore.txt
@@ -100,7 +100,8 @@ PATTERN FORMAT
a shell glob pattern and checks for a match against the
pathname relative to the location of the `.gitignore` file
(relative to the toplevel of the work tree if not from a
- `.gitignore` file).
+ `.gitignore` file). Note that the "two consecutive asterisks" rule
+ below does not apply.
- Otherwise, Git treats the pattern as a shell glob: "`*`" matches
anything except "`/`", "`?`" matches any one character except "`/`"
@@ -129,7 +130,8 @@ full pathname may have special meaning:
matches zero or more directories. For example, "`a/**/b`"
matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
- - Other consecutive asterisks are considered invalid.
+ - Other consecutive asterisks are considered invalid and the pattern
+ is ignored.
NOTES
-----
-- 8< --
Or we could make the behavior consistent. If '**' is invalid, just
consider it two separate regular '*'. Then all four of your patterns
will behave the same way. The change for that is quite simple
-- 8< --
diff --git a/wildmatch.c b/wildmatch.c
index d074c1be10..64087bf02c 100644
--- a/wildmatch.c
+++ b/wildmatch.c
@@ -104,8 +104,10 @@ static int dowild(const uchar *p, const uchar *text,
unsigned int flags)
dowild(p + 1, text, flags) ==
WM_MATCH)
return WM_MATCH;
match_slash = 1;
- } else
- return WM_ABORT_MALFORMED;
+ } else {
+ /* without WM_PATHNAME, '*' == '**' */
+ match_slash = flags & WM_PATHNAME ? 0 :
1;
+ }
} else
/* without WM_PATHNAME, '*' == '**' */
match_slash = flags & WM_PATHNAME ? 0 : 1;
-- 8< --
Which way should we go? I'm leaning towards the second one...
--
Duy