Hello Sergey, > The proposed patch considerably speed-ups the exclude module > for large exclusion lists of non-wildcard patterns.
Great idea! 'is_fnmatch_pattern' is probably a misnomer, because its argument is by definition already an fnmatch pattern. What the function is testing is whether it contains wildcard characters [1]ΒΈ in other words, whether it matches a single string (modulo case, optionally) or more. Possibly better names are: fnmatch_pattern_has_wildcards is_fnmatch_pattern_constant (for the opposite sense) Btw, this function does not handle multiple adjacent backslashes correctly, i.e. patterns such as 'ab\\[cd]'. Also, I'm surprised that this function does not take an 'options' argument. For example, the string 'ab\[cd\]' is constant if FNM_NOESCAPE is off, but non-constant if FNM_NOESCAPE is on. I would like to see some unit tests committed to this module before the big optimization. Thinking more into the future, there are now multiple optimizations of regex / fnmatch from various sides. (Recall that fnmatch patterns can be transformed into regular expressions.) - grep and awk use a wrapper around regex, composed of 'kwset' and 'dfa', for speeding up regex. 'kwset' optimizes fgrep-like uses. - fnmatch suffers from the API that provides no distinct "compile" and "execute" passes. Ondrej Bilka was working on this in May 2009. - Now for the 'exclude' module we have a particular optimization for the case '^\(word1\|word2\|...\|more complex stuff\)$. Also, the regex API has a few other shortcomings: - It uses the current locale, does not have an explicit locale_t argument, - It does not work on Unicode strings or with Unicode character predicates. I hope there will be a better, unified, optimized regex in GNU some day. Bruno [1] http://en.wikipedia.org/wiki/Wildcard_character