Hello Ondrej, Nice work already!
> *abc* > en_US.UTF-8 > user: 2.71 > user: 0.23 > ... > *ab[a-z]* > en_US.UTF-8 > user: 2.85 > user: 3.32 > C > user: 0.86 > user: 1.25 Hmm, well, users are expecting a speedup through your module always. If it some (not too rare) cases the glibc fnmatch is faster, it's difficult to convince people to use your function. > Still I assume UTF8 or singlebyte encoding otherwise I fallback. The general and hard case is the multibyte encoding case, where the encoding is something like GB18030, BIG5, or similar. Single-byte encodings may be worth optimizing for, because the C locale on glibc systems uses a single-byte encoding. Whether UTF-8 is worth special-casing, depends on the complexity. > I could generalize this to wide characters but if I dont know any character > is single wchar I cannot do much (UTF16 wchar in windows). It's OK to pretend that every wchar_t is a single character. I.e. you can assume that when you use mbstowcs on a string, you get an array of characters. Yes, on Windows, wchar_t is UTF-16 probably, but in practice this does not make much of a difference, because the Unicode characters in the range U+10000..U+10FFFF are not among the "interesting" characters (patterns that contain such a character inside brackets are rare) and because they occur rarely enough. > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 2, or (at your option) > + any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program; if not, write to the Free Software Foundation, > + Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ For code in gnulib, please use the same copyright header as in many other files, namely GPLv3+ with a pointer to the FSF's web site rather than with its postal address. > +int fnmatch_exec (fnmatch_state * pattern, const char *string); Could this function prototype also be int fnmatch_exec (const fnmatch_state * pattern, const char *string); ? If not, this means fnmatch_exec modifies the compiled state, which makes it not multithread-safe to use the same fnmatch_state in multiple threads? > +struct _fnmatch_state; > +typedef struct _fnmatch_state fnmatch_state; Is it really a "state", i.e. a data structure that is modified over and over again? I would have expected a compiled pattern data structure, that is created by fnmatch_compile and then left unchanged until fnmatch_free. > +#include <stdint.h> > + > +#define _GNU_SOURCE "#define GNU_SOURCE" has no effect after a system include file such as <stdint.h> was already included. Also, "#define _GNU_SOURCE" has no effect on Solaris. For this reason, gnulib has a module 'extensions' which does the right thing (and which you're already using). > +#define CHARSPERINT 4 For better portability, write this as #define CHARSPERINT (sizeof (int) / sizeof (char)) > + uint32_t hash[8]; For better portability, write (UCHAR_MAX / 32) + 1 instead of 8. > +int > +ascii_compatible_encoding (char *c) Could this function be 'static'? And take a 'const char *' argument? > +void > +strfold (const char *str, char *buf) > +{ > + while (*str) > + *buf++ = tolower (*str++); tolower() is only useful in single-byte locales. For support of multibyte locales, gnulib offers the functions mbmemcasecmp, mbscasecmp, mbspcasecmp. > + for (pass = 0; pass < 2; pass++) > + { // two passes in first we compute values > we use > + patsize = size; // in second to optimalize. Not all C compilers support ISO C 99 / C++ style comments. Better use the old style of comments, even if it means more typing. > + case '[':; > + int siz = parse_bracket (s + 1, pat, flags); Declarations after statements also are only a C99 / C++ feature. For ANSI C, you need to put braces around this block. Like Ralf, I have no read through most of the code yet. Bruno