Eric, > A couple of questions. First, in hash-pjw.c, should we be using unsigned > char instead of char to iterate through the NUL-terminated string?
I believe it should usually have no effect on the average number of collisions (= average length of a non-empty hash bucket), but I would be more comfortable with this change if you could post some concrete figures. I would assume that the gcc-generated machine code for both cases is equally fast. > Second, would it be worth adding a case-insensitive version of hash_pjw, > so that strings can be hashed to the same value regardless of their case? > It only makes sense for single-byte locales, but that's all the more that > hash_pjw accommodates at the moment. The majority of locales in use nowadays are multibyte locales (UTF-8, GB18030 and EUC-*). Therefore I would concentrate on a solution that works for both kinds of locales. Bruno