Re: case-insensitive hash of strings

Bruno Haible Tue, 21 Aug 2007 13:59:32 -0700

Eric,

> A couple of questions.  First, in hash-pjw.c, should we be using unsigned
> char instead of char to iterate through the NUL-terminated string?


I believe it should usually have no effect on the average number of
collisions (= average length of a non-empty hash bucket), but I would be
more comfortable with this change if you could post some concrete figures.

I would assume that the gcc-generated machine code for both cases is equally
fast.

> Second, would it be worth adding a case-insensitive version of hash_pjw,
> so that strings can be hashed to the same value regardless of their case?
>  It only makes sense for single-byte locales, but that's all the more that
> hash_pjw accommodates at the moment.

The majority of locales in use nowadays are multibyte locales (UTF-8,
GB18030 and EUC-*). Therefore I would concentrate on a solution that works
for both kinds of locales.

Bruno

Re: case-insensitive hash of strings

Reply via email to