Paul Eggert wrote: > > - strstr: This function's behaviour is not clearly defined. POSIX says > > that it compares a "string" with a "sequence of bytes". Which a priori > > is nonsense, since the elements of strings are characters. > > No, elements of "character strings" are characters. Elements of "strings" > are bytes. See: > > http://www.opengroup.org/susv3/basedefs/xbd_chap03.html#tag_03_92 > http://www.opengroup.org/susv3/basedefs/xbd_chap03.html#tag_03_367
It's hard to know POSIX as well as you do :-) > So strstr's behavior is clearly defined: it operates on strings (i.e., > byte strings), not character strings. Indeed. And strstr cannot be specified to consider "character strings", without breaking backward compatibility :-( > > It was tempting to make a clear API nomenclature: c-str* for the C locale > > emulation, str* for the internationalized functions. But if you're right > > with strstr, then we should find new names for the internationalized > > versions > > of these functions. > > I think we have to find new names, yes. Yup. It appears that Microsoft did their homework regarding str* functions and multibyte strings, while the ISO C and POSIX communities didn't. I'll be adding the following functions to gnulib, attempting to fix the hole that ISO C and POSIX left. mbschr like strchr mbsrchr like strrchr mbsstr like strstr mbscasecmp like strcasecmp mbscasestr like strcasestr mbscspn like strcspn mbspbrk like strpbrk mbsspn like strspn mbstok_r like strtok_r The prefix "mbs" coincides with the precedent "mbswidth" in gnulib and with the precedent "mbspbrk", "mbsrchr" on HP-UX. It does not conflict with the Microsoft names, since Microsoft uses "_mbs", but the functions have the same calling convention as Microsoft's functions, except that MS uses 'unsigned char *' as multibyte string type. Bruno