On 8 March 2010 15:57, Gregor Best <g...@ring0.de> wrote: > On Mon, Mar 08, 2010 at 03:44:28PM +0000, Anselm R Garbe wrote: >> [...] >> Sure, but according to the spec: >> >> "The strlen() function shall compute the number of bytes in the string >> to which s points, not including the terminating null byte." >> >> strlen() should not count multi-char characters as 1 but rather return >> number of bytes. Do you disagree? >> [...] > > I never read the actual docs of that function (a few glances at the > manpage aside), and if it definitely says "count the number of bytes", > fine. But intuitively, I would've thought it gives the length of a > string, as in "how many letters appear on my screen if I printf() > this?".
Well if so, then many C programs would completely fall over, because it is common to allocate buffers of the length returned by strlen(), and if that returns just number of UTF-8 glyphs we'll have buffer overflows in nearly any language except English presumably. The only part where UTF-8 might matter are sorting routines, but I wouldn't bother too much about it because in most case < or > on a per-byte basis will still lead to reasonable results, which is another reason for the beauty of UTF-8. And if you really want to use more improved sorting routines, I'd recommend Plan 9 Rune's (http://swtch.com/plan9port/man/man3/rune.html) on top of the plain handling. Cheers, Anselm