Re: [dev] GSoC 2010

Anselm R Garbe Mon, 08 Mar 2010 08:07:49 -0800

On 8 March 2010 15:57, Gregor Best <[email protected]> wrote:
> On Mon, Mar 08, 2010 at 03:44:28PM +0000, Anselm R Garbe wrote:
>> [...]
>> Sure, but according to the spec:
>>
>> "The strlen() function shall compute the number of bytes in the string
>> to which s points, not including the terminating null byte."
>>
>> strlen() should not count multi-char characters as 1 but rather return
>> number of bytes. Do you disagree?
>> [...]
>
> I never read the actual docs of that function (a few glances at the
> manpage aside), and if it definitely says "count the number of bytes",
> fine. But intuitively, I would've thought it gives the length of a
> string, as in "how many letters appear on my screen if I printf()
> this?".


Well if so, then many C programs would completely fall over, because
it is common to allocate buffers of the length returned by strlen(),
and if that returns just number of UTF-8 glyphs we'll have buffer
overflows in nearly any language except English presumably.

The only part where UTF-8 might matter are sorting routines, but I
wouldn't bother too much about it because in most case < or > on a
per-byte basis will still lead to reasonable results, which is another
reason for the beauty of UTF-8. And if you really want to use more
improved sorting routines, I'd recommend Plan 9 Rune's
(http://swtch.com/plan9port/man/man3/rune.html) on top of the plain
handling.

Cheers,
Anselm

Re: [dev] GSoC 2010

Reply via email to