Well, hard to grep for '3' in a big codebase. They should have used UTFMAX in the first place.
I see though that currently in libc.h I have enum { UTFmax = 4, /* maximum bytes per rune */ Runesync = 0x80, /* cannot represent part of a UTF sequence (<) */ Runeself = 0x80, /* rune and UTF sequences are the same (<) */ Runeerror = 0xFFFD, /* decoding error in UTF */ Runemax = 0x10FFFF, /* 21-bit rune */ Runemask = 0x1FFFFF, /* bits used by runes (see grep) */ }; so Runemax seems to indicate we never produce rune using more than 3 bytes no? So maybe buf[3] is safe? On Jun 18, 2014, at 10:36 AM, erik quanstrom <quans...@quanstro.net> wrote: > On Wed Jun 18 13:36:09 EDT 2014, mirtchov...@gmail.com wrote: >> used to be 3 :) >> >> "UTFmax, defined as 3 in <libc.h>, is the maximum number of bytes >> required to represent a rune." > > which is exactly why this should have been caught. > this one's my fault. > > - erik >