2017-07-24 21:12 GMT+02:00 J Decker via Unicode <unicode@unicode.org>:
> > > On Mon, Jul 24, 2017 at 10:57 AM, Costello, Roger L. via Unicode < > unicode@unicode.org> wrote: > >> Hi Folks, >> >> 2. (Bug) The sending application performs the folding process - inserts >> CRLF plus white space characters - and the receiving application does the >> unfolding process but doesn't properly delete all of them. >> >> The RFC doesn't say 'characters' but either a space or a tab character > (singular) > > back scanning is simple enough > > while( ( from[0] & 0xC0 ) == 0x80 ) > from--; > Certainly not like this! Backscanning should only directly use a single assignement to the last known start position, no loop at all ! UTF-8 security is based on the fact that its sequences are strictly limited in length so that you will never have more than 3 trailing bytes. If you don't have that last position in a variable, just use 3 tests but NO loop at all: if all 3 tests are failing, you know the input was not valid at all, and the way to handle this error will not be solved simply by using a very unsecure unbound loop like above but by exiting and returning an error immediately, or throwing an exception. The code should better be: if (from[0]&0xC0 == 0x80) from--; else if (from[-1]&0xC0 == 0x80) from -=2; else if (from[-2]&0xC0 == 0x80) from -=3; if (from[0]&0xC0 == 0x80) throw (some exception); // continue here with character encoded as UTF-8 starting at "from" (an ASCII byte or an UTF-8 leading byte) And it should be secured using a guard byte at start of your buffer in which the "from" pointer was pointing, so that it will never read something else and can generate an error.