On Thu, 1 Jun 2017 17:10:54 -0700 Ken Whistler via Unicode <unicode@unicode.org> wrote:
> Well, working from the *current* specification: > > FC 80 80 80 80 80 > and > FF FF FF FF FF FF > > are equal trash, uninterpretable as *anything* in UTF-8. > > By definition D39b, either sequence of bytes, if encountered by an > conformant UTF-8 conversion process, would be interpreted as a > sequence of 6 maximal subparts of an ill-formed subsequence. There is a very good argument that 0xFC and 0xFF are not code units (D77) - they are not used in the representation of any Unicode scalar values. By that argument, you have 5 maximal subparts and seven garbage bytes. Richard.