Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Alastair Houghton via Unicode Wed, 31 May 2017 11:18:11 -0700

> On 30 May 2017, at 18:11, Shawn Steele via Unicode <[email protected]> 
> wrote:
> 
>> Which is to completely reverse the current recommendation in Unicode 9.0. 
>> While I agree that this might help you fending off a bug report, it would 
>> create chances for bug reports for Ruby, Python3, many if not all Web 
>> browsers,...
> 
> & Windows & .Net
> 
> Changing the behavior of the Windows / .Net SDK is a non-starter.
> 
>> Essentially, "overlong" is a word like "dragon" or "ghost": Everybody knows 
>> what it means, but everybody knows they don't exist.
> 
> Yes, this is trying to improve the language for a scenario that CANNOT 
> HAPPEN.  We're trying to optimize a case for data that implementations should 
> never encounter.  It is sort of exactly like optimizing for the case where 
> your data input is actually a dragon and not UTF-8 text.  
> 
> Since it is illegal, then the "at least 1 FFFD but as many as you want to 
> emit (or just fail)" is fine.


And *that* is what the specification says.  The whole problem here is that 
someone elevated one choice to the status of “best practice”, and it’s a choice 
that some of us don’t think *should* be considered best practice.

Perhaps “best practice” should simply be altered to say that you *clearly 
document* your behaviour in the case of invalid UTF-8 sequences, and that code 
should not rely on the number of U+FFFDs generated, rather than suggesting a 
behaviour?

Kind regards,

Alastair.

--
http://alastairs-place.net

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to