> On 30 May 2017, at 18:11, Shawn Steele via Unicode <unicode@unicode.org> > wrote: > >> Which is to completely reverse the current recommendation in Unicode 9.0. >> While I agree that this might help you fending off a bug report, it would >> create chances for bug reports for Ruby, Python3, many if not all Web >> browsers,... > > & Windows & .Net > > Changing the behavior of the Windows / .Net SDK is a non-starter. > >> Essentially, "overlong" is a word like "dragon" or "ghost": Everybody knows >> what it means, but everybody knows they don't exist. > > Yes, this is trying to improve the language for a scenario that CANNOT > HAPPEN. We're trying to optimize a case for data that implementations should > never encounter. It is sort of exactly like optimizing for the case where > your data input is actually a dragon and not UTF-8 text. > > Since it is illegal, then the "at least 1 FFFD but as many as you want to > emit (or just fail)" is fine.
And *that* is what the specification says. The whole problem here is that someone elevated one choice to the status of “best practice”, and it’s a choice that some of us don’t think *should* be considered best practice. Perhaps “best practice” should simply be altered to say that you *clearly document* your behaviour in the case of invalid UTF-8 sequences, and that code should not rely on the number of U+FFFDs generated, rather than suggesting a behaviour? Kind regards, Alastair. -- http://alastairs-place.net