On 31 May 2017, at 20:24, Shawn Steele via Unicode wrote:
>
> > For implementations that emit FFFD while handling text conversion and
> > repair (ie, converting ill-formed
> > UTF-8 to well-formed), it is best for interoperability if they get the same
> > results, so that indices within the
> >
On 31 May 2017, at 20:42, Shawn Steele via Unicode wrote:
>
>> And *that* is what the specification says. The whole problem here is that
>> someone elevated
>> one choice to the status of “best practice”, and it’s a choice that some of
>> us don’t think *should*
>> be considered best practice.
On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode
wrote:
> On Wed, 31 May 2017 15:12:12 +0300
> Henri Sivonen via Unicode wrote:
>> I am not claiming it's too difficult to implement. I think it
>> inappropriate to ask implementations, even from-scratch ones, to take
>> on added comp
On 1 Jun 2017, at 10:32, Henri Sivonen via Unicode wrote:
>
> On Wed, May 31, 2017 at 10:42 PM, Shawn Steele via Unicode
> wrote:
>> * As far as I can tell, there are two (maybe three) sane approaches to this
>> problem:
>>* Either a "maximal" emission of one U+FFFD for every byte that
On 6/1/2017 2:32 AM, Henri Sivonen via Unicode wrote:
O
On Wed, May 31, 2017 at 10:38 PM, Doug Ewell via Unicode
wrote:
Henri Sivonen wrote:
If anything, I hope this thread results in the establishment of a
requirement for proposals to come with proper research about what
multiple prominent i
I think that the (or a) key problem is that the current "best practice" is
treated as "SHOULD" in RFC parlance. When what this really needs is a "MAY".
People reading standards tend to treat "SHOULD" and "MUST" as the same thing.
So, when an implementation deviates, then you get bugs (as we se
On 6/1/2017 10:41 AM, Shawn Steele via
Unicode wrote:
I think that the (or a) key problem is that the current "best practice" is treated as "SHOULD" in RFC parlance. When what this really needs is a "MAY".
People reading standards tend to treat "SHOULD" and "MUST"
But those are IETF definitions. They don’t have to mean the same thing in
Unicode - except that people working in this field probably expect them to.
From: Unicode [mailto:unicode-boun...@unicode.org] On Behalf Of Asmus Freytag
via Unicode
Sent: Thursday, June 1, 2017 11:44 AM
To: unicode@unico
On Thu, 1 Jun 2017 12:32:08 +0300
Henri Sivonen via Unicode wrote:
> On Wed, May 31, 2017 at 8:11 PM, Richard Wordingham via Unicode
> wrote:
> > On Wed, 31 May 2017 15:12:12 +0300
> > Henri Sivonen via Unicode wrote:
> >> I am not claiming it's too difficult to implement. I think it
> >> ina
On 6/1/2017 11:53 AM, Shawn Steele wrote:
But those are IETF definitions. They don’t have to mean the same
thing in Unicode - except that people working in this field probably
expect them to.
That's the thing. And even if Unicode had it's own version of RFC 2119
one would considered it r
Richard Wordingham wrote:
> even supporting 6-byte patterns just in case 20.1 bits eventually turn
> out not to be enough,
Oh, gosh, here we go with this.
What will we do if 31 bits turn out not to be enough?
--
Doug Ewell | Thornton, CO, US | ewellic.org
On Thu, 01 Jun 2017 12:54:45 -0700
Doug Ewell via Unicode wrote:
> Richard Wordingham wrote:
>
> > even supporting 6-byte patterns just in case 20.1 bits eventually
> > turn out not to be enough,
>
> Oh, gosh, here we go with this.
You were implicitly invited to argue that there was no need
This is still very unlikely to occur. Lot of discussions about emojis but
they still don't count a lot in the total.
The major updates were epected for CJK sinograms, but even the rate of
updates has slowed down and we will eventually will have another
sinographic plane, but it will not come soon a
On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote:
You were implicitly invited to argue that there was no need to handle
5 and 6 byte invalid sequences.
Well, working from the *current* specification:
FC 80 80 80 80 80
and
FF FF FF FF FF FF
are equal trash, uninterpretable as *anyth
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode wrote:
> On 6/1/2017 2:39 PM, Richard Wordingham via Unicode wrote:
> > You were implicitly invited to argue that there was no need to
> > handle 5 and 6 byte invalid sequences.
> >
>
> Well, working from the *current* specification:
>
On Thu, 1 Jun 2017 17:10:54 -0700
Ken Whistler via Unicode wrote:
> Well, working from the *current* specification:
>
> FC 80 80 80 80 80
> and
> FF FF FF FF FF FF
>
> are equal trash, uninterpretable as *anything* in UTF-8.
>
> By definition D39b, either sequence of bytes, if encountered by a
On 6/1/2017 6:21 PM, Richard Wordingham via Unicode wrote:
By definition D39b, either sequence of bytes, if encountered by an
conformant UTF-8 conversion process, would be interpreted as a
sequence of 6 maximal subparts of an ill-formed subsequence.
("D39b" is a typo for "D93b".)
Sorry about
On Thu, 1 Jun 2017 19:19:51 -0700
Ken Whistler via Unicode wrote:
> > and therefore should start a
> > sequence of 6 characters.
>
> That is completely false, and has nothing to do with the current
> definition of UTF-8.
>
> The current, normative definition of UTF-8, in the Unicode Standa
On 6/1/2017 8:32 PM, Richard Wordingham via Unicode wrote:
TUS Section 3 is like the Augean Stables. It is a complete mess as a
standards document,
That is a matter of editorial taste, I suppose.
imputing mental states to computing processes.
That, however, is false. The rhetorical turn i
19 matches
Mail list logo