Of course, I leave all of this to the authors and ultimately the working group. But a few comments:
Tim Bray <tb...@textuality.com> writes: > You have a point, but Iâm reluctant, for several reasons. First, I disagree > that the doc is organized per PRECIS; in fact, it makes use of exactly zero > of the considerable apparatus that PRECIS builds to support its profile > definitions. First, the organization is, rather, âHere the problems with > code points, and here are three subsets that, to a varying degree, exclude > them, and here are the necessary declarations to use these as PRECIS > profiles.â Second, I donât want to create the expectation that every > off-the-shelf PRECIS library will know about âXML charactersâ or > âUnicode > assignableâ. Third, I donât want to create the impression that a specifier > must understand PRECIS to use Unichars. PRECIS is a big and complicated, > really quite a heavy lift. Hmmmm. Is it possible to get that point (those points!) across briefly in section 1? When I was writing the review, I had the feeling that this I-D is somehow founded on PRECIS. I mean, all of the IANA actions are additions to PRECIS tables. And literally 4% of the words in the Abstract are "PRECIS". Clarifying how Unichars related to PRECIS would be helpful. >> This is an awkward mix of singular and plural usages. Inquire of >> Editor the best way to phrase this. I've gradually come to believe that when one is talking of generic entities, it's easier to make the sentences work if you use singulars whenever possible. >> I think the usual terminology would be "variable-length sequences of >> 8-bit chunks" or better "variable-length sequences of octets". > > Really? The document is written for programmers for whom âvariable-length > byte sequencesâ is super-idiomatic. Yes, it is idiomatic. OTOH, RFCs seem to have labored long and hard to not use "byte". ... OTOOH, about as many RFCs use octet (3490) as byte (3531). Wikipedia says The octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as the byte has historically been used for storage units of a variety of sizes. Historically, I would expect the distinction to be driven by the use of the PDP-10. I suspect my personal attitude is driven by the fact that the SIP RFCs consistently use octets or characters, as needed, and not bytes. And the matter hasn't been settled recently; RFC 9659 and RFC 9661 are the most recent contrasting pair. Really, the Editor should have an opinion about this. >> [RFC9413] emphasizes that when encountering problematic input, >> software should consider the field as a whole, not individual code >> points or bytes. >> >> This needs to be clarified; RFC 9413 does not contain the word >> "field", and only one instance of "as a whole" (in the phrase >> "protocol as a whole"). > > This is embarrassing. In commentary on an earlier draft, a person I tend to > believe (I forget who) said âOf course, RFC9413 saysâ¦â and that language > sounded wise and we included it without checking. In fact, 9413 says no > such thing. I still think itâs a sensible idea and, while I would like to > have an actually-accurate citation, would also like to retain the > suggestion even if we canât. Oh, yes, I didn't mean to omit the guidance, rather to make sure there's a good pointer to the discussion of *why* just e.g. dropping individual bogus bytes may not be a good strategy, given that it is obvious (and simple to implement). >> [...] surrogates, legacy C0 Controls, and the noncharacters U+FFFE [...] >> >> The phrase "legacy C0 Controls" is not defined. I think you mean "C0 >> Controls". > > The phrase âLegacy Controls which are C0 Controlsâ relies only on defined > words. I think it might be forgivable to include the âC0â in the middle of > the term defined in 2.2.2.2 as more readable? My reflex is to disagree, but I am picky about using defined terms exactly. I mean, exactly what does "legacy C0 controls" mean if the term is used nowhere else in this document? The underlying problem is that while everybody else sensibly considers the C1 controls pretty much like the C0 controls (as in "legacy controls"), XML for some reason treats the C1 controls like normal characters. >> It isn't your problem, but currently the URL >> <http://www.unicode.org/versions/latest/> goes to a page titled >> "Unicode(R) 16.0.0", but that page gives only a summary of changes, >> not the contents of Unicode 16. You have to go to >> e.g. <https://www.unicode.org/versions/Unicode15.0.0/> to see the >> standard. > > Yes, and Iâve been working to try to find the right Unicode person to yell > at about this. Heh! Dale
_______________________________________________ Gen-art mailing list -- gen-art@ietf.org To unsubscribe send an email to gen-art-le...@ietf.org