On 9/22/2015 1:45 AM, Philippe Verdy wrote:
I would not use the "clumsy 7-bit ASCII" due to the confusion created since long when it could refer to any national version of ISO 646, which reassign some code positions in the rande 0x00 to 0x07F to other characters outside the range U+0000 to U+007F, while still remaining 7-bit encodings. So insead of "7-bit ASCII" I highly prefer the term "US-ASCII" to make sure it refers to the encoding of 7-bit code positions effectively to U+0000..U+007F.

So for code positions outside 0x00..0x7F, I would call them "not US-ASCII" (none of them are bound to any Unicode "character" or "code point" or "scalar value", they are just "code positions" or more precisely "octet values with their most significant bit set to 1" which is really long: "not US-ASCII" is fine as a shorter term).

Again having just read through ANSI X3.4-1986 (R1997), I would like to clarify some things.

The standard itself is titled:
American National Standard for Information Systems - Coded Character Sets - 7-Bit American National Standard Code for Information Interchange (7-Bit ASCII)

However, Clause 1.1 states:
This standard specifies a set of 128 characters (control characters and graphic characters, such as letters, digits, and symbols) with their coded representation. The American National Standard Code for Information Interchange may also be identified by the acronym ASCII (pronounced ask-ee). To explicitly designate a particular (perhaps prior) edition of this standard, the last two digits of the year of issue may be appended, as in "ASCII 68" or "ASCII 86".


According to the title, "7-Bit ASCII" is proper. However, according to the text, "ASCII" is sufficient. The "7-Bit" part really just emphasizes the fact that it is a 7-bit standard. The eighth bit is outside the scope of the standard (but see clause 2.1.1). (Incidentally, Clause 1.1 is not Y2K compliant! Thus you should '86 that part of ASCII 86...hehe)

The term "US-ASCII" (see also RFC 2046 for a lot of discussion) is similarly redundant. After all, it is the *American* *National* Standard Code for Information Interchange. Even if you remove the term "National" (which does not appear in ASCII 68 or ASCII 63), it's still American. However, ASCII 68 (partially reprinted in RFC 20: <https://tools.ietf.org/html/rfc20>) actually permits "the notation ASCII (pronounced as'-key) or USASCII (pronounced you-sas'-key) [...] to mean the code prescribed by the latest issue of the standard". That is probably the genesis of US-ASCII. I wasn't alive at the time so I don't know. My suspicion is that "US-ASCII" was meant to disambiguate ASCII 86 from ASCII 68 (which is referred to as "ASCII" in RFC 821) without referring to the year, and since 68 and 86 are transposed numerals, "US-ASCII" eliminates possible mix-ups.


My conclusion here is that "ASCII" is sufficient when talking about the range of (code or character) positions 0 - 127, regardless of how they are encoded, so long as they logically evaluate to the bit combinations of the 7-bit code described in ANSI X3.4-1986.

"Basic Latin" also works if you want to avoid the historic reference. But there are many systems in use that are ASCII-based (including the Internet, as RFC 20 is still in force), and the term "ASCII" is peppered throughout the Unicode Standard 8.0 with greater frequency than "Basic Latin" (which is acknowledged to be a synonym for "ASCII" in Sections 5.7 and 6.2).

Sean



Reply via email to