Hello Sean, On 2015/09/20 23:48, Sean Leonard wrote:
What is the most concise term for characters or code points
So we already have two different things we might need a term for.
outside of the US-ASCII range (U+0000 - U+007F)? Sometimes I have referred to these as "extended characters"
Most of the characters outside the US-ASCII range are perfectly simple and basic characters. I don't think the term 'extended' fits well here. It gives the impression that everything except US-ASCII is somewhat extraordinary, which in this day and age shouldn't be the case anymore.
or "non-ASCII Unicode" but I do not find those terms precise. We are talking about the code points U+0080 - U+10FFFF. I suppose that this also refers to code points/scalar values that are not formally Unicode characters, such as U+FFFF.
Again we may need different terms depending on whether these are included or not.
Basically, I am looking for a concise term for values that would require multiple UTF-8 octets if encoded in UTF-8 (without referring to UTF-8 encoding specifically). "Non-ASCII" is not precise enough since character sets like Shift-JIS are non-ASCII.
Well, the non-ASCII characters in Shift-JIS are also contained in Unicode, so depending on exactly what you want to talk about, Non-ASCII characters may be good enough.
Also a citation to a relevant standard (whether Unicode or otherwise) would be helpful. The terms "supplementary character" and "supplementary code point" are defined in the Unicode standard, referring to characters or code points above U+FFFF. I am looking for something like those, but for characters or code points above U+007F.
And then in some cases, you may want to exclude the C0 area (U+0000-001F), or part of it, or some syntactically significant characters (e.g. punctuation) in the remaining part.
Anyway, what I wanted to show is that depending on what you need it for, there are so many different variations that it doesn't pay off to create specific short terms for all of them, and the term you use currently may be short enough.
Regards, Martin.

