On 02.12.22 22:39, thebluepandabear wrote:
Hm, that specifically might not be. The thing is, I thought a UTF-8 code unit can store 1-4 bytes for each character, so how is it right to say that `char` is a utf-8 code unit, it seems like it's just an ASCII code unit.

You're simply not using the term "code unit" correctly. A UTF-8 code unit is just one of those 1-4 bytes. Together they form a "sequence" which encodes a "code point".

And all (true) ASCII code units are indeed also valid UTF-8 code units. Because UTF-8 is a superset of ASCII. If you save a file as ASCII and open it as UTF-8, that works. But it doesn't work the other way around.

Reply via email to