On 2017-10-13 05:28, Gregory Ewing <greg.ew...@canterbury.ac.nz> wrote: > Grant Edwards wrote: >> On 2017-10-13, Stefan Ram <r...@zedat.fu-berlin.de> wrote: >>> 1 byte >>> >>> addressable unit of data storage large enough to hold >>> any member of the basic character set of the execution >>> environment« >>> >>> ISO C standard > > Hmmm. So an architecture with memory addressed in octets > and Unicode as the basic character set would have a > char of 8 bits and a byte of 32 bits?
No, because a char is also "large enough to store any member of the basic execution character set. (§6.2.5). A "byte" is just the amount of storage a "char" occupies: | The sizeof operator yields the size (in bytes) of its operand [...] | When applied to an operand that has type char, unsigned char, or signed | char, (or a qualified version thereof) the result is 1. (§6.5.3.4) So if a C implementation used Unicode as the base character set, a byte would have to be at least 21 bits, a char the same, and all other types would have to be multiples of that. For any modern architecture that would be rounded up to 32 bits. (I am quite certain that there was at least one computer with a 21 bit word size, but I can't find it: Lots of 18 bit and 24 bit machines, but nothing in between.) An implementation could also choose the BMP as the base character set and the rest of Unicode as the extended character set. That would result in a 16 bit byte and char (and most likely UTF-16 as the multibyte character representation). > Not only does "byte" not always mean "8 bits", but > "char" isn't always short for "character"... True. A character often occupies more space than a char, and you can store non-character data in a char. hp -- _ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung: |_|_) | | Man feilt solange an seinen Text um, bis | | | h...@hjp.at | die Satzbestandteile des Satzes nicht mehr __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel -- https://mail.python.org/mailman/listinfo/python-list