Hi 'Johannes, On Saturday, 2021-07-31 12:37:08 +0200, 'Johannes Köhler' via vim_use wrote:
> > It's not that simple unfortunately, UTF-16 (let's leave aside UCS-2, it > > shouldn't matter) cannot be assumed to always have two bytes per > > UCS: _Uni_versal _Cod_ed Character Set > > In my mind, UCS is the mathematical quantum and UTF the > encoding/decoding function using this: > magnitudes: 16(32)bit > plurality: charset / coded character You are confusing things. UCS-4 and UTF-32 as its subset are capable to hold respectively encode assigned Unicode characters as direct representations of the Unicode characters' code points. UCS-2 is a 2-byte fixed width character set capable of encoding 65536 characters, or just the Unicode Basic Multilingual Plane (BMP). UTF-16 is capable to encode the entire Unicode character range. It is almost identical to UCS-2 in the first 64k characters, except the "escape sequences" it needs to represent surrogate pairs for characters of higher planes. > Assuming that the data of the hdd partition tables (e.g.UID), > used by the operating system, are encoded in 16bit Unicode. > Well, my inferring thoughts were that UCS-2 is a > hardware encoding, UTF-8 for ASCII purpose, UTF-32 a > high level programmer attitude and UTF-16 the real unicode. That's all nonsense. Really. > In the end that means, the controller is made for 2-byte. > The old ASCII code needs 7bit and probably one for > sth., now than UTF-8 has to work with a different endian. There is no endianess in UTF-8. Unless your hardware has less than 8 bits per word.. > And... why should i use a deprecated ASCII scheme > at my system, when i can have lots of advantage > using utf-16 (e.g. control/hash functions). It fells > like utf-8 is a "work around" wrapper for > the ASCII scheme... UTF-8 is an efficient encoding that for Unicode characters <128 (which happen to be identical with ASCII and a subset of Unicode) needs only 1 byte per character, whereas UTF-16 needs at least 2 bytes for each character. UTF-16 is a workaround for those who wanted Unicode and started off with UCS-2 but then realized there's more than just BMP. Or, UTF-16 is the devil's work: https://robert.ocallahan.org/2008/01/string-theory_08.html Eike -- OpenPGP/GnuPG encrypted mail preferred in all private communication. GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A Use LibreOffice! https://www.libreoffice.org/ -- -- You received this message from the "vim_use" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_use" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/YQdSsnKXRVsDvhFt%40kulungile.erack.de.
signature.asc
Description: PGP signature
