Re: unicode: UTF / UCS

Gabriele F Tue, 27 Jul 2021 12:37:09 -0700

'Johannes Köhler' via vim_use wrote:

*disorientation*
The unix _manpage_ utf-8 describes unicode with 2-byte encoding. But_wikipedia_ indicates also 1-byte unicode
with ascii compatibility.

(If I remember correctly) the first versions of Unicode had only a2-byte encoding, so that (part of the) manpage is very old.

Furthermore, be interested myself in the filesystem behavior
and unicode with ucs-2. Is it possible to use a linux
filesystem with 2-byte unicode encoding on principle.

I'm not so strong on Linux but filesystems shouldn't have anything to dowith text files encodings

Due to the cause that linux creates a 2-byte file
(1-byte character & 1-byte EOF) when creating it with
touch, and inserting one character into it with vim.

I think it's vim that puts the EOF (see :help 'fixendofline'), not thetouch program or linux

The bottom line is a 1-byte ascii file... Or a 1-byte
unicode with ascii compatibility (that what i meant with
endian abuse appearance).

I haven't understood this or other parts of the first message, butyou're probably thinking too much ahead, these issues have likelynothing to do with endianness

Present, i study autodidactic with electric circuits and
the logical behavior. With that in mind it should be
faster to use 2-byte all over instead of a 1-byte, 2-byte
decision with the encoder, decoder.

It's not that simple unfortunately, UTF-16 (let's leave aside UCS-2, itshouldn't matter) cannot be assumed to always have two bytes percharacter, and some tests indicated that UTF-8 usually ends up beingbetter overall (utf8everywhere.org is certainly worth a look, I don'tremember if I agreed with it completely but it for sure is aninteresting document).

All in all, it's nice if you want to understand how things are at thelower levels, it's quite fun to know it, but in order to achieve thatfor text files these days you need to read the Unicode specification, atleast in its first parts; other sources are quite likely to cause moreconfusion than clarity. To tackle the varied things you can run into onthe web and other information sources you'll probably also need to knowsome of the earlier history of Unicode and the older encodings /character sets.



Kind regards,
Gabriele

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---You received this message because you are subscribed to the Google Groups "vim_use" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/0165ce1c-cd38-2d24-72b2-365849a8f788%40tiscali.it.

Re: unicode: UTF / UCS

Reply via email to