'Johannes Köhler' via vim_use wrote:
*disorientation*
The unix _manpage_ utf-8 describes unicode with 2-byte encoding. But
_wikipedia_ indicates also 1-byte unicode
with ascii compatibility.
(If I remember correctly) the first versions of Unicode had only a
2-byte encoding, so that (part of the) manpage is very old.
Furthermore, be interested myself in the filesystem behavior
and unicode with ucs-2. Is it possible to use a linux
filesystem with 2-byte unicode encoding on principle.
I'm not so strong on Linux but filesystems shouldn't have anything to do
with text files encodings
Due to the cause that linux creates a 2-byte file
(1-byte character & 1-byte EOF) when creating it with
touch, and inserting one character into it with vim.
I think it's vim that puts the EOF (see :help 'fixendofline'), not the
touch program or linux
The bottom line is a 1-byte ascii file... Or a 1-byte
unicode with ascii compatibility (that what i meant with
endian abuse appearance).
I haven't understood this or other parts of the first message, but
you're probably thinking too much ahead, these issues have likely
nothing to do with endianness
Present, i study autodidactic with electric circuits and
the logical behavior. With that in mind it should be
faster to use 2-byte all over instead of a 1-byte, 2-byte
decision with the encoder, decoder.
It's not that simple unfortunately, UTF-16 (let's leave aside UCS-2, it
shouldn't matter) cannot be assumed to always have two bytes per
character, and some tests indicated that UTF-8 usually ends up being
better overall (utf8everywhere.org is certainly worth a look, I don't
remember if I agreed with it completely but it for sure is an
interesting document).
All in all, it's nice if you want to understand how things are at the
lower levels, it's quite fun to know it, but in order to achieve that
for text files these days you need to read the Unicode specification, at
least in its first parts; other sources are quite likely to cause more
confusion than clarity. To tackle the varied things you can run into on
the web and other information sources you'll probably also need to know
some of the earlier history of Unicode and the older encodings /
character sets.
Kind regards,
Gabriele
--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/vim_use/0165ce1c-cd38-2d24-72b2-365849a8f788%40tiscali.it.