Hi, first of all you seem to have misunderstandings about what UTF-8 and the other Unicode encodings are. If you're interested and confident with low-level things I advise you to learn exactly what they are. The relevant portions of the Unicode specification (unicode.org) are not very long or exceedingly hard to understand, but maybe you can find some more accessible description.

Most of all, UTF-8 is (normally) absolutely indistinguishable from normal US-ASCII until you use characters that were not in US-ASCII; so for example most English files will be bit-per-bit identical whether written in US-ASCII or UTF-8.

Then, there are many fairly complex issues in how files are read, converted and written by the various parts of the system. Vim is an especially problematic part, I had made an attempt of understanding it in the message https://www.mail-archive.com/[email protected]/msg57383.html and the others of that thread. But you probably won't make much out of it until you know how at least UTF-8 is encoded.

Finally, if you really want to be sure of having all your files encoded in Unicode (in UTF-8 or other encodings), then I applaud you and agree with your concern, and I suggest the way I do it (yes there actually is a way): https://www.mail-archive.com/[email protected]/msg57385.html . The BOM mentioned there is a byte sequence that can be placed at the beginning of text files and will be interpreted by unicode-aware software as a sort of invisible declaration that the file is in a certain Unicode encoding.

By the way, all of this means that it's not ascii that is "deprecated", but the various complimentary or alternative encodings that were (and still partly are) used to support non-English characters.

Kind regards,
Gabriele



P.S. I'm not sure I'll be able to further reply in the next days, I'm in a complex situation






'Johannes Köhler' via vim_use wrote:

Beloved vim'er!

until shortly before... I never came up with
the idea of doing: "thinking about the text file encoding
of my files@hdd"

I used unicode like a definition at my locales. Still in
mind that my files are utf-8 encoded.

BUT, after a file crash - during the system play with an
old ext2 filesystem and gnu tar, i had an file header
without file in my inodes. Like an condensor without
payload :) AND, out of curiosity i probed a bit with vim
files, and utf-8 (but btrfs) and an up-to-date archlinux.

Then, I realized that there are three encoding views:
keyboard, display(terminal), vim. Like, decoding pipes to
an encoded socket. The encoded socket, the file itself,
works partly inconsistent together with vim, xterm and
the unixtool file.

Setting: I create an file using xterm console and touch.
Then, i open it with vim.

Vim: enc & fenc = utf-8
BUT file -i: us-ascii

The file results with 2-byte per Character, yet like
us-ascii inside of an unicode container. However, i
like to have real unicode and not an endianness
of us-ascii using 2-byte instead of 1-byte.

Then @vim, i change the encoding to ucs-2 with :set fenc=ucs-2. I read@vimdoku ucs-2 and utf-8 is similar@linux
Now :write, vim tells me [converted] and
file (sometimes) tells me utf-8 like expected. The file
size increases to 4-byte per character, like expected
for ucs-4. Then reread @vim, shows me unreadable content.
I have to ++enc it back to ucs-2. So, inside vim ucs-2 and utf-8 seems to be different. And @linux ucs-2 using
filespace like ucs-4.

Imaginary reasoning: my system wide (or kernel working)
utf-8 differs from real unicode utf-8 by endianness
abuse. Maybe because of compatibility...
That is why the file tool works inconsistent
(partly tells binary stuff instead of text encoding).

Is there a way to ensure working with true utf-8
or better utf-16 files? Aim is to work with source
files in unicode to exclude the deprecated ascii...

Sincerly
-kefko


--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/vim_use/fd8feffe-891b-5a14-223c-9ebdf99841ac%40tiscali.it.

Reply via email to