[issue21081] missing vietnamese codec TCVN 5712:1993 in Python

Antti Haapala Fri, 21 Oct 2016 13:48:44 -0700

Antti Haapala added the comment:

I found the full document on SlideShare: 
http://www.slideshare.net/sacobat/tcvn-5712-1993-cng-ngh-thng-tin-b-m-chun-8bit-k-t-vit-dng-trong-trao-i-thng-tin


As far as I can understand, they're "subsets" of each other only in the sense 
that VN1 has the widest mapping of characters, but this also partially overlaps 
with C0 and C1 ranges of control characters in ISO code pages - there are 139 
additional characters!

VN2 then lets the C0 and C1 retain the meanings of ISO-8859 by sacrificing some 
capital vowels (Ezio perhaps remembers that Italy is Ý in Vietnamese - sorry, 
can't write it in upper case in VN2). VN3 then sacrifices even more for some 
more spaces left for possibly application-specific uses (the standard is very 
vague about that); 

The text of the standard is copy-pasteable at 
http://luatvn.net/tieu-chuan-viet-nam/tieu-chuan-viet-nam-tcvn5712_1993.2.171673.html
 - however, it lacks some of the tables.

The standard additionally has both UCS-2 mappings and Unicode names of the 
characters, but they're in pictures; so it would be preferable to get the 
mapping from the iconv output, say.

----------
nosy: +ztane

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21081>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21081] missing vietnamese codec TCVN 5712:1993 in Python

Reply via email to