2015年3月8日 18:53于 "Philipp Hahn" <h...@univention.de>写道: > > Hello, > > On 08.03.2015 02:53, Xiaodong Gong wrote: > > the encoding type of parent location is must be utf 8,utf16e,according > > to the draft > > Yes, the SPEC for VPC/VHD specifies the character encoding to use, which > is good for being portable. > > > ascii is the encoding type to store the string of parent location in > > memery and to use fopen() > > No: For the (Linux) kernel the filename is a sequence of 8 bit bytes, > where only '\0'=end_of_string and '/'=path_separator are handled > specially. All other bytes have no special meaning and are passed in and > out as is. > > Only the applications are doing the character encoding. Normally this is > not a problem as you setup your system once with one encoding (nowadays > UTF-8) and use that consistently: If you enter ä on the keyboard, > the kernels input layer returns \u00E4 as the two-byte UTF-8 sequence > > $ echo -n ä | xxd -g 1 > > 0000000: c3 a4 > Any application can either just pass the byte sequence around as a CLOB > (or use any other encoding internally - but then it must know that the > input-encoding is UTF-8), but when again doing any system call, they > will again pass that same byte sequence as the file-name, which the > kernel will store on disk. > If you take that disk to another computer, which does NOT use Unicode, > you have a problem: If, for example, that one is still using the old > ISO-8859-1 encoding used in western Europe, you file will be named > differently: > > $ echo -n ä | iconv -f ISO-8859-1 -t UTF-8 > > ä > > (The reverse is even more painful, as not any ISO-8859-1 character > sequence is a valid UTF-8 byte sequence - several years back when I > moved from my old ISO-8859-1 to a more modern UTF-8 setup, I had to > rename lots of files to be readable again) > > You can even test that locally on one system by creating a file > containing an umlaut in its name and then to display that in a non-UTF-8 > terminal / environment: > > $ touch ä > > $ LANG=C ls -NQ > > "\303\244" > > > ascii need to translate to other encoding type according to LANG when to > > show the information of the vhd file using the qemu-info and so on > > No: your assumption that ASCII is used is IMHO wrong: ASCII is only 7 > bit, but the kernel interface is 8 bit. The terminal input- and output > layer nowadays are UTF-8, so as long as you're working on the console > everything is fine. If you mix in GUIs and libraries doing their own > encoding/decoding, things get more interesting. > > But when you do explicit character conversion like you do for VHD, you > must honor the user configured character encoding of the environment > yourself, that is use LC_CTYPE for any conversion from input, for output > which includes file names. > > I checked xen/tools/blktap2/vpc/lib/libvhd.c # > vhd_initialize_header_parent_name() > which also (wrongly) assumes ASCII. Because of the creating a snapshot > using vhd-utils is also broken: > > > $ /usr/bin/vhd-util create -n ä.vhd -s 1 > > $ /usr/bin/vhd-util snapshot -n snap.vhd -p ä.vhd ; echo $? > > 84 > > Next I checked > <https://technet.microsoft.com/de-de/library/gg318052%28v=ws.10%29.aspx> > to create a VHD using umlauts with Windows 7: > > > cmd # as Admin > > diskpart > > create vdisk file="C:\ä.vhd" maximum=2000 type=expandable > > create vdisk file="C:\snap.vhd" parent="C:\ä.vhd" > > But vhd-utils from Xen is broken: > > > $ /usr/bin/vhd-util read -n snap.vhd -p > > VHD Header Summary: > ... > > Parent name : failed to read name > ... > > VHD Parent Locators: > > -------------------- > > locator: : 0 > .... > > failed to read parent name > > With the attached patch it works: > > > VHD Header Summary: > > ------------------- > ... > > Parent name : /ä.vhd > ... > > VHD Parent Locators: > > -------------------- > > locator: : 0 > > code : PLAT_CODE_W2KU > ... > > decoded name : /ä.vhd > > > > locator: : 1 > > code : PLAT_CODE_W2RU > ... > > decoded name : ./ä.vhd > > Hope that clarified things. > > Philipp
first,your patch is very clear,a good sample. store ascii code in kernel that I said before is a mistake,I mean the glibc need the input of arguments of fuction such as fopen(path)is ascii code I think: icovn_open(utf16le,ascii)in encode icovn_open(ascii,utf16le)in decode icovn_open(codeset,ascii)in show