Τη Τετάρτη, 5 Ιουνίου 2013 8:40:39 π.μ. UTC+3, ο χρήστης Michael Torrie έγραψε: > On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote: > > > One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek > > > filename with spaces. Is there a problem when a filename contain both > > > english and greek letters? Isn't it still a unicode string? > > > > > > All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του > > > Ιησού.mp3" > > > > > > and the displayed filename after 'ls -l' returned was: > > > > > > is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\ > > > \364\357\365\ \311\347\363\357\375.mp3 > > > > > > There is no way at all to check the charset used to store it in hdd? > > > It should be UTF-8, but it doesn't look like it. Is there some linxu > > > command or some python command that will print out the actual > > > encoding of '\305\365\367\336\ \364\357\365\ > > > \311\347\363\357\375.mp3' ? > > > > I can see that you are starting to understand things. I can't answer > your question (don't know the answer), but you're correct about one > thing. A filename is just a sequence of bytes. We'd hope it would be > utf-8, but it could be anything. Even worse, it's not possible to tell > from a byte stream what encoding it is unless we just try one and see > what happens. Text editors, for example, have to either make a guess > (utf-8 is a good one these days), or ask, or try to read from the first > line of the file using ascii and see if there's a source code character > set command to give it an idea.
Um, is there a way even if we don't actually know the encoding CentOS used to store the filename to hdd to tell Python to just open the bytestream as it is? I don't know if its possible, but iam looking for a way to skip the encoding, since we have now way of knowing what this is. This is very weird because: ni...@superhost.gr [~]# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= ni...@superhost.gr [~]# all i did it was a simple rename from english to greek. Since locale is set to use utf8, shouldnt the result in the hdd be an utf-8 bytestream? -- http://mail.python.org/mailman/listinfo/python-list