On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote: > One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek > filename with spaces. Is there a problem when a filename contain both > english and greek letters? Isn't it still a unicode string? > > All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του > Ιησού.mp3" > > and the displayed filename after 'ls -l' returned was: > > is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\ > \364\357\365\ \311\347\363\357\375.mp3 > > There is no way at all to check the charset used to store it in hdd? > It should be UTF-8, but it doesn't look like it. Is there some linxu > command or some python command that will print out the actual > encoding of '\305\365\367\336\ \364\357\365\ > \311\347\363\357\375.mp3' ?
I can see that you are starting to understand things. I can't answer your question (don't know the answer), but you're correct about one thing. A filename is just a sequence of bytes. We'd hope it would be utf-8, but it could be anything. Even worse, it's not possible to tell from a byte stream what encoding it is unless we just try one and see what happens. Text editors, for example, have to either make a guess (utf-8 is a good one these days), or ask, or try to read from the first line of the file using ascii and see if there's a source code character set command to give it an idea. -- http://mail.python.org/mailman/listinfo/python-list