On 05/06/2013 06:40, Michael Torrie wrote:
On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek filename with spaces. Is there a problem when a filename contain both english and greek letters? Isn't it still a unicode string?All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του Ιησού.mp3" and the displayed filename after 'ls -l' returned was: is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3 There is no way at all to check the charset used to store it in hdd? It should be UTF-8, but it doesn't look like it. Is there some linxu command or some python command that will print out the actual encoding of '\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3' ?I can see that you are starting to understand things. I can't answer your question (don't know the answer), but you're correct about one thing. A filename is just a sequence of bytes. We'd hope it would be utf-8, but it could be anything. Even worse, it's not possible to tell from a byte stream what encoding it is unless we just try one and see what happens. Text editors, for example, have to either make a guess (utf-8 is a good one these days), or ask, or try to read from the first line of the file using ascii and see if there's a source code character set command to give it an idea.
From the previous posts I guessed that the filename might be encoded
using ISO-8859-7:
>>> s = b"\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3"
>>> s.decode("iso-8859-7")
'Ευχή\\ του\\ Ιησού.mp3'
Yes, that looks the same.
--
http://mail.python.org/mailman/listinfo/python-list
