On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote: > Steve D'Aprano wrote: > >> On Sun, 25 Jun 2017 07:17 am, Peter Otten wrote: >> >>> Then I'd fix the name manually... >> >> The file name isn't broken. >> >> >> What's broken is parts of the OP's code which assumes that non-ASCII file >> names are broken... > > Hm, the OP says > > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac' > > Shouldn't it be > > '06 - Todd’s Song (Post-Spiderland Song in Progress).flac'
It should, if the OP did everything right. He has a file name containing the word "Todd’s": # Python 3.5 py> fname = 'Todd’s' py> repr(fname) "'Todd’s'" On disk, that is represented in UTF-8: py> repr(fname.encode('utf-8')) "b'Todd\\xe2\\x80\\x99s'" The OP appears to be using Python 2, so when he calls os.listdir() he gets the file names as bytes, not Unicode. That means he'll see: - the file name will be Python 2 str, which is *byte string* not text string; - so not Unicode - rather the individual bytes in the UTF-8 encoding of the file name. So in Python 2.7 instead of 3.5 above: py> fname = u'Todd’s' py> repr(fname) "u'Todd\\u2019s'" py> repr(fname.encode('utf-8')) "'Todd\\xe2\\x80\\x99s'" > if everything worked correctly? Though I don't understand why the OP doesn't > see > > '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac' > > which is the repr() that I get. That's mojibake and is always wrong :-) I'm not sure how you got that. Something to do with an accidental decode to Latin-1? # Python 2.7 py> repr(fname.encode('utf-8').decode('latin-1')) "u'Todd\\xe2\\x80\\x99s'" # Python 3.5 py> repr(fname.encode('utf-8').decode('latin-1')) "'Toddâ\\x80\\x99s'" -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list