Rod Person wrote: > Ok...so after reading all the replies in the thread, I thought I would > be easier to send a general reply and include some links to screenshots. > > As Peter mention, the logic thing to do would be to fix the file name > to what I actually thought it was and if this was for work that > probably what I would have done, but since I want to understand what's > going on I decided to waste time on that. > > I have to admit, I didn't think the file system was utf-8 as seeing what > looked to be an apostrophe sent me down the road of why is this > apostrophe screwed up instead of "ah this must be unicode". > > But doing a simple ls of that directory show it is unicode but the > replacement of the offending character. > > http://rodperson.com/graphics/uc/ls.png
Have you set LANG to something that implies ASCII? $ touch Todd’s ähnlich üblich löblich $ ls ähnlich löblich Todd’s üblich $ LANG=C ls Todd???s l??blich ??hnlich ??blich $ python3 -c 'import os; print(os.listdir())' ['Todd’s', 'üblich', 'ähnlich', 'löblich'] $ LANG=C python3 -c 'import os; print(os.listdir())' ['Todd\udce2\udc80\udc99s', '\udcc3\udcbcblich', '\udcc3\udca4hnlich', 'l\udcc3\udcb6blich'] $ LANG=en_US.utf-8 python3 -c 'import os; print(os.listdir())' ['Todd’s', 'üblich', 'ähnlich', 'löblich'] For file names Python resorts to surrogates whenever a byte does not translate into a character in the advertised encoding. > I am in fact using Python 3.5. I may be lacking in unicode skills but I > do have the sense enough to know the version of Python I am invoking. I've made so many "stupid errors" myself that I always consider them first ;) > So I included this screenshot of that so the version of Python and the > files list returned by os.walk > > http://rodperson.com/graphics/uc/files.png > > So the fact that it shows as a string and not bytes in the debugger was > throwing me for a loop, in my log section I was trying to determine if > it was unicode decode it...if not don't do anything which wasn't working > > http://rodperson.com/graphics/uc/log_section.png > > > > > On Sun, 25 Jun 2017 10:47:18 +0200 > Peter Otten <__pete...@web.de> wrote: > >> Steve D'Aprano wrote: >> >> > On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote: >> >> >> if everything worked correctly? Though I don't understand why the >> >> OP doesn't see >> >> >> >> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac' >> >> >> >> which is the repr() that I get. >> > >> > That's mojibake and is always wrong :-) >> >> Yes, that's my very point. >> >> > I'm not sure how you got that. >> >> I took the OP's string at face value and pasted it into the >> interpreter: >> >> # python 3.4 >> >>> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in >> >>> Progress).flac' >> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac' >> >> > Something to do with an accidental decode to Latin-1? >> >> If the above filename is the only one or one of a few that seem >> broken, and other non-ascii filenames look OK the OP's >> toolchain/filesystem may work correctly and the odd name might have >> been produced elsewhere, e. g. by copying an already messed-up >> freedb.org entry. >> >> [Heureka] >> >> However, the most likely explanation is that the filename is correct >> and that the OP is not using Python 3 as he claims but Python 2. >> >> Yes, it took that long for me to realise ;) Python 2 is slowly >> sinking into oblivion... >> > > > -- https://mail.python.org/mailman/listinfo/python-list