On 2017-06-24 19:57, Rod Person wrote:
Hi,

I'm working on a program that will walk a file system and clean the id3
tags of mp3 and flac files, everything is working great until the
follow file is found

'06 - Todd's Song (Post-Spiderland Song in Progress).flac'

for some reason that I can't understand os.walk() returns this file
name as

'06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'

which then causes more hell than a little bit for me. I'm not
understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
about it.

The script is Python 3, the file system it is running on is a hammer
filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS which
runs some kind of Linux so it probably ext3/4. The files came from
various system (Mac, Windows, FreeBSD).

If you treat it as a bytestring b'\xe2\x80\x99' and decode it:

>>> c = b'\xe2\x80\x99'.decode('utf-8')
>>> ascii(c)
"'\\u2019'"
>>> import unicodedata
>>> unicodedata.name(c)
'RIGHT SINGLE QUOTATION MARK'

It's not an apostrophe, it's '\u2019' ('\N{RIGHT SINGLE QUOTATION MARK}').

It looks like the filename is encoded as UTF-8, but Python thinks that the filesystem encoding is something like Latin-1.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to