On Sat, 24 Jun 2017 13:28:55 -0600 Michael Torrie <torr...@gmail.com> wrote:
> On 06/24/2017 12:57 PM, Rod Person wrote: > > Hi, > > > > I'm working on a program that will walk a file system and clean the > > id3 tags of mp3 and flac files, everything is working great until > > the follow file is found > > > > '06 - Todd's Song (Post-Spiderland Song in Progress).flac' > > > > for some reason that I can't understand os.walk() returns this file > > name as > > > > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in > > Progress).flac' > > That's basically a UTF-8 string there: > > $ python3 > >>> a= b'06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in > Progress).flac' > >>> print (a.decode('utf-8')) > 06 - Todd’s Song (Post-Spiderland Song in Progress).flac > >>> > > The NAS is just happily reading the UTF-8 bytes and passing them on > the wire. > > > which then causes more hell than a little bit for me. I'm not > > understand why apostrophe(') becomes \xe2\x80\x99, or what I can do > > about it. > > It's clearly not an apostrophe in the original filename, but probably > U+2019 (’) > > > The script is Python 3, the file system it is running on is a hammer > > filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS > > which runs some kind of Linux so it probably ext3/4. The files came > > from various system (Mac, Windows, FreeBSD). > > It's the file serving protocol that dictates how filenames are > transmitted. In your case it's probably smb. smb (samba) is just > passing the native bytes along from the file system. Since you know > the native file system is just UTF-8, you can just decode every > filename from utf-8 bytes into unicode. This is the impression that I was under, my unicode is that strong, so maybe my understand is off...but I tried. file_name = file_name.decode('utf-8', 'ignore') but when I get to my logging code: logfile.write(file_name) that throws the error: UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-41: ordinal not in range(128) -- Rod http://www.rodperson.com Who at Clitorius fountain thirst remove Loath Wine and, abstinent, meer Water love. - Ovid -- https://mail.python.org/mailman/listinfo/python-list