On Sat, 24 Jun 2017 21:28:45 +0200 Peter Otten <__pete...@web.de> wrote:
> Rod Person wrote: > > > Hi, > > > > I'm working on a program that will walk a file system and clean the > > id3 tags of mp3 and flac files, everything is working great until > > the follow file is found > > > > '06 - Todd's Song (Post-Spiderland Song in Progress).flac' > > > > for some reason that I can't understand os.walk() returns this file > > name as > > > > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in > > Progress).flac' > > > > which then causes more hell than a little bit for me. I'm not > > understand why apostrophe(') becomes \xe2\x80\x99, or what I can do > > about it. > > >>> b"\xe2\x80\x99".decode("utf-8") > '’' > >>> unicodedata.name(_) > 'RIGHT SINGLE QUOTATION MARK' > > So it's '’' rather than "'". > > > The script is Python 3, the file system it is running on is a hammer > > filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS > > which runs some kind of Linux so it probably ext3/4. The files came > > from various system (Mac, Windows, FreeBSD). > > There seems to be a mismatch between the assumed and the actual file > system encoding somewhere in this mix. Is this the only glitch or are > there similar problems with other non-ascii characters? > This is the only glitch as in file names so far. -- Rod http://www.rodperson.com Who at Clitorius fountain thirst remove Loath Wine and, abstinent, meer Water love. - Ovid -- https://mail.python.org/mailman/listinfo/python-list