On 2017-06-24 20:47, Rod Person wrote:
On Sat, 24 Jun 2017 13:28:55 -0600
Michael Torrie <torr...@gmail.com> wrote:

On 06/24/2017 12:57 PM, Rod Person wrote:
> Hi,
> > I'm working on a program that will walk a file system and clean the
> id3 tags of mp3 and flac files, everything is working great until
> the follow file is found
> > '06 - Todd's Song (Post-Spiderland Song in Progress).flac' > > for some reason that I can't understand os.walk() returns this file
> name as
> > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in > Progress).flac'
That's basically a UTF-8 string there:

$ python3
>>> a= b'06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac' >>> print (a.decode('utf-8')) 06 - Todd’s Song (Post-Spiderland Song in Progress).flac >>>
The NAS is just happily reading the UTF-8 bytes and passing them on
the wire.

> which then causes more hell than a little bit for me. I'm not
> understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
> about it.
It's clearly not an apostrophe in the original filename, but probably
U+2019 (’)

> The script is Python 3, the file system it is running on is a hammer
> filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS
> which runs some kind of Linux so it probably ext3/4. The files came
> from various system (Mac, Windows, FreeBSD).
It's the file serving protocol that dictates how filenames are
transmitted. In your case it's probably smb. smb (samba) is just
passing the native bytes along from the file system.  Since you know
the native file system is just UTF-8, you can just decode every
filename from utf-8 bytes into unicode.

This is the impression that I was under, my unicode is that strong, so
maybe my understand is off...but I tried.

        file_name = file_name.decode('utf-8', 'ignore')

but when I get to my logging code:

        logfile.write(file_name)

that throws the error:
        UnicodeEncodeError: 'ascii' codec can't encode characters in
        position 39-41: ordinal not in range(128)


Your logfile was opened with the 'ascii' encoding, so you can't write anything outside the ASCII range.

Open it with the 'utf-8' encoding instead.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to