[EMAIL PROTECTED] wrote: > I'm using the ID3 tag of an mp3 file to query musicbrainz to get their > sort-name for the artist. A simple example is "The Beatles" -> > MusicBrainz -> "Beatles, The". I then want to rename the mp3 file > using this information. However, I would like the filename to contain > only ascii characters, while musicbrainz gives unicode back. So far, > I've got something like: > > >>> artist = u'B\xe9la Fleck' > >>> artist.encode('ascii', 'ignore') > 'Bla Fleck'
Why do you want only ASCII characters? What platform are you running on? If it's just a display problem, and the Unicode doesn't stray outside the first 256 codepoints, you shouldn't have a problem e.g. Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 [snip] IDLE 1.1.3 >>> artist = u'B\xe9la Fleck' >>> artist u'B\xe9la Fleck' >>> print artist Béla Fleck >>> import sys >>> sys.stdout.encoding 'cp1252' >>> print artist.encode('latin1') Béla Fleck On a *x box, using latin1 should work. > > However, I'd like to see the more sensible "Bela Fleck" instead of > dropping '\xe9' entirely. I believe this sort of translation can be > done using: > > >>> artist.translate(XXXX) > > The trick is finding the right XXXX. Has someone attempted this > before, or am I stuck writing my own solution? However if you really insist on having only ASCII characters, then you've pretty much got to make up your own translation table. There was a thread or two on this topic within the last few months. Merely stripping off accents umlauts cedillas etc etc off most European scripts where the basic alphabet is Roman/Latin is easy enough. However some scripts use characters which are not Latin letters with detachable decorations, and you will need 2 characters out for 1 in (e.g. German eszett, Icelandic thorn (the name of the god with the hammer is shown in ASCII as Thor, not Por!)). Scripts like Greek and Cyrillic would need even more work HTH, John -- http://mail.python.org/mailman/listinfo/python-list