On 10/17/06, TiNo <[EMAIL PROTECTED]> wrote: > Hi all, > > I am trying to compare my Itunes Library xml to the actual files on my > computer. > As the xml file is in UTF-8 encoding, I decided to do the comparison of the > filenames in that encoding. > It all works, except with one file. It is named 'The Chemical > Brothers-Elektrobank-04 - Don't Stop the Rock (Electronic Battle Weapon > Version).mp3'. It goes wrong with the apostrophe in Don't. That is actually > not an apostrophe, but ASCII char 180: ยด
It's actually Unicode char #180, not ASCII. ASCII characters are in 0..127 range. > In the Itunes library it is encoded as: Don%E2%80%99t Looks like a utf-8 encoded string, then encoded like an url. > I do some some conversions with both the library path names and the folder > path names. Here is the code: > (in the comment I dispay how the Don't part looks. I got this using print > repr(filename)) > ------------------------------------------------------------- > #Once I have the filenames from the library I clean them using the following > code (as filenames are in the format ' > file://localhost/m:/music/track%20name.mp3') > > filename = urlparse.urlparse(filename)[2][1:] # u'Don%E2%80%99t' ; side > question, anybody who nows a way to do this in a more fashionable way? > filename = urllib.unquote (filename) # u'Don\xe2\x80\x99t' This doesn't work for me in python 2.4, unquote expects str type, not unicode. So it should be: filename = urllib.unquote(filename.encode('ascii')).decode('utf-8') > filename = os.path.normpath(filename) # u'Don\xe2\x80\x99t' > > I get the files in my music folder with the os.walk method and then > I do: > > filename = os.path.normpath(os.path.join (root,name)) # 'Don\x92t' > filename = unicode(filename,'latin1') # u'Don\x92t' > filename = filename.encode('utf-8') # 'Don\xc2\x92t' > filename = unicode(filename,'latin1') # u'Don\xc2\x92t' This looks like calling random methods with random parameters :) Python is able to return you unicode file names right away, you just need to pass input parameters as unicode strings: >>> os.listdir(u"/") [u'alarm', u'ARCSOFT' ...] So in your case you need to make sure the start directory parameter for walk function is unicode. -- http://mail.python.org/mailman/listinfo/python-list