That is actually
> not an apostrophe, but ASCII char 180: ยด
It's actually Unicode char #180, not ASCII. ASCII characters are in
0..127 range.
Yep, that's what I ment... :D
> In the Itunes library it is encoded as: Don%E2%80%99t
Looks like a utf-8 encoded string, then encoded like an url.
It is. I just found out it is unicode character 2019. So in the Itunes library it is not unicode char 180, but it looks exactly the same...
> I do some some conversions with both the library path names and the folder
> path names. Here is the code:
> (in the comment I dispay how the Don't part looks. I got this using print
> repr(filename))
> -------------------------------------------------------------
> #Once I have the filenames from the library I clean them using the following
> code (as filenames are in the format '
> file://localhost/m:/music/track%20name.mp3')
>
> filename = urlparse.urlparse(filename)[2][1:] # u'Don%E2%80%99t' ; side
> question, anybody who nows a way to do this in a more fashionable way?
> filename = urllib.unquote (filename) # u'Don\xe2\x80\x99t'
This doesn't work for me in python 2.4, unquote expects str type, not
unicode. So it should be:
filename = urllib.unquote(filename.encode('ascii')).decode('utf-8')
It works for me with python 2.4.3. It returns a unicode string.
> filename = os.path.normpath(filename) # u'Don\xe2\x80\x99t'
>
> I get the files in my music folder with the os.walk method and then
> I do:
>
> filename = os.path.normpath(os.path.join (root,name)) # 'Don\x92t'
> filename = unicode(filename,'latin1') # u'Don\x92t'
> filename = filename.encode('utf-8') # 'Don\xc2\x92t'
> filename = unicode(filename,'latin1') # u'Don\xc2\x92t'
This looks like calling random methods with random parameters :)
It is... Well, not totally random. I figured I needed a unicode string to be able to encode it to utf-8 (otherwise it gives an error). After that is appears not to be a unicode string anymore(no u in front of it), so I decided to unicode it again....
It worked, but I now accomplish the same by just the encode line and the following:
Python is able to return you unicode file names right away, you just
need to pass input parameters as unicode strings:
>>> os.listdir(u"/")
[u'alarm', u'ARCSOFT' ...]
So in your case you need to make sure the start directory parameter
for walk function is unicode.
That does not matter much for me. Then I will have to convert the path name to unicode, as it is user input. (ok, it still saves me converting a string to unicode a thousand times, so I'll do it :)
Now I know where the problem lies. The character in the actual file path is u+00B4 (Acute accent) and in the Itunes library it is u+2019 (a right curly quote). Somehow Itunes manages to make these two the same...?
As it is the only file that gave me trouble, I changed the accent in the file to an apostrophe and re-imported it in Itunes. But I would like to hear if there is a solution for this problem?
-- http://mail.python.org/mailman/listinfo/python-list