>> I would advise against such a strategy. Instead, you should first >> understand what the encodings of the file names actually *are*, on >> a real system, and draw conclusions from that. > I don't follow you here. The encoding of file names *on* a real system are > (for Linux) byte strings of potentially *any* encoding.
No. On a real system, nothing is potential, but everything is actual. So on *your* system, today: what encoding are the filenames encoded in? We are not talking about arbitrary files, right, but about font files? What *actual* file names do these font files have? On my system, all font files have ASCII-only file names, even if they are for non-ASCII characters. > os.listdir() may even > fail to grok some of them. So, I will have a few elements in a list that are > not unicode, I can't ask the O/S for any help and therefore I should be able > to pass that byte string to a function as suggested in the article to at > least take one last stab at identifying it. It won't identify it. It will just give you *some* Unicode string. > Or is that a waste of time because os.listdir() has already tried something > similar (and prob. better)? "better" is a difficult notion here. Is it better to produce some result, possibly incorrect, or is it better to give up? > I forgot to mention the command-line interface... I actually had trouble with > that too. The user can start the app like this: > fontypython /some/folder/ > or > fontypython SomeFileName > And that introduces input in some kind of encoding. I hope that > locale.getprefferedencoding() will be the right one to handle that. If the user has set up his machine correctly: yes. >> I see no problem with that: >>>>> u"M\xd6gul".encode("ascii","ignore") >> 'Mgul' >>>>> u"M\xd6gul".encode("ascii","replace") >> 'M?gul' > Well, that was what I expected to see too. I must have been doing something > stupid. Most likely, you did not invoke .encode on a Unicode string. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list