> So on *your* system, today: what encoding are the filenames encoded in? > We are not talking about arbitrary files, right, but about font files? > What *actual* file names do these font files have? > > On my system, all font files have ASCII-only file names, even if they > are for non-ASCII characters. I guess I'm confused by that. I can ls them, so they appear and thus have characters displayed. I can open and cat them and thus the O/S can access them, but I don't know whether their characters are strictly in ascii-limits or drawn from a larger set like unicode. I mean, I have seen Japanese characters in filenames on my system, and that can't be ascii.
You see, I have a large collection of fonts going back over 10 years and they came from usenet years ago and so have filenames mangled all to hell. I can't always *type* some of their names and have to use copy/paste to, for example, ls one of them. Again, it's working from ignorance (my own) : I assume filenames in different countries will be in character sets that I have never (nor will I ever) see. But I have to cover them somehow. > > Or is that a waste of time because os.listdir() has already tried > > something similar (and prob. better)? > "better" is a difficult notion here. Is it better to produce some > result, possibly incorrect, or is it better to give up? I think I see, combined with your previous advice - I will keep byte strings alongside unicode and where I can't get to the unicode for that string, I will keep an 'ignore' or 'replace' unicode, but I will still have the byte string and will access the file with that anyway. > If the user has set up his machine correctly: yes. Meaning, I am led to assume, the LANG variable primarily? \d -- http://mail.python.org/mailman/listinfo/python-list