Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Martin v. Löwis
>Has it been decided how Python 3.0 will implement os.listdir on Unix? > Will there be only a single attempt to encode using the current locale > or will there be a backup technique? That's what it currently does. > I'd probably define an optional > encoding parameter so you can ask for os.li

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Neil Hodgson
Martin v. Löwis: > That's not true. Try open("\xff","w"), then try interpreting the file > name as UTF-8. Some byte strings are not meaningful UTF-8, hence that > approach cannot work. Has it been decided how Python 3.0 will implement os.listdir on Unix? Will there be only a single attempt t

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Martin v. Löwis
> Ah. Can one call it after the full call has been done: > locale.setlocale(locale.LC_ALL,'') > locale.setlocale(locale.LC_ALL) > Without any issues? If you pass LC_ALL, then some systems will give you funny results (semicolon-separated enumerations of all the categoryies). Instead, pick a specifi

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Donn
> You get the full locale name with locale.setlocale(category) (i.e. > without the second argument) Ah. Can one call it after the full call has been done: locale.setlocale(locale.LC_ALL,'') locale.setlocale(locale.LC_ALL) Without any issues? > > I need that two-letter code that's hidden in a > > t

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Martin v. Löwis
> Given that getlocale() is not to be used, what's the best way to get the > locale later in the app? You get the full locale name with locale.setlocale(category) (i.e. without the second argument) > I need that two-letter code that's hidden in a > typical locale like en_ZA.utf8 -- I want that

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Donn
Given that getlocale() is not to be used, what's the best way to get the locale later in the app? I need that two-letter code that's hidden in a typical locale like en_ZA.utf8 -- I want that 'en' part. BTW - things are hanging-together much better now, thanks to your info. I have it running in

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Martin v. Löwis
> I have decided to keep the test for a decode error because files created > under > different locales should not be written-to under the current one. I don't > know if one can mix encodings in a single text file, but I don't have time to > find out. Of course it's *possible*. However, you nee

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-14 Thread Donn
> Can you please type > paf = ['/home/donn/.fontypython/M\xc3\x96gul.pog'] > f = open(paf, "r") I think I was getting a ghost error from another try somewhere higher up. You are correct, this does open the file - no matter what the locale is. I have decided to keep the test for a decode error be

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
> Now you are mixing two important concepts - the *contents* > of the file with the *name* of the file. Then I suspect the error may be due to the contents having been written in utf8 from previous runs. Phew! It's bedtime on my end, so I'll try it again when I get a chance during the week. Th

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
> Now, I want to open that file from Python, and I create a path with > os.path.join() and an os.listdir() which results in this byte string: > paf = ['/home/donn/.fontypython/M\xc3\x96gul.pog'] > > I *think* that the situation is impossible because the system cannot resolve > the correct filena

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
Well, that didn't take me long... Can you help with this situation? I have a file named "MÖgul.pog" in this directory: /home/donn/.fontypython/ I set my LANG=C Now, I want to open that file from Python, and I create a path with os.path.join() and an os.listdir() which results in this byte string

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
Martin, I want to thank you for your patience, you have been sterling. I have an overview this evening that I did not have this morning. I have started fixing my code and the repairs may not be that extreme after all. I'll hack-on and get it done. I *might* bug you again, but I'll resist at all

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
> What happens if there is a filename that cannot be represented in it's > entirety? i.e. every character is 'replaced'. Does it simply vanish, or does > it appear as "?" ? :) The latter. I did open(u"\u20ac\u20ac","w") in an UTF-8 locale, then did "LANG=C ls", and it gave me ?? (as

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
> No. It may use replacement characters (i.e. a question mark, or an empty > square box), but if you don't see such characters, then the terminal has > successfully decoded the file names. Whether it also correctly decoded > them is something for you to check (i.e. do they look right?) Okay. So, t

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
> Could it not be that the app doing the output (say konsole) could be > displaying a filename as best as it can (doing the ignore/replace) trick and > using whatever fonts it can reach) and this would disguise the situation? No. It may use replacement characters (i.e. a question mark, or an emp

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
> If you can all ls them, and if the file names come out right, then > they'll have the same encoding. Could it not be that the app doing the output (say konsole) could be displaying a filename as best as it can (doing the ignore/replace) trick and using whatever fonts it can reach) and this woul

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
> I guess I'm confused by that. I can ls them, so they appear and thus have > characters displayed. I can open and cat them and thus the O/S can access > them, but I don't know whether their characters are strictly in ascii-limits > or drawn from a larger set like unicode. I mean, I have seen Ja

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
> So on *your* system, today: what encoding are the filenames encoded in? > We are not talking about arbitrary files, right, but about font files? > What *actual* file names do these font files have? > > On my system, all font files have ASCII-only file names, even if they > are for non-ASCII chara

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
>> I would advise against such a strategy. Instead, you should first >> understand what the encodings of the file names actually *are*, on >> a real system, and draw conclusions from that. > I don't follow you here. The encoding of file names *on* a real system are > (for Linux) byte strings of po

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
Martin, > Yes. It does so when it fails to decode the byte string according to the > file system encoding (which, in turn, bases on the locale). That's at least one way I can weed-out filenames that are going to give me trouble; if Python itself can't figure out how to decode it, then I can also

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
> I have found that os.listdir() does not always return unicode objects when > passed a unicode path. Sometimes "byte strings" are returned in the list, > mixed-in with unicodes. Yes. It does so when it fails to decode the byte string according to the file system encoding (which, in turn, bases

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Donn
Martin, Thanks, food for thought indeed. > On Unix, yes. On Windows, NTFS and VFAT represent file names as Unicode > strings always, independent of locale. POSIX file names are byte > strings, and there isn't any good support for recording what their > encoding is. I get my filenames from two sour

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-13 Thread Martin v. Löwis
> It seems to me that filenames are like snapshots of the locales where they > originated. On Unix, yes. On Windows, NTFS and VFAT represent file names as Unicode strings always, independent of locale. POSIX file names are byte strings, and there isn't any good support for recording what their e

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-12 Thread Donn
Martin, I really appreciate your reply. I have been working in a vacuum on this and without any experience. I hope you don't mind if I ask you a bunch of questions. If I can get over some conceptual 'humps' then I'm sure I can produce a better app. > That's a bug in the app. It shouldn't assume

Re: LANG, locale, unicode, setup.py and Debian packaging

2008-01-12 Thread Martin v. Löwis
> 2. If this returns "C" or anything without 'utf8' in it, then things start > to go downhill: > 2a. The app assumes unicode objects internally. i.e. Whenever there is > a "string like this" in a var it's supposed to be unicode. Whenever > something comes into the app (from a filename, a file's c

LANG, locale, unicode, setup.py and Debian packaging

2008-01-12 Thread Donn Ingle
Hello, I hope someone can illuminate this situation for me. Here's the nutshell: 1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale. 2. If this returns "C" or anything without 'utf8' in it, then things start to go downhill: 2a. The app assumes unicode objects internally. i.e.