>Has it been decided how Python 3.0 will implement os.listdir on Unix?
> Will there be only a single attempt to encode using the current locale
> or will there be a backup technique?
That's what it currently does.
> I'd probably define an optional
> encoding parameter so you can ask for os.li
Martin v. Löwis:
> That's not true. Try open("\xff","w"), then try interpreting the file
> name as UTF-8. Some byte strings are not meaningful UTF-8, hence that
> approach cannot work.
Has it been decided how Python 3.0 will implement os.listdir on
Unix? Will there be only a single attempt t
> Ah. Can one call it after the full call has been done:
> locale.setlocale(locale.LC_ALL,'')
> locale.setlocale(locale.LC_ALL)
> Without any issues?
If you pass LC_ALL, then some systems will give you funny results
(semicolon-separated enumerations of all the categoryies). Instead,
pick a specifi
> You get the full locale name with locale.setlocale(category) (i.e.
> without the second argument)
Ah. Can one call it after the full call has been done:
locale.setlocale(locale.LC_ALL,'')
locale.setlocale(locale.LC_ALL)
Without any issues?
> > I need that two-letter code that's hidden in a
> > t
> Given that getlocale() is not to be used, what's the best way to get the
> locale later in the app?
You get the full locale name with locale.setlocale(category) (i.e.
without the second argument)
> I need that two-letter code that's hidden in a
> typical locale like en_ZA.utf8 -- I want that
Given that getlocale() is not to be used, what's the best way to get the
locale later in the app? I need that two-letter code that's hidden in a
typical locale like en_ZA.utf8 -- I want that 'en' part.
BTW - things are hanging-together much better now, thanks to your info. I have
it running in
> I have decided to keep the test for a decode error because files created
> under
> different locales should not be written-to under the current one. I don't
> know if one can mix encodings in a single text file, but I don't have time to
> find out.
Of course it's *possible*. However, you nee
> Can you please type
> paf = ['/home/donn/.fontypython/M\xc3\x96gul.pog']
> f = open(paf, "r")
I think I was getting a ghost error from another try somewhere higher up. You
are correct, this does open the file - no matter what the locale is.
I have decided to keep the test for a decode error be
> Now you are mixing two important concepts - the *contents*
> of the file with the *name* of the file.
Then I suspect the error may be due to the contents having been written in
utf8 from previous runs. Phew!
It's bedtime on my end, so I'll try it again when I get a chance during the
week.
Th
> Now, I want to open that file from Python, and I create a path with
> os.path.join() and an os.listdir() which results in this byte string:
> paf = ['/home/donn/.fontypython/M\xc3\x96gul.pog']
>
> I *think* that the situation is impossible because the system cannot resolve
> the correct filena
Well, that didn't take me long... Can you help with this situation?
I have a file named "MÖgul.pog" in this directory:
/home/donn/.fontypython/
I set my LANG=C
Now, I want to open that file from Python, and I create a path with
os.path.join() and an os.listdir() which results in this byte string
Martin,
I want to thank you for your patience, you have been sterling. I have an
overview this evening that I did not have this morning. I have started fixing
my code and the repairs may not be that extreme after all.
I'll hack-on and get it done. I *might* bug you again, but I'll resist at all
> What happens if there is a filename that cannot be represented in it's
> entirety? i.e. every character is 'replaced'. Does it simply vanish, or does
> it appear as "?" ? :)
The latter. I did open(u"\u20ac\u20ac","w") in an UTF-8 locale, then did
"LANG=C ls", and it gave me ?? (as
> No. It may use replacement characters (i.e. a question mark, or an empty
> square box), but if you don't see such characters, then the terminal has
> successfully decoded the file names. Whether it also correctly decoded
> them is something for you to check (i.e. do they look right?)
Okay.
So, t
> Could it not be that the app doing the output (say konsole) could be
> displaying a filename as best as it can (doing the ignore/replace) trick and
> using whatever fonts it can reach) and this would disguise the situation?
No. It may use replacement characters (i.e. a question mark, or an emp
> If you can all ls them, and if the file names come out right, then
> they'll have the same encoding.
Could it not be that the app doing the output (say konsole) could be
displaying a filename as best as it can (doing the ignore/replace) trick and
using whatever fonts it can reach) and this woul
> I guess I'm confused by that. I can ls them, so they appear and thus have
> characters displayed. I can open and cat them and thus the O/S can access
> them, but I don't know whether their characters are strictly in ascii-limits
> or drawn from a larger set like unicode. I mean, I have seen Ja
> So on *your* system, today: what encoding are the filenames encoded in?
> We are not talking about arbitrary files, right, but about font files?
> What *actual* file names do these font files have?
>
> On my system, all font files have ASCII-only file names, even if they
> are for non-ASCII chara
>> I would advise against such a strategy. Instead, you should first
>> understand what the encodings of the file names actually *are*, on
>> a real system, and draw conclusions from that.
> I don't follow you here. The encoding of file names *on* a real system are
> (for Linux) byte strings of po
Martin,
> Yes. It does so when it fails to decode the byte string according to the
> file system encoding (which, in turn, bases on the locale).
That's at least one way I can weed-out filenames that are going to give me
trouble; if Python itself can't figure out how to decode it, then I can also
> I have found that os.listdir() does not always return unicode objects when
> passed a unicode path. Sometimes "byte strings" are returned in the list,
> mixed-in with unicodes.
Yes. It does so when it fails to decode the byte string according to the
file system encoding (which, in turn, bases
Martin,
Thanks, food for thought indeed.
> On Unix, yes. On Windows, NTFS and VFAT represent file names as Unicode
> strings always, independent of locale. POSIX file names are byte
> strings, and there isn't any good support for recording what their
> encoding is.
I get my filenames from two sour
> It seems to me that filenames are like snapshots of the locales where they
> originated.
On Unix, yes. On Windows, NTFS and VFAT represent file names as Unicode
strings always, independent of locale. POSIX file names are byte
strings, and there isn't any good support for recording what their
e
Martin,
I really appreciate your reply. I have been working in a vacuum on this and
without any experience. I hope you don't mind if I ask you a bunch of
questions. If I can get over some conceptual 'humps' then I'm sure I can
produce a better app.
> That's a bug in the app. It shouldn't assume
> 2. If this returns "C" or anything without 'utf8' in it, then things start
> to go downhill:
> 2a. The app assumes unicode objects internally. i.e. Whenever there is
> a "string like this" in a var it's supposed to be unicode. Whenever
> something comes into the app (from a filename, a file's c
Hello,
I hope someone can illuminate this situation for me.
Here's the nutshell:
1. On start I call locale.setlocale(locale.LC_ALL,''), the getlocale.
2. If this returns "C" or anything without 'utf8' in it, then things start
to go downhill:
2a. The app assumes unicode objects internally. i.e.
26 matches
Mail list logo