Duncan Booth wrote:
Windows (when using NTFS) stores all the filenames in unicode, and Python
uses the unicode api to implement listdir (when given a unicode path). This
means that the filename never gets encoded to a byte string either by the
OS or Python. If you use a byte string path than the
Duncan Booth wrote:
> Martin v. Löwis wrote:
>
> > Serge Orlov wrote:
> >> Shouldn't os.path.join do that? If you pass a unicode string
> >> and a byte string it currently tries to convert bytes to
> >> characters
> >> but it makes more sense to convert the unicode string to bytes
> >> and return t
Martin v. Löwis wrote:
> Serge Orlov wrote:
>> Shouldn't os.path.join do that? If you pass a unicode string
>> and a byte string it currently tries to convert bytes to characters
>> but it makes more sense to convert the unicode string to bytes
>> and return two byte strings concatenated.
>
> Sou
On Wed, Feb 23, 2005 at 10:07:19PM +0100, "Martin v. Löwis" wrote:
> So we have three options:
> 1. skip this string, only return the ones that can be
>converted to Unicode. Give the user the impression
>the file does not exist.
> 2. return the string as a byte string
> 3. refuse to listdir
Serge Orlov wrote:
Shouldn't os.path.join do that? If you pass a unicode string
and a byte string it currently tries to convert bytes to characters
but it makes more sense to convert the unicode string to bytes
and return two byte strings concatenated.
Sounds reasonable. OTOH, this would be the onl
"Martin v. Löwis" wrote:
>> My goal is to build generalized code that consistently works with all
>> kinds of filenames.
>
> Then it is best to drop the notion that file names are
> character strings (because some file names aren't). You
> do so by converting your path variable into a byte
> string
Kenneth Pronovici wrote:
1) Why LC_ALL has any effect on the os.listdir() result?
The operating system (POSIX) does not have the inherent notion
that file names are character strings. Instead, in POSIX, file
names are primarily byte strings. There are some bytes which
are interpreted as charact
Kenneth Pronovici wrote:
I think that I can solve my problem by just converting any unicode
strings from configuration into utf-8 simple strings using encode().
Using this solution, all of my existing regression tests still pass, and
my code seems to make it past the unusual directory.
See my other
On Wed, Feb 23, 2005 at 01:03:56AM -0600, Kenneth Pronovici wrote:
[snip]
> Today, I accidentally ran across a directory containing three "normal"
> files (with ASCII filenames) and one file with a two-character unicode
> filename. My code, which was doing something like this:
>
>for entry