Re: sax barfs on unicode filenames

2006-10-04 Thread Martin v. Löwis
Fredrik Lundh schrieb: > Martin v. Löwis wrote: > >> Yes. While you can pass Unicode strings as file names to many Python >> functions, you can't pass them to Expat, as Expat requires the file name >> as a byte string. Hence the error. > > sounds like a bug in the xml.sax layer, really (ET also u

Re: sax barfs on unicode filenames

2006-10-04 Thread Fredrik Lundh
Martin v. Löwis wrote: > Yes. While you can pass Unicode strings as file names to many Python > functions, you can't pass them to Expat, as Expat requires the file name > as a byte string. Hence the error. sounds like a bug in the xml.sax layer, really (ET also uses Expat, and doesn't seem to ha

Re: sax barfs on unicode filenames: workaround

2006-10-04 Thread Martin v. Löwis
Edward K. Ream schrieb: > Happily, the workaround is easy. Replace theFile with: > > # Use cStringIo to avoid a crash in sax when inputFileName has unicode > characters. > s = theFile.read() > theFile = cStringIO.StringIO(s) > > My first attempt at a workaround was to use: > > s = theFile.read

Re: sax barfs on unicode filenames

2006-10-04 Thread Martin v. Löwis
Fredrik Lundh schrieb: > Diez B. Roggisch wrote: > >> Filenames are expected to be bytestrings. So what happens is that the >> unicode string you pass as filename gets implicitly converted using the >> default encoding. > > it is ? Yes. While you can pass Unicode strings as file names to many Py

Re: sax barfs on unicode filenames: workaround

2006-10-04 Thread Edward K. Ream
Happily, the workaround is easy. Replace theFile with: # Use cStringIo to avoid a crash in sax when inputFileName has unicode characters. s = theFile.read() theFile = cStringIO.StringIO(s) My first attempt at a workaround was to use: s = theFile.read() parser.parseString(s) but the expat pars

Re: sax barfs on unicode filenames

2006-10-04 Thread John Machin
Diez B. Roggisch wrote: > Edward K. Ream wrote: > > > Hi. Presumably this is a easy question, but anyone who understands the > > sax docs thinks completely differently than I do :-) > > > > > > > > Following the usual cookbook examples, my app parses an open file as > > follows:: > > > > > > > >

Re: sax barfs on unicode filenames

2006-10-04 Thread Edward K. Ream
> Filenames are expected to be bytestrings. The exception happens in a method to which no fileName is passed as an argument. parse_leo_file: 'C:\\prog\\tigris-cvs\\leo\\test\\unittest\\chinese?folder\\chinese?test.leo' (trace of converted fileName) Unexpected exception parsing C:\prog\tigris

Re: sax barfs on unicode filenames

2006-10-04 Thread Fredrik Lundh
Diez B. Roggisch wrote: > Filenames are expected to be bytestrings. So what happens is that the > unicode string you pass as filename gets implicitly converted using the > default encoding. it is ? >>> f = open(u"\u8116", "w") >>> f.write("hello") >>> f.close() >>> f = open(u"\u8116", "r") >>>

Re: sax barfs on unicode filenames

2006-10-04 Thread Diez B. Roggisch
Edward K. Ream wrote: > Hi. Presumably this is a easy question, but anyone who understands the > sax docs thinks completely differently than I do :-) > > > > Following the usual cookbook examples, my app parses an open file as > follows:: > > > > parser = xml.sax.make_parser() > > parser.s