Re: How to ask sax for the file encoding

2006-10-04 Thread Fredrik Lundh
Martin v. Löwis wrote: > A common problem is to save the data in the same encoding that they > original had; this is what an editor typically does (you may know > Edward Ream for writing editors). XML parsers are notoriously bad > in supporting editors. There are too many lexical details that may

Re: How to ask sax for the file encoding

2006-10-04 Thread Martin v. Löwis
Irmen de Jong schrieb: > As others have tried to explain, the encoding in the xml header is > not part of the document data itself, it says something about the data. > It would be a bad design decision imo to rely on this meta information > if you really meant that information to be part of the dat

Re: How to ask sax for the file encoding

2006-10-04 Thread Irmen de Jong
Edward K. Ream wrote: >> Please consider adding some elements to the document itself that > describe the desired output format, > > Well, that's what the encoding field in the xml line was supposed to do. As others have tried to explain, the encoding in the xml header is not part of the document

Re: How to ask sax for the file encoding

2006-10-04 Thread Martin v. Löwis
Edward K. Ream schrieb: > Can anyone tell me how the content handler can determine the encoding of the > file? Can sax provide this info? That's not supported in SAX. If you use Expat directly (module pyexpat), you can set the XmlDeclHandler, which is called when the XML declaration is received

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
> Try this: [snip] Parser.XmlDeclHandler = self.XmlDecl [snip] Excellent! Thanks so much. Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreamleo/front.html -

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
> the encoding isn't *in* the XML file, it's an artifact of the > serialization model used for a specific XML infoset. the XML > data is pure Unicode. Sorry, but no. The *file* is what I am talking about, and the way it is encoded does, in fact, really make a difference to some users. They ha

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
> Please consider adding some elements to the document itself that describe the desired output format, Well, that's what the encoding field in the xml line was supposed to do. Not a bad idea though, except it changes the file format, and I would really rather not do that. Edward ---

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
> are you expecting your users to write XML by hand? Of course not. Leo has the following option: @string new_leo_file_encoding = utf-8 Edward Edward K. Ream email: [EMAIL PROTECTED] Leo: http://webpages.charter.net/edreaml

Re: How to ask sax for the file encoding

2006-10-04 Thread Fredrik Lundh
Edward K. Ream wrote: > What suits me best is what the *user* specified, and that got put in the > first xml line. are you expecting your users to write XML by hand? ouch. -- http://mail.python.org/mailman/listinfo/python-list

Re: How to ask sax for the file encoding

2006-10-04 Thread Fredrik Lundh
Edward K. Ream wrote: > I'm asking this question because my app needs it :-) Imo, there is *no* > information in any xml file that can be considered irrelvant. the encoding isn't *in* the XML file, it's an artifact of the serialization model used for a specific XML infoset. the XML data is pu

Re: How to ask sax for the file encoding

2006-10-04 Thread Irmen de Jong
Edward K. Ream wrote: > What suits me best is what the *user* specified, and that got put in the > first xml line. > I'm going to have to parse this line myself. Please consider adding some elements to the document itself that describe the desired output format, such as: ... utf-8 ... Thi

Re: How to ask sax for the file encoding

2006-10-04 Thread Rob Wolfe
"Edward K. Ream" <[EMAIL PROTECTED]> writes: > Can anyone tell me how the content handler can determine the encoding of the > file? Can sax provide this info? Try this: from xml.parsers import expat s = """ Title Chapter 1 """ class MyParser(object): def XmlDecl(self, version, encodin

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
> The encoding _is_ irrelevant, in the very moment you get unicode strings. We shall have to disagree about this. My use case is perfectly reasonable, imo. > If you write out xml again, use whatever encoding suits you best. What suits me best is what the *user* specified, and that got put in t

Re: How to ask sax for the file encoding

2006-10-04 Thread Diez B. Roggisch
Edward K. Ream wrote: >> [The value of the encoding field] _could_ be retained, but for what >> purpose? > > I'm asking this question because my app needs it :-) > Imo, there is *no* > information in any xml file that can be considered irrelvant. It sure is! The encoding _is_ irrelevant, in

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
> [The value of the encoding field] _could_ be retained, but for what > purpose? I'm asking this question because my app needs it :-) Imo, there is *no* information in any xml file that can be considered irrelvant. My app will want to know the original encoding when writing the file. Edward

Re: How to ask sax for the file encoding

2006-10-04 Thread Diez B. Roggisch
Edward K. Ream wrote: >>> Can anyone tell me how the content handler can determine the encoding of >>> the file? Can sax provide this info? > >> there is no encoding on the "inside" of an XML document; it's all >> Unicode. > > True, but sax is reading the file, so sax is producing the unicode,

Re: How to ask sax for the file encoding

2006-10-04 Thread Fredrik Lundh
Edward K. Ream wrote: > > > so it would seem reasonable for sax to be able to return 'utf-8' somehow. why? that's an encoding detail, and should be completely irrelevant for your application. > Am I missing something? you're confusing artifacts of an external serialization format with the act

Re: How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
>> Can anyone tell me how the content handler can determine the encoding of >> the file? Can sax provide this info? > there is no encoding on the "inside" of an XML document; it's all Unicode. True, but sax is reading the file, so sax is producing the unicode, so it should (must) be able to de

Re: How to ask sax for the file encoding

2006-10-04 Thread Fredrik Lundh
Edward K. Ream wrote: > Can anyone tell me how the content handler can determine the encoding of the > file? Can sax > provide this info? there is no encoding on the "inside" of an XML document; it's all Unicode. -- http://mail.python.org/mailman/listinfo/python-list

How to ask sax for the file encoding

2006-10-04 Thread Edward K. Ream
Following the usual cookbook examples, my app parses an open file as follows:: parser = xml.sax.make_parser() parser.setFeature(xml.sax.handler.feature_external_ges,1) # Hopefully the content handler can figure out the encoding from the element. handler = saxContentHandler(c,inputFileName,