Daniel Molina Wegener wrote: > Stefan Behnel <stefan...@behnel.de> > on Sunday 19 April 2009 02:25 > wrote in comp.lang.python: > > >> Daniel Molina Wegener wrote: >>> * Every serilization is made into unicode objects. >> Hmm, does that mean that when I serialise, I get a unicode object back? >> What about the XML declaration? How can a user create well-formed XML from >> your output? Or is that not the intention? > > Yes, if you serialize an object you get an XML string as > unicode object, since unicode objects supports UTF-8 and > some other encodings.
That's not what I meant. I was wondering why you chose to use a unicode string instead of a byte string (which XML is defined for). If your only intention is to deserialise the unicode string into a tree, that may be acceptable. However, as soon as you start writing the data to a file or through a network pipe, or pass it to an XML parser, you'd better make it well-formed XML. So you either need to encode it as UTF-8 (for which you do not need a declaration), or you will need to encode it in a different byte encoding, and then prepend a declaration yourself. In any case, this is a lot more overhead (and cumbersome for users) than writing out a correctly serialised byte string directly. You seemed to be very interested in good performance, so I don't quite understand why you want to require an additional step with a relatively high performance impact that only makes it harder for users to use the tool correctly. Stefan -- http://mail.python.org/mailman/listinfo/python-list