* Alexander Karelas <ak...@zoo.gr> [2016-07-04 21:48]: > The same question applies to parsing: should the XML documents that > the module parses be byte strings or character strings?
An XML document must be bytes, because it specifies its encoding in the <?xml?> at the top (even if only implicitly) and that makes no sense any other way. But an XML fragment must be characters because text in XML is Unicode and fragments do not have an encoding. But this gets a little metaphysical when you deal with concrete data because the XML PI is optional. You can’t distinguish XML fragments from XML documents just by looking at them. It’s like a string that sticks to ASCII: is that bytes or characters? The distinction is not in the data, it’s in programmer intent behind the code that handles the data… but you have to keep that in mind to write code that actually works correctly. (Which is to say we’re talking about types. The type is not in the data. This is where an actual type system helps – having one means you can express that concretely.) So the I-don’t-believe-in-abstractions answer is… just allow the user to get the data as both characters and bytes, and make them say which one. For that case I would argue that the default ought to be bytes. The more abstractionista answer would be if the user can ask for a node to be rendered as an XML fragment; in that case, to get characters they must ask for the document element rendered to a string, and if they ask for the whole document they always get bytes. Regards, -- Aristotle Pagaltzis // <http://plasmasturm.org/>