* Alexander Karelas <ak...@zoo.gr> [2016-07-04 21:48]:
> The same question applies to parsing: should the XML documents that
> the module parses be byte strings or character strings?

An XML document must be bytes, because it specifies its encoding in
the <?xml?> at the top (even if only implicitly) and that makes no sense
any other way.

But an XML fragment must be characters because text in XML is Unicode
and fragments do not have an encoding.

But this gets a little metaphysical when you deal with concrete data
because the XML PI is optional. You can’t distinguish XML fragments from
XML documents just by looking at them.

It’s like a string that sticks to ASCII: is that bytes or characters?
The distinction is not in the data, it’s in programmer intent behind the
code that handles the data… but you have to keep that in mind to write
code that actually works correctly. (Which is to say we’re talking about
types. The type is not in the data. This is where an actual type system
helps – having one means you can express that concretely.)

So the I-don’t-believe-in-abstractions answer is… just allow the user to
get the data as both characters and bytes, and make them say which one.
For that case I would argue that the default ought to be bytes.

The more abstractionista answer would be if the user can ask for a node
to be rendered as an XML fragment; in that case, to get characters they
must ask for the document element rendered to a string, and if they ask
for the whole document they always get bytes.

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

Reply via email to