"Adam Tauno Williams" <awill...@whitemice.org> wrote in message
news:mailman.287.1293319780.6505.python-l...@python.org...
On Sat, 2010-12-25 at 22:34 +0000, Nobody wrote:
On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote:
>> XML works extremely well for large datasets.
One advantage it has over many legacy formats is that there are no
inherent 2^31/2^32 limitations. Many binary formats inherently cannot
support files larger than 2GiB or 4Gib due to the use of 32-bit offsets
in
indices.
And what legacy format has support for code pages, namespaces, schema
verification, or comments? None.
> Of course, one advantage of XML is that with so much redundant text, it
> compresses well. We typically see gzip compression ratios of 20:1.
> But, that just means you can archive them efficiently; you can't do
> anything useful until you unzip them.
XML is typically processed sequentially, so you don't need to create a
decompressed copy of the file before you start processing it.
Yep.
If file size is that much of an issue,
Which it isn't.
Only if you're prepared to squander resources that could be put to better
use.
XML is so redundant, anyone (even me :-) could probably spend an afternoon
coming up with a compression scheme to reduce it to a fraction of it's size.
It can even be an custom format, provided you also send along the few dozen
lines of Python (or whatever language) needed to decompress. Although if
it's done properly, it might be possible to create an XML library that works
directly on the compressed format, and as a plug-in replacement for a
conventional library.
That will likely save time and memory.
Anyway there seem to be existing schemes for binary XML, indicating some
people do think it is an issue.
I'm just concerned at the waste of computer power (I used to think HTML was
bad, for example repeating the same long-winded font name hundreds of times
over in the same document. And PDF: years ago I was sent a 1MB document for
a modem; perhaps some substantial user manual for it? No, just a simple
diagram showing how to plug it into the phone socket!).
--
Bartc
--
http://mail.python.org/mailman/listinfo/python-list