On Sat, 2010-12-25 at 22:34 +0000, Nobody wrote: > On Sat, 25 Dec 2010 14:41:29 -0500, Roy Smith wrote: > >> XML works extremely well for large datasets. > One advantage it has over many legacy formats is that there are no > inherent 2^31/2^32 limitations. Many binary formats inherently cannot > support files larger than 2GiB or 4Gib due to the use of 32-bit offsets in > indices.
And what legacy format has support for code pages, namespaces, schema verification, or comments? None. > > Of course, one advantage of XML is that with so much redundant text, it > > compresses well. We typically see gzip compression ratios of 20:1. > > But, that just means you can archive them efficiently; you can't do > > anything useful until you unzip them. > XML is typically processed sequentially, so you don't need to create a > decompressed copy of the file before you start processing it. Yep. > If file size is that much of an issue, Which it isn't. > eventually we'll see a standard for > compressing XML. This could easily result in smaller files than using a > dedicated format compressed with general-purpose compression algorithms, > as a widely-used format such as XML merits more effort than any > application-specific format. Agree; and there actually already is a standard compression scheme - HTTP compression [supported by every modern web-server]; so the data is compressed at the only point where it matters [during transfer]. Again: "XML works extremely well for large datasets". -- http://mail.python.org/mailman/listinfo/python-list