Paul McGuire schrieb: > meat of the data can be relatively small. Note also that this XML overhead > is directly related to the verbosity of the XML designer's choice of tag > names, and whether the designer was predisposed to using XML elements over > attributes. Imagine a record structure for a 3D coordinate point (described > here in no particular coding language):
> struct ThreeDimPoint: > xValue : integer, > yValue : integer, > zValue : integer > Directly translated to XML gives: > <ThreeDimPoint> > <xValue>4</xValue> > <yValue>5</yValue> > <zValue>6</zValue> > </ThreeDimPoint> This is essentially true, but should not cause the OP's problem. After parsing, the overhead of XML is gone, and long tag names are nothing but pointers to a string which happens to be long (unless *all* tags in the XML are differently named, which would cause a huge DTD/XSD as well). > This expands 3 integers to a whopping 101 characters. Throw in namespaces > for good measure, and you inflate the data even more. In the DOM, it contracts to 3 integers and a few pointers - essentially the same as needed in a reasonably written data structure. > Try zipping your 10Gb file, and see what kind of compression you get - I'll > bet it's close to 30:1. If so, convert the data to a real data storage In this case, his DOM (or whatever equivalent data structure, i.e. that what he *must* process) would be 300 MB + pointers. I'd even go as far and say that the best thing that can happen to him is a huge overhead - this would mean he has a little data in a rather spongy file (which collapses on parsing). > medium. Even a SQLite database table should do better, and you can ship it > around just like a file (just can't open it up like a text file). A table helps only if the data is tabular (i.e. a single relation), i.e. probably never (otherwise the sending side would have shipped something like CSV). Ralf -- http://mail.python.org/mailman/listinfo/python-list