On Sun, 26 Dec 2010 01:05:53 +0000, Tim Harig wrote: >> XML is typically processed sequentially, so you don't need to create a >> decompressed copy of the file before you start processing it. > > Sometimes XML is processed sequentially. When the markup footprint is > large enough it must be. Quite often, as in the case of the OP, you only > want to extract a small piece out of the total data. In those cases, > being forced to read all of the data sequentially is both inconvenient and > and a performance penalty unless there is some way to address the data you > want directly.
Actually, I should have said "must be processed sequentially". Even if you only care about a small portion of the data, you have to read it sequentially to locate that portion. IOW, anything you can do with uncompressed XML can be done with compressed XML; you can't do random access with either. If XML has a drawback over application-specific formats, it's the sequential nature of XML rather than its (uncompressed) size. OTOH, formats designed for random access tend to be more limited in their utility. You can only perform random access based upon criteria which match the format's indexing. Once you step outside that, you often have to walk the entire file anyhow. -- http://mail.python.org/mailman/listinfo/python-list