K.S.Sreeram wrote: > Fredrik Lundh wrote: > > both ElementTree and cElementTree support "sax-style" event generation > > (through XMLTreeBuilder/XMLParser) and incremental parsing (through > > iterparse). the cElementTree versions of these are even faster than > > pyexpat. > > > > the iterparse interface is described here: > > > > http://effbot.org/zone/element-iterparse.htm > > > Thats cool! Thanks for the info! > > For a multi-gigabyte file, I would still recommend C/C++, because the > processing code which sits on top of the XML library needs to be Python, > and that could turn out to be a significant overhead in such extreme cases. > > Of course, the exact strategy to follow would depend on the specifics of > the case, and all this speculation may not really apply! :)
Honestly, i think that legitimate use-cases for multi-gigabyte XML are very rare. Many people abuse XML as some sort of DBMS replacement. This abuse is part of the reason why so many developers are hostile to XML. XML is best for documents, and documents can get to the multi-gigabyte range, but rarely do. Usually, when they do, there is a logical way to decompose them, process them, and re-compose them, whereas with XML used as a DBMS replacement, relations and datatyping complicate such natural divide-and-conquer techniques. I always say that if you're dealing with gigabyte XML, it's well worth considering whether you're not using a hammer to screw in a bolt. If monster XML is inevitable, then I extend's Fredrik earlier mention of Amara to say that Pushdom allows you to pre-declare the chunks of XML you're interested in, and then it processes the XML in streaming mode, only instantiating the chunks of interest one at a time. This allows for handling of huge files with a very simple programming idiom. http://uche.ogbuji.net/tech/4suite/amara/ -- Uche Ogbuji Fourthought, Inc. http://uche.ogbuji.net http://fourthought.com http://copia.ogbuji.net http://4Suite.org Articles: http://uche.ogbuji.net/tech/publications/ -- http://mail.python.org/mailman/listinfo/python-list