[EMAIL PROTECTED] wrote: > Thanks guys for all your posts... > > So I am a bit confused....Fuzzy, the code I saw looks like it > decompresses as a stream (i.e. per byte). Is this the case or are you > just compressing for file storage but the actual data set has to be > exploded in memory? >
it wasn't my code. if you zip the 10GB and read from the zip into a DOM style tree, you haven't gained anything, except adding additional CPU requirements to do the decompression. You still have to load the entire thing into memory. There are differences in XML Parsers, IN EVERY LANGUAGE a poorly written parser is a poorly written parser. Using the wrong IDIOM is more of a problem than anything else. DOM parsers are good when you need to read and process every element and attribute and the data is "small". Granted, "small" is relative, but no body will consider 10GB "small". SAX style or a pull-parser has to be used when the data is "large" or when you don't really need to process every element and attribute. This problem looks like it is just a data export / import problem. In that case you will either have to use a sax style parser and parse the 10GB file. Or as I suggested in another reply, export the data in smaller chunks and process them separately, which in almost EVERY case is a better solution to do batch processing. You should always break processing up into as many discreet steps as possible. Make for easier debugging and you can start over in the middle much easier. Even if you just write a simple SAX style parser to just break the file up into smaller pieces to actually process it you will be ahead of the game. We have systems that process streaming data coming from sockets in XML format, that run in Java with very little memory footprint and very little CPU usage. At 50 megabit a sec, that is about 4TB a day. C wouldn't read from a socket any faster than the NBIO, actually it would be harder to get the same performance in C because we would have to duplicate all the SEDA style NBIO. -- http://mail.python.org/mailman/listinfo/python-list