On Thu, 10 Feb 2011 07:22:55 -0800 (PST) Marko Topolnik <marko.topol...@gmail.com> wrote:
> I am required to process a huge XML file with 300,000 records. The > structure is like this: > > <root> > <header> > .... > </header> > <body> > <record>...</record> > <record>...</record> > ... 299,998 more > </body> > </root> > > Obviously, it is of key importance not to allocate memory for all the > records at once. I don't think it's obvious. Maybe I'm missing something? Like - how big are the records? If they're less than 1K, that's at most 300 meg in core - which is large, but not impossible on modern hardware. I've been handling .5G data structures in core for the last few years (in Python, anyway). I've run into at least one stupid garbage collector that insisted on scanning such structures even though they weren't changing, which pretty much killed performance. Maybe you have a "fast startup" requirement, which building the initial data structure would kill. Maybe something else? Thanks, <mike -- Mike Meyer <m...@mired.org> http://www.mired.org/consulting.html Independent Network/Unix/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en