On Thu, 10 Feb 2011 07:22:55 -0800 (PST)
Marko Topolnik <marko.topol...@gmail.com> wrote:

> I am required to process a huge XML file with 300,000 records. The
> structure is like this:
> 
> <root>
>   <header>
>     ....
>   </header>
>   <body>
>     <record>...</record>
>     <record>...</record>
>     ... 299,998 more
>   </body>
> </root>
> 
> Obviously, it is of key importance not to allocate memory for all the
> records at once.

I don't think it's obvious. Maybe I'm missing something? Like - how
big are the records? If they're less than 1K, that's at most 300 meg
in core - which is large, but not impossible on modern hardware. I've
been handling .5G data structures in core for the last few years (in
Python, anyway). I've run into at least one stupid garbage collector
that insisted on scanning such structures even though they weren't
changing, which pretty much killed performance. Maybe you have a "fast
startup" requirement, which building the initial data structure would
kill. Maybe something else?

      Thanks,
      <mike
-- 
Mike Meyer <m...@mired.org>             http://www.mired.org/consulting.html
Independent Network/Unix/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
   

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to