I am required to process a huge XML file with 300,000 records. The
structure is like this:

<root>
  <header>
    ....
  </header>
  <body>
    <record>...</record>
    <record>...</record>
    ... 299,998 more
  </body>
</root>

Obviously, it is of key importance not to allocate memory for all the
records at once. If I do this:

(use ['clojure.contrib.lazy-xml :only ['parse-trim]])
(use ['clojure.java.io :only ['reader]])

(-> (parse-trim (reader "huge.xml"))
     :content
     second
     :tag)

This should only parse the start-tag <body>, but it parses all the way
down to </body> -- at least it tries to, failing with
OutOfMemoryError.

Am I wrong in expecting the entire contents of body not to be
parsed? :content is supposed to be a lazy seq, so even if I access its
head, it should still not parse more than just the first <record>
element, right?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to