As far as I know, using zippers like that will need the whole XML data
structure to be in memory.  data.xml returns fast because it's lazy (uses
pull parsing).  Until you start traversing down the structure, it won't
parse more of it.  data.xml should also be fully streaming, so it shouldn't
require the full 500 MB XML file in memory unless you're doing something to
require that.

Traversing the structure that data.xml emits directly should not consume
heap, but you wouldn't be able to use XPath.  I've not used it, but there
is an XPath wrapper library here: https://github.com/kyleburton/clj-xpath.
Briefly looking at the code, it looks like it's using DOM parsing, so it
would consume heap. You could bump your max heap (-Xmx from the command
line) if you had the extra memory and weren't worried about the docs
getting larger.

-Ryan


On Tue, Dec 17, 2013 at 4:57 AM, Peter Ullah <peterul...@gmail.com> wrote:

>
> Hi all,
>
> I'm attempting to parse a large (500MB) XML, specifically I am trying to
> extract various parts using XPath. I've been using the examples presented
> here:
> http://clojure-doc.org/articles/tutorials/parsing_xml_with_zippers.html
> and all was going when tested against small files, however now that I am
> using the larger file Fireplace/Vim just hangs and my laptop gets hot then
> I get a memory exception.
>
> I've been playing around with various other libraries such as
> clojure.data.xml and and found that the following works perfectly well for
> parsing... but when I come to search inside root, things start to snarl up
> again.
>
> (ns example.core
>   (:require [clojure.java.io :as java.io]
>             [clojure.data.xml :as data.xml]
>             ))
>
> (def large-file "/path-to-large-file")
>
> ;; using clojure.data.xml returns quickly with no problems whereas
> clojure.xml/parse from the link above causes problems..
> (def root
>   ( -> large-file
>        java.io/input-stream
>        data.xml/parse
>        ))
>
> (class root) ;clojure.data.xml.Element
>
> Does anyone know a way of searching within root that won't consume the
> heap?
>
> Forgive me, I'm new to Clojure and these forums, I've searched through
> previous posts but not managed to answer my own question.
>
> Thanks in advance.
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to