Good question. Every lib that came to mind when I saw 
clojure.data.xml/parse's
tree of Elements {:tag _,
:attrs _, :content _} only works on zippers which apparently sit in memory.

One option is to use `clojure.data.xml/source-seq` to get back a lazy 
sequence
of Events {:type _, :name _, :attrs _, :str _} where the event :name is 
either
:start-element, :end-element, or :characters.

For example, "<strong>Hello</strong>" would parse into the events
[:start-element "strong"], [:characters "Hello"], [:end-element "strong"]. 
You
could use loop/recur to manage state as your consume the sequence.

That's actually how I'm used to working with SAX parsers anyways. Here are 
some
naive Ruby examples if it's new to you: 
https://gist.github.com/danneu/3977120.

Of course, I imagine the ideal solution would involve some way to express 
selectors on the
Element tree like I'm used to doing with raynes/laser on zippers: 
https://github.com/Raynes/laser/blob/master/docs/guide.md#screen-scraping.


On Tuesday, December 17, 2013 4:57:32 AM UTC-6, Peter Ullah wrote:
>
>
> Hi all, 
>
> I'm attempting to parse a large (500MB) XML, specifically I am trying to 
> extract various parts using XPath. I've been using the examples presented 
> here: 
> http://clojure-doc.org/articles/tutorials/parsing_xml_with_zippers.html
> and all was going when tested against small files, however now that I am 
> using the larger file Fireplace/Vim just hangs and my laptop gets hot then 
> I get a memory exception.
>
> I've been playing around with various other libraries such as 
> clojure.data.xml and and found that the following works perfectly well for 
> parsing... but when I come to search inside root, things start to snarl up 
> again.
>
> (ns example.core
>   (:require [clojure.java.io :as java.io] 
>             [clojure.data.xml :as data.xml]
>             ))  
>
> (def large-file "/path-to-large-file")
>
> ;; using clojure.data.xml returns quickly with no problems whereas 
> clojure.xml/parse from the link above causes problems..
> (def root 
>   ( -> large-file
>        java.io/input-stream
>        data.xml/parse
>        ))  
>
> (class root) ;clojure.data.xml.Element
>
> Does anyone know a way of searching within root that won't consume the 
> heap?
>
> Forgive me, I'm new to Clojure and these forums, I've searched through 
> previous posts but not managed to answer my own question.
>
> Thanks in advance.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to