Good question. Every lib that came to mind when I saw clojure.data.xml/parse's tree of Elements {:tag _, :attrs _, :content _} only works on zippers which apparently sit in memory.
One option is to use `clojure.data.xml/source-seq` to get back a lazy sequence of Events {:type _, :name _, :attrs _, :str _} where the event :name is either :start-element, :end-element, or :characters. For example, "<strong>Hello</strong>" would parse into the events [:start-element "strong"], [:characters "Hello"], [:end-element "strong"]. You could use loop/recur to manage state as your consume the sequence. That's actually how I'm used to working with SAX parsers anyways. Here are some naive Ruby examples if it's new to you: https://gist.github.com/danneu/3977120. Of course, I imagine the ideal solution would involve some way to express selectors on the Element tree like I'm used to doing with raynes/laser on zippers: https://github.com/Raynes/laser/blob/master/docs/guide.md#screen-scraping. On Tuesday, December 17, 2013 4:57:32 AM UTC-6, Peter Ullah wrote: > > > Hi all, > > I'm attempting to parse a large (500MB) XML, specifically I am trying to > extract various parts using XPath. I've been using the examples presented > here: > http://clojure-doc.org/articles/tutorials/parsing_xml_with_zippers.html > and all was going when tested against small files, however now that I am > using the larger file Fireplace/Vim just hangs and my laptop gets hot then > I get a memory exception. > > I've been playing around with various other libraries such as > clojure.data.xml and and found that the following works perfectly well for > parsing... but when I come to search inside root, things start to snarl up > again. > > (ns example.core > (:require [clojure.java.io :as java.io] > [clojure.data.xml :as data.xml] > )) > > (def large-file "/path-to-large-file") > > ;; using clojure.data.xml returns quickly with no problems whereas > clojure.xml/parse from the link above causes problems.. > (def root > ( -> large-file > java.io/input-stream > data.xml/parse > )) > > (class root) ;clojure.data.xml.Element > > Does anyone know a way of searching within root that won't consume the > heap? > > Forgive me, I'm new to Clojure and these forums, I've searched through > previous posts but not managed to answer my own question. > > Thanks in advance. > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.