Thank you everyone for your advice, I found it useful and think that I am part-way to a solution using clojure.data.xml/source-seq as suggested by dannue.
I'll post what I have done so far in the hope it might help someone else... comments on style welcome. *Solution*: Given the following XML, <head> <title>This is some text</title> <body> <h1>This is a header</h1> </body> </head> data.xml/source-seq will return a lazy seq of data.xml.Event items #clojure.data.xml.Event{:type :start-element, :name :head, :attrs nil, :str nil} #clojure.data.xml.Event{:type :characters, :name nil, :attrs nil, :str nil} #clojure.data.xml.Event{:type :start-element, :name :title, :attrs nil, :str nil} #clojure.data.xml.Event{:type :characters, :name nil, :attrs nil, :str This is some text} #clojure.data.xml.Event{:type :end-element, :name :title, :attrs nil, :str nil} #clojure.data.xml.Event{:type :start-element, :name :body, :attrs nil, :str nil} #clojure.data.xml.Event{:type :start-element, :name :h1, :attrs nil, :str nil} #clojure.data.xml.Event{:type :characters, :name nil, :attrs nil, :str This is a header} #clojure.data.xml.Event{:type :end-element, :name :h1, :attrs nil, :str nil} #clojure.data.xml.Event{:type :end-element, :name :body, :attrs nil, :str nil} #clojure.data.xml.Event{:type :end-element, :name :head, :attrs nil, :str nil} This is perfect for finding elements with a particular name, but completely useless if I want to find an element based on its location. So I maintain a stack where each :start-element causes the element name to be pushed, and each :end-element to invoke a pop. (filter (fn [x] (complement (nil? x))) (let [stack (atom []) search-pattern "vmware/collectionHost/Object/Property/Property"] (doseq[x (take 100 xml)] ; just test with the first 100 elements in seq. (do (cond (= (:type x) :start-element) (swap! stack conj (name (get x :name))) (= (:type x) :end-element) (swap! stack pop) ) (when (= search-pattern (clojure.string/join "/" @stack)) (println (clojure.string/join "/" @stack))) ) ) ) ) This is a work in progress and does not take account of attributes on the elements, but I would appreciate any comments. Thanks Pete On Wednesday, December 18, 2013 7:23:21 AM UTC, danneu wrote: > > Good question. Every lib that came to mind when I saw > clojure.data.xml/parse's > tree of Elements {:tag _, > :attrs _, :content _} only works on zippers which apparently sit in memory. > > One option is to use `clojure.data.xml/source-seq` to get back a lazy > sequence > of Events {:type _, :name _, :attrs _, :str _} where the event :name is > either > :start-element, :end-element, or :characters. > > For example, "<strong>Hello</strong>" would parse into the events > [:start-element "strong"], [:characters "Hello"], [:end-element "strong"]. > You > could use loop/recur to manage state as your consume the sequence. > > That's actually how I'm used to working with SAX parsers anyways. Here are > some > naive Ruby examples if it's new to you: > https://gist.github.com/danneu/3977120. > > Of course, I imagine the ideal solution would involve some way to express > selectors on the > Element tree like I'm used to doing with raynes/laser on zippers: > https://github.com/Raynes/laser/blob/master/docs/guide.md#screen-scraping. > > > On Tuesday, December 17, 2013 4:57:32 AM UTC-6, Peter Ullah wrote: >> >> >> Hi all, >> >> I'm attempting to parse a large (500MB) XML, specifically I am trying to >> extract various parts using XPath. I've been using the examples presented >> here: >> http://clojure-doc.org/articles/tutorials/parsing_xml_with_zippers.html >> and all was going when tested against small files, however now that I am >> using the larger file Fireplace/Vim just hangs and my laptop gets hot then >> I get a memory exception. >> >> I've been playing around with various other libraries such as >> clojure.data.xml and and found that the following works perfectly well for >> parsing... but when I come to search inside root, things start to snarl up >> again. >> >> (ns example.core >> (:require [clojure.java.io :as java.io] >> [clojure.data.xml :as data.xml] >> )) >> >> (def large-file "/path-to-large-file") >> >> ;; using clojure.data.xml returns quickly with no problems whereas >> clojure.xml/parse from the link above causes problems.. >> (def root >> ( -> large-file >> java.io/input-stream >> data.xml/parse >> )) >> >> (class root) ;clojure.data.xml.Element >> >> Does anyone know a way of searching within root that won't consume the >> heap? >> >> Forgive me, I'm new to Clojure and these forums, I've searched through >> previous posts but not managed to answer my own question. >> >> Thanks in advance. >> > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.