Re: Lazily read XML larger than memory: my take at parser API

2011-05-31 Thread Avram
Interestingly, I had planned on using Jackson, but found that because my JSON data in fact was not always well-formed and needed minor cleaning steps (e.g. double newline without interleave commas between JSON chunks), I needed to create better chunks of well-formed JSON first in a streaming sort o

Re: Lazily read XML larger than memory: my take at parser API

2011-05-31 Thread Ulises
jackson can read/parse large JSON files through its streaming API: http://wiki.fasterxml.com/JacksonInFiveMinutes#Streaming_API_Example U -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com No

Re: Lazily read XML larger than memory: my take at parser API

2011-05-31 Thread Avram
Just a quick comment on a generic, similar issue. I need to parse Gigabyte files of multi-line JSON ( which is a similar problem to parsing Gigabytes of XML) where the record delimiter is not a newline. My strategy is to determine record separators (e.g. by counting the level of nestings) as chun

Re: Lazily read XML larger than memory: my take at parser API

2011-05-31 Thread Ilya Kasnacheev
Forgot to mention some things: - https://github.com/alamar/clojure-xml-stream on github. - I'm yet to figure out that Lenin thing, so ant. - The two-step handler system (there's a function that takes a method and returns a handler, and handler accepts item being constructed and stream-reader) seem

Lazily read XML larger than memory: my take at parser API

2011-05-31 Thread Ilya Kasnacheev
Hi *! I've tried a few searches on parsing XML files larger than memory, didn't find anything and wrote a simple framework for parsing XML via STAX to lazy sequence of defrecords. It is therefore capable of reading several GB of xml without much problems. It is quite declarative but also quite ugly