Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
In fact, it is enough to replace (drop-last sibs) with (remove seq? sibs). . On Feb 12, 9:54 pm, Marko Topolnik wrote: > On Feb 12, 7:55 pm, Marko Topolnik wrote: > > > How about replacing > >   (drop-last sibs) > > with > >   (remove vector? sibs) > > ? > > This was slightly naive. We also need

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
On Feb 12, 7:55 pm, Marko Topolnik wrote: > How about replacing >   (drop-last sibs) > with >   (remove vector? sibs) > ? This was slightly naive. We also need these changes: In siblings: :end-element [[(rest s)]] In mktree: (cons (struct element (:name elem) (:attrs elem) (remove

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
Also, the xpp-based parser is almost an order of magnitude slower than the sax-based one. The only thing it lacks is a couple of type hints: (defn- attrs [^XmlPullParser xpp] (defn- ns-decs [^XmlPullParser xpp] (let [step (fn [^XmlPullParser xpp] These hints increase the performance from 400%

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
How about replacing (drop-last sibs) with (remove vector? sibs) ? remove will not access the next seq member in advance and the only vector in sibs is the last element. I tried this change and it works for the test code from the original post. On Feb 12, 4:43 pm, Chouser wrote: > On Sat, Feb

Re: Problems with lazy-xml

2011-02-12 Thread Chouser
On Sat, Feb 12, 2011 at 4:16 AM, Marko Topolnik wrote: >> > Just guessing, but is it something to do with this (from the docstring >> > of parse-seq)? >> >> > "it will be run in a separate thread and be allowed to get >> >  ahead by queue-size items, which defaults to maxint". > > As I've figured

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
> > Just guessing, but is it something to do with this (from the docstring > > of parse-seq)? > > > "it will be run in a separate thread and be allowed to get > >  ahead by queue-size items, which defaults to maxint". As I've figured it out, when there's XPP on the classpath, and I'm using it, the

Re: Problems with lazy-xml

2011-02-11 Thread Chouser
On Fri, Feb 11, 2011 at 2:35 PM, Chris Perkins wrote: > On Feb 11, 5:07 am, Marko Topolnik wrote: >> http://db.tt/iqTo1Q4 >> >> This is a sample XML file with 1000 records -- enough to notice a >> significant delay when evaluating the code from the original post. >> >> Chouser, could you spare a

Re: Problems with lazy-xml

2011-02-11 Thread Chris Perkins
On Feb 11, 5:07 am, Marko Topolnik wrote: > http://db.tt/iqTo1Q4 > > This is a sample XML file with 1000 records -- enough to notice a > significant delay when evaluating the code from the original post. > > Chouser, could you spare a second here? I've been looking and looking > at mktree and sibl

Re: Problems with lazy-xml

2011-02-11 Thread Benny Tsai
I can confirm that the same thing is happening on my end as well. The XML is parsed lazily: user=> (time (let [root (parse-trim (reader "huge.xml"))] (-> root :content type))) "Elapsed time: 45.57367 msecs" clojure.lang.LazySeq ...but as soon as I try to do anything with the struct map for the D

Re: Problems with lazy-xml

2011-02-11 Thread Marko Topolnik
http://db.tt/iqTo1Q4 This is a sample XML file with 1000 records -- enough to notice a significant delay when evaluating the code from the original post. Chouser, could you spare a second here? I've been looking and looking at mktree and siblings for two days now and can't for the life of me find

Re: Problems with lazy-xml

2011-02-11 Thread Benny Tsai
Can you post a link to a (sanitized, if need be) sample file? On Feb 11, 1:21 am, Marko Topolnik wrote: > Right now I'm working with a 300k-record file, but the code must scale > into the millions, and, as I mentioned, it is already spewing > OutOfMemoy errors. Also, on a more abstract level, it'

Re: Problems with lazy-xml

2011-02-11 Thread Marko Topolnik
Right now I'm working with a 300k-record file, but the code must scale into the millions, and, as I mentioned, it is already spewing OutOfMemoy errors. Also, on a more abstract level, it's just not right to thrash the memory of a concurrent server-side component for absolutely no good reason. --

Re: Problems with lazy-xml

2011-02-10 Thread Mike Meyer
On Thu, 10 Feb 2011 07:22:55 -0800 (PST) Marko Topolnik wrote: > I am required to process a huge XML file with 300,000 records. The > structure is like this: > > > > > > > ... > ... > ... 299,998 more > > > > Obviously, it is of key importance not to allocate

Problems with lazy-xml

2011-02-10 Thread Marko Topolnik
I am required to process a huge XML file with 300,000 records. The structure is like this: ... ... ... 299,998 more Obviously, it is of key importance not to allocate memory for all the records at once. If I do this: (use ['clojure.contrib.lazy-xml :only ['pars