Alan,

Apologies for the delayed reply - I remember Iota well (there was some 
cross-fertilisation between it and foldable-seq a few months back IIRC :-)

Having said that, I don't think that Iota will help in my particular situation 
(although I'd be delighted to be proven wrong)? Given that the file I'm 
processing is an XML file, and will therefore have to pass through an XML 
parser, unless I write an XML parser on top of the reducers framework, I'm 
stuck with dealing with sequences at some point along the way?

--
paul.butcher->msgCount++

Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?

http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: p...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher

On 30 Sep 2013, at 11:15, Alan Busby <thebu...@gmail.com> wrote:

> Sorry to jump in, but I thought it worthwhile to add a couple points; (sorry 
> for being brief)
> 
> 1. Reducers work fine with data much larger than memory, you just need to 
> mmap() the data you're working with so Clojure thinks everything is in memory 
> when it isn't. Reducer access is fairly sequential, not random, so spinning 
> disks work great here. 
> 
> 2. A 40GB XML file is very often many many smaller XML documents aggregated 
> together. It's often faster to separate each document into it's own line (via 
> various UNIX tools) and parse each line separately. I typically do something 
> like $ zcat bigxml.gz | tr '\n' ' ' | sed 's/<foo>/\n<foo>/' | grep '^<foo>' 
> > records.xml . 
> 
> 3. Check out the Iota library, https://github.com/thebusby/iota/ . I often 
> use for reducing over 100's of GB's worth of text data. It does what Jozef 
> suggests, and makes a text file a foldable collection.
> 
> 4. While pmap is great for advertising the power of Clojure, it's likely safe 
> to say that it should be ignored if you're actually looking for performance. 
> 
> 
> Hope this helps,
>     Alan Busby
> 
> 
> 
> 
> -- 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to