Sorry to jump in, but I thought it worthwhile to add a couple points; (sorry for being brief)
1. Reducers work fine with data much larger than memory, you just need to mmap() the data you're working with so Clojure thinks everything is in memory when it isn't. Reducer access is fairly sequential, not random, so spinning disks work great here. 2. A 40GB XML file is very often many many smaller XML documents aggregated together. It's often faster to separate each document into it's own line (via various UNIX tools) and parse each line separately. I typically do something like $ zcat bigxml.gz | tr '\n' ' ' | sed 's/<foo>/\n<foo>/' | grep '^<foo>' > records.xml . 3. Check out the Iota library, https://github.com/thebusby/iota/ . I often use for reducing over 100's of GB's worth of text data. It does what Jozef suggests, and makes a text file a foldable collection. 4. While pmap is great for advertising the power of Clojure, it's likely safe to say that it should be ignored if you're actually looking for performance. Hope this helps, Alan Busby -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.