On 29 Sep 2013, at 22:58, Paul Mooser <taron...@gmail.com> wrote: > Paul, is there any easy way to get the (small) dataset you're working with, > so we can run your actual code against the same data?
The dataset I'm using is a Wikipedia dump, which hardly counts as "small" :-) Having said that, the first couple of million lines is all you need to reproduce the results I'm getting, which you can download with: curl http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 | bunzip2 | head -n 2000000 > enwiki-short.xml -- paul.butcher->msgCount++ Snetterton, Castle Combe, Cadwell Park... Who says I have a one track mind? http://www.paulbutcher.com/ LinkedIn: http://www.linkedin.com/in/paulbutcher MSN: p...@paulbutcher.com AIM: paulrabutcher Skype: paulrabutcher -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.