Alan,
Apologies for the delayed reply - I remember Iota well (there was some
cross-fertilisation between it and foldable-seq a few months back IIRC :-)
Having said that, I don't think that Iota will help in my particular situation
(although I'd be delighted to be proven wrong)? Given that the f
Sorry to jump in, but I thought it worthwhile to add a couple points;
(sorry for being brief)
1. Reducers work fine with data much larger than memory, you just need to
mmap() the data you're working with so Clojure thinks everything is in
memory when it isn't. Reducer access is fairly sequential,
On the other hand it is 2013, not 2003. 40G is small in terms of modern
hardware. Terabyte ram servers have been available for awhile, at prices
within the reach of many projects. "Large" data in this decade is measured
in petabytes, at least.
On Sunday, September 29, 2013 5:13:14 PM UTC-7, Pa
Thanks - when I said "small", I was referring to the fact that your tests
were using the first 1 pages, as opposed to the entire data dump. Sorry
if I was unclear or misunderstood.
On Sunday, September 29, 2013 3:20:38 PM UTC-7, Paul Butcher wrote:
>
> The dataset I'm using is a Wikipedia d
On 29 Sep 2013, at 22:58, Paul Mooser wrote:
> Paul, is there any easy way to get the (small) dataset you're working with,
> so we can run your actual code against the same data?
The dataset I'm using is a Wikipedia dump, which hardly counts as "small" :-)
Having said that, the first couple of
Paul, is there any easy way to get the (small) dataset you're working with,
so we can run your actual code against the same data?
On Saturday, May 25, 2013 9:34:15 AM UTC-7, Paul Butcher wrote:
>
>
> The example counts the words contained within a Wikipedia dump. It should
> respond well to para
To be clear, I don't object to the approach, only to naming it "fold"
and/or tying it to interfaces related to folding.
Stu
On Sat, Sep 28, 2013 at 5:29 PM, Paul Butcher wrote:
> On 28 Sep 2013, at 22:00, Alex Miller wrote:
>
> Reducers (and fork/join in general) are best suited for fine-grai
On 28 Sep 2013, at 22:00, Alex Miller wrote:
> Reducers (and fork/join in general) are best suited for fine-grained
> computational parallelism on in-memory data. The problem in question involves
> processing more data than will fit in memory.
>
> So the question is then what is the best way t
Thanks Alex - I've made both of these changes. The shutdown-agents did get rid
of the pause at the end of the pmap solution, and the -server argument made a
very slight across-the-board performance improvement. But neither of them
fundamentally change the basic result (that the implementation th
Can't your last possible solution rather be implemented on top of f/j pool?
Is it possible to beat f/j pool performance with ad-hoc thread-pool in
situations where there are thousands of tasks?
JW
On Sat, Sep 28, 2013 at 11:00 PM, Alex Miller wrote:
> Reducers (and fork/join in general) are be
I am hoping that this will be fixed for 1.6 but no one is actually
"working" on it afaik. If someone wants to take it on, I would GREATLY
appreciate a patch on this ticket (must be a contributor of course).
On Saturday, September 28, 2013 11:24:18 AM UTC-5, Paul Butcher wrote:
>
> On 28 Sep 2013
Reducers (and fork/join in general) are best suited for fine-grained
computational parallelism on in-memory data. The problem in question
involves processing more data than will fit in memory.
So the question is then what is the best way to parallelize computation
over the stream. There are man
For your timings, I would also strongly recommend altering your project.clj
to force the -server hotspot:
:jvm-opts ^:replace ["-Xmx1g" "-server" ... and whatever else you want
here ... ]
By default lein will use tiered compilation to optimize repl startup, which
is not what you want for ti
On 28 Sep 2013, at 19:51, Jozef Wagner wrote:
> Anyway, I think the bottleneck in your code is at
> https://github.com/paulbutcher/parallel-word-count/blob/master/src/wordcount/core.clj#L9
> Instead of creating new persistent map for each word, you should use a
> transient here.
I would love
If a Clojure ticket is triaged, it means that one of the Clojure screeners
believe the ticket's description describes a real issue with Clojure that
ought to be changed in some way, and would like Rich Hickey to look at it
and see whether he agress. If he does, it becomes vetted. A diagram of
the
Or even better, use guava's Multiset there...
On Saturday, September 28, 2013 8:51:56 PM UTC+2, Jozef Wagner wrote:
>
> Well it should be possible to implement a foldseq variant which takes a
> reducible collection as an input. This would speed things, as you don't
> create so much garbage with
Well it should be possible to implement a foldseq variant which takes a
reducible collection as an input. This would speed things, as you don't
create so much garbage with reducers. XML parser which produces reducible
collection will be a bit harder :).
Anyway, I think the bottleneck in your c
On 28 Sep 2013, at 17:42, Jozef Wagner wrote:
> I mean that you should forgot about lazy sequences and sequences in general,
> if you want to have a cutting edge performance with reducers. Example of
> reducible slurp, https://gist.github.com/wagjo/6743885 , does not hold into
> the head.
OK
I mean that you should forgot about lazy sequences and sequences in
general, if you want to have a cutting edge performance with reducers.
Example of reducible slurp, https://gist.github.com/wagjo/6743885 , does
not hold into the head.
JW
On Sat, Sep 28, 2013 at 6:24 PM, Paul Butcher wrote:
>
On 28 Sep 2013, at 17:14, Jozef Wagner wrote:
> I would go a bit more further and suggest that you do not use sequences at
> all and work only with reducible/foldable collections. Make an input reader
> which returns a foldable collection and you will have the most performant
> solution. The t
Ah - one mystery down. Thanks Andy!
--
paul.butcher->msgCount++
Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?
http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: p...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
On 28 Sep 20
I would go a bit more further and suggest that you do not use sequences at
all and work only with reducible/foldable collections. Make an input reader
which returns a foldable collection and you will have the most performant
solution. The thing about holding into the head is being worked on righ
I do not know about the most important parts of your performance difficulties,
but on a more trivial point I might be able to shed some light.
See the ClojureDocs page for pmap, which refers to the page for future, linked
below. If you call (shutdown-agents) the 60-second wait to exit should go
On 28 Sep 2013, at 01:22, Rich Morin wrote:
>> On Sat, May 25, 2013 at 12:34 PM, Paul Butcher wrote:
>> I'm currently working on a book on concurrent/parallel development for The
>> Pragmatic Programmers. ...
>
> Ordered; PDF just arrived (:-).
Cool - very interested to hear your feedback onc
On 28 Sep 2013, at 00:27, Stuart Halloway wrote:
> I have posted an example that shows partition-then-fold at
> https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj.
>
> I would be curious to know how this approach performs with your data. W
> On Sat, May 25, 2013 at 12:34 PM, Paul Butcher wrote:
> I'm currently working on a book on concurrent/parallel development for The
> Pragmatic Programmers. ...
Ordered; PDF just arrived (:-).
I don't know yet whether the book has anything like this, but I'd
like to see a table that shows whi
Hi Paul,
I have posted an example that shows partition-then-fold at
https://github.com/stuarthalloway/exploring-clojure/blob/master/examples/exploring/reducing_apple_pie.clj
.
I would be curious to know how this approach performs with your data. With
the generated data I used, the partition+fold
I'm currently working on a book on concurrent/parallel development for The
Pragmatic Programmers. One of the subjects I'm covering is parallel programming
in Clojure, but I've hit a roadblock with one of the examples. I'm hoping that
I can get some help to work through it here.
The example coun
28 matches
Mail list logo