01-Feb-2013 19:42, Sparsh Mittal пишет:
It got posted before I completed it! Sorry.I am parallelizing a program which follows this structure: immutable int numberOfThreads= 2 for iter = 1 to MAX_ITERATION { myLocalBarrier = new Barrier(numberOfThreads+1); for i= 1 to numberOfThreads { spawn(&myFunc, args) } myLocalBarrier.wait(); } void myFunc(args) { //do the task myLocalBarrier.wait() } When I run it, and compare this parallel version with its serial version, I only get speedup of nearly <1.3 for 2 threads. When I write same program in Go, scaling is nearly 2. Also, in D, on doing "top", I see the usage as only 130% CPU and not nearly 200% or 180%. So I was wondering, if I am doing it properly. Please help me.
Can't tell much without the whole source or at least compilable standalone piece. The '//do task part' is critical to understanding as well as declaration of myLocalBarrier.
Also why not use std.parallelism? -- Dmitry Olshansky
