Hi,
the current solution seems to be (master == thread-0):
* for (int c = 1, c < core_count; ++c) thread-0 waits for thread-c*
One could instead do something this:
* for (int dc = 1; dc < core_count); dc += dx)
{
parallel(
thread-n waits for thread-n+dc ) if (thread-n+dc <
core_count)
)
}*
Same for start-up. In our case the time would be reduced from 80*tsync
to 7*tsync which
would give us about 11 times the current performance.
/// Jürgen
On 04/06/2014 05:28 PM, Elias Mårtenson wrote:
What part of the join should be parallel? The join itself is
essentially the main thread waiting for all other threads to finish.
What is it that can be parallelised?
Regards,
Elias
On 6 April 2014 22:32, Juergen Sauermann
<juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>>
wrote:
Hi,
one more plot that might explain a lot. I have plotted the startup
times and the total times
vs. the number of cores (1024÷1024 array).
For small core counts (i.e. < 6...10), the startup time is
moderate and the total time decreases rapidly.
For more cores, the total time increases again. This is most
likely because the timer per core becomes negligible
and the join time begins to dominate the total time.
Both start and join times seem to be more-or-less linear with the
number of cores which is probably because
the master thread is doing all that. It would have been smarter to
do the start and join in parallel which
would then cost O(log P) instead of O(P) for P cores.
/// Jürgen