Hi,

the current solution seems to be (master == thread-0):

*    for (int c = 1, c < core_count; ++c)   thread-0 waits for thread-c*

One could instead do something this:

*    for (int dc = 1; dc < core_count); dc += dx)
        {
           parallel(
thread-n waits for thread-n+dc ) if (thread-n+dc < core_count)
                   )
        }*

Same for start-up. In our case the time would be reduced from 80*tsync to 7*tsync which
would give us about 11 times the current performance.

/// Jürgen


On 04/06/2014 05:28 PM, Elias Mårtenson wrote:
What part of the join should be parallel? The join itself is essentially the main thread waiting for all other threads to finish. What is it that can be parallelised?

Regards,
Elias


On 6 April 2014 22:32, Juergen Sauermann <juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>> wrote:

    Hi,

    one more plot that might explain a lot. I have plotted the startup
    times and the total times
    vs. the number of cores (1024÷1024 array).

    For small core counts (i.e. < 6...10), the startup time is
    moderate and the total time decreases rapidly.

    For more cores, the total time increases again. This is most
    likely because the timer per core becomes negligible
    and the join time begins to dominate the total time.

    Both start and join times seem to be more-or-less linear with the
    number of cores which is probably because
    the master thread is doing all that. It would have been smarter to
    do the start and join in parallel which
    would then cost O(log P) instead of O(P) for P cores.

    /// Jürgen




Reply via email to