Hi Elias,

yes - actually a lot. I haven't looked through all files, but
at 80, 60, and small core counts.

The good news is that all results look plausible now. There are some variations
in the data, of course, but the trend is clear:

The total time for OMP (the rightmost value in the plot, i.e. x == corecount + 10) is consistently about twice the total time for a hand-crafted fork/sync. The benchmark was made in such way that it only shows the fork/join times. Column N ≤ corecount shows the time when the N'th core
started execution of its task.

I have attached a plot for the 80 core result (4 hand-crafted runs in red and 4 OMP runs in green).
And the script that created the plots using gnuplot.

/// Jürgen


On 08/01/2014 03:16 PM, Elias Mårtenson wrote:

Were you able to deduce anything from the test results?

On 11 May 2014 23:02, "Juergen Sauermann" <juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>> wrote:

    Hi Elias,

    thanks, already interesting. If you could loop around the core count:

    *for ((i=1; $i<=80; ++i)); do**
    ** ./Parallel $i**
    ** ./Parallel_OMP $i**
    **done*

    then I could understand the data better. Also not sure if something
    is wrong with the benchmark program. On my new 4-core with OMP I get
    fluctuations from:

    eedjsa@server65 ~/apl-1.3/tools $ ./Parallel_OMP 4
    Pass 0: 4 cores/threads, 8229949 cycles total
    Pass 1: 4 cores/threads, 8262 cycles total
    Pass 2: 4 cores/threads, 4035 cycles total
    Pass 3: 4 cores/threads, 4126 cycles total
    Pass 4: 4 cores/threads, 4179 cycles total

    to:

    eedjsa@server65 ~/apl-1.3/tools $ ./Parallel_OMP 4
    Pass 0: 4 cores/threads, 11368032 cycles total
    Pass 1: 4 cores/threads, 4042228 cycles total
    Pass 2: 4 cores/threads, 7251419 cycles total
    Pass 3: 4 cores/threads, 3846 cycles total
    Pass 4: 4 cores/threads, 2725 cycles total

    The fluctuations with the manual parallel for are smaller:

    Pass 0: 4 cores/threads, 87225 cycles total
    Pass 1: 4 cores/threads, 245046 cycles total
    Pass 2: 4 cores/threads, 84632 cycles total
    Pass 3: 4 cores/threads, 63619 cycles total
    Pass 4: 4 cores/threads, 93437 cycles total

    but still considerable. The picture so far suggests that OMP
    fluctuates much
    more (in the start-up + sync time) than manual with the highest
    OMP start-up above manual
    and the lowest far below. One change on my  TODO list is to use
    futexes instead of mutexes
    (like OMP does), probably not an issue under Solaris sunce
    futextes are linux-specific.

    /// Jürgen


    On 05/11/2014 04:23 AM, Elias Mårtenson wrote:
    Here are the files that I promised earlier.

    Regards,
    Elias


CORES=$1

cat << ENDCAT > plot_cmd
#! /usr/bin/gnuplot

set terminal png
set output "result_$CORES.png"

  plot \
"./cores_${CORES}_pass_1.manual" with lines 1, \
"./cores_${CORES}_pass_2.manual" with lines 1, \
"./cores_${CORES}_pass_3.manual" with lines 1, \
"./cores_${CORES}_pass_4.manual" with lines 1, \
"./cores_${CORES}_pass_1.omp" with lines 2, \
"./cores_${CORES}_pass_2.omp" with lines 2, \
"./cores_${CORES}_pass_3.omp" with lines 2, \
"./cores_${CORES}_pass_4.omp" with lines 2

ENDCAT

chmod 755 plot_cmd
./plot_cmd

Reply via email to