Re: [Bug-apl] Performance optimisations: Results

Juergen Sauermann Sun, 06 Apr 2014 07:33:05 -0700

Hi,

one more plot that might explain a lot. I have plotted the startup timesand the total times

vs. the number of cores (1024÷1024 array).

For small core counts (i.e. < 6...10), the startup time is moderate andthe total time decreases rapidly.

For more cores, the total time increases again. This is most likelybecause the timer per core becomes negligible

and the join time begins to dominate the total time.

Both start and join times seem to be more-or-less linear with the numberof cores which is probably becausethe master thread is doing all that. It would have been smarter to dothe start and join in parallel which

would then cost O(log P) instead of O(P) for P cores.

/// Jürgen


On 04/04/2014 04:52 PM, Elias Mårtenson wrote:

Thanks. I'll look into that a bit later.
I wouldn't expect must strangeness in terms of the environment. Themachine more or less idle at the time (as far as I remember).Hyperthreading is also turned off on the machine in order to providereliable multicore behaviour (if enabled, the operating system willreport 160 CPU's).
Also, Solaris tends to be very reliable when it comes to multithreadedbehaviour.
I believe you are on to something when you are talking about overheadto dispatch and join the threads. It's likely that with 80 threads thejob itself is just too short for it be any significant portion of allthe time taken. This, of course, brings us back to whether coalescingis something that should be done.
Regards,
Elias
On 4 April 2014 22:47, Juergen Sauermann<juergen.sauerm...@t-online.de <mailto:juergen.sauerm...@t-online.de>>wrote:
    Hi Elias,

    thanks, very interesting figures. Looking at the 1024×1024 numbers
    the behavior of
    your machine is rather non-linear. Not in the "scales badly" sense
    but completely irregular.

    For example, 6 threads have finished after about 23 Mio cycles
    while 70 threads after about 91 Mio cycles.
    At the time the 6 threads are finished, none of the 70 threads has
    even started.

    Often the startup and more often the join time is longer than the
    active execution time.
    On my 1-CPU 2-core box this is completely different.

    There could be several reasons:

    Inter-CPU sync is much longer than inter-core sync (on a 10×8 core
    machine) ?
    Does the cycle counter work reliably on a multiple-CPU ?
    Wrong core affinities ?
    CPUs busy with other things ?

    If you want to visualize that, try gnuplot in the same directory
    as the txt files
    and then give the following commands:

    set xtics 32
    set grid
    plot   "./results_6.txt" with lines 1
    replot "./results_70.txt" with lines 2
    pause -1

    /// Jürgen



    On 04/04/2014 05:49 AM, Elias Mårtenson wrote:
    Here are the results. I modified main.cc so that it accepts the
    thread count as an argument, and ran the test 80 times. I then
    increased the array size from 1024x1024 to 6000x1024 and re-ran
    the test again. The results are attached.

    If I extend the array larger than that, the interpreter crashes.

    Regards,
    Elias

<<attachment: start-total.png>>

Re: [Bug-apl] Performance optimisations: Results

Reply via email to