Re: [Bug-apl] 80 core performance results

2014-08-23 Thread Juergen Sauermann
Hi Elias, I believe the gain of coalescing functions (or slicing the values involved) is somewhat limited and occurs only when your APL values are small. For large values computing one function after the other has a better cache locality. And it has a

Re: [Bug-apl] 80 core performance results

2014-08-22 Thread Elias Mårtenson
Thanks, that's interesting indeed. What about the idea of coalescing multiple functions so that each thread can stream multiple operations in a row without synchronising? To me, it would seem to be hugely beneficial if the expression -1+2+X could stream the three operations (two additions, one neg

Re: [Bug-apl] 80 core performance results

2014-08-22 Thread Juergen Sauermann
Hi Elias, I am working on it. As a preparation I have created a new command ]PSTAT that shows how many CPU cycles the different scalar function take. You can run the new workspace ScalarBenchmark_1.apl to see the results (SVN 444).

Re: [Bug-apl] 80 core performance results

2014-08-22 Thread Elias Mårtenson
Have the results of this been integrated in the interpreter? On 1 August 2014 21:57, Juergen Sauermann wrote: > Hi Elias, > > yes - actually a lot. I haven't looked through all files, but > at 80, 60, and small core counts. > > The good news is that all results look plausible now. There are so

Re: [Bug-apl] 80 core performance results

2014-08-01 Thread Juergen Sauermann
Hi Elias, yes - actually a lot. I haven't looked through all files, but at 80, 60, and small core counts. The good news is that all results look plausible now. There are some variations in the data, of course, but the trend is clear: The total time for OMP (the rightmost value in the plot, i.

Re: [Bug-apl] 80 core performance results

2014-08-01 Thread Elias Mårtenson
Were you able to deduce anything from the test results? On 11 May 2014 23:02, "Juergen Sauermann" wrote: > Hi Elias, > > thanks, already interesting. If you could loop around the core count: > > *for ((i=1; $i<=80; ++i)); do* > * ./Parallel $i* > * ./Parallel_OMP $i* > *done* > > then I could un

Re: [Bug-apl] 80 core performance results

2014-05-13 Thread Juergen Sauermann
Hi, I guess I know what went wrong. The workload per thread was so small (reading the CPU cycle counter and that was it) that the first threads will have finished while the tasks were still being distributed. Due to the lack of core binding, some cores would therefore be used several times and c

Re: [Bug-apl] 80 core performance results

2014-05-11 Thread Juergen Sauermann
Hi Elias, thanks, already interesting. If you could loop around the core count: *for ((i=1; $i<=80; ++i)); do** ** ./Parallel $i** ** ./Parallel_OMP $i** **done* then I could understand the data better. Also not sure if something is wrong with the benchmark program. On my new 4-core with OMP I