Re: [Bug-apl] Performance optimisations: Results

2014-04-06 Thread Juergen Sauermann
Hi, the current solution seems to be (master == thread-0): *for (int c = 1, c < core_count; ++c) thread-0 waits for thread-c* One could instead do something this: *for (int dc = 1; dc < core_count); dc += dx) { parallel( thread-n waits for thread-n+d

Re: [Bug-apl] Performance optimisations: Results

2014-04-06 Thread Elias Mårtenson
What part of the join should be parallel? The join itself is essentially the main thread waiting for all other threads to finish. What is it that can be parallelised? Regards, Elias On 6 April 2014 22:32, Juergen Sauermann wrote: > Hi, > > one more plot that might explain a lot. I have plotted

Re: [Bug-apl] Performance optimisations: Results

2014-04-06 Thread Juergen Sauermann
Hi, one more plot that might explain a lot. I have plotted the startup times and the total times vs. the number of cores (1024÷1024 array). For small core counts (i.e. < 6...10), the startup time is moderate and the total time decreases rapidly. For more cores, the total time increases agai

Re: [Bug-apl] Performance optimisations: Results

2014-04-03 Thread Juergen Sauermann
Hi, what I can see is that all this runs on two cores only. Maybe you want to change: omp_set_num_threads(2); in main.cc to something higher? /// Jürgen On 04/03/2014 07:09 AM, Elias Mårtenson wrote: Here are the data files when run on an 80-core machine (8 × 10-core Xeon processors, 0.

Re: [Bug-apl] Performance optimisations: Results

2014-04-02 Thread Juergen Sauermann
Hi Elias, there are a number of other functions that have side effects: many ⎕ functions/vars, all user defined functions, and in particular everybody else who creates a value (unless new is atomic; the Value constructor is currently not). Some of these side effects are internal (eg. ⎕RL, ⎕EA).

Re: [Bug-apl] Performance optimisations: Results

2014-04-02 Thread Elias Mårtenson
Thanks. I'm at home now, but I'll run the tests tomorrow at the office. I would say, though, that being able to run, say, the EACH operator on a user function (or lambda) would provide some tremendous opportunities for parallelisation. Would it be safe to say that as long as the function being ca

Re: [Bug-apl] Performance optimisations: Results

2014-04-02 Thread Juergen Sauermann
Hi, the output is meant to be gnuplotted. You either copy-and-paste the data lines into a some file or else apl > file (in that case you have to type blindly and remove the non-data lines with an editor. The first 256 data lines are the cycle counter of the CPU before the nth iteration at t

Re: [Bug-apl] Performance optimisations: Results

2014-04-02 Thread Elias Mårtenson
By the way, I believe the hang comes from GNU APL trying to format the 1024 by 1024 array for printing. Regards, Elias On 2 April 2014 18:43, Elias Mårtenson wrote: > Thanks, > > Now I have an OpenMP enabled build on Solaris, and I'm ready to test. How > am I supposed to interpret the output f

Re: [Bug-apl] Performance optimisations: Results

2014-04-02 Thread Elias Mårtenson
Thanks, Now I have an OpenMP enabled build on Solaris, and I'm ready to test. How am I supposed to interpret the output from this command? Regards, Elias On 2 April 2014 01:27, Juergen Sauermann wrote: > Hi Elias, > > I have attached the changed files. Note thjat this is very quick-and-dirty.

Re: [Bug-apl] Performance optimisations: Results

2014-04-01 Thread Juergen Sauermann
uot;bug-apl@gnu.org <mailto:bug-apl@gnu.org>" mailto:bug-apl@gnu.org>> Cc: Date: Fri, 14 Mar 2014 22:22:15 +0800 Subject: [Bug-apl] Performance optimisations: Results Hello guys, I've spent some time experimenting with various performance optimisation

Re: [Bug-apl] Performance optimisations: Results

2014-03-15 Thread Juergen Sauermann
Hi, one more thing is proper .configure (see README-2-configure). In particular VALUE_CHECK_WANTED=no and ASSERT_LEVEL_WANTED=0, otherwise you get a sequential time component proportional to the result size. /// Jürgen On 03/14/2014 07:56 PM, David Lamkins wrote: Hmm, I hadn't thought of

Re: [Bug-apl] Performance optimisations: Results

2014-03-14 Thread Elias Mårtenson
t using OpenMP is that there's no > hand-coding necessary. All you do is add #pragmas to your program; the > compiler takes care of the rewrites. > > > ---------- Forwarded message ------ > >> From: "Elias Mårtenson" >> To: "bug-apl@gnu.org&qu

Re: [Bug-apl] Performance optimisations: Results

2014-03-14 Thread Juergen Sauermann
es. -- Forwarded message -- From: "Elias Mårtenson" mailto:loke...@gmail.com>> To: "bug-apl@gnu.org <mailto:bug-apl@gnu.org>" mailto:bug-apl@gnu.org>> Cc: Date: Fri, 14 Mar 2014 22:22:15 +0800 Subject: [Bug-apl] Performance optimisati

Re: [Bug-apl] Performance optimisations: Results

2014-03-14 Thread David Lamkins
- > From: "Elias Mårtenson" > To: "bug-apl@gnu.org" > Cc: > Date: Fri, 14 Mar 2014 22:22:15 +0800 > Subject: [Bug-apl] Performance optimisations: Results > Hello guys, > > I've spent some time experimenting with various performance optimisations

[Bug-apl] Performance optimisations: Results

2014-03-14 Thread Elias Mårtenson
Hello guys, I've spent some time experimenting with various performance optimisations and I would like to share my latest results with you: I've run a lot of tests using Callgrind, which is part of the Valgrindtool (documentation here