Hi Elias,
could you please send the the output of ./configure (and
maybe config.log) ?
Seems like something is wrong with the thread synchronization.
/// Jürgen
On 10/24/2014 07:16 AM, Elias Mårtenson
wrote:
OK, I started some tests on my 80-core machine. At
first I decided to run the exact same thing as what you ran
above.
As you can see, before I set the dyadic threshold, I got
the expected results. After setting it, the same command hangs
with 200% CPU usage. At the time I'm writing this mail, it's
been sitting like that for about 30 minutes or so.
Here's the log of what I did. GNU APL was compiled with
CORE_COUNT_WANTED=-3
∇Z ← NCPU
time LEN;T;X;tmp
[1] ⎕SYL[26;2]
← NCPU
[2] X ←
LEN⍴2J2
[3] T ← ⎕TS
[4] tmp ←
X⋆X
[5] Z←1 1 1
24 60 60 1000⊥⎕TS - T
[6] ∇
(⍳8) ∘.time
10⋆⍳7
0 0 1 7 40 414 4180
0 0 0 3 38 409 4178
0 0 0 4 39 412 4212
0 0 1 4 38 416 4204
0 0 0 5 39 417 4225
0 0 0 4 39 417 4232
0 0 0 4 38 417 4245
0 0 0 4 38 417 4241
)COPY 5
FILE_IO
loading )DUMP file
/home/emartenson/src/apl/wslib5/FILE_IO.apl...
1
FIO∆set_dyadic_threshold '⋆'
888
(⍳8) ∘.time
10⋆⍳7
(Hangs here)
Regards,
Elias
On 26 September 2014 20:04, Juergen
Sauermann
wrote:
Hi Elias,
if you used a recent SVN then you need to set the
thresholds (vector size) above which
parallel execution is performed:
(⍳4) ∘.time 10⋆⍳7
0 0 1 3 29 254 2593
0 0 1 2 25 252 2618
0 0 1 2 26 258 2682
0 0 1 2 26 263 2866
)COPY 5 FILE_IO
loading )DUMP file
/usr/local/lib/apl/wslib5/FILE_IO.apl...
1 FIO∆set_dyadic_threshold '⋆'
⍝
returns the previous threshold for dyadic ⋆
8070450532247928832
(⍳4) ∘.time 10⋆⍳7
0 0 0 2 30 250 2590
0 0 0 1 15 149 1580
0 0 0 1 11 113 1225
0 3 0 0 12 103 1120
I am currently working on a benchmark workspace that
determines the optimal thresholds
for the different scalar functions (and those thresholds
will beome the future defaults). Right
now the default thresholds are so high that you will
always have sequential execution.
/// Jürgen
On 09/26/2014 07:22 AM, Elias Mårtenson wrote:
I've tested this code, and I don't
see much of an improvement as I increase the core
count:
Given the following function:
∇Z
← NCPU time LEN;T;X;tmp
⎕SYL[26;2] ← NCPU
X
← LEN⍴2J2
T
← ⎕TS
tmp ← X⋆X
Z←1 1 1 24 60 60 1000⊥⎕TS - T
∇
I'm running this command on my 8-core
workstation:
(⍳8) ∘.time 10⋆⍳7
0 0 0 2
19 188 2139
0 0 1 2
19 189 2147
0 0 1 2
19 210 2256