The way MPI processes are being assigned to hardware threads is perhaps
neither controlled nor optimal. On the HT nodes, two processes may end
up sharing the same core, with poorer performance.
Try submitting your job like this
% cat myrankfile1
rank 0=os223 slot=0
rank 1=os221 slot=0
rank 2=os222 slot=0
rank 3=os224 slot=0
rank 4=os228 slot=0
rank 5=os229 slot=0
rank 6=os223 slot=1
rank 7=os221 slot=1
rank 8=os222 slot=1
rank 9=os224 slot=1
rank 10=os228 slot=1
rank 11=os229 slot=1
rank 12=os223 slot=2
rank 13=os221 slot=2
rank 14=os222 slot=2
rank 15=os224 slot=2
rank 16=os228 slot=2
rank 17=os229 slot=2
% mpirun -host os221,os222,os223,os224,os228,os229 -np 18 --rankfile
myrankfile1 ./a.out
You can also try
% cat myrankfile2
rank 0=os223 slot=0
rank 1=os221 slot=0
rank 2=os222 slot=0
rank 3=os224 slot=0
rank 4=os228 slot=0
rank 5=os229 slot=0
rank 6=os223 slot=1
rank 7=os221 slot=1
rank 8=os222 slot=1
rank 9=os224 slot=1
rank 10=os228 slot=2
rank 11=os229 slot=2
rank 12=os223 slot=2
rank 13=os221 slot=2
rank 14=os222 slot=2
rank 15=os224 slot=2
rank 16=os228 slot=4
rank 17=os229 slot=4
% mpirun -host os221,os222,os223,os224,os228,os229 -np 18 --rankfile
myrankfile2 ./a.out
which one reproduces your problem and which one avoids it depends on
how the BIOS numbers your HTs. Once you can confirm you understand the
problem, you (with the help of this list) can devise a solution
approach for your situation.
Saygin Arkan wrote:
Hello,
I'm running mpi jobs in non-homogeneous cluster. 4 of my machines have
the following properties, os221, os222, os223, os224:
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
stepping : 7
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
bogomips : 4999.40
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
and the problematic, hyper-threaded 2 machines are as follows, os228
and os229:
vendor_id : GenuineIntel
cpu family : 6
model : 26
model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
stepping : 5
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni
monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
ida
bogomips : 5396.88
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
The problem is: those 2 machines seem to be having 8 cores (virtually,
actualy core number is 4).
When I submit an MPI job, I calculated the comparison times in the
cluster. I got strange results.
I'm running the job on 6 nodes, 3 core per node. And sometimes ( I can
say 1/3 of the tests) os228 or os229 returns strange results. 2 cores
are slow (slower than the first 4 nodes) but the 3rd core is extremely
fast.
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - RANK(0) Printing
Times...
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(1)
:38 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(2)
:38 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(3)
:38 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(4)
:37 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(5)
:34 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(6)
:38 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(7)
:39 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(8)
:37 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(9)
:38 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228
RANK(10) :48 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229
RANK(11) :35 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223
RANK(12) :38 sec
2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221
RANK(13) :37 sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os222
RANK(14) :37 sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os224
RANK(15) :38 sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os228
RANK(16) :43 sec
2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os229
RANK(17) :35 sec
TOTAL CORRELATION TIME: 48 sec
or another test:
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - RANK(0)
Printing Times...
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221
RANK(1) :170 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os222
RANK(2) :161 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os224
RANK(3) :158 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os228
RANK(4) :142 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os229
RANK(5) :256 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os223
RANK(6) :156 sec
2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221
RANK(7) :162 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222
RANK(8) :159 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224
RANK(9) :168 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228
RANK(10) :141 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229
RANK(11) :136 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os223
RANK(12) :173 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os221
RANK(13) :164 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222
RANK(14) :171 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224
RANK(15) :156 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228
RANK(16) :136 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229
RANK(17) :250 sec
2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - TOTAL
CORRELATION TIME: 256 sec
Do you have any idea? Why it is happening?
I assume that it gives 2 jobs to 2 cores in os229, but actually those 2
are one core.
Do you have any idea? If you have, how can I fix it? because the
longest time affects the whole time information. 100 sec delay is too
much for 250 sec comparison time,
and it might have finish around 160 sec.
|