Re: [OMPI users] Hyper-thread architecture effect on MPI jobs

Saygin Arkan Thu, 12 Aug 2010 04:39:32 -0400

Hi Gus,

1 - first of all, turning off hyper-threading is not an option. And it gives
pretty good results if I can find a way to arrange the cores.


2 - Actually Eugene (one of her messages in this thread) had suggested to
arrange the slots.
I did and wrote the results, it delivers the cores randomly, nothing
changed.
but I haven't checked loadbalance option. -byslot or -bynode is not gonna
help.

3 - Could you give me a bit more detail how affinity works? or what it does
actually?

Thanks a lot for your suggestions

Saygin

On Wed, Aug 11, 2010 at 6:18 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Hi Saygin
>
> You could:
>
> 1) turn off hyperthreading (on BIOS), or
>
> 2) use the mpirun options (you didn't send your mpirun command)
> to distribute the processes across the nodes, cores, etc.
> "man mpirun" is a good resource, see the explanations about
> the -byslot, -bynode, -loadbalance options.
>
> 3) In addition, you can use the mca parameters to set processor affinity
> in the mpirun command line "mpirun -mca mpi_paffinity_alone 1 ..."
> I don't know how this will play in a hyperthreaded machine,
> but it works fine in our dual processor quad-core computers
> (not hyperthreaded).
>
> Depending on your code, hyperthreading may not help performance anyway.
>
> I hope this helps,
> Gus Correa
>
> Saygin Arkan wrote:
>
>> Hello,
>>
>> I'm running mpi jobs in non-homogeneous cluster. 4 of my machines have the
>> following properties, os221, os222, os223, os224:
>>
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 23
>> model name      : Intel(R) Core(TM)2 Quad  CPU   Q9300  @ 2.50GHz
>> stepping        : 7
>> cache size      : 3072 KB
>> physical id     : 0
>> siblings        : 4
>> core id         : 3
>> cpu cores       : 4
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 10
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
>> constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est
>> tm2 ssse3 cx16 xtpr sse4_1 lahf_lm
>> bogomips        : 4999.40
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>>
>> and the problematic, hyper-threaded 2 machines are as follows, os228 and
>> os229:
>>
>> vendor_id       : GenuineIntel
>> cpu family      : 6
>> model           : 26
>> model name      : Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz
>> stepping        : 5
>> cache size      : 8192 KB
>> physical id     : 0
>> siblings        : 8
>> core id         : 3
>> cpu cores       : 4
>> fpu             : yes
>> fpu_exception   : yes
>> cpuid level     : 11
>> wp              : yes
>> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
>> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx
>> est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm ida
>> bogomips        : 5396.88
>> clflush size    : 64
>> cache_alignment : 64
>> address sizes   : 36 bits physical, 48 bits virtual
>>
>>
>> The problem is: those 2 machines seem to be having 8 cores (virtually,
>> actualy core number is 4).
>> When I submit an MPI job, I calculated the comparison times in the
>> cluster. I got strange results.
>>
>> I'm running the job on 6 nodes, 3 core per node. And sometimes ( I can say
>> 1/3 of the tests) os228 or os229 returns strange results. 2 cores are slow
>> (slower than the first 4 nodes) but the 3rd core is extremely fast.
>>
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - RANK(0) Printing
>> Times...
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(1)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(2)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(3)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(4)
>>  :37 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(5)
>>  :34 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(6)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(7)
>>  :39 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(8)
>>  :37 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(9)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(10)
>>  :*48 sec*
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(11)
>>  :35 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(12)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(13)
>>  :37 sec
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os222 RANK(14)
>>  :37 sec
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os224 RANK(15)
>>  :38 sec
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os228 RANK(16)
>>  :*43 sec*
>> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os229 RANK(17)
>>  :35 sec
>> TOTAL CORRELATION TIME: 48 sec
>>
>>
>> or another test:
>>
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - RANK(0) Printing
>> Times...
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(1)
>>  :170 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os222 RANK(2)
>>  :161 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os224 RANK(3)
>>  :158 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os228 RANK(4)
>>  :142 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os229 RANK(5)
>>  :*256 sec*
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os223 RANK(6)
>>  :156 sec
>> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(7)
>>  :162 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(8)
>>  :159 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(9)
>>  :168 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(10)
>>  :141 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(11)
>>  :136 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os223 RANK(12)
>>  :173 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os221 RANK(13)
>>  :164 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(14)
>>  :171 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(15)
>>  :156 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(16)
>>  :136 sec
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(17)
>>  :*250 sec*
>> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - TOTAL CORRELATION
>> TIME: 256 sec
>>
>>
>> Do you have any idea? Why it is happening?
>> I assume that it gives 2 jobs to 2 cores in os229, but actually those 2
>> are one core.
>> Do you have any idea? If you have, how can I fix it? because the longest
>> time affects the whole time information. 100 sec delay is too much for 250
>> sec comparison time,
>> and it might have finish around 160 sec.
>>
>>
>>
>> --
>> Saygin
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Saygin

Re: [OMPI users] Hyper-thread architecture effect on MPI jobs

Reply via email to