Hi Gus, 1 - first of all, turning off hyper-threading is not an option. And it gives pretty good results if I can find a way to arrange the cores.
2 - Actually Eugene (one of her messages in this thread) had suggested to arrange the slots. I did and wrote the results, it delivers the cores randomly, nothing changed. but I haven't checked loadbalance option. -byslot or -bynode is not gonna help. 3 - Could you give me a bit more detail how affinity works? or what it does actually? Thanks a lot for your suggestions Saygin On Wed, Aug 11, 2010 at 6:18 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Saygin > > You could: > > 1) turn off hyperthreading (on BIOS), or > > 2) use the mpirun options (you didn't send your mpirun command) > to distribute the processes across the nodes, cores, etc. > "man mpirun" is a good resource, see the explanations about > the -byslot, -bynode, -loadbalance options. > > 3) In addition, you can use the mca parameters to set processor affinity > in the mpirun command line "mpirun -mca mpi_paffinity_alone 1 ..." > I don't know how this will play in a hyperthreaded machine, > but it works fine in our dual processor quad-core computers > (not hyperthreaded). > > Depending on your code, hyperthreading may not help performance anyway. > > I hope this helps, > Gus Correa > > Saygin Arkan wrote: > >> Hello, >> >> I'm running mpi jobs in non-homogeneous cluster. 4 of my machines have the >> following properties, os221, os222, os223, os224: >> >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 23 >> model name : Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz >> stepping : 7 >> cache size : 3072 KB >> physical id : 0 >> siblings : 4 >> core id : 3 >> cpu cores : 4 >> fpu : yes >> fpu_exception : yes >> cpuid level : 10 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm >> constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx smx est >> tm2 ssse3 cx16 xtpr sse4_1 lahf_lm >> bogomips : 4999.40 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 36 bits physical, 48 bits virtual >> >> and the problematic, hyper-threaded 2 machines are as follows, os228 and >> os229: >> >> vendor_id : GenuineIntel >> cpu family : 6 >> model : 26 >> model name : Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz >> stepping : 5 >> cache size : 8192 KB >> physical id : 0 >> siblings : 8 >> core id : 3 >> cpu cores : 4 >> fpu : yes >> fpu_exception : yes >> cpuid level : 11 >> wp : yes >> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca >> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx >> rdtscp lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx >> est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm ida >> bogomips : 5396.88 >> clflush size : 64 >> cache_alignment : 64 >> address sizes : 36 bits physical, 48 bits virtual >> >> >> The problem is: those 2 machines seem to be having 8 cores (virtually, >> actualy core number is 4). >> When I submit an MPI job, I calculated the comparison times in the >> cluster. I got strange results. >> >> I'm running the job on 6 nodes, 3 core per node. And sometimes ( I can say >> 1/3 of the tests) os228 or os229 returns strange results. 2 cores are slow >> (slower than the first 4 nodes) but the 3rd core is extremely fast. >> >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - RANK(0) Printing >> Times... >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(1) >> :38 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(2) >> :38 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(3) >> :38 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(4) >> :37 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(5) >> :34 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(6) >> :38 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(7) >> :39 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os222 RANK(8) >> :37 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os224 RANK(9) >> :38 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os228 RANK(10) >> :*48 sec* >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os229 RANK(11) >> :35 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os223 RANK(12) >> :38 sec >> 2010-08-05 14:30:58,926 50672 DEBUG [0x7fcadf98c740] - os221 RANK(13) >> :37 sec >> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os222 RANK(14) >> :37 sec >> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os224 RANK(15) >> :38 sec >> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os228 RANK(16) >> :*43 sec* >> 2010-08-05 14:30:58,926 50673 DEBUG [0x7fcadf98c740] - os229 RANK(17) >> :35 sec >> TOTAL CORRELATION TIME: 48 sec >> >> >> or another test: >> >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - RANK(0) Printing >> Times... >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(1) >> :170 sec >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os222 RANK(2) >> :161 sec >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os224 RANK(3) >> :158 sec >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os228 RANK(4) >> :142 sec >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os229 RANK(5) >> :*256 sec* >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os223 RANK(6) >> :156 sec >> 2010-08-09 15:28:10,947 272904 DEBUG [0x7f27dec27740] - os221 RANK(7) >> :162 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(8) >> :159 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(9) >> :168 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(10) >> :141 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(11) >> :136 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os223 RANK(12) >> :173 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os221 RANK(13) >> :164 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os222 RANK(14) >> :171 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os224 RANK(15) >> :156 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os228 RANK(16) >> :136 sec >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - os229 RANK(17) >> :*250 sec* >> 2010-08-09 15:28:10,947 272905 DEBUG [0x7f27dec27740] - TOTAL CORRELATION >> TIME: 256 sec >> >> >> Do you have any idea? Why it is happening? >> I assume that it gives 2 jobs to 2 cores in os229, but actually those 2 >> are one core. >> Do you have any idea? If you have, how can I fix it? because the longest >> time affects the whole time information. 100 sec delay is too much for 250 >> sec comparison time, >> and it might have finish around 160 sec. >> >> >> >> -- >> Saygin >> >> >> ------------------------------------------------------------------------ >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Saygin