Hi again,
I think the problem is solved. Thanks to Gus, I've tried
mpirun -mca mpi_paffinity_alone 1
while running the program, and I've made a quick search on that, it assures
that every program works on a specific core I guess.
(correct me if I'm wrong).
I've ran over 20 tests, and now it works
Hi Gus,
1 - first of all, turning off hyper-threading is not an option. And it gives
pretty good results if I can find a way to arrange the cores.
2 - Actually Eugene (one of her messages in this thread) had suggested to
arrange the slots.
I did and wrote the results, it delivers the cores random
for
...
rank 13=os221 slot=2
rank 14=os222 slot=2
rank 15=os224 slot=2
rank 16=os228 slot=4
rank 17=os229 slot=4
I've tried and here are the results, same thing happened.
2010-08-12 11:09:28,814 59759 DEBUG [0x7fbd3fdce740] - RANK(0) Printing
Times...
2010-08-12 11:09:28,814 59759 DEBUG [0x7fbd3fd
The way MPI processes are being assigned to hardware threads is perhaps
neither controlled nor optimal. On the HT nodes, two processes may end
up sharing the same core, with poorer performance.
Try submitting your job like this
% cat myrankfile1
rank 0=os223 slot=0
rank 1=os221 slot=0
rank
Hi Saygin
You could:
1) turn off hyperthreading (on BIOS), or
2) use the mpirun options (you didn't send your mpirun command)
to distribute the processes across the nodes, cores, etc.
"man mpirun" is a good resource, see the explanations about
the -byslot, -bynode, -loadbalance options.
3) In
Saygin,
You can use mpstat tool to see the load on each core at runtime.
Do you know exactly which particular calls are taking longer time ?
You can run just those two computations (one at a time) on a different
machine and check if the other machines have similar or lesser
computation time.
- P