Hi,

I am evaluating the performance of a clustering program written in Java
with MPI+threads and would like to get some insight in solving a peculiar
case. I've attached a performance graph to explain this.

In essence the tests were carried out as TxPxN, where T is threads per
process, P is processes per node, and N is number of nodes. I noticed an
inefficiency with Tx*1*xN cases in general (tall bars in graph).

To elaborate a bit further,
1. each node has 2 sockets with 4 cores each (totaling 8 cores)
2. used OpenMPI 1.7.5rc5 (later tested with 1.8 and observed the same)
3. with options
     A.) --map-by node:PE=4 and --bind-to core
     B.) --map-by node:PE=8 and --bind-to-core
     C.) --map-by socket and --bind-to none

Timing of A,B,C came out as A < B < C, so used results from option A for Tx
*1*xN in the graph.

Could you please give some suggestion that may help to speed up these Tx*1*xN
cases? Also, I expected B to perform better than A as threads could utilize
all 8 cores, but it wasn't the case.

Thank you,
Saliya


[image: Inline image 1]

-- 
Saliya Ekanayake esal...@gmail.com
Cell 812-391-4914 Home 812-961-6383
http://saliya.org

Reply via email to