[OMPI users] using multiple IB connections between hosts

2015-01-28 Thread Dave Turner
I ran some aggregate bandwidth tests between 2 hosts connected by both QDR InfiniBand and RoCE enabled 10 Gbps Mellanox cards. The tests measured the aggregate performance for 16 cores on one host communicating with 16 on the second host. I saw the same performance as with the QDR InfiniBand

Re: [OMPI users] 1.8.1 query [SEC=UNCLASSIFIED]

2015-01-28 Thread Ralph Castain
Ah, indeed - sounds like we are not correctly picking up the cpuset. Can you pass me the environ from the procs, and the contents of the $PBS_HOSTFILE? IIRC, Torque isn't going to bind us to those cores, but instead sets something into the environ or the allocation that we need to correctly parse.

[OMPI users] 1.8.1 query [SEC=UNCLASSIFIED]

2015-01-28 Thread DOHERTY, Greg
Thank you Ralph for the advice. I will move on to try 1.8.4 as soon as I can. The first torque job asks for nodes=1:ppn=16:whatever The second job asks for nodes=1:ppn=16:whatever Both jobs happen to finish up on the same 64 core node. Each is running on its own set of 16 cores 0-15, and 16-31 res

Re: [OMPI users] 1.8.1 [SEC=UNCLASSIFIED]

2015-01-28 Thread Ralph Castain
I'm not entirely clear on the sequence of commands here. Is the user requesting a new allocation from maui/torque for each run? In this case, it's possible we aren't correctly picking up the external binding from Torque. This would likely be a bug we would need to fix. Or is the user obtaining a s

[OMPI users] 1.8.1 [SEC=UNCLASSIFIED]

2015-01-28 Thread DOHERTY, Greg
This might or might not be related to openmpi 1.8.1. I have not seen the problem with the same program and previous versions of openmpi We have 64 core AMD nodes. I have recently recompiled a large Monte Carlo program using 1.8.1 version of openmpi. Users start this program using maui/torque as