I ran some aggregate bandwidth tests between 2 hosts connected by
both QDR InfiniBand and RoCE enabled 10 Gbps Mellanox cards. The tests
measured the aggregate performance for 16 cores on one host communicating
with 16 on the second host. I saw the same performance as with the QDR
InfiniBand
Ah, indeed - sounds like we are not correctly picking up the cpuset. Can
you pass me the environ from the procs, and the contents of the
$PBS_HOSTFILE? IIRC, Torque isn't going to bind us to those cores, but
instead sets something into the environ or the allocation that we need to
correctly parse.
Thank you Ralph for the advice. I will move on to try 1.8.4 as soon as I can.
The first torque job asks for nodes=1:ppn=16:whatever
The second job asks for nodes=1:ppn=16:whatever
Both jobs happen to finish up on the same 64 core node. Each is running on its
own set of 16 cores 0-15, and 16-31 res
I'm not entirely clear on the sequence of commands here. Is the user
requesting a new allocation from maui/torque for each run? In this case,
it's possible we aren't correctly picking up the external binding from
Torque. This would likely be a bug we would need to fix.
Or is the user obtaining a s
This might or might not be related to openmpi 1.8.1. I have not seen the
problem with the same program and previous versions of openmpi
We have 64 core AMD nodes. I have recently recompiled a large Monte Carlo
program using 1.8.1 version of openmpi. Users start this program using
maui/torque as