We would rather that OpenMPI use shared-mem (sm) module when running
intra-node processes.
Doesn't PSM use shared memory to communicate between peers on the same
node?
Possibly, yes (I'm not sure). Even if it does it appears to consume a
'hardware context' for each peer - this is what we want to avoid.
We believe that by using our scheduler's allocation policy (packing)
and considering our job mix, we might be able to add nodes to this
cluster using only one HCA per node (again, we would rather not use
'shared contexts').
Are you saying that you want Open MPI to not use PSM when the job
entirely fits within a single node?
Yes, considering that the use of sm instead of psm would conserve
hardware contexts (and thus reduce the need for HCAs)
If so, you might want to experiment with the pre-job hook in the job
scheduler. You could try setting MCA parameters as environment
variables (e.g., setenv OMPI_MCA_pml ob1 -- which would exclude the CM
PML and therefore the PSM MTL) if your pre-job hook can tell if the job
fits entirely on a single node.
Does that help?
That's an interesting idea that I will investigate.
Thank you,
Tom
Tom Harvill
hcc.unl.edu
402.472.5660