Hi All,
I am seeing some funky behavior and am hoping someone has some ideas on
where to start looking. I have installed openmpi 4.1.4 via spack on this
cluster, Slurm aware. I then build Orca against that via spack as well
(for context). Orca calls mpi under the hood with simple `mpirun -np X
....`. However I am running into a case where on some nodes I am getting
`While computing bindings, we found no available cpus on the following
node:` when trying to use more than `-np 2`. However, when I add
`--oversubscribe` and `--host [hostname]` I can run successfully.
The other weird part of this is that it does not happen on all of my
compute nodes. All of the compute nodes are installed identically with
Rocky 8.
Here are examples:
```
[user@node2428 sbatch_scripts]$ mpirun --display-allocation -np 4 hostname
====================== ALLOCATED NODES ======================
node2428: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
--------------------------------------------------------------------------
While computing bindings, we found no available cpus on
the following node:
Node: node2428
Please check your allocation.
--------------------------------------------------------------------------
[user@node2428 sbatch_scripts]$ mpirun --display-allocation
--oversubscribe -np 4 hostname
====================== ALLOCATED NODES ======================
node2428: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
--------------------------------------------------------------------------
While computing bindings, we found no available cpus on
the following node:
Node: node2428
Please check your allocation.
--------------------------------------------------------------------------
[user@node2428 sbatch_scripts]$ mpirun --display-allocation
--oversubscribe --host node2428 -np 4 hostname
====================== ALLOCATED NODES ======================
node2428: flags=0x11 slots=4 max_slots=0 slots_inuse=0 state=UP
=================================================================
node2428
node2428
node2428
node2428
```
Thanks in advance!
--
Morgan Ludwig
Techsquare Inc.
http://www.techsquare.com/
mlud...@techsquare.com