sounds to me like your maui scheduler didn’t provide any allocated slots on the 
nodes - did you check $PBS_NODEFILE?

> On Aug 9, 2017, at 12:41 PM, A M <amm.p...@gmail.com> wrote:
> 
> 
> Hello,
> 
> I have just ran into a strange issue with "mpirun". Here is what happened:
> 
> I successfully installed Torque 6.1.1.1 with the plain pbs_sched on a minimal 
> set of 2 IB nodes. Then I added openmpi 2.1.1 compiled with verbs and tm, and 
> have verified that mpirun works as it should with a small "pingpong" program. 
> 
> Here is my Torque minimal jobscript which I used to check the IB message 
> passing:
> 
> #!/bin/sh
> #PBS -o Out
> #PBS -e Err
> #PBS -l nodes=2:ppn=1
> cd $PBS_O_WORKDIR
> mpirun -np 2 -pernode ./pingpong 4000000
> 
> The job correctly used IB as the default message passing iface and resulted 
> in 3.6 Gb/sec "pingpong" bandwidth which is correct in my case, since the two 
> batch nodes have the QDR HCAs.
> 
> I have then stopped "pbs_sched" and started the Maui 3.3.1 scheduler instead. 
> Serial jobs work without any problem, but the same jobscript is now failing 
> with the following  message:
> 
> --------
> Your job has requested more processes than the ppr for this topology can 
> support:
> App: /lustre/work/user/testus/pingpong
> Number of procs:  2
> PPR: 1:node
> Please revise the conflict and try again.
> --------
> 
> I then have tried to play with  - -nooversubscribe and "--pernode 2" options, 
> but the error persisted. It looks like the freshmost "mpirun" is getting some 
> information from the latest available Maui scheduler. It is enough to go back 
> to "pbs_sched", and everything works like a charm. I used the preexisting 
> "maui.cfg" file which still works well on the oldish Centos 6 with an old 
> 1.8.5 version of openmpi.  
> 
> Thanks ahead for any hint/comment on how to address this. Are there any other 
> mpirun options to try? Should I try to downgrade openmpi to the latest 1.X 
> series?
> 
> Andy.
>  
> 
>  
> 
> 
> 
> 
> 
> 
> 
> 
> 
> mpirun -np 2 -pernode --mca btl ^tcp ./pingpong 4000000
> 
> 
> 2.   
> 
>  
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to