Hmmm - try adding a value for nprocs instead of leaving it blank. Say ā-np 7ā
Sent from my iPhone > On Nov 1, 2018, at 11:56 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: > > Hello Ralph, > > Here is the output for a failing machine: > > [130_02:44:13_aleblanc@farbauti]{~}$ > mpirun --mca > btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0 > --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues > P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts --mca > ras_base_verbose 5 IMB-MPI1 > > ====================== ALLOCATED NODES ====================== > farbauti: flags=0x11 slots=1 max_slots=0 slots_inuse=0 state=UP > hyperion-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > io-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > jarnsaxa-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > rhea-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > tarqeq-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > tarvos-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > ================================================================= > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 7 slots > that were requested by the application: > 10 > > Either request fewer slots for your application, or make more slots available > for use. > -------------------------------------------------------------------------- > > > Here is an output of a passing machine: > > [1_02:54:26_aleblanc@hyperion]{~}$ > mpirun --mca > btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0 > --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues > P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts --mca > ras_base_verbose 5 IMB-MPI1 > > ====================== ALLOCATED NODES ====================== > hyperion: flags=0x11 slots=1 max_slots=0 slots_inuse=0 state=UP > farbauti-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > io-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > jarnsaxa-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > rhea-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > tarqeq-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > tarvos-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > ================================================================= > > > Yes the hostfile is available on all nodes through an NFS mount for all of > our home directories. > >> On Thu, Nov 1, 2018 at 2:44 PM Adam LeBlanc <alebl...@iol.unh.edu> wrote: >> >> >> ---------- Forwarded message --------- >> From: Ralph H Castain <r...@open-mpi.org> >> Date: Thu, Nov 1, 2018 at 2:34 PM >> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count >> To: Open MPI Users <users@lists.open-mpi.org> >> >> >> Iām a little under the weather and so will only be able to help a bit at a >> time. However, a couple of things to check: >> >> * add -mca ras_base_verbose 5 to the cmd line to see what mpirun thought the >> allocation was >> >> * is the hostfile available on every node? >> >> Ralph >> >>> On Nov 1, 2018, at 10:55 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: >>> >>> Hello Ralph, >>> >>> Attached below is the verbose output for a failing machine and a passing >>> machine. >>> >>> Thanks, >>> Adam LeBlanc >>> >>>> On Thu, Nov 1, 2018 at 1:41 PM Adam LeBlanc <alebl...@iol.unh.edu> wrote: >>>> >>>> >>>> ---------- Forwarded message --------- >>>> From: Ralph H Castain <r...@open-mpi.org> >>>> Date: Thu, Nov 1, 2018 at 1:07 PM >>>> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count >>>> To: Open MPI Users <users@lists.open-mpi.org> >>>> >>>> >>>> Set rmaps_base_verbose=10 for debugging output >>>> >>>> Sent from my iPhone >>>> >>>>> On Nov 1, 2018, at 9:31 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: >>>>> >>>>> The version by the way for Open-MPI is 3.1.2. >>>>> >>>>> -Adam LeBlanc >>>>> >>>>>> On Thu, Nov 1, 2018 at 12:05 PM Adam LeBlanc <alebl...@iol.unh.edu> >>>>>> wrote: >>>>>> Hello, >>>>>> >>>>>> I am an employee of the UNH InterOperability Lab, and we are in the >>>>>> process of testing OFED-4.17-RC1 for the OpenFabrics Alliance. We have >>>>>> purchased some new hardware that has one processor, and noticed an issue >>>>>> when running mpi jobs on nodes that do not have similar processor >>>>>> counts. If we launch the MPI job from a node that has 2 processors, it >>>>>> will fail and stating there are not enough resources and will not start >>>>>> the run, like so: >>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> There are not enough slots available in the system to satisfy the 14 >>>>>> slots >>>>>> that were requested by the application: >>>>>> IMB-MPI1 >>>>>> >>>>>> Either request fewer slots for your application, or make more slots >>>>>> available >>>>>> for use. >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> If we launch the MPI job from the node with one processor, without >>>>>> changing the mpirun command at all, it runs as expected. >>>>>> >>>>>> Here is the command being run: >>>>>> >>>>>> mpirun --mca btl_openib_warn_no_device_params_found 0 --mca >>>>>> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 >>>>>> --mca btl_openib_receive_queues P,65536,120,64,32 -hostfile >>>>>> /home/soesterreich/ce-mpi-hosts IMB-MPI1 >>>>>> >>>>>> Here is the hostfile being used: >>>>>> >>>>>> farbauti-ce.ofa.iol.unh.edu slots=1 >>>>>> hyperion-ce.ofa.iol.unh.edu slots=1 >>>>>> io-ce.ofa.iol.unh.edu slots=1 >>>>>> jarnsaxa-ce.ofa.iol.unh.edu slots=1 >>>>>> rhea-ce.ofa.iol.unh.edu slots=1 >>>>>> tarqeq-ce.ofa.iol.unh.edu slots=1 >>>>>> tarvos-ce.ofa.iol.unh.edu slots=1 >>>>>> >>>>>> This seems like a bug and we would like some help to explain and fix >>>>>> what is happening. The IBTA plugfest saw similar behaviours, so this >>>>>> should be reproduceable. >>>>>> >>>>>> Thanks, >>>>>> Adam LeBlanc >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>> <passing_verbose_output.txt><failing_verbose_output.txt>_______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users