Hello Ralph, Is there any update on this?
Thanks, Adam LeBlanc On Fri, Nov 2, 2018 at 11:06 AM Adam LeBlanc <alebl...@iol.unh.edu> wrote: > Hello Ralph, > > When I do the -np 7 it still fails with "There are not enough slots > available in the system to satisfy the 7 slots that were requested by the > application", but when I do -np 2 it will actually run from a machine that > was failing but will only run on one other machine and in this case it ran > from a machine with 2 processors to a machine with only 1 processor. If I > try to make -np higher then 2 it will also fail. > > -Adam LeBlanc > > On Thu, Nov 1, 2018 at 3:53 PM Ralph H Castain <r...@open-mpi.org> wrote: > >> Hmmm - try adding a value for nprocs instead of leaving it blank. Say >> ā-np 7ā >> >> Sent from my iPhone >> >> On Nov 1, 2018, at 11:56 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: >> >> Hello Ralph, >> >> Here is the output for a failing machine: >> >> [130_02:44:13_aleblanc@farbauti]{~}$ > mpirun --mca >> btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0 >> --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues >> P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts --mca >> ras_base_verbose 5 IMB-MPI1 >> >> ====================== ALLOCATED NODES ====================== >> farbauti: flags=0x11 slots=1 max_slots=0 slots_inuse=0 state=UP >> hyperion-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> io-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> jarnsaxa-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> rhea-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> tarqeq-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> tarvos-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> ================================================================= >> -------------------------------------------------------------------------- >> There are not enough slots available in the system to satisfy the 7 slots >> that were requested by the application: >> 10 >> >> Either request fewer slots for your application, or make more slots >> available >> for use. >> -------------------------------------------------------------------------- >> >> >> Here is an output of a passing machine: >> >> [1_02:54:26_aleblanc@hyperion]{~}$ > mpirun --mca >> btl_openib_warn_no_device_params_found 0 --mca orte_base_help_aggregate 0 >> --mca btl openib,vader,self --mca pml ob1 --mca btl_openib_receive_queues >> P,65536,120,64,32 -hostfile /home/soesterreich/ce-mpi-hosts --mca >> ras_base_verbose 5 IMB-MPI1 >> >> ====================== ALLOCATED NODES ====================== >> hyperion: flags=0x11 slots=1 max_slots=0 slots_inuse=0 state=UP >> farbauti-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> io-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> jarnsaxa-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> rhea-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> tarqeq-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> tarvos-ce: flags=0x10 slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN >> ================================================================= >> >> >> Yes the hostfile is available on all nodes through an NFS mount for all >> of our home directories. >> >> On Thu, Nov 1, 2018 at 2:44 PM Adam LeBlanc <alebl...@iol.unh.edu> wrote: >> >>> >>> >>> ---------- Forwarded message --------- >>> From: Ralph H Castain <r...@open-mpi.org> >>> Date: Thu, Nov 1, 2018 at 2:34 PM >>> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count >>> To: Open MPI Users <users@lists.open-mpi.org> >>> >>> >>> Iām a little under the weather and so will only be able to help a bit at >>> a time. However, a couple of things to check: >>> >>> * add -mca ras_base_verbose 5 to the cmd line to see what mpirun thought >>> the allocation was >>> >>> * is the hostfile available on every node? >>> >>> Ralph >>> >>> On Nov 1, 2018, at 10:55 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: >>> >>> Hello Ralph, >>> >>> Attached below is the verbose output for a failing machine and a passing >>> machine. >>> >>> Thanks, >>> Adam LeBlanc >>> >>> On Thu, Nov 1, 2018 at 1:41 PM Adam LeBlanc <alebl...@iol.unh.edu> >>> wrote: >>> >>>> >>>> >>>> ---------- Forwarded message --------- >>>> From: Ralph H Castain <r...@open-mpi.org> >>>> Date: Thu, Nov 1, 2018 at 1:07 PM >>>> Subject: Re: [OMPI users] Bug with Open-MPI Processor Count >>>> To: Open MPI Users <users@lists.open-mpi.org> >>>> >>>> >>>> Set rmaps_base_verbose=10 for debugging output >>>> >>>> Sent from my iPhone >>>> >>>> On Nov 1, 2018, at 9:31 AM, Adam LeBlanc <alebl...@iol.unh.edu> wrote: >>>> >>>> The version by the way for Open-MPI is 3.1.2. >>>> >>>> -Adam LeBlanc >>>> >>>> On Thu, Nov 1, 2018 at 12:05 PM Adam LeBlanc <alebl...@iol.unh.edu> >>>> wrote: >>>> >>>>> Hello, I am an employee of the UNH InterOperability Lab, and we are in >>>>> the process of testing OFED-4.17-RC1 for the OpenFabrics Alliance. We have >>>>> purchased some new hardware that has one processor, and noticed an issue >>>>> when running mpi jobs on nodes that do not have similar processor counts. >>>>> If we launch the MPI job from a node that has 2 processors, it will fail >>>>> and stating there are not enough resources and will not start the run, >>>>> like >>>>> so: >>>>> -------------------------------------------------------------------------- >>>>> There are not enough slots available in the system to satisfy the 14 slots >>>>> that were requested by the application: IMB-MPI1 Either request fewer >>>>> slots for your application, or make more slots available for use. >>>>> -------------------------------------------------------------------------- >>>>> If we launch the MPI job from the node with one processor, without >>>>> changing >>>>> the mpirun command at all, it runs as expected. Here is the command being >>>>> run: mpirun --mca btl_openib_warn_no_device_params_found 0 --mca >>>>> orte_base_help_aggregate 0 --mca btl openib,vader,self --mca pml ob1 --mca >>>>> btl_openib_receive_queues P,65536,120,64,32 -hostfile >>>>> /home/soesterreich/ce-mpi-hosts IMB-MPI1 Here is the hostfile being used: >>>>> farbauti-ce.ofa.iol.unh.edu slots=1 hyperion-ce.ofa.iol.unh.edu >>>>> slots=1 io-ce.ofa.iol.unh.edu slots=1 jarnsaxa-ce.ofa.iol.unh.edu >>>>> slots=1 rhea-ce.ofa.iol.unh.edu slots=1 tarqeq-ce.ofa.iol.unh.edu >>>>> slots=1 tarvos-ce.ofa.iol.unh.edu slots=1 This seems like a bug and >>>>> we would like some help to explain and fix what is happening. The IBTA >>>>> plugfest saw similar behaviours, so this should be reproduceable. Thanks, >>>>> Adam LeBlanc >>>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>>> >>> <passing_verbose_output.txt><failing_verbose_output.txt> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >>> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://lists.open-mpi.org/mailman/listinfo/users > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users