There is a typo in your command line. You should use --mca (minus minus) instead of -mca
Also, you can try --machinefile instead of -machinefile Cheers, Gilles There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: –mca On Mon, Nov 14, 2022 at 11:04 AM timesir via users <users@lists.open-mpi.org> wrote: > *(py3.9) ➜ /share mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose > 100 --mca ras_base_verbose 100 which mpirun* > [computer01:04570] mca: base: component_find: searching NULL for ras > components > [computer01:04570] mca: base: find_dyn_components: checking NULL for ras > components > [computer01:04570] pmix:mca: base: components_register: registering > framework ras components > [computer01:04570] pmix:mca: base: components_register: found loaded > component simulator > [computer01:04570] pmix:mca: base: components_register: component > simulator register function successful > [computer01:04570] pmix:mca: base: components_register: found loaded > component pbs > [computer01:04570] pmix:mca: base: components_register: component pbs > register function successful > [computer01:04570] pmix:mca: base: components_register: found loaded > component slurm > [computer01:04570] pmix:mca: base: components_register: component slurm > register function successful > [computer01:04570] mca: base: components_open: opening ras components > [computer01:04570] mca: base: components_open: found loaded component > simulator > [computer01:04570] mca: base: components_open: found loaded component pbs > [computer01:04570] mca: base: components_open: component pbs open function > successful > [computer01:04570] mca: base: components_open: found loaded component slurm > [computer01:04570] mca: base: components_open: component slurm open > function successful > [computer01:04570] mca:base:select: Auto-selecting ras components > [computer01:04570] mca:base:select:( ras) Querying component [simulator] > [computer01:04570] mca:base:select:( ras) Querying component [pbs] > [computer01:04570] mca:base:select:( ras) Querying component [slurm] > [computer01:04570] mca:base:select:( ras) No component selected! > > ====================== ALLOCATED NODES > ====================== > [10/1444] > computer01: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.180.48 > 192.168.60.203: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > Flags: SLOTS_GIVEN > aliases: NONE > ================================================================= > > ====================== ALLOCATED NODES ====================== > computer01: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.180.48 > hepslustretest03: slots=1 max_slots=0 slots_inuse=0 state=UP > Flags: DAEMON_LAUNCHED:LOCATION_VERIFIED:SLOTS_GIVEN > aliases: 192.168.60.203,172.17.180.203,172.168.10.23,172.168.10.143 > ================================================================= > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > –mca > > Either request fewer procs for your application, or make more slots > available for use. > > A "slot" is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via "slots=N" clauses (N defaults to number of > processor cores if not provided) > 2. The --host command line parameter, via a ":N" suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the --host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > --use-hwthread-cpus option. > > Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > -------------------------------------------------------------------------- > > > > 在 2022/11/13 23:42, Jeff Squyres (jsquyres) 写道: > > Interesting. It says: > > [computer01:106117] AVAILABLE NODES FOR MAPPING: > [computer01:106117] node: computer01 daemon: 0 slots_available: 1 > > This is why it tells you you're out of slots: you're asking for 2, but it > only found 1. This means it's not seeing your hostfile somehow. > > I should have asked you to run with *2* variables last time -- can you > re-run with "mpirun --mca rmaps_base_verbose 100 --mca ras_base_verbose 100 > ..."? > > Turning on the RAS verbosity should show us what the hostfile component is > doing. > > -- > Jeff Squyres > jsquy...@cisco.com > ------------------------------ > *From:* 龙龙 <mrlong...@gmail.com> <mrlong...@gmail.com> > *Sent:* Sunday, November 13, 2022 3:13 AM > *To:* Jeff Squyres (jsquyres) <jsquy...@cisco.com> <jsquy...@cisco.com>; > Open MPI Users <users@lists.open-mpi.org> <users@lists.open-mpi.org> > *Subject:* Re: [OMPI devel] There are not enough slots available in the > system to satisfy the 2, slots that were requested by the application > > > *(py3.9) ➜ /share mpirun –version* > > mpirun (Open MPI) 5.0.0rc9 > > Report bugs to https://www.open-mpi.org/community/help/ > > *(py3.9) ➜ /share cat hosts* > > 192.168.180.48 slots=1 > 192.168.60.203 slots=1 > > *(py3.9) ➜ /share mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose > 100 which mpirun* > > [computer01:106117] mca: base: component_find: searching NULL for rmaps > components > [computer01:106117] mca: base: find_dyn_components: checking NULL for > rmaps components > [computer01:106117] pmix:mca: base: components_register: registering > framework rmaps components > [computer01:106117] pmix:mca: base: components_register: found loaded > component ppr > [computer01:106117] pmix:mca: base: components_register: component ppr > register function successful > [computer01:106117] pmix:mca: base: components_register: found loaded > component rank_file > [computer01:106117] pmix:mca: base: components_register: component > rank_file has no register or open function > [computer01:106117] pmix:mca: base: components_register: found loaded > component round_robin > [computer01:106117] pmix:mca: base: components_register: component > round_robin register function successful > [computer01:106117] pmix:mca: base: components_register: found loaded > component seq > [computer01:106117] pmix:mca: base: components_register: component seq > register function successful > [computer01:106117] mca: base: components_open: opening rmaps components > [computer01:106117] mca: base: components_open: found loaded component ppr > [computer01:106117] mca: base: components_open: component ppr open > function successful > [computer01:106117] mca: base: components_open: found loaded component > rank_file > [computer01:106117] mca: base: components_open: found loaded component > round_robin > [computer01:106117] mca: base: components_open: component round_robin open > function successful > [computer01:106117] mca: base: components_open: found loaded component seq > [computer01:106117] mca: base: components_open: component seq open > function successful > [computer01:106117] mca:rmaps:select: checking available component ppr > [computer01:106117] mca:rmaps:select: Querying component [ppr] > [computer01:106117] mca:rmaps:select: checking available component > rank_file > [computer01:106117] mca:rmaps:select: Querying component [rank_file] > [computer01:106117] mca:rmaps:select: checking available component > round_robin > [computer01:106117] mca:rmaps:select: Querying component [round_robin] > [computer01:106117] mca:rmaps:select: checking available component seq > [computer01:106117] mca:rmaps:select: Querying component [seq] > [computer01:106117] [prterun-computer01-106117@0,0]: Final mapper > priorities > [computer01:106117] Mapper: ppr Priority: 90 > [computer01:106117] Mapper: seq Priority: 60 > [computer01:106117] Mapper: round_robin Priority: 10 > [computer01:106117] Mapper: rank_file Priority: 0 > [computer01:106117] mca:rmaps: mapping job prterun-computer01-106117@1 > > [computer01:106117] mca:rmaps: setting mapping policies for job > prterun-computer01-106117@1 inherit TRUE hwtcpus FALSE [9/1957] > [computer01:106117] mca:rmaps[358] mapping not given - using bycore > [computer01:106117] setdefaultbinding[365] binding not given - using bycore > [computer01:106117] mca:rmaps:ppr: job prterun-computer01-106117@1 not > using ppr mapper PPR NULL policy PPR NOTSET > [computer01:106117] mca:rmaps:seq: job prterun-computer01-106117@1 not > using seq mapper > [computer01:106117] mca:rmaps:rr: mapping job prterun-computer01-106117@1 > [computer01:106117] AVAILABLE NODES FOR MAPPING: > [computer01:106117] node: computer01 daemon: 0 slots_available: 1 > [computer01:106117] mca:rmaps:rr: mapping by Core for job > prterun-computer01-106117@1 slots 1 num_procs 2 > ------------------------------ > > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > which > > Either request fewer procs for your application, or make more slots > available for use. > > A “slot” is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via “slots=N” clauses (N defaults to number of > processor cores if not provided) > 2. The –host command line parameter, via a “:N” suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the –host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > –use-hwthread-cpus option. > > Alternatively, you can use the –map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > ------------------------------ > 在 2022/11/8 05:46, Jeff Squyres (jsquyres) 写道: > > In the future, can you please just mail one of the lists? This particular > question is probably more of a users type of question (since we're not > talking about the internals of Open MPI itself), so I'll reply just on the > users list. > > For what it's worth, I'm unable to replicate your error: > > $ mpirun --version > > mpirun (Open MPI) 5.0.0rc9 > > > Report bugs to https://www.open-mpi.org/community/help/ > $ cat hostfile > > mpi002 slots=1 > > mpi005 slots=1 > > $ mpirun -n 2 --machinefile hostfile hostname > > mpi002 > > mpi005 > > Can you try running with "--mca rmaps_base_verbose 100" so that we can get > some debugging output and see why the slots aren't working for you? Show > the full output, like I did above (e.g., cat the hostfile, and then mpirun > with the MCA param and all the output). Thanks! > > -- > Jeff Squyres > jsquy...@cisco.com > ------------------------------ > *From:* devel <devel-boun...@lists.open-mpi.org> > <devel-boun...@lists.open-mpi.org> on behalf of mrlong via devel > <de...@lists.open-mpi.org> <de...@lists.open-mpi.org> > *Sent:* Monday, November 7, 2022 3:37 AM > *To:* de...@lists.open-mpi.org <de...@lists.open-mpi.org> > <de...@lists.open-mpi.org>; Open MPI Users <users@lists.open-mpi.org> > <users@lists.open-mpi.org> > *Cc:* mrlong <mrlong...@gmail.com> <mrlong...@gmail.com> > *Subject:* [OMPI devel] There are not enough slots available in the > system to satisfy the 2, slots that were requested by the application > > > *Two machines, each with 64 cores. The contents of the hosts file are:* > > 192.168.180.48 slots=1 > 192.168.60.203 slots=1 > *Why do you get the following error when running with openmpi 5.0.0rc9?* > > (py3.9) [user@machine01 share]0.5692263713929891nbsp; mpirun -n 2 > --machinefile hosts hostname > -------------------------------------------------------------------------- > There are not enough slots available in the system to satisfy the 2 > slots that were requested by the application: > > hostname > > Either request fewer procs for your application, or make more slots > available for use. > > A "slot" is the PRRTE term for an allocatable unit where we can > launch a process. The number of slots available are defined by the > environment in which PRRTE processes are run: > > 1. Hostfile, via "slots=N" clauses (N defaults to number of > processor cores if not provided) > 2. The --host command line parameter, via a ":N" suffix on the > hostname (N defaults to 1 if not provided) > 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) > 4. If none of a hostfile, the --host command line parameter, or an > RM is present, PRRTE defaults to the number of processor cores > > In all the above cases, if you want PRRTE to default to the number > of hardware threads instead of the number of processor cores, use the > --use-hwthread-cpus option. > > Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the > number of available slots when deciding the number of processes to > launch. > > >