*(py3.9) ➜ /share mpirun –version* mpirun (Open MPI) 5.0.0rc9
Report bugs to https://www.open-mpi.org/community/help/ *(py3.9) ➜ /share cat hosts* 192.168.180.48 slots=1 192.168.60.203 slots=1 *(py3.9) ➜ /share mpirun -n 2 -machinefile hosts –mca rmaps_base_verbose 100 which mpirun* [computer01:106117] mca: base: component_find: searching NULL for rmaps components [computer01:106117] mca: base: find_dyn_components: checking NULL for rmaps components [computer01:106117] pmix:mca: base: components_register: registering framework rmaps components [computer01:106117] pmix:mca: base: components_register: found loaded component ppr [computer01:106117] pmix:mca: base: components_register: component ppr register function successful [computer01:106117] pmix:mca: base: components_register: found loaded component rank_file [computer01:106117] pmix:mca: base: components_register: component rank_file has no register or open function [computer01:106117] pmix:mca: base: components_register: found loaded component round_robin [computer01:106117] pmix:mca: base: components_register: component round_robin register function successful [computer01:106117] pmix:mca: base: components_register: found loaded component seq [computer01:106117] pmix:mca: base: components_register: component seq register function successful [computer01:106117] mca: base: components_open: opening rmaps components [computer01:106117] mca: base: components_open: found loaded component ppr [computer01:106117] mca: base: components_open: component ppr open function successful [computer01:106117] mca: base: components_open: found loaded component rank_file [computer01:106117] mca: base: components_open: found loaded component round_robin [computer01:106117] mca: base: components_open: component round_robin open function successful [computer01:106117] mca: base: components_open: found loaded component seq [computer01:106117] mca: base: components_open: component seq open function successful [computer01:106117] mca:rmaps:select: checking available component ppr [computer01:106117] mca:rmaps:select: Querying component [ppr] [computer01:106117] mca:rmaps:select: checking available component rank_file [computer01:106117] mca:rmaps:select: Querying component [rank_file] [computer01:106117] mca:rmaps:select: checking available component round_robin [computer01:106117] mca:rmaps:select: Querying component [round_robin] [computer01:106117] mca:rmaps:select: checking available component seq [computer01:106117] mca:rmaps:select: Querying component [seq] [computer01:106117] [prterun-computer01-106117@0,0]: Final mapper priorities [computer01:106117] Mapper: ppr Priority: 90 [computer01:106117] Mapper: seq Priority: 60 [computer01:106117] Mapper: round_robin Priority: 10 [computer01:106117] Mapper: rank_file Priority: 0 [computer01:106117] mca:rmaps: mapping job prterun-computer01-106117@1 [computer01:106117] mca:rmaps: setting mapping policies for job prterun-computer01-106117@1 inherit TRUE hwtcpus FALSE [9/1957] [computer01:106117] mca:rmaps[358] mapping not given - using bycore [computer01:106117] setdefaultbinding[365] binding not given - using bycore [computer01:106117] mca:rmaps:ppr: job prterun-computer01-106117@1 not using ppr mapper PPR NULL policy PPR NOTSET [computer01:106117] mca:rmaps:seq: job prterun-computer01-106117@1 not using seq mapper [computer01:106117] mca:rmaps:rr: mapping job prterun-computer01-106117@1 [computer01:106117] AVAILABLE NODES FOR MAPPING: [computer01:106117] node: computer01 daemon: 0 slots_available: 1 [computer01:106117] mca:rmaps:rr: mapping by Core for job prterun-computer01-106117@1 slots 1 num_procs 2 ------------------------------ There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: which Either request fewer procs for your application, or make more slots available for use. A “slot” is the PRRTE term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which PRRTE processes are run: 1. Hostfile, via “slots=N” clauses (N defaults to number of processor cores if not provided) 2. The –host command line parameter, via a “:N” suffix on the hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) 4. If none of a hostfile, the –host command line parameter, or an RM is present, PRRTE defaults to the number of processor cores In all the above cases, if you want PRRTE to default to the number of hardware threads instead of the number of processor cores, use the –use-hwthread-cpus option. Alternatively, you can use the –map-by :OVERSUBSCRIBE option to ignore the number of available slots when deciding the number of processes to launch. ------------------------------ 在 2022/11/8 05:46, Jeff Squyres (jsquyres) 写道: In the future, can you please just mail one of the lists? This particular question is probably more of a users type of question (since we're not talking about the internals of Open MPI itself), so I'll reply just on the users list. For what it's worth, I'm unable to replicate your error: $ mpirun --version mpirun (Open MPI) 5.0.0rc9 Report bugs to https://www.open-mpi.org/community/help/ $ cat hostfile mpi002 slots=1 mpi005 slots=1 $ mpirun -n 2 --machinefile hostfile hostname mpi002 mpi005 Can you try running with "--mca rmaps_base_verbose 100" so that we can get some debugging output and see why the slots aren't working for you? Show the full output, like I did above (e.g., cat the hostfile, and then mpirun with the MCA param and all the output). Thanks! -- Jeff Squyres jsquy...@cisco.com ------------------------------ *From:* devel <devel-boun...@lists.open-mpi.org> <devel-boun...@lists.open-mpi.org> on behalf of mrlong via devel <de...@lists.open-mpi.org> <de...@lists.open-mpi.org> *Sent:* Monday, November 7, 2022 3:37 AM *To:* de...@lists.open-mpi.org <de...@lists.open-mpi.org> <de...@lists.open-mpi.org>; Open MPI Users <users@lists.open-mpi.org> <users@lists.open-mpi.org> *Cc:* mrlong <mrlong...@gmail.com> <mrlong...@gmail.com> *Subject:* [OMPI devel] There are not enough slots available in the system to satisfy the 2, slots that were requested by the application *Two machines, each with 64 cores. The contents of the hosts file are:* 192.168.180.48 slots=1 192.168.60.203 slots=1 *Why do you get the following error when running with openmpi 5.0.0rc9?* (py3.9) [user@machine01 share]0.5692263713929891nbsp; mpirun -n 2 --machinefile hosts hostname -------------------------------------------------------------------------- There are not enough slots available in the system to satisfy the 2 slots that were requested by the application: hostname Either request fewer procs for your application, or make more slots available for use. A "slot" is the PRRTE term for an allocatable unit where we can launch a process. The number of slots available are defined by the environment in which PRRTE processes are run: 1. Hostfile, via "slots=N" clauses (N defaults to number of processor cores if not provided) 2. The --host command line parameter, via a ":N" suffix on the hostname (N defaults to 1 if not provided) 3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.) 4. If none of a hostfile, the --host command line parameter, or an RM is present, PRRTE defaults to the number of processor cores In all the above cases, if you want PRRTE to default to the number of hardware threads instead of the number of processor cores, use the --use-hwthread-cpus option. Alternatively, you can use the --map-by :OVERSUBSCRIBE option to ignore the number of available slots when deciding the number of processes to launch.