It isn’t an issue as there is nothing wrong with OMPI. Your method of joining the allocation is a problem. What you have done is to create a job step that has only 1 slot/node. We have no choice but to honor that constraint and run within it.
What you should be doing is to use salloc to create the allocation. This places you inside the main allocation so we can use all of it. > On Sep 8, 2017, at 1:27 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > > Thanks, now i can reproduce the issue > > > Cheers, > > > Gilles > > > On 9/8/2017 5:20 PM, Maksym Planeta wrote: >> I run start an interactive allocation and I just noticed that the problem >> happens, when I join this allocation from another shell. >> >> Here is how I join: >> >> srun --pty --x11 --jobid=$(squeue -u $USER -o %A | tail -n 1) bash >> >> And here is how I create the allocation: >> >> srun --pty --nodes 8 --ntasks-per-node 24 --mem 50G --time=3:00:00 >> --partition=haswell --x11 bash >> >> >> On 09/08/2017 09:58 AM, Gilles Gouaillardet wrote: >>> Maxsym, >>> >>> >>> can you please post your sbatch script ? >>> >>> fwiw, i am unable to reproduce the issue with the latest v2.x from github. >>> >>> >>> by any chance, would you be able to test the latest openmpi 2.1.2rc3 ? >>> >>> >>> Cheers, >>> >>> >>> Gilles >>> >>> >>> On 9/8/2017 4:19 PM, Maksym Planeta wrote: >>>> Indeed mpirun shows slots=1 per node, but I create allocation with >>>> --ntasks-per-node 24, so I do have all cores of the node allocated. >>>> >>>> When I use srun I can get all the cores. >>>> >>>> On 09/07/2017 02:12 PM, r...@open-mpi.org wrote: >>>>> My best guess is that SLURM has only allocated 2 slots, and we >>>>> respect the RM regardless of what you say in the hostfile. You can >>>>> check this by adding --display-allocation to your cmd line. You >>>>> probably need to tell slurm to allocate more cpus/node. >>>>> >>>>> >>>>>> On Sep 7, 2017, at 3:33 AM, Maksym Planeta >>>>>> <mplan...@os.inf.tu-dresden.de> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I'm trying to tell OpenMPI how many processes per node I want to >>>>>> use, but mpirun seems to ignore the configuration I provide. >>>>>> >>>>>> I create following hostfile: >>>>>> >>>>>> $ cat hostfile.16 >>>>>> taurusi6344 slots=16 >>>>>> taurusi6348 slots=16 >>>>>> >>>>>> And then start the app as follows: >>>>>> >>>>>> $ mpirun --display-map -machinefile hostfile.16 -np 2 hostname >>>>>> Data for JOB [42099,1] offset 0 >>>>>> >>>>>> ======================== JOB MAP ======================== >>>>>> >>>>>> Data for node: taurusi6344 Num slots: 1 Max slots: 0 Num >>>>>> procs: 1 >>>>>> Process OMPI jobid: [42099,1] App: 0 Process rank: 0 Bound: >>>>>> socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core >>>>>> 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket >>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], >>>>>> socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core >>>>>> 10[hwt 0]], socket 0[core 11[hwt >>>>>> 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] >>>>>> >>>>>> Data for node: taurusi6348 Num slots: 1 Max slots: 0 Num >>>>>> procs: 1 >>>>>> Process OMPI jobid: [42099,1] App: 0 Process rank: 1 Bound: >>>>>> socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]], socket 0[core >>>>>> 2[hwt 0]], socket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket >>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], socket 0[core 7[hwt 0]], >>>>>> socket 0[core 8[hwt 0]], socket 0[core 9[hwt 0]], socket 0[core >>>>>> 10[hwt 0]], socket 0[core 11[hwt >>>>>> 0]]:[B/B/B/B/B/B/B/B/B/B/B/B][./././././././././././.] >>>>>> >>>>>> ============================================================= >>>>>> taurusi6344 >>>>>> taurusi6348 >>>>>> >>>>>> If I put anything more than 2 in "-np 2", I get following error >>>>>> message: >>>>>> >>>>>> $ mpirun --display-map -machinefile hostfile.16 -np 4 hostname >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> There are not enough slots available in the system to satisfy the 4 >>>>>> slots >>>>>> that were requested by the application: >>>>>> hostname >>>>>> >>>>>> Either request fewer slots for your application, or make more slots >>>>>> available >>>>>> for use. >>>>>> -------------------------------------------------------------------------- >>>>>> >>>>>> >>>>>> The OpenMPI version is "mpirun (Open MPI) 2.1.0" >>>>>> >>>>>> Also there is SLURM installed with version "slurm >>>>>> 16.05.7-Bull.1.1-20170512-1252" >>>>>> >>>>>> Could you help me to enforce OpenMPI to respect slots paremeter? >>>>>> -- >>>>>> Regards, >>>>>> Maksym Planeta >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://lists.open-mpi.org/mailman/listinfo/users >>>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://lists.open-mpi.org/mailman/listinfo/users >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://lists.open-mpi.org/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://lists.open-mpi.org/mailman/listinfo/users >> <https://lists.open-mpi.org/mailman/listinfo/users> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users>
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users