Hi guys, I trying to use the rankfile in OpenMPI 4.1 but it crashes. I got the following message: [genji608:451977] [[64242,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/odls_base_default_fns.c at line 612
I use the following script. It's fine if using the last command of the script. I may do something wrong. In that case, please let me know ! Thanks in advance for the time spend on that problem, Ludovic [cid:image001.gif@01D088CD.E4C3E120] Ludovic HABLOT HPC Applications Expert Applications & Performance Group Big Data & Security BULL, an ATOS company #!/bin/bash #SBATCH -N 2 #SBATCH -n 2 #SBATCH --exclusive module load intel/2018.5.274 module load openmpi-hpcx/hpcx-2.3.0/4.1.0a1/icc-2018 rank=0 rm -Rf rankfile for node in $(nodeset -e $SLURM_NODELIST) do echo rank $rank=$node slot=0-4 >> rankfile rank=$[rank + 1] done echo "" >> rankfile cat rankfile export bin=hostname mpirun -V mpirun -rf rankfile -n $SLURM_NTASKS --oversubscribe --report-bindings $bin #mpirun -host $(nodeset -S',' -e $SLURM_NODELIST) --map-by socket:PE=5 -n $SLURM_NTASKS --report-bindings --oversubscribe $bin Output with the first command rank 0=genji608 slot=0-4 rank 1=genji609 slot=0-4 mpirun (Open MPI) 4.1.0a1 Report bugs to http://www.open-mpi.org/community/help/ [genji608:451977] [[64242,0],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file base/odls_base_default_fns.c at line 612 Output with the second command ank 0=genji608 slot=0-4 rank 1=genji609 slot=0-4 mpirun (Open MPI) 4.1.0a1 Report bugs to http://www.open-mpi.org/community/help/ [genji609:444658] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]]: [BB/BB/BB/BB/BB/../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../..] [genji608:456450] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]]: [BB/BB/BB/BB/BB/../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../..] genji609 genji608 [cid:image001.gif@01D088CD.E4C3E120] Ludovic HABLOT HPC Applications Expert Applications & Performance Group Big Data & Security BULL, an ATOS company ludovic.hab...@atos.net<mailto:ludovic.hab...@atos.net>
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users