Hi guys,

I trying to use the rankfile in OpenMPI 4.1 but it crashes. I got the following 
message: [genji608:451977] [[64242,0],0] ORTE_ERROR_LOG: Data unpack would read 
past end of buffer in file base/odls_base_default_fns.c at line 612

I use the following script. It's fine if using the last command of the script.

I may do something wrong. In that case, please let me know !

Thanks in advance for the time spend on that problem,

Ludovic
 [cid:image001.gif@01D088CD.E4C3E120]
Ludovic HABLOT
HPC Applications Expert
Applications & Performance Group
Big Data & Security
BULL, an ATOS company

#!/bin/bash
#SBATCH -N 2
#SBATCH -n 2
#SBATCH --exclusive

module load intel/2018.5.274
module load openmpi-hpcx/hpcx-2.3.0/4.1.0a1/icc-2018

rank=0
rm -Rf rankfile
for node in $(nodeset -e $SLURM_NODELIST)
do
    echo rank $rank=$node slot=0-4 >> rankfile
    rank=$[rank + 1]
done
echo "" >> rankfile
cat rankfile

export bin=hostname

mpirun -V
mpirun -rf rankfile -n $SLURM_NTASKS --oversubscribe --report-bindings $bin
#mpirun -host $(nodeset -S',' -e $SLURM_NODELIST) --map-by socket:PE=5 -n 
$SLURM_NTASKS --report-bindings --oversubscribe $bin

Output with the first command

rank 0=genji608 slot=0-4
rank 1=genji609 slot=0-4
mpirun (Open MPI) 4.1.0a1

Report bugs to http://www.open-mpi.org/community/help/
[genji608:451977] [[64242,0],0] ORTE_ERROR_LOG: Data unpack would read past end 
of buffer in file base/odls_base_default_fns.c at line 612

Output with the second command

ank 0=genji608 slot=0-4
rank 1=genji609 slot=0-4

mpirun (Open MPI) 4.1.0a1

Report bugs to http://www.open-mpi.org/community/help/
[genji609:444658] MCW rank 1 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
0[core 4[hwt 0-1]]: 
[BB/BB/BB/BB/BB/../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../..]
[genji608:456450] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 
0[core 4[hwt 0-1]]: 
[BB/BB/BB/BB/BB/../../../../../../../../../../../../../../..][../../../../../../../../../../../../../../../../../../../..]
genji609
genji608
 [cid:image001.gif@01D088CD.E4C3E120]
Ludovic HABLOT
HPC Applications Expert
Applications & Performance Group
Big Data & Security
BULL, an ATOS company
ludovic.hab...@atos.net<mailto:ludovic.hab...@atos.net>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to