Hi, I'm focusing on the MPI_Bcast routine that seems to randomly segfault when using the openib btl. I'd like to know if there is any way to make OpenMPI switch to a different algorithm than the default one being selected for MPI_Bcast.
Thanks for your help, Eloi On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote: > Hi, > > I'm observing a random segmentation fault during an internode parallel > computation involving the openib btl and OpenMPI-1.4.2 (the same issue > can be observed with OpenMPI-1.3.3). > mpirun (Open MPI) 1.4.2 > Report bugs to http://www.open-mpi.org/community/help/ > [pbn08:02624] *** Process received signal *** > [pbn08:02624] Signal: Segmentation fault (11) > [pbn08:02624] Signal code: Address not mapped (1) > [pbn08:02624] Failing at address: (nil) > [pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0] > [pbn08:02624] *** End of error message *** > sh: line 1: 2624 Segmentation fault > \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x86_64\ > /bin\/actranpy_mp > '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_64/Ac > tran_11.0.rc2.41872' > '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.dat' > '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' '--mem=3200' > '--threads=1' '--errorlevel=FATAL' '--t_max=0.1' '--parallel=domain' > > If I choose not to use the openib btl (by using --mca btl self,sm,tcp on > the command line, for instance), I don't encounter any problem and the > parallel computation runs flawlessly. > > I would like to get some help to be able: > - to diagnose the issue I'm facing with the openib btl > - understand why this issue is observed only when using the openib btl > and not when using self,sm,tcp > > Any help would be very much appreciated. > > The outputs of ompi_info and the configure scripts of OpenMPI are > enclosed to this email, and some information on the infiniband drivers > as well. > > Here is the command line used when launching a parallel computation > using infiniband: > path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca > btl openib,sm,self,tcp --display-map --verbose --version --mca > mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...] > and the command line used if not using infiniband: > path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca > btl self,sm,tcp --display-map --verbose --version --mca > mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...] > > Thanks, > Eloi