you could try first to use the algorithms in the basic module, e.g. mpirun -np x --mca coll basic ./mytest
and see whether this makes a difference. I used to observe sometimes a (similar ?) problem in the openib btl triggered from the tuned collective component, in cases where the ofed libraries were installed but no NCA was found on a node. It used to work however with the basic component. Thanks Edgar On 7/15/2010 3:08 AM, Eloi Gaudry wrote: > hi Rolf, > > unfortunately, i couldn't get rid of that annoying segmentation fault when > selecting another bcast algorithm. > i'm now going to replace MPI_Bcast with a naive implementation (using > MPI_Send and MPI_Recv) and see if that helps. > > regards, > éloi > > > On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote: >> Hi Rolf, >> >> thanks for your input. You're right, I miss the >> coll_tuned_use_dynamic_rules option. >> >> I'll check if I the segmentation fault disappears when using the basic >> bcast linear algorithm using the proper command line you provided. >> >> Regards, >> Eloi >> >> On Tuesday 13 July 2010 20:39:59 Rolf vandeVaart wrote: >>> Hi Eloi: >>> To select the different bcast algorithms, you need to add an extra mca >>> parameter that tells the library to use dynamic selection. >>> --mca coll_tuned_use_dynamic_rules 1 >>> >>> One way to make sure you are typing this in correctly is to use it with >>> ompi_info. Do the following: >>> ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll >>> >>> You should see lots of output with all the different algorithms that can >>> be selected for the various collectives. >>> Therefore, you need this: >>> >>> --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1 >>> >>> Rolf >>> >>> On 07/13/10 11:28, Eloi Gaudry wrote: >>>> Hi, >>>> >>>> I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to switch >>>> to the basic linear algorithm. Anyway whatever the algorithm used, the >>>> segmentation fault remains. >>>> >>>> Does anyone could give some advice on ways to diagnose the issue I'm >>>> facing ? >>>> >>>> Regards, >>>> Eloi >>>> >>>> On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote: >>>>> Hi, >>>>> >>>>> I'm focusing on the MPI_Bcast routine that seems to randomly segfault >>>>> when using the openib btl. I'd like to know if there is any way to >>>>> make OpenMPI switch to a different algorithm than the default one >>>>> being selected for MPI_Bcast. >>>>> >>>>> Thanks for your help, >>>>> Eloi >>>>> >>>>> On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote: >>>>>> Hi, >>>>>> >>>>>> I'm observing a random segmentation fault during an internode >>>>>> parallel computation involving the openib btl and OpenMPI-1.4.2 (the >>>>>> same issue can be observed with OpenMPI-1.3.3). >>>>>> >>>>>> mpirun (Open MPI) 1.4.2 >>>>>> Report bugs to http://www.open-mpi.org/community/help/ >>>>>> [pbn08:02624] *** Process received signal *** >>>>>> [pbn08:02624] Signal: Segmentation fault (11) >>>>>> [pbn08:02624] Signal code: Address not mapped (1) >>>>>> [pbn08:02624] Failing at address: (nil) >>>>>> [pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0] >>>>>> [pbn08:02624] *** End of error message *** >>>>>> sh: line 1: 2624 Segmentation fault >>>>>> >>>>>> \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x >>>>>> 86 _6 4\ /bin\/actranpy_mp >>>>>> '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_ >>>>>> 64 /A c tran_11.0.rc2.41872' >>>>>> '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.da >>>>>> t' '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' >>>>>> '--mem=3200' '--threads=1' '--errorlevel=FATAL' '--t_max=0.1' >>>>>> '--parallel=domain' >>>>>> >>>>>> If I choose not to use the openib btl (by using --mca btl self,sm,tcp >>>>>> on the command line, for instance), I don't encounter any problem and >>>>>> the parallel computation runs flawlessly. >>>>>> >>>>>> I would like to get some help to be able: >>>>>> - to diagnose the issue I'm facing with the openib btl >>>>>> - understand why this issue is observed only when using the openib >>>>>> btl and not when using self,sm,tcp >>>>>> >>>>>> Any help would be very much appreciated. >>>>>> >>>>>> The outputs of ompi_info and the configure scripts of OpenMPI are >>>>>> enclosed to this email, and some information on the infiniband >>>>>> drivers as well. >>>>>> >>>>>> Here is the command line used when launching a parallel computation >>>>>> >>>>>> using infiniband: >>>>>> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list >>>>>> --mca >>>>>> >>>>>> btl openib,sm,self,tcp --display-map --verbose --version --mca >>>>>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...] >>>>>> >>>>>> and the command line used if not using infiniband: >>>>>> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list >>>>>> --mca >>>>>> >>>>>> btl self,sm,tcp --display-map --verbose --version --mca >>>>>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...] >>>>>> >>>>>> Thanks, >>>>>> Eloi >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >
signature.asc
Description: OpenPGP digital signature