Hi Eloi:
To select the different bcast algorithms, you need to add an extra mca
parameter that tells the library to use dynamic selection.
--mca coll_tuned_use_dynamic_rules 1
One way to make sure you are typing this in correctly is to use it with
ompi_info. Do the following:
ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll
You should see lots of output with all the different algorithms that can
be selected for the various collectives.
Therefore, you need this:
--mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1
Rolf
On 07/13/10 11:28, Eloi Gaudry wrote:
Hi,
I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to switch to the
basic linear algorithm.
Anyway whatever the algorithm used, the segmentation fault remains.
Does anyone could give some advice on ways to diagnose the issue I'm facing ?
Regards,
Eloi
On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
Hi,
I'm focusing on the MPI_Bcast routine that seems to randomly segfault when
using the openib btl. I'd like to know if there is any way to make OpenMPI
switch to a different algorithm than the default one being selected for
MPI_Bcast.
Thanks for your help,
Eloi
On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
Hi,
I'm observing a random segmentation fault during an internode parallel
computation involving the openib btl and OpenMPI-1.4.2 (the same issue
can be observed with OpenMPI-1.3.3).
mpirun (Open MPI) 1.4.2
Report bugs to http://www.open-mpi.org/community/help/
[pbn08:02624] *** Process received signal ***
[pbn08:02624] Signal: Segmentation fault (11)
[pbn08:02624] Signal code: Address not mapped (1)
[pbn08:02624] Failing at address: (nil)
[pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0]
[pbn08:02624] *** End of error message ***
sh: line 1: 2624 Segmentation fault
\/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x86_6
4\ /bin\/actranpy_mp
'--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_64/A
c tran_11.0.rc2.41872'
'--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.dat'
'--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' '--mem=3200'
'--threads=1' '--errorlevel=FATAL' '--t_max=0.1' '--parallel=domain'
If I choose not to use the openib btl (by using --mca btl self,sm,tcp on
the command line, for instance), I don't encounter any problem and the
parallel computation runs flawlessly.
I would like to get some help to be able:
- to diagnose the issue I'm facing with the openib btl
- understand why this issue is observed only when using the openib btl
and not when using self,sm,tcp
Any help would be very much appreciated.
The outputs of ompi_info and the configure scripts of OpenMPI are
enclosed to this email, and some information on the infiniband drivers
as well.
Here is the command line used when launching a parallel computation
using infiniband:
path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
btl openib,sm,self,tcp --display-map --verbose --version --mca
mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
and the command line used if not using infiniband:
path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
btl self,sm,tcp --display-map --verbose --version --mca
mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
Thanks,
Eloi
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users