Hi Jeff,

the patch is working fine, with preliminary test with SKaMPI.

Thanks very much!

2010/7/7 Jeff Squyres <jsquy...@cisco.com>

> I do believe that this is a bug.  I *think* that the included patch will
> fix it for you, but George is on vacation until tomorrow (and I don't know
> how long it'll take him to slog through his backlog :-( ).
>
> Can you try the following patch and see if it fixes it for you?
>
> Index: ompi/mca/coll/tuned/coll_tuned_module.c
> ===================================================================
> --- ompi/mca/coll/tuned/coll_tuned_module.c     (revision 23360)
> +++ ompi/mca/coll/tuned/coll_tuned_module.c     (working copy)
> @@ -165,6 +165,7 @@
>     {                                                                   \
>         int need_dynamic_decision = 0;                                  \
>         ompi_coll_tuned_forced_getvalues( (TYPE),
> &((DATA)->user_forced[(TYPE)]) ); \
> +        (DATA)->com_rules[(TYPE)] = NULL;                               \
>         if( 0 != (DATA)->user_forced[(TYPE)].algorithm ) {              \
>             need_dynamic_decision = 1;                                  \
>             EXECUTE;                                                    \
>
>
>
>
>
> On Jul 4, 2010, at 8:12 AM, Gabriele Fatigati wrote:
>
> > Dear OpenMPI user,
> >
> > i'm trying to use collective dynamic rules with OpenMPi 1.4.2:
> >
> > export OMPI_MCA_coll_tuned_use_dynamic_rules=1
> > export OMPI_MCA_coll_tuned_bcast_algorithm=1
> >
> > My target is to test Bcast peformances using SKaMPI benchmark changing
> dynamic rules. But at runtime i get the follow error:
> >
> >
> > [node003:05871] *** Process received signal ***
> > [node003:05871] Signal: Segmentation fault (11)
> > [node003:05871] Signal code: Address not mapped (1)
> > [node003:05871] Failing at address: 0xcc
> > [node003:05872] *** Process received signal ***
> > [node003:05872] Signal: Segmentation fault (11)
> > [node003:05872] Signal code: Address not mapped (1)
> > [node003:05872] Failing at address: 0xcc
> > [node003:05871] [ 0] /lib64/libpthread.so.0 [0x3be160e4c0]
> > [node003:05871] [ 1]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0
> [0x2accf7210145]
> > [node003:05871] [ 2]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0
> [0x2accf720ef16]
> > [node003:05871] [ 3]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0
> [0x2accf721fec9]
> > [node003:05871] [ 4]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0(MPI_Bcast+0x171)
> [0x2accf71b81e1]
> > [node003:05871] [ 5] ./skampi [0x409566]
> > [node003:05871] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3be0e1d974]
> > [node003:05871] [ 7] ./skampi [0x404e19]
> > [node003:05871] *** End of error message ***
> > [node003:05872] [ 0] /lib64/libpthread.so.0 [0x3be160e4c0]
> > [node003:05872] [ 1]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0
> [0x2b1959eb3145]
> > [node003:05872] [ 2]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0
> [0x2b1959eb1f16]
> > [node003:05872] [ 3]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0
> [0x2b1959ec2ec9]
> > [node003:05872] [ 4]
> /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0(MPI_Bcast+0x171)
> [0x2b1959e5b1e1]
> > [node003:05872] [ 5] ./skampi [0x409566]
> > [node003:05872] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x3be0e1d974]
> > [node003:05872] [ 7] ./skampi [0x404e19]
> > [node003:05872] *** End of error message ***
> >
> --------------------------------------------------------------------------
> > mpirun noticed that process rank 9 with PID 5872 on node node003ib0
> exited on signal 11 (Segmentation fault).
> >
> --------------------------------------------------------------------------
> >
> >
> > The same using other Bcast algorithm. Disabling dynamic rules, it works
> well. Maybe i'm using some wrong parameter setup?
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> > --
> > Ing. Gabriele Fatigati
> >
> > Parallel programmer
> >
> > CINECA Systems & Tecnologies Department
> >
> > Supercomputing Group
> >
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> >
> > www.cineca.it                    Tel:   +39 051 6171722
> >
> > g.fatigati [AT] cineca.it
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>


-- 
Ing. Gabriele Fatigati

Parallel programmer

CINECA Systems & Tecnologies Department

Supercomputing Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.it                    Tel:   +39 051 6171722

g.fatigati [AT] cineca.it

Reply via email to