Greetings!!!

I am observing crash in MPI_Allreduce() call from my actual application.
After debugging I found that MPI_Allreduce() with MPI_DOUBLE_PRECISION
returns NULL for following code in op.h

if (0 != (op->o_flags & OMPI_OP_FLAGS_INTRINSIC)) {
       op->o_func.intrinsic.fns[ompi_op_ddt_map[dtype->id]](source, target,
                                                            &count, &dtype,

op->o_func.intrinsic.modules[ompi_op_ddt_map[dtype->id]]);

where, o_func.intrinsic.fns[27] points to 0.

On further debugging, I found that it is making call to
mca_coll_basic_reduce_lin_intra(); see below trace...

>       libmpid.dll!ompi_op_reduce(ompi_op_t * op, void * source, void * 
> target, int count, ompi_datatype_t * dtype)  Line 500  C++
       libmpid.dll!mca_coll_basic_reduce_lin_intra(void * sbuf, void *
rbuf, int count, ompi_datatype_t * dtype, ompi_op_t * op, int root,
ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
Line 249        C++
       libmpid.dll!mca_coll_sync_reduce(void * sbuf, void * rbuf, int
count, ompi_datatype_t * dtype, ompi_op_t * op, int root,
ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
Line 45 + 0xd4 bytes    C++
       libmpid.dll!mca_coll_basic_allreduce_intra(void * sbuf, void * rbuf,
int count, ompi_datatype_t * dtype, ompi_op_t * op,
ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
Line 57 + 0x58 bytes    C++
       libmpid.dll!MPI_Allreduce(void * sendbuf, void * recvbuf, int count,
ompi_datatype_t * datatype, ompi_op_t * op, ompi_communicator_t *
comm)  Line 107 + 0x5c bytes    C++
       libmpi_f77d.dll!mpi_allreduce_f(char * sendbuf, char * recvbuf, int
* count, int * datatype, int * op, int * comm, int * ierr)  Line 79 +
0x34 bytes      C++
       libmpi_f77d.dll!MPI_ALLREDUCE(char * sendbuf, char * recvbuf, int *
count, int * datatype, int * op, int * comm, int * ierr)  Line 53 +
0x67 bytes      C++


Now to simulate this problem, the attached test program works fine but
I observed completely different callstack see attached images...

Just for information: I am executing my application using following command:
c:/openmpi/bin/orterun -mca mca_component_show_load_errors 0 --prefix
... -x ... -x ...  --machinefile ... -np 2 myApplication

And test program using following command:
c:/openmpi/bin/mpirun mar_f_dp.exe


Please let me know based on what criteria "coll_reduce" is pointing to
"mca_coll_basic_allreduce_intra() or mca_coll_self_allreduce_intra();
this would help me to debug my application further.

Thank you in advance.
-Hiral

Reply via email to