Just to summarize for the list.  With Jeff’s prodding I got it generating core 
files with the debug (and mem-debug) version of openmpi, and below is the kind 
of stack trace I’m getting from gdb.  It looks slightly different when I use a 
slightly different implementation that doesn’t use MPI_INPLACE, but nearly the 
same.  The array that’s being summed is not large, 3776 doubles.  


#0  0x0000003160a32495 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003160a33bfd in abort () at abort.c:121
#2  0x0000000002a3903e in for__issue_diagnostic ()
#3  0x0000000002a3ff66 in for__signal_handler ()
#4  <signal handler called>
#5  0x00002b67a4217029 in mca_btl_vader_check_fboxes () at btl_vader_fbox.h:208
#6  0x00002b67a421962e in mca_btl_vader_component_progress () at 
btl_vader_component.c:724
#7  0x00002b67934fd311 in opal_progress () at runtime/opal_progress.c:229
#8  0x00002b6792e2f0df in ompi_request_wait_completion (req=0xe863600) at 
../ompi/request/request.h:415
#9  0x00002b6792e2f122 in ompi_request_default_wait (req_ptr=0x7ffebdbb8c20, 
status=0x0) at request/req_wait.c:42
#10 0x00002b6792ed7d5a in ompi_coll_base_allreduce_intra_ring (sbuf=0x1, 
rbuf=0xeb79ca0, count=3776, dtype=0x2b679317dd40, op=0x2b6793192380, 
comm=0xe14c9c0, module=0xe14f8b0)
    at base/coll_base_allreduce.c:460
#11 0x00002b67a6ccb3e2 in ompi_coll_tuned_allreduce_intra_dec_fixed (sbuf=0x1, 
rbuf=0xeb79ca0, count=3776, dtype=0x2b679317dd40, op=0x2b6793192380, 
comm=0xe14c9c0, module=0xe14f8b0)
    at coll_tuned_decision_fixed.c:74
#12 0x00002b6792e4d9b0 in PMPI_Allreduce (sendbuf=0x1, recvbuf=0xeb79ca0, 
count=3776, datatype=0x2b679317dd40, op=0x2b6793192380, comm=0xe14c9c0) at 
pallreduce.c:113
#13 0x00002b6792bb6287 in ompi_allreduce_f (sendbuf=0x1 <Address 0x1 out of 
bounds>,
    recvbuf=0xeb79ca0 
"\310,&AYI\257\276\031\372\214\223\270-y>\207\066\226\003W\f\240\276\334'}\225\376\336\277>\227§\231",
 count=0x7ffebdbbc4d4, datatype=0x2b48f5c, op=0x2b48f60,
    comm=0x5a0ae60, ierr=0x7ffebdbb8f60) at pallreduce_f.c:87
#14 0x000000000042991b in m_sumb_d (comm=..., vec=..., n=Cannot access memory 
at address 0x928
) at mpi.F:870
#15 m_sum_d (comm=..., vec=..., n=Cannot access memory at address 0x928
) at mpi.F:3184
#16 0x0000000001b22b83 in david::eddav (hamiltonian=..., p=Cannot access memory 
at address 0x1
) at davidson.F:779
#17 0x0000000001c6ef0e in elmin (hamiltonian=..., kineden=Cannot access memory 
at address 0x19
) at electron.F:424
#18 0x0000000002a108b2 in electronic_optimization () at main.F:4783
#19 0x00000000029ec5d3 in vamp () at main.F:2800
#20 0x00000000004100de in main ()
#21 0x0000003160a1ed1d in __libc_start_main (main=0x4100b0 <main>, argc=1, 
ubp_av=0x7ffebdbc5e38, init=<value optimized out>, fini=<value optimized out>, 
rtld_fini=<value optimized out>,
    stack_end=0x7ffebdbc5e28) at libc-start.c:226
#22 0x000000000040ffe9 in _start ()

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to