> On Jul 16, 2018, at 8:34 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil 
> <mailto:noam.bernst...@nrl.navy.mil>> wrote:
> 
>> On Jul 14, 2018, at 1:31 AM, Nathan Hjelm via users 
>> <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>> wrote:
>> 
>> Please give master a try. This looks like another signature of running out 
>> of space for shared memory buffers.
> 
> Sorry, I wasn’t explicit on this point - I’m already using master, 
> specifically
> openmpi-master-201807120327-34bc777.tar.gz

And a bit more data on the stack traces, since the problem is 
non-deterministic.  I’ve run 30 sets of 10 iterations of the code, and 8 
crashed.  In every case the final part of the stack trace was
Program terminated with signal 6, Aborted.
#0  0x0000003f5a432495 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
#0  0x0000003f5a432495 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003f5a433bfd in abort () at abort.c:121
#2  0x0000000002a3985e in for__issue_diagnostic ()
#3  0x0000000002a40786 in for__signal_handler ()
#4  <signal handler called>
#5  0x00002ae37088f029 in mca_btl_vader_check_fboxes () at btl_vader_fbox.h:208
#6  0x00002ae37089162e in mca_btl_vader_component_progress () at 
btl_vader_component.c:724
#7  0x00002ae35fa41311 in opal_progress () at runtime/opal_progress.c:229
#8  0x00002ae3724a11b7 in ompi_request_wait_completion (req=0xd2a4700) at 
../../../../ompi/request/request.h:415
with some variation in the routines that lead to this point.  In all cases the 
mpi call was some all to all routine, all but one “opmi_allreduce_f", and one 
"ompi_alltoallv_z”. 

I can of course post all 8 stack traces if that’s useful.

                                                                        Noam

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to