Hello,

I have this piece of code:

MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);

and sometimes (like in 1 of 5 runs), I get:

[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of 
buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[helium:33883] ***    and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) 
[0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) 
[0x7fc1af4955e5]
[helium:33883] [2] 
func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) 
[0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) 
[0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) 
[0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) 
[0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]


Before that I check for the length of portName

      DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
      DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;

which both return 1024.

I am completely puzzled, how I can get a buffer issue, except something faulty 
with std::string portName.

Any clues?

Launch command: mpirun -n 4 -mca opal_abort_print_stack 1 
OpenMPI 1.10.2 @ Ubuntu 16.

Thanks,
Florian
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to