Hello,
I have this piece of code:
MPI_Comm icomm;
INFO << "Accepting connection on " << portName;
MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm);
and sometimes (like in 1 of 5 runs), I get:
[helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of
buffer in file dpm_orte.c at line 406
[helium:33883] *** An error occurred in MPI_Comm_accept
[helium:33883] *** reported by process [2141257729,0]
[helium:33883] *** on communicator MPI_COMM_SELF
[helium:33883] *** MPI_ERR_UNKNOWN: unknown error
[helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
now abort,
[helium:33883] *** and potentially your MPI job)
[helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33)
[0x7fc1ad0ac6e3]
[helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365)
[0x7fc1af4955e5]
[helium:33883] [2]
func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2)
[0x7fc1af487e72]
[helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145)
[0x7fc1af4874b5]
[helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262)
[0x7fc1af4a90e2]
[helium:33883] [5] func:./mpiports() [0x41e43d]
[helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)
[0x7fc1ad7a1830]
[helium:33883] [7] func:./mpiports() [0x41b249]
Before that I check for the length of portName
DEBUG << "COMM ACCEPT portName.size() = " << portName.size();
DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME;
which both return 1024.
I am completely puzzled, how I can get a buffer issue, except something faulty
with std::string portName.
Any clues?
Launch command: mpirun -n 4 -mca opal_abort_print_stack 1
OpenMPI 1.10.2 @ Ubuntu 16.
Thanks,
Florian
_______________________________________________
users mailing list
[email protected]
https://lists.open-mpi.org/mailman/listinfo/users