Hello, I have this piece of code:
MPI_Comm icomm; INFO << "Accepting connection on " << portName; MPI_Comm_accept(portName.c_str(), MPI_INFO_NULL, 0, MPI_COMM_SELF, &icomm); and sometimes (like in 1 of 5 runs), I get: [helium:33883] [[32673,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 406 [helium:33883] *** An error occurred in MPI_Comm_accept [helium:33883] *** reported by process [2141257729,0] [helium:33883] *** on communicator MPI_COMM_SELF [helium:33883] *** MPI_ERR_UNKNOWN: unknown error [helium:33883] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [helium:33883] *** and potentially your MPI job) [helium:33883] [0] func:/usr/lib/libopen-pal.so.13(opal_backtrace_buffer+0x33) [0x7fc1ad0ac6e3] [helium:33883] [1] func:/usr/lib/libmpi.so.12(ompi_mpi_abort+0x365) [0x7fc1af4955e5] [helium:33883] [2] func:/usr/lib/libmpi.so.12(ompi_mpi_errors_are_fatal_comm_handler+0xe2) [0x7fc1af487e72] [helium:33883] [3] func:/usr/lib/libmpi.so.12(ompi_errhandler_invoke+0x145) [0x7fc1af4874b5] [helium:33883] [4] func:/usr/lib/libmpi.so.12(MPI_Comm_accept+0x262) [0x7fc1af4a90e2] [helium:33883] [5] func:./mpiports() [0x41e43d] [helium:33883] [6] func:/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1ad7a1830] [helium:33883] [7] func:./mpiports() [0x41b249] Before that I check for the length of portName DEBUG << "COMM ACCEPT portName.size() = " << portName.size(); DEBUG << "MPI_MAX_PORT_NAME = " << MPI_MAX_PORT_NAME; which both return 1024. I am completely puzzled, how I can get a buffer issue, except something faulty with std::string portName. Any clues? Launch command: mpirun -n 4 -mca opal_abort_print_stack 1 OpenMPI 1.10.2 @ Ubuntu 16. Thanks, Florian _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users