Looking deeper, I believe we may have a race condition in the code. Sadly, that error message is actually irrelevant, but causes the code to abort.
It can be triggered by race conditions in the app as well, but ultimately is something we need to clean up. On Jun 27, 2011, at 9:29 AM, Rodrigo Oliveira wrote: > Hi there. > I am developing a server/client application using Open MPI 1.5.3. In a point > of the server code I open a port to receive connections from a client. After > that, I call the function MPI_Comm_accept and on the client side I call > MPI_Comm_connect. Sometimes I get an ORTE_ERROR_LOG, as showed bellow. > before accept in host hydra9 port name = > 4108386304.0;tcp://150.164.3.204:48761;tcp://192.168.63.9:48761+4108386305.0tcp://150.164.3.204:49211;tcp://192.168.63.9:49211:300 > > [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file > base/grpcomm_base_allgather.c at line 220 > [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file > base/grpcomm_base_modex.c at line 116 > [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file > grpcomm_bad_module.c at line 608 > [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file dpm_orte.c at > line 379 > MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code > > after accept in host hydra9 error code = 17 > > MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code > The mpi_errno is 17 and I could not find a clear explanation about this > error. It occurs sporadically. Sometimes the application works, sometimes > does not. > > Any ideas? > > Thanks > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users