How are you passing the port info between the server and client? You're hitting 
a race condition between the two sides.

On Jun 27, 2011, at 9:29 AM, Rodrigo Oliveira wrote:

> Hi there.
> I am developing a server/client application using Open MPI 1.5.3. In a point 
> of the server code I open a port to receive connections from a client. After 
> that, I call the function MPI_Comm_accept and on the client side I call 
> MPI_Comm_connect. Sometimes I get an ORTE_ERROR_LOG, as showed bellow.
> before accept in host hydra9 port name = 
> 4108386304.0;tcp://150.164.3.204:48761;tcp://192.168.63.9:48761+4108386305.0tcp://150.164.3.204:49211;tcp://192.168.63.9:49211:300
>                                              
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file 
> base/grpcomm_base_allgather.c at line 220              
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file 
> base/grpcomm_base_modex.c at line 116                  
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file 
> grpcomm_bad_module.c at line 608                       
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file dpm_orte.c at 
> line 379                                 
> MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code   
>                                         
> after accept in host hydra9 error code = 17                                   
>                                         
> MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code
> The mpi_errno is 17 and I could not find a clear explanation about this 
> error. It occurs sporadically. Sometimes the application works, sometimes 
> does not.
> 
> Any ideas?
> 
> Thanks
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to