I'm testing a couple of applications with OpenMPI v1.2b, using over 1000 processors, and am getting TCP errors. These apps ran fine for a lesser number of processors.
The errors can be different for different runs. Here's one: [blade90][0,1,223][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:572:mc a_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 [blade82][0,1,203][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:572:mc a_btl_tcp_endpoint_complete_connect] connect() failed with errno=113 And I've appended the output from a second type of error, on another trial run. I only have a single interface, and understand I'm pushing the capacity of the single gigE. But I'd like to know what these errors signify. Thanks, Todd ----- [blade6][0,1,10][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mca_ btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade309:12625] mca_btl_tcp_frag_send: writev failed with errno=104 [blade309:12625] mca_btl_tcp_frag_send: writev failed with errno=104 [blade5][0,1,9][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mca_b tl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade134:12179] mca_btl_tcp_frag_send: writev failed with errno=104 [blade3][0,1,4][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mca_b tl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade484][0,1,1060][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415: mca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade146][0,1,400][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] [blade157][0,1,444][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] [blade212][0,1,532][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade134:12182] mca_btl_tcp_frag_send: writev failed with errno=104 recv() failed with errno=104 recv() failed with errno=104 [blade146][0,1,402][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade157][0,1,446][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade4][0,1,6][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mca_b tl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade485][0,1,1062][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415: mca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade214][0,1,534][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade146][0,1,403][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] [blade4][0,1,7][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mca_b tl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade486][0,1,1063][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415: mca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade157][0,1,447][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] [blade215][0,1,535][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 recv() failed with errno=104 recv() failed with errno=104 [blade146][0,1,401][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade157][0,1,445][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade3][0,1,5][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mca_b tl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade485][0,1,1061][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415: mca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade213][0,1,533][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade62][0,1,124][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mc a_btl_tcp_endpoint_recv_blocking] [blade71:12423] mca_btl_tcp_frag_send: writev failed with errno=104 [blade132][0,1,344][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] [blade389][0,1,872][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 recv() failed with errno=104 [blade132][0,1,347][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade390][0,1,873][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 recv() failed with errno=104 [blade62][0,1,125][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mc a_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade62][0,1,127][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mc a_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade72:12411] mca_btl_tcp_frag_send: writev failed with errno=104 [blade132][0,1,345][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade391][0,1,875][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade390][0,1,874][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade62][0,1,126][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:mc a_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104 [blade132][0,1,346][../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:415:m ca_btl_tcp_endpoint_recv_blocking] recv() failed with errno=104