Hi Devel list, crossposting as this is getting weird...
Alfonso did a client/server using MPI_Publish_name / MPI_Lookup_name and it runs fine on both MPICH2 and LAM-MPI but fail on Open MPI. It's not a simple failure (ie. returning an error code) it breaks the execution line and quits. The server continue to run after the client's crash. The server also use 100% of CPU while running, what doesn't happen with LAM. The code is here: http://www.systemcall.com.br/rengolin/open-mpi/ OpenMP version: 1.1.1 Compiling: mpiCC -o server server.c mpiCC -o client client.c - or - mpiCC -o client client.c -DUSE_LOOKUP Running & Output: -- Server -- sbornia$ mpiexec server foo server Process Rank 0 ,TOT processes 1 on sbornia Server foo available at 0.1.0:2000 -- Client without USE_LOOKUP -- sbornia$ mpiexec client foo Rank Client Process 0 ,TOT processes 1 on sbornia [sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in file dss/dss_unpack.c at line 171 [sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in file dss/dss_unpack.c at line 145 [sbornia:06246] *** An error occurred in MPI_Comm_connect [sbornia:06246] *** on communicator MPI_COMM_WORLD [sbornia:06246] *** MPI_ERR_UNKNOWN: unknown error [sbornia:06246] *** MPI_ERRORS_ARE_FATAL (goodbye) [sbornia:06243] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed with errno=104 -- Client with USE_LOOKUP -- sbornia$ mpiexec client foo Rank Client Process 0 ,TOT processes 1 on sbornia [sbornia:06232] *** An error occurred in MPI_Lookup_name [sbornia:06232] *** on communicator MPI_COMM_WORLD [sbornia:06232] *** MPI_ERR_NAME: invalid name argument [sbornia:06232] *** MPI_ERRORS_ARE_FATAL (goodbye) [sbornia:06229] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed with errno=104 OS error code 104: Connection reset by peer what are we doing wrong ? thanks in advance! --renato