Hi Devel list, crossposting as this is getting weird...

Alfonso did a client/server using MPI_Publish_name / MPI_Lookup_name
and it runs fine on both MPICH2 and LAM-MPI but fail on Open MPI. It's
not a simple failure (ie. returning an error code) it breaks the
execution line and quits. The server continue to run after the
client's crash.

The server also use 100% of CPU while running, what doesn't happen with LAM.

The code is here:
http://www.systemcall.com.br/rengolin/open-mpi/

OpenMP version: 1.1.1

Compiling:
mpiCC -o server server.c
mpiCC -o client client.c
- or -
mpiCC -o client client.c -DUSE_LOOKUP

Running & Output:
-- Server --
sbornia$ mpiexec server foo
server Process Rank 0 ,TOT processes 1 on sbornia
Server foo available at 0.1.0:2000


-- Client without USE_LOOKUP --
sbornia$ mpiexec client foo
Rank Client Process 0 ,TOT processes 1 on sbornia
[sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in file
dss/dss_unpack.c at line 171
[sbornia:06246] [0,1,0] ORTE_ERROR_LOG: Pack data mismatch in file
dss/dss_unpack.c at line 145
[sbornia:06246] *** An error occurred in MPI_Comm_connect
[sbornia:06246] *** on communicator MPI_COMM_WORLD
[sbornia:06246] *** MPI_ERR_UNKNOWN: unknown error
[sbornia:06246] *** MPI_ERRORS_ARE_FATAL (goodbye)
[sbornia:06243] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed
with errno=104


-- Client with USE_LOOKUP --
sbornia$ mpiexec client foo
Rank Client Process 0 ,TOT processes 1 on sbornia
[sbornia:06232] *** An error occurred in MPI_Lookup_name
[sbornia:06232] *** on communicator MPI_COMM_WORLD
[sbornia:06232] *** MPI_ERR_NAME: invalid name argument
[sbornia:06232] *** MPI_ERRORS_ARE_FATAL (goodbye)
[sbornia:06229] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv: readv failed
with errno=104


OS error code 104:  Connection reset by peer

what are we doing wrong ?

thanks in advance!
--renato

Reply via email to