Hi,
I just compiled openmpi-4.0.1 using --with-sge to work with Univa Grid Engine
that we have on our cluster.
I tried the basic hello world c program where a worker will print it's rank and
the world size on stdout and then quit.
This seems to work fine and I've had it running on 64 nodes with no issues.
I moved on to a more complex test program where a worker calculates it's share
of a sum from 1-N and then communicates its partial sum to rank 0 which
collects all the answers using the MPI_Reduce() function.
Now that the program has workers that communicate amongst each other it is
failing to work.
I get errors such as the following...
WARNING: Open MPI accepted a TCP connection from what appears to be a
another Open MPI process but cannot find a corresponding process
entry for that peer.
This attempted connection will be ignored; your MPI job may or may not
continue properly.
Local host: node-hp0409
PID:58849
I've googled for this error and there doesn't seem to be anything relevant to
this issue there as far as I can tell.
Does anyone have any idea what might be going on and what solutions there may
be ?
Our nodes are running Scientific Linux release 7.2.
Regards,
Emyr James
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users