Galen Shipman wrote:
We have found a potential issue with BPROC that may effect Open MPI.
Open MPI by default uses PTYs for I/O forwarding, if PTYs aren't
setup on the compute nodes, Open MPI will revert to using pipes.
Recently (today) we found a potential issue with PTYs and BPROC. A
simple reader/writer using PTYs causes the writer to hang in
uninterruptible sleep. The consistency of the process table from the
head node and the back end nodes is also effected, that is "bps"
shows no writer process, while "bpsh NODE ps aux" shows the writer
process in uninterruptible sleep.
Since Open MPI uses PTYs by default on BPROC this results in ORTED or
MPI processes being orphaned on compute nodes. The workaround for
this issue is to configure Open MPI with --disable-pty-support and
rebuild.
The mpirun manual says that standard input is redirected from /dev/null,
and that standard output of remote nodes will be attached to the node
that invoked mpirun. If this is all caused by some buglet with BPROC
I/O forwarding, perhaps it would help of the slave nodes were invoked
with the equivalent of "bpsh -N"? I wonder if some people see the
problem and others don't depending on stdout (or its absence) from
different applications?