You can eliminate the "[n17:30019] odls_bproc: openpty failed, using pipes instead" message by configuring OMPI with the --disable-pty- support flag, as there is a bug in BProc that causes that to happen.

-david
--
David Gunter
HPC-4: HPC Environments: Parallel Tools Team
Los Alamos National Laboratory


On Apr 26, 2007, at 2:06 PM, Daniel Gruner wrote:

Hi

I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc-
based clusters, and I have found some problems/issues.  All my
clusters have standard ethernet interconnects, either 100Base/T or
Gigabit, on standard switches.

The clusters are all running Clustermatic 5 (BProc 4.x), and range
from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron.  In all cases
the same problems occur, identically.  I attach here the results
from "ompi_info --all" and the config.log, for my latest build on
an Opteron cluster, using the Pathscale compilers.  I had exactly
the same problems when using the vanilla GNU compilers.

Now for a description of the problem:

When running an mpi code (cpi.c, from the standard mpi examples, also
attached), using the mpirun defaults (e.g. -byslot), with a single
process:

        sonoma:dgruner{134}> mpirun -n 1 ./cpip
        [n17:30019] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        pi is approximately 3.1415926544231341, Error is 0.0000000008333410
        wall clock time = 0.000199

However, if one tries to run more than one process, this bombs:

        sonoma:dgruner{134}> mpirun -n 2 ./cpip
        .
        .
        .
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        .
        . ad infinitum

If one uses de option "-bynode", things work:

        sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip
        [n17:30055] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 1 on n21
        pi is approximately 3.1415926544231318, Error is 0.0000000008333387
        wall clock time = 0.010375


Note that there is always the message about "openpty failed, using pipes instead".

If I run more processes (on my 3-node cluster, with 2 cpus per node), the
openpty message appears repeatedly for the first node:

        sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 2 on n49
        Process 1 on n21
        Process 5 on n49
        Process 3 on n17
        Process 4 on n21
        pi is approximately 3.1415926544231239, Error is 0.0000000008333307
        wall clock time = 0.050332


Should I worry about the openpty failure? I suspect that communications may be slower this way. Using the -byslot option always fails, so this is a bug. The same occurs for all the codes that I have tried, both simple
and complex.

Thanks for your attention to this.
Regards,
Daniel
--

Dr. Daniel Gruner                        dgru...@chem.utoronto.ca
Dept. of Chemistry                       daniel.gru...@utoronto.ca
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key
<cpi.c.gz>
<config.log.gz>
<ompiinfo.gz>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to