Thanks to both you and David Gunter.  I disabled pty support and
it now works.  

There is still the issue of the mpirun default being "-byslot", which
causes all kinds of trouble.  Only by using "-bynode" do things work
properly.

Daniel

On Thu, Apr 26, 2007 at 02:28:33PM -0600, gshipman wrote:
> There is a known issue on BProc 4 w.r.t. pty support. Open MPI by  
> default will try to use ptys for I/O forwarding but will revert to  
> pipes if ptys are not available.
> 
> You can "safely" ignore the pty warnings, or you may want to rerun  
> configure and add:
> --disable-pty-support
> 
> I say "safely" because my understanding is that some I/O data may be  
> lost if pipes are used during abnormal termination.
> 
> Alternatively you might try getting pty support working, you need to  
> configure ptys on the backend nodes.
> You can then try the following code to test if it is working  
> correctly, if this fails (it does on our BProc 4 cluster) you  
> shouldn't use ptys on BProc.
> 
> 
> #include <pty.h>
> #include <utmp.h>
> #include <stdio.h>
> #include <string.h>
> #include <errno.h>
> 
> int
> main(int argc, char *agrv[])
> {
>    int amaster, aslave;
> 
>    if (openpty(&amaster, &aslave, NULL, NULL, NULL) < 0) {
>      printf("openpty() failed with errno = %d, %s\n", errno, strerror 
> (errno));
>    } else {
>      printf("openpty() succeeded\n");
>    }
> 
>    return 0;
> }
> 
> 
> 
> 
> 
> 
> On Apr 26, 2007, at 2:06 PM, Daniel Gruner wrote:
> 
> > Hi
> >
> > I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc-
> > based clusters, and I have found some problems/issues.  All my
> > clusters have standard ethernet interconnects, either 100Base/T or
> > Gigabit, on standard switches.
> >
> > The clusters are all running Clustermatic 5 (BProc 4.x), and range
> > from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron.  In all cases
> > the same problems occur, identically.  I attach here the results
> > from "ompi_info --all" and the config.log, for my latest build on
> > an Opteron cluster, using the Pathscale compilers.  I had exactly
> > the same problems when using the vanilla GNU compilers.
> >
> > Now for a description of the problem:
> >
> > When running an mpi code (cpi.c, from the standard mpi examples, also
> > attached), using the mpirun defaults (e.g. -byslot), with a single
> > process:
> >
> >     sonoma:dgruner{134}> mpirun -n 1 ./cpip
> >     [n17:30019] odls_bproc: openpty failed, using pipes instead
> >     Process 0 on n17
> >     pi is approximately 3.1415926544231341, Error is 0.0000000008333410
> >     wall clock time = 0.000199
> >
> > However, if one tries to run more than one process, this bombs:
> >
> >     sonoma:dgruner{134}> mpirun -n 2 ./cpip
> >     .
> >     .
> >     .
> >     [n21:30029] OOB: Connection to HNP lost
> >     [n21:30029] OOB: Connection to HNP lost
> >     [n21:30029] OOB: Connection to HNP lost
> >     [n21:30029] OOB: Connection to HNP lost
> >     [n21:30029] OOB: Connection to HNP lost
> >     [n21:30029] OOB: Connection to HNP lost
> >     .
> >     . ad infinitum
> >
> > If one uses de option "-bynode", things work:
> >
> >     sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip
> >     [n17:30055] odls_bproc: openpty failed, using pipes instead
> >     Process 0 on n17
> >     Process 1 on n21
> >     pi is approximately 3.1415926544231318, Error is 0.0000000008333387
> >     wall clock time = 0.010375
> >
> >
> > Note that there is always the message about "openpty failed, using  
> > pipes instead".
> >
> > If I run more processes (on my 3-node cluster, with 2 cpus per  
> > node), the
> > openpty message appears repeatedly for the first node:
> >
> >     sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip
> >     [n17:30061] odls_bproc: openpty failed, using pipes instead
> >     [n17:30061] odls_bproc: openpty failed, using pipes instead
> >     Process 0 on n17
> >     Process 2 on n49
> >     Process 1 on n21
> >     Process 5 on n49
> >     Process 3 on n17
> >     Process 4 on n21
> >     pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> >     wall clock time = 0.050332
> >
> >
> > Should I worry about the openpty failure?  I suspect that  
> > communications
> > may be slower this way.  Using the -byslot option always fails, so  
> > this
> > is a bug.  The same occurs for all the codes that I have tried,  
> > both simple
> > and complex.
> >
> > Thanks for your attention to this.
> > Regards,
> > Daniel
> > -- 
> >
> > Dr. Daniel Gruner                        dgru...@chem.utoronto.ca
> > Dept. of Chemistry                       daniel.gru...@utoronto.ca
> > University of Toronto                    phone:  (416)-978-8689
> > 80 St. George Street                     fax:    (416)-978-5325
> > Toronto, ON  M5S 3H6, Canada             finger for PGP public key
> > <cpi.c.gz>
> > <config.log.gz>
> > <ompiinfo.gz>
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

Dr. Daniel Gruner                        dgru...@chem.utoronto.ca
Dept. of Chemistry                       daniel.gru...@utoronto.ca
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key

Reply via email to