Thanks to both you and David Gunter. I disabled pty support and it now works.
There is still the issue of the mpirun default being "-byslot", which causes all kinds of trouble. Only by using "-bynode" do things work properly. Daniel On Thu, Apr 26, 2007 at 02:28:33PM -0600, gshipman wrote: > There is a known issue on BProc 4 w.r.t. pty support. Open MPI by > default will try to use ptys for I/O forwarding but will revert to > pipes if ptys are not available. > > You can "safely" ignore the pty warnings, or you may want to rerun > configure and add: > --disable-pty-support > > I say "safely" because my understanding is that some I/O data may be > lost if pipes are used during abnormal termination. > > Alternatively you might try getting pty support working, you need to > configure ptys on the backend nodes. > You can then try the following code to test if it is working > correctly, if this fails (it does on our BProc 4 cluster) you > shouldn't use ptys on BProc. > > > #include <pty.h> > #include <utmp.h> > #include <stdio.h> > #include <string.h> > #include <errno.h> > > int > main(int argc, char *agrv[]) > { > int amaster, aslave; > > if (openpty(&amaster, &aslave, NULL, NULL, NULL) < 0) { > printf("openpty() failed with errno = %d, %s\n", errno, strerror > (errno)); > } else { > printf("openpty() succeeded\n"); > } > > return 0; > } > > > > > > > On Apr 26, 2007, at 2:06 PM, Daniel Gruner wrote: > > > Hi > > > > I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc- > > based clusters, and I have found some problems/issues. All my > > clusters have standard ethernet interconnects, either 100Base/T or > > Gigabit, on standard switches. > > > > The clusters are all running Clustermatic 5 (BProc 4.x), and range > > from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron. In all cases > > the same problems occur, identically. I attach here the results > > from "ompi_info --all" and the config.log, for my latest build on > > an Opteron cluster, using the Pathscale compilers. I had exactly > > the same problems when using the vanilla GNU compilers. > > > > Now for a description of the problem: > > > > When running an mpi code (cpi.c, from the standard mpi examples, also > > attached), using the mpirun defaults (e.g. -byslot), with a single > > process: > > > > sonoma:dgruner{134}> mpirun -n 1 ./cpip > > [n17:30019] odls_bproc: openpty failed, using pipes instead > > Process 0 on n17 > > pi is approximately 3.1415926544231341, Error is 0.0000000008333410 > > wall clock time = 0.000199 > > > > However, if one tries to run more than one process, this bombs: > > > > sonoma:dgruner{134}> mpirun -n 2 ./cpip > > . > > . > > . > > [n21:30029] OOB: Connection to HNP lost > > [n21:30029] OOB: Connection to HNP lost > > [n21:30029] OOB: Connection to HNP lost > > [n21:30029] OOB: Connection to HNP lost > > [n21:30029] OOB: Connection to HNP lost > > [n21:30029] OOB: Connection to HNP lost > > . > > . ad infinitum > > > > If one uses de option "-bynode", things work: > > > > sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip > > [n17:30055] odls_bproc: openpty failed, using pipes instead > > Process 0 on n17 > > Process 1 on n21 > > pi is approximately 3.1415926544231318, Error is 0.0000000008333387 > > wall clock time = 0.010375 > > > > > > Note that there is always the message about "openpty failed, using > > pipes instead". > > > > If I run more processes (on my 3-node cluster, with 2 cpus per > > node), the > > openpty message appears repeatedly for the first node: > > > > sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip > > [n17:30061] odls_bproc: openpty failed, using pipes instead > > [n17:30061] odls_bproc: openpty failed, using pipes instead > > Process 0 on n17 > > Process 2 on n49 > > Process 1 on n21 > > Process 5 on n49 > > Process 3 on n17 > > Process 4 on n21 > > pi is approximately 3.1415926544231239, Error is 0.0000000008333307 > > wall clock time = 0.050332 > > > > > > Should I worry about the openpty failure? I suspect that > > communications > > may be slower this way. Using the -byslot option always fails, so > > this > > is a bug. The same occurs for all the codes that I have tried, > > both simple > > and complex. > > > > Thanks for your attention to this. > > Regards, > > Daniel > > -- > > > > Dr. Daniel Gruner dgru...@chem.utoronto.ca > > Dept. of Chemistry daniel.gru...@utoronto.ca > > University of Toronto phone: (416)-978-8689 > > 80 St. George Street fax: (416)-978-5325 > > Toronto, ON M5S 3H6, Canada finger for PGP public key > > <cpi.c.gz> > > <config.log.gz> > > <ompiinfo.gz> > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Dr. Daniel Gruner dgru...@chem.utoronto.ca Dept. of Chemistry daniel.gru...@utoronto.ca University of Toronto phone: (416)-978-8689 80 St. George Street fax: (416)-978-5325 Toronto, ON M5S 3H6, Canada finger for PGP public key