Hi Todd Just as a thought - you could try not using --debug-daemons or -d and instead setting "-mca pls_rsh_num_concurrent 50" or some such small number. This will tell the system to launch 50 ssh calls at a time, waiting for each group to complete before launching the next. You can't use it with --debug-daemons as that option prevents the ssh calls from "closing" so that you can get the output from the daemons. You can still launch as big a job as you like - we'll just do it 50 ssh calls at a time.
If we are truly overwhelming the slapd, then this should alleviate the problem. Let me know if you get to try it... Ralph On 2/6/07 4:05 PM, "Heywood, Todd" <heyw...@cshl.edu> wrote: > Hi Ralph, It looks that way. I created a user local to each node, with local > authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just > fine for that. I know this is an OpenMPI list, but does anyone know how > common or uncommon LDAP-based clusters are? I would have thought this issue > would have arisen elsewhere, but Googling MPI+LDAP (and similar) doesn't turn > up much. I'd certainly be willing to test any patch. > Thanks. Todd -----Original Message----- From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain Sent: > Tuesday, February 06, 2007 9:54 AM To: Open MPI Users > <us...@open-mpi.org> Subject: Re: [OMPI users] large jobs hang on startup > (deadlock?) It sounds to me like we are probably overwhelming your slapd - > your test would seem to indicate that slowing down the slapd makes us fail > even with smaller jobs, which tends to support that idea. We frankly haven't > encountered that before since our rsh tests have all been done using non-LDAP > authentication (basically, we ask that you setup rsh to auto-authenticate on > each node). It sounds like we need to add an ability to slow down so that the > daemon doesn't "fail" due to authentication timeout and/or slapd rejection due > to the queue being full. This may take a little time to fix due to other > priorities, and will almost certainly have to be released in a subsequent > 1.2.x version. Meantime, I'll let you know when I get something to test - > would you be willing to give it a shot if I provide a patch? I don't have > access to an LDAP-based system. Ralph On 2/6/07 7:44 AM, "Heywood, Todd" > <heyw...@cshl.edu> wrote: > Hi Ralph, Thanks for the reply. This is a tough > one. It is OpenLDAP. I had > thought that I might be hitting a file descriptor > limit for slapd (LDAP > daemon), which ulimit -n does not effect (you have to > rebuild LDAP with a > different FD_SETSIZE variable). However, I simply turned > on more expressive > logging to /var/log/slapd, and that resulted in smaller > jobs (which > successfully ran before) hanging. Go figure. It appears that > daemons are up > and running (from ps), and everything hangs in MPI_Init. > Ctl-C > gives [blade1:04524] ERROR: A daemon on node blade26 failed to start > as > expected. [blade1:04524] ERROR: There may be more information available > > from [blade1:04524] ERROR: the remote shell (see above). [blade1:04524] > ERROR: > The daemon exited unexpectedly with status 255. I'm interested in > any > suggestion, semi-fixes, etc. which might help get to the bottom of this. > Right > now: whether the daemons are indeed up and running, or if there are > some that > are not (causing MPI_Init to > hang). Thanks, Todd -----Original > Message----- From: > users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of > Ralph H Castain Sent: > Tuesday, February 06, 2007 8:52 AM To: Open MPI > Users > <us...@open-mpi.org> Subject: Re: [OMPI users] large jobs hang on > startup > (deadlock?) Well, I can't say for sure about LDAP. I did a quick > search and > found two things: 1. there are limits imposed in LDAP that may > apply to your > situation, and 2. that statement varies tremendously > depending upon the > specific LDAP implementation you are using I would > suggest you see which LDAP > you are using and contact the > respective organization to ask if they do have > such a limit, and if so, how > to adjust it. It sounds like maybe we are > hitting the LDAP server with too > many requests too rapidly. Usually, the issue > is not starting fast enough, > so this is a new one! We don't currently check to > see if everything started > up okay, so that is why the processes might hang - > we hope to fix that soon. > I'll have to see if there is something we can do to > help alleviate such > problems - might not be in time for the 1.2 release, but > perhaps it will > make a subsequent "fix" or, if you are willing/interested, I > could provide > it to you as a "patch" you could use until a later official > > release. Meantime, you might try upgrading to 1.2b3 or even a nightly > release > from the trunk. There are known problems with 1.2b2 (which is why > there is a > b3 and soon to be an rc1), though I don't think that will be the > problem > here. At the least, the nightly trunk has a much better response to > ctrl-c in > it. Ralph On 2/5/07 9:50 AM, "Heywood, Todd" <heyw...@cshl.edu> > wrote: > > Hi Ralph, > > Thanks for the reply. The OpenMPI version is 1.2b2 > (because I > would like to > integrate it with SGE). > > Here is what is > happening: > > > (1) When I run with debug-daemons (but WITHOUT d), I > get ³Daemon> > [0,0,27] checking in as pid 7620 on host blade28² (for example) > messages for > > most but not all of the daemons that should be started up, > and then it > hangs. > I also notice ³reconnecting to LDAP server² messages in > various > > /var/log/secure files, and cannot login while things are hung > (with ³su:> > pam_ldap: ldap_result Can't contact LDAP server² in > /var/log/messages). So > > apparently LDAP hits some limit to opening ssh > sessions, and I¹m not sure > how > to address this. > (2) When I run with > debug-daemons AND the debug > option d, all daemons > start start up and > check-in, albeit slowly (debug > must slow things down so > LDAP can handle > all the requests??). Then > apparently, the cpi process is > started for each > task but it then hangs: > > > [blade1:23816] spawn: in > job_state_callback(jobid = 1, state = 0x4) > > [blade1:23816] Info: Setting up > debugger process table for applications > > MPIR_being_debugged = 0 > > MPIR_debug_gate = 0 > MPIR_debug_state = 1 > > MPIR_acquired_pre_main = 0 > > MPIR_i_am_starter = 0 > MPIR_proctable_size = > 800 > MPIR_proctable: > > (i, host, exe, pid) = (0, blade1, > /home4/itstaff/heywood/ompi/cpi, 24193) > > Š > Š(i, host, exe, pid) = (799, > blade213, /home4/itstaff/heywood/ompi/cpi, > 4762) > > A ³ps² on the head node > shows 200 open ssh sessions, and 4 cpi > processes doing > nothing. A ^C gives > this: > > mpirun: killing job... > > > > > -------------------------------------------------------------------------- > > > WARNING: A process refused to die! > > Host: blade1 > PID: 24193 > > > This > process may still be running and/or consuming resources. > > > > > > > Still got a ways to go, but any ideas/suggestions are welcome! > > > Thanks, > > > Todd > > > > > From: users-boun...@open-mpi.org > > [mailto:users-boun...@open-mpi.org] On Behalf > Of Ralph Castain > Sent: > > Friday, February 02, 2007 5:20 PM > To: Open MPI Users > Subject: Re: [OMPI > > users] large jobs hang on startup (deadlock?) > > Hi Todd > > To help us > > provide advice, could you tell us what version of OpenMPI you are > using? > > > > Meantime, try adding ³-mca pls_rsh_num_concurrent 200² to your mpirun > > command > line. You can up the number of concurrent daemons we launch to > > anything your > system will support basically, we limit the number only > > because some systems > have limits on the number of ssh calls we can have > > active at any one time. > Because we hold stdio open when running with > > ‹debug-daemons, the number of > concurrent daemons must match or exceed the > > number of nodes you are trying to > launch on. > > I have a ³fix² in the > > works that will help relieve some of that restriction, > but that won¹t come > > out until a later release. > > Hopefully, that will allow you to obtain > more > debug info about why/where > things are hanging. > > Ralph > > > On > 2/2/07 > 11:41 AM, "Heywood, Todd" <heyw...@cshl.edu> wrote: > I have OpenMPI > running > fine for a small/medium number of tasks (simple hello > or cpi > program). But > when I try 700 or 800 tasks, it hangs, apparently on > > startup. I think this > might be related to LDAP, since if I try to log into > my > account while the > job is hung, I get told my username doesn¹t exist. > However, > I tried adding > debug to the mpirun, and got the same sequence of > output as > for successful > smaller runs, until it hung again. So I added > -debug-daemons > and got this > (with an exit, i.e. no hanging): > Š > > [blade1:31733] [0,0,0] wrote setup > file > > > -------------------------------------------------------------------------- > > > The rsh launcher has been given a number of 128 concurrent daemons to > > launch > and is in a debug-daemons option. However, the total number of > > daemons to > launch (200) is greater than this value. This is a scenario > that > will cause > the system to deadlock. > > To avoid deadlock, either > increase the number of > concurrent daemons, or > remove the debug-daemons > flag. > > > -------------------------------------------------------------------------- > > > [blade1:31733] [0,0,0] ORTE_ERROR_LOG: Fatal in file > > > ../../../../../orte/mca/rmgr/urm/ > rmgr_urm.c at line 455 > [blade1:31733] > > mpirun: spawn failed with errno=-6 > [blade1:31733] sess_dir_finalize: proc > > session dir not empty - leaving > > Any ideas or suggestions > appreciated. > > > Todd Heywood > > > > > > > > > _______________________________________________ > users mailing list > > > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > users mailing > list > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________ > > ________________________ users mailing > > list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users __ > > _____________________________________________ users mailing > > list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ users mailing > list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users __ > _____________________________________________ users mailing > list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users