Re: [OMPI users] large jobs hang on startup (deadlock?)

Ralph Castain Tue, 6 Feb 2007 18:48:13 -0500

Hi Todd

Just as a thought - you could try not using --debug-daemons or -d and
instead setting "-mca pls_rsh_num_concurrent 50" or some such small number.
This will tell the system to launch 50 ssh calls at a time, waiting for each
group to complete before launching the next. You can't use it with
--debug-daemons as that option prevents the ssh calls from "closing" so that
you can get the output from the daemons. You can still launch as big a job
as you like - we'll just do it 50 ssh calls at a time.


If we are truly overwhelming the slapd, then this should alleviate the
problem.

Let me know if you get to try it...
Ralph


On 2/6/07 4:05 PM, "Heywood, Todd" <heyw...@cshl.edu> wrote:

> Hi Ralph,

It looks that way. I created a user local to each node, with local
> authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up just
> fine for that.

I know this is an OpenMPI list, but does anyone know how
> common or uncommon LDAP-based clusters are? I would have thought this issue
> would have arisen elsewhere, but Googling MPI+LDAP (and similar) doesn't turn
> up much.

I'd certainly be willing to test any patch.
> Thanks.

Todd

-----Original Message-----
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
> Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users
> <us...@open-mpi.org>
Subject: Re: [OMPI users] large jobs hang on startup
> (deadlock?)

It sounds to me like we are probably overwhelming your slapd -
> your test
would seem to indicate that slowing down the slapd makes us fail
> even with
smaller jobs, which tends to support that idea.

We frankly haven't
> encountered that before since our rsh tests have all been
done using non-LDAP
> authentication (basically, we ask that you setup rsh to
auto-authenticate on
> each node). It sounds like we need to add an ability to
slow down so that the
> daemon doesn't "fail" due to authentication timeout
and/or slapd rejection due
> to the queue being full.

This may take a little time to fix due to other
> priorities, and will almost
certainly have to be released in a subsequent
> 1.2.x version. Meantime, I'll
let you know when I get something to test -
> would you be willing to give it
a shot if I provide a patch? I don't have
> access to an LDAP-based system.

Ralph


On 2/6/07 7:44 AM, "Heywood, Todd"
> <heyw...@cshl.edu> wrote:

> Hi Ralph,

Thanks for the reply. This is a tough
> one. It is OpenLDAP. I had
> thought that I might be hitting a file descriptor
> limit for slapd (LDAP
> daemon), which ulimit -n does not effect (you have to
> rebuild LDAP with a
> different FD_SETSIZE variable). However, I simply turned
> on more expressive
> logging to /var/log/slapd, and that resulted in smaller
> jobs (which
> successfully ran before) hanging. Go figure. It appears that
> daemons are up
> and running (from ps), and everything hangs in MPI_Init.
> Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to start
> as
> expected.
[blade1:04524] ERROR: There may be more information available
>
> from
[blade1:04524] ERROR: the remote shell (see above).
[blade1:04524]
> ERROR:
> The daemon exited unexpectedly with status 255.

I'm interested in
> any
> suggestion, semi-fixes, etc. which might help get to the bottom of this.
> Right
> now: whether the daemons are indeed up and running, or if there are
> some that
> are not (causing MPI_Init to
> hang).

Thanks,

Todd

-----Original
> Message-----
From:
> users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of
> Ralph H Castain
Sent:
> Tuesday, February 06, 2007 8:52 AM
To: Open MPI
> Users
> <us...@open-mpi.org>
Subject: Re: [OMPI users] large jobs hang on
> startup
> (deadlock?)

Well, I can't say for sure about LDAP. I did a quick
> search and
> found two
things:

1. there are limits imposed in LDAP that may
> apply to your
> situation, and

2. that statement varies tremendously
> depending upon the
> specific LDAP
implementation you are using

I would
> suggest you see which LDAP
> you are using and contact the
> respective
organization to ask if they do have
> such a limit, and if so, how
> to adjust
it.

It sounds like maybe we are
> hitting the LDAP server with too
> many requests
too rapidly. Usually, the issue
> is not starting fast enough,
> so this is a
new one! We don't currently check to
> see if everything started
> up okay, so
that is why the processes might hang -
> we hope to fix that soon.
> I'll have
to see if there is something we can do to
> help alleviate such
> problems -
might not be in time for the 1.2 release, but
> perhaps it will
> make a
subsequent "fix" or, if you are willing/interested, I
> could provide
> it to
you as a "patch" you could use until a later official
>
> release.

Meantime, you might try upgrading to 1.2b3 or even a nightly
> release
> from
the trunk. There are known problems with 1.2b2 (which is why
> there is a
> b3
and soon to be an rc1), though I don't think that will be the
> problem
> here.
At the least, the nightly trunk has a much better response to
> ctrl-c in
> it.

Ralph


On 2/5/07 9:50 AM, "Heywood, Todd" <heyw...@cshl.edu>
> wrote:

>
> Hi Ralph,
>  
> Thanks for the reply. The OpenMPI version is 1.2b2
> (because I
> would like to
> integrate it with SGE).
>  
> Here is what is
> happening:
>  
>
> (1)     When I run with debug-daemons (but WITHOUT d), I
> get ³Daemon>
> [0,0,27] checking in as pid 7620 on host blade28² (for example)
> messages for
>
> most but not all of the daemons that should be started up,
> and then it
> hangs.
> I also notice ³reconnecting to LDAP server² messages in
> various
>
> /var/log/secure files, and cannot login while things are hung
> (with ³su:>
> pam_ldap: ldap_result Can't contact LDAP server² in
> /var/log/messages). So
>
> apparently LDAP hits some limit to opening ssh
> sessions, and I¹m not sure
> how
> to address this.
> (2)     When I run with
> debug-daemons AND the debug
> option d, all daemons
> start start up and
> check-in, albeit slowly (debug
> must slow things down so
> LDAP can handle
> all the requests??). Then
> apparently, the cpi process is
> started for each
> task but it then hangs:
>
> 
> [blade1:23816] spawn: in
> job_state_callback(jobid = 1, state = 0x4)
>
> [blade1:23816] Info: Setting up
> debugger process table for applications
>
> MPIR_being_debugged = 0
>
> MPIR_debug_gate = 0
>  MPIR_debug_state = 1
>
> MPIR_acquired_pre_main = 0
>
> MPIR_i_am_starter = 0
>  MPIR_proctable_size =
> 800
>  MPIR_proctable:
>
> (i, host, exe, pid) = (0, blade1,
> /home4/itstaff/heywood/ompi/cpi, 24193)
>
> Š
> Š(i, host, exe, pid) = (799,
> blade213, /home4/itstaff/heywood/ompi/cpi,
> 4762)
>  
> A ³ps² on the head node
> shows 200 open ssh sessions, and 4 cpi
> processes doing
> nothing. A ^C gives
> this:
>  
> mpirun: killing job...
>
> 
>
> 
> --------------------------------------------------------------------------
>
>
> WARNING: A process refused to die!
>  
> Host: blade1
> PID:  24193
>  
>
> This
> process may still be running and/or consuming resources.
> 
>  
>  
>
> 
>
> Still got a ways to go, but any ideas/suggestions are welcome!
>  
>
> Thanks,
>
> 
> Todd
>  
>  
> 
> 
> From: users-boun...@open-mpi.org
>
> [mailto:users-boun...@open-mpi.org] On Behalf
> Of Ralph Castain
> Sent:
>
> Friday, February 02, 2007 5:20 PM
> To: Open MPI Users
> Subject: Re: [OMPI
>
> users] large jobs hang on startup (deadlock?)
>  
> Hi Todd
> 
> To help us
>
> provide advice, could you tell us what version of OpenMPI you are
> using?
>
>
> 
> Meantime, try adding ³-mca pls_rsh_num_concurrent 200² to your mpirun
>
> command
> line. You can up the number of concurrent daemons we launch to
>
> anything your
> system will support  basically, we limit the number only
>
> because some systems
> have limits on the number of ssh calls we can have
>
> active at any one time.
> Because we hold stdio open when running with
>
> ‹debug-daemons, the number of
> concurrent daemons must match or exceed the
>
> number of nodes you are trying to
> launch on.
> 
> I have a ³fix² in the
>
> works that will help relieve some of that restriction,
> but that won¹t come
>
> out until a later release.
> 
> Hopefully, that will allow you to obtain
> more
> debug info about why/where
> things are hanging.
> 
> Ralph
> 
> 
> On
> 2/2/07
> 11:41 AM, "Heywood, Todd" <heyw...@cshl.edu> wrote:
> I have OpenMPI
> running
> fine for a small/medium number of tasks (simple hello
> or cpi
> program). But
> when I try 700 or 800 tasks, it hangs, apparently on
>
> startup. I think this
> might be related to LDAP, since if I try to log into
> my
> account while the
> job is hung, I get told my username doesn¹t exist.
> However,
> I tried adding
> debug to the mpirun, and got the same sequence of
> output as
> for successful
> smaller runs, until it hung again. So I added
> -debug-daemons
> and got this
> (with an exit, i.e. no hanging):
> Š
>
> [blade1:31733] [0,0,0] wrote setup
> file
> 
>
> --------------------------------------------------------------------------
>
>
> The rsh launcher has been given a number of 128 concurrent daemons to
>
> launch
> and is in a debug-daemons option. However, the total number of
>
> daemons to
> launch (200) is greater than this value. This is a scenario
> that
> will cause
> the system to deadlock.
>  
> To avoid deadlock, either
> increase the number of
> concurrent daemons, or
> remove the debug-daemons
> flag.
>
> 
> --------------------------------------------------------------------------
>
>
> [blade1:31733] [0,0,0] ORTE_ERROR_LOG: Fatal in file
>
>
> ../../../../../orte/mca/rmgr/urm/
> rmgr_urm.c at line 455
> [blade1:31733]
>
> mpirun: spawn failed with errno=-6
> [blade1:31733] sess_dir_finalize: proc
>
> session dir not empty - leaving
>  
> Any ideas or suggestions
> appreciated.
>
> 
> Todd Heywood
>  
>  
> 
>  
> 
> 
>
>
> _______________________________________________
> users mailing list
>
>
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> 
>
> 
> 
> _______________________________________________
> users mailing
> list
>
> us...@open-mpi.org
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users




_______________________
> 
> ________________________
users mailing
>
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

__
> 
> _____________________________________________
users mailing
>
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




> _______________________________________________
users mailing
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

__
> _____________________________________________
users mailing
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] large jobs hang on startup (deadlock?)

Reply via email to