Re: [OMPI users] large jobs hang on startup (deadlock?)

Ralph Castain Wed, 7 Feb 2007 13:28:19 -0500

Hi Todd

I truly appreciate your patience. If the rate was the same with that switch
set, then that would indicate to me that we aren't having trouble getting
through the slapd - it probably isn't a problem with how hard we are driving
it, but rather with the total number of connections being created.
Basically, we need to establish one connection/node to launch the orteds
(the app procs are just fork/exec'd by the orteds so they shouldn't see the
slapd).


The issue may have to do with limits on the total number of LDAP
authentication connections allowed for one user. I believe that is settable,
but will have to look it up and/or ask a few friends that might know.

I have not seen an LDAP-based cluster before (though authentication onto the
head node of a cluster is frequently handled that way), but that doesn't
mean someone hasn't done it.

Again, appreciate the patience.
Ralph



On 2/7/07 10:28 AM, "Heywood, Todd" <heyw...@cshl.edu> wrote:

> Hi Ralph,

Unfortunately, adding "-mca pls_rsh_num_concurrent 50" to mpirun
> (with just -np and -hostfile) has no effect. The number of established
> connections for slapd grows to the same number at the same rate as without it.
> 

BTW, I upgraded from 1.2b2 to 1.2b3

Thanks,

TOdd

-----Original
> Message-----
From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday,
> February 06, 2007 6:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] large
> jobs hang on startup (deadlock?)

Hi Todd

Just as a thought - you could try
> not using --debug-daemons or -d and
instead setting "-mca
> pls_rsh_num_concurrent 50" or some such small number.
This will tell the
> system to launch 50 ssh calls at a time, waiting for each
group to complete
> before launching the next. You can't use it with
--debug-daemons as that
> option prevents the ssh calls from "closing" so that
you can get the output
> from the daemons. You can still launch as big a job
as you like - we'll just
> do it 50 ssh calls at a time.

If we are truly overwhelming the slapd, then
> this should alleviate the
problem.

Let me know if you get to try
> it...
Ralph


On 2/6/07 4:05 PM, "Heywood, Todd" <heyw...@cshl.edu> wrote:

>
> Hi Ralph,

It looks that way. I created a user local to each node, with
> local
> authentication via /etc/passwd and /etc/shadow, and OpenMPI scales up
> just
> fine for that.

I know this is an OpenMPI list, but does anyone know
> how
> common or uncommon LDAP-based clusters are? I would have thought this
> issue
> would have arisen elsewhere, but Googling MPI+LDAP (and similar)
> doesn't turn
> up much.

I'd certainly be willing to test any patch.
>
> Thanks.

Todd

-----Original Message-----
From: users-boun...@open-mpi.org
>
> [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph H Castain
Sent:
>
> Tuesday, February 06, 2007 9:54 AM
To: Open MPI Users
>
> <us...@open-mpi.org>
Subject: Re: [OMPI users] large jobs hang on startup
>
> (deadlock?)

It sounds to me like we are probably overwhelming your slapd -
>
> your test
would seem to indicate that slowing down the slapd makes us fail
>
> even with
smaller jobs, which tends to support that idea.

We frankly
> haven't
> encountered that before since our rsh tests have all been
done using
> non-LDAP
> authentication (basically, we ask that you setup rsh
> to
auto-authenticate on
> each node). It sounds like we need to add an ability
> to
slow down so that the
> daemon doesn't "fail" due to authentication
> timeout
and/or slapd rejection due
> to the queue being full.

This may take a
> little time to fix due to other
> priorities, and will almost
certainly have
> to be released in a subsequent
> 1.2.x version. Meantime, I'll
let you know
> when I get something to test -
> would you be willing to give it
a shot if I
> provide a patch? I don't have
> access to an LDAP-based system.

Ralph


On
> 2/6/07 7:44 AM, "Heywood, Todd"
> <heyw...@cshl.edu> wrote:

> Hi
> Ralph,

Thanks for the reply. This is a tough
> one. It is OpenLDAP. I had
>
> thought that I might be hitting a file descriptor
> limit for slapd (LDAP
>
> daemon), which ulimit -n does not effect (you have to
> rebuild LDAP with a
>
> different FD_SETSIZE variable). However, I simply turned
> on more
> expressive
> logging to /var/log/slapd, and that resulted in smaller
> jobs
> (which
> successfully ran before) hanging. Go figure. It appears that
>
> daemons are up
> and running (from ps), and everything hangs in MPI_Init.
>
> Ctl-C
> gives

[blade1:04524] ERROR: A daemon on node blade26 failed to
> start
> as
> expected.
[blade1:04524] ERROR: There may be more information
> available
>
> from
[blade1:04524] ERROR: the remote shell (see
> above).
[blade1:04524]
> ERROR:
> The daemon exited unexpectedly with status
> 255.

I'm interested in
> any
> suggestion, semi-fixes, etc. which might help
> get to the bottom of this.
> Right
> now: whether the daemons are indeed up
> and running, or if there are
> some that
> are not (causing MPI_Init to
>
> hang).

Thanks,

Todd

-----Original
> Message-----
From:
>
> users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf
> Of
> Ralph H Castain
Sent:
> Tuesday, February 06, 2007 8:52 AM
To: Open MPI
>
> Users
> <us...@open-mpi.org>
Subject: Re: [OMPI users] large jobs hang on
>
> startup
> (deadlock?)

Well, I can't say for sure about LDAP. I did a quick
>
> search and
> found two
things:

1. there are limits imposed in LDAP that may
>
> apply to your
> situation, and

2. that statement varies tremendously
>
> depending upon the
> specific LDAP
implementation you are using

I would
>
> suggest you see which LDAP
> you are using and contact the
>
> respective
organization to ask if they do have
> such a limit, and if so,
> how
> to adjust
it.

It sounds like maybe we are
> hitting the LDAP server
> with too
> many requests
too rapidly. Usually, the issue
> is not starting
> fast enough,
> so this is a
new one! We don't currently check to
> see if
> everything started
> up okay, so
that is why the processes might hang -
> we
> hope to fix that soon.
> I'll have
to see if there is something we can do to
>
> help alleviate such
> problems -
might not be in time for the 1.2 release,
> but
> perhaps it will
> make a
subsequent "fix" or, if you are
> willing/interested, I
> could provide
> it to
you as a "patch" you could use
> until a later official
>
> release.

Meantime, you might try upgrading to
> 1.2b3 or even a nightly
> release
> from
the trunk. There are known problems
> with 1.2b2 (which is why
> there is a
> b3
and soon to be an rc1), though I
> don't think that will be the
> problem
> here.
At the least, the nightly trunk
> has a much better response to
> ctrl-c in
> it.

Ralph


On 2/5/07 9:50 AM,
> "Heywood, Todd" <heyw...@cshl.edu>
> wrote:

>
> Hi Ralph,
>  
> Thanks for
> the reply. The OpenMPI version is 1.2b2
> (because I
> would like to
>
> integrate it with SGE).
>  
> Here is what is
> happening:
>  
>
> (1)
> When I run with debug-daemons (but WITHOUT d), I
> get ³Daemon>
> [0,0,27]
> checking in as pid 7620 on host blade28² (for example)
> messages for
>
> most
> but not all of the daemons that should be started up,
> and then it
> hangs.
>
> I also notice ³reconnecting to LDAP server² messages in
> various
>
>
> /var/log/secure files, and cannot login while things are hung
> (with ³su:>
>
> pam_ldap: ldap_result Can't contact LDAP server² in
> /var/log/messages).
> So
>
> apparently LDAP hits some limit to opening ssh
> sessions, and I¹m not
> sure
> how
> to address this.
> (2)     When I run with
> debug-daemons AND
> the debug
> option d, all daemons
> start start up and
> check-in, albeit
> slowly (debug
> must slow things down so
> LDAP can handle
> all the
> requests??). Then
> apparently, the cpi process is
> started for each
> task
> but it then hangs:
>
> 
> [blade1:23816] spawn: in
> job_state_callback(jobid
> = 1, state = 0x4)
>
> [blade1:23816] Info: Setting up
> debugger process table
> for applications
>
> MPIR_being_debugged = 0
>
> MPIR_debug_gate = 0
>
> MPIR_debug_state = 1
>
> MPIR_acquired_pre_main = 0
>
> MPIR_i_am_starter =
> 0
>  MPIR_proctable_size =
> 800
>  MPIR_proctable:
>
> (i, host, exe, pid) =
> (0, blade1,
> /home4/itstaff/heywood/ompi/cpi, 24193)
>
> Š
> Š(i, host, exe,
> pid) = (799,
> blade213, /home4/itstaff/heywood/ompi/cpi,
> 4762)
>  
> A ³ps²
> on the head node
> shows 200 open ssh sessions, and 4 cpi
> processes doing
>
> nothing. A ^C gives
> this:
>  
> mpirun: killing job...
>
> 
>
> 
>
> --------------------------------------------------------------------------
>
>
> 
> WARNING: A process refused to die!
>  
> Host: blade1
> PID:  24193
>  
>
>
> This
> process may still be running and/or consuming resources.
> 
>  
>  
>
>
> 
>
> Still got a ways to go, but any ideas/suggestions are welcome!
>  
>
>
> Thanks,
>
> 
> Todd
>  
>  
> 
> 
> From: users-boun...@open-mpi.org
>
>
> [mailto:users-boun...@open-mpi.org] On Behalf
> Of Ralph Castain
> Sent:
>
>
> Friday, February 02, 2007 5:20 PM
> To: Open MPI Users
> Subject: Re:
> [OMPI
>
> users] large jobs hang on startup (deadlock?)
>  
> Hi Todd
> 
> To
> help us
>
> provide advice, could you tell us what version of OpenMPI you
> are
> using?
>
>
> 
> Meantime, try adding ³-mca pls_rsh_num_concurrent 200²
> to your mpirun
>
> command
> line. You can up the number of concurrent daemons
> we launch to
>
> anything your
> system will support  basically, we limit the
> number only
>
> because some systems
> have limits on the number of ssh calls
> we can have
>
> active at any one time.
> Because we hold stdio open when
> running with
>
> ‹debug-daemons, the number of
> concurrent daemons must match
> or exceed the
>
> number of nodes you are trying to
> launch on.
> 
> I have a
> ³fix² in the
>
> works that will help relieve some of that restriction,
> but
> that won¹t come
>
> out until a later release.
> 
> Hopefully, that will allow
> you to obtain
> more
> debug info about why/where
> things are hanging.
> 
>
> Ralph
> 
> 
> On
> 2/2/07
> 11:41 AM, "Heywood, Todd" <heyw...@cshl.edu>
> wrote:
> I have OpenMPI
> running
> fine for a small/medium number of tasks
> (simple hello
> or cpi
> program). But
> when I try 700 or 800 tasks, it
> hangs, apparently on
>
> startup. I think this
> might be related to LDAP,
> since if I try to log into
> my
> account while the
> job is hung, I get told
> my username doesn¹t exist.
> However,
> I tried adding
> debug to the mpirun,
> and got the same sequence of
> output as
> for successful
> smaller runs,
> until it hung again. So I added
> -debug-daemons
> and got this
> (with an
> exit, i.e. no hanging):
> Š
>
> [blade1:31733] [0,0,0] wrote setup
> file
>
> 
>
> 
> --------------------------------------------------------------------------
>
>
> 
> The rsh launcher has been given a number of 128 concurrent daemons to
>
>
> launch
> and is in a debug-daemons option. However, the total number of
>
>
> daemons to
> launch (200) is greater than this value. This is a scenario
>
> that
> will cause
> the system to deadlock.
>  
> To avoid deadlock, either
>
> increase the number of
> concurrent daemons, or
> remove the debug-daemons
>
> flag.
>
> 
> 
> --------------------------------------------------------------------------
>
>
> 
> [blade1:31733] [0,0,0] ORTE_ERROR_LOG: Fatal in file
>
>
>
> ../../../../../orte/mca/rmgr/urm/
> rmgr_urm.c at line 455
>
> [blade1:31733]
>
> mpirun: spawn failed with errno=-6
> [blade1:31733]
> sess_dir_finalize: proc
>
> session dir not empty - leaving
>  
> Any ideas or
> suggestions
> appreciated.
>
> 
> Todd Heywood
>  
>  
> 
>  
> 
> 
>
>
>
> _______________________________________________
> users mailing list
>
>
>
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> 
>
> 
> 
> _______________________________________________
> users mailing
>
> list
>
> us...@open-mpi.org
>
>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users




_______________________
> 
> 
> ________________________
users mailing
>
>
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

__
> 
> 
> _____________________________________________
users mailing
>
>
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




> 
> _______________________________________________
users mailing
>
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

__
> 
> _____________________________________________
users mailing
>
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




> _______________________________________________
users mailing
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

__
> _____________________________________________
users mailing
> list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] large jobs hang on startup (deadlock?)

Reply via email to