Forgotten stack as promised, it keeps changing at the lower level 
opal_progress, but never moves above that.

[yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all 
Stack trace(s) for thread: 1
-----------------
[0-63] (64 processes)
-----------------
main() at ?:?
  Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, std::basic_string<char, 
std::char_traits<char>, std::allocator<char> > const&)() at ?:?
    Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
      Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
        Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
          Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:?
            Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
?:?
              Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at 
?:?
                Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() 
at ?:?
                  Loci::execute_list::execute(Loci::fact_db&, 
Loci::sched_db&)() at ?:?
                    Loci::execute_rule::execute(Loci::fact_db&, 
Loci::sched_db&)() at ?:?
                      streamUns::HypreSolveUnit::compute(Loci::sequence 
const&)() at ?:?
                        hypre_BoomerAMGSetup() at ?:?
                          hypre_BoomerAMGBuildInterp() at ?:?
                            -----------------
                            [0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes)
                            -----------------
                            hypre_ParCSRMatrixExtractBExt() at ?:?
                              hypre_ParCSRMatrixExtractBExt_Arrays() at ?:?
                                hypre_ParCSRCommHandleDestroy() at ?:?
                                  PMPI_Waitall() at ?:?
                                    -----------------
                                    [0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 
processes)
                                    -----------------
                                    ompi_request_default_wait_all() at ?:?
                                      opal_progress() at ?:?
                                    -----------------
                                    [6] (1 processes)
                                    -----------------
                                    ompi_mtl_psm_progress() at ?:?
                            -----------------
                            [1,4,17,20,25-26,35] (7 processes)
                            -----------------
                            hypre_ParCSRCommHandleDestroy() at ?:?
                              PMPI_Waitall() at ?:?
                                ompi_request_default_wait_all() at ?:?
                                  opal_progress() at ?:?
Stack trace(s) for thread: 2
-----------------
[0-63] (64 processes)
-----------------
start_thread() at ?:?
  ips_ptl_pollintr() at ptl_rcvthread.c:324
    poll() at ?:?


Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
bro...@umich.edu
(734)936-1985



On Mar 21, 2012, at 11:14 AM, Brock Palen wrote:

> I have a users code that appears to be hanging some times on MPI_Waitall(),  
> stack trace from padb below.  It is on qlogic IB using the psm mtl.
> Without knowing what requests go to which rank, how can I check that this 
> code didn't just get its self into a deadlock?  Is there a way to get a 
> reable list of every ranks posted sends?  And then query an wiating 
> MPI_Waitall() of a running job to get what rends/recvs it is waiting on?
> 
> Thanks!
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 


Reply via email to