Forgotten stack as promised, it keeps changing at the lower level opal_progress, but never moves above that.
[yccho@nyx0817 ~]$ padb -Ormgr=orte --all --stack-trace --tree --all Stack trace(s) for thread: 1 ----------------- [0-63] (64 processes) ----------------- main() at ?:? Loci::makeQuery(Loci::rule_db const&, Loci::fact_db&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)() at ?:? Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_loop::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_list::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? Loci::execute_rule::execute(Loci::fact_db&, Loci::sched_db&)() at ?:? streamUns::HypreSolveUnit::compute(Loci::sequence const&)() at ?:? hypre_BoomerAMGSetup() at ?:? hypre_BoomerAMGBuildInterp() at ?:? ----------------- [0,2-3,5-16,18-19,21-24,27-34,36-63] (57 processes) ----------------- hypre_ParCSRMatrixExtractBExt() at ?:? hypre_ParCSRMatrixExtractBExt_Arrays() at ?:? hypre_ParCSRCommHandleDestroy() at ?:? PMPI_Waitall() at ?:? ----------------- [0,2-3,5,7-16,18-19,21-24,27-34,36-63] (56 processes) ----------------- ompi_request_default_wait_all() at ?:? opal_progress() at ?:? ----------------- [6] (1 processes) ----------------- ompi_mtl_psm_progress() at ?:? ----------------- [1,4,17,20,25-26,35] (7 processes) ----------------- hypre_ParCSRCommHandleDestroy() at ?:? PMPI_Waitall() at ?:? ompi_request_default_wait_all() at ?:? opal_progress() at ?:? Stack trace(s) for thread: 2 ----------------- [0-63] (64 processes) ----------------- start_thread() at ?:? ips_ptl_pollintr() at ptl_rcvthread.c:324 poll() at ?:? Brock Palen www.umich.edu/~brockp CAEN Advanced Computing bro...@umich.edu (734)936-1985 On Mar 21, 2012, at 11:14 AM, Brock Palen wrote: > I have a users code that appears to be hanging some times on MPI_Waitall(), > stack trace from padb below. It is on qlogic IB using the psm mtl. > Without knowing what requests go to which rank, how can I check that this > code didn't just get its self into a deadlock? Is there a way to get a > reable list of every ranks posted sends? And then query an wiating > MPI_Waitall() of a running job to get what rends/recvs it is waiting on? > > Thanks! > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > bro...@umich.edu > (734)936-1985 > > >