Re: [OMPI users] core from today

2007-11-13 Thread Terry Dontje
Marcin, A couple questions: What OS are you running on? Did you run this job oversubscribed, that is more processes than there are cpus? I've found with oversubscribed jobs that the recursive calls to opal_progress by the SM BTL that the yield within opal_progress (intending to give up the c

[OMPI users] core from today

2007-11-13 Thread Marcin Skoczylas
OpenMPI 1.2.4 mpirun noticed that job rank 0 with PID 19021 on node pc801 exited on signal 15 (Terminated). 11 additional processes aborted (not shown) (gdb) bt #0 0x411b776c in mca_pml_ob1_recv_frag_match () from /usr/local/openmpi//lib/openmpi/mca_pml_ob1.so #1 0x411ce010 in mca_btl_sm_co