I'm a tad confused - this trace would appear to indicate that mpirun is failing, yes? Not your application?
The reason it works for local procs is that tm_init isn't called for that case - mpirun just fork/exec's the procs directly. When remote nodes are required, mpirun must connect to Torque. This is done with a call to: ret = tm_init(NULL, &tm_root); My guess is that something changed in PBS Pro 10.2 to that API. Can you check the tm header file and see? I have no access to PBS any more, so I'll have to rely on your eyes to see a diff. Thanks Ralph On Feb 12, 2010, at 8:50 AM, Repsher, Stephen J wrote: > Hello, > > I'm having problems running Open MPI jobs under PBS Pro 10.2. I've > configured and built OpenMPI 1.4.1 with the Intel 11.1 compiler on Linux and > with --with-tm support and the build runs fine. I've also built with static > libraries per the FAQ suggestion since libpbs is static. However, my test > application keep failing with a segmentation fault, but ONLY when trying to > select more than 1 node. Running on a single node withing PBS works fine. > Also, running outside of PBS vis ssh runs fine as well, even across multiple > nodes. OpenIB support is also enabled, but that doesn't seem to affect the > error because I've also tried running with the --mca btl tcp,self flag and it > still doesn't work. Here is the error I'm getting: > > [n34:26892] *** Process received signal *** > [n34:26892] Signal: Segmentation fault (11) > [n34:26892] Signal code: Address not mapped (1) > [n34:26892] Failing at address: 0x3f > [n34:26892] [ 0] /lib64/libpthread.so.0 [0x7fc0309d6a90] > [n34:26892] [ 1] > /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun(discui_+0x84) [0x476a50] > [n34:26892] [ 2] > /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun(diswsi+0xc3) [0x474063] > [n34:26892] [ 3] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x471d0c] > [n34:26892] [ 4] > /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun(tm_init+0x1fe) [0x471ff8] > [n34:26892] [ 5] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x43f580] > [n34:26892] [ 6] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x413921] > [n34:26892] [ 7] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x412b78] > [n34:26892] [ 8] /lib64/libc.so.6(__libc_start_main+0xe6) [0x7fc03068d586] > [n34:26892] [ 9] /part0/apps/MPI/intel/openmpi-1.4.1/bin/pbs_mpirun [0x412ac9] > [n34:26892] *** End of error message *** > Segmentation fault > > (NOTE: pbs_mpirun = orterun, mpirun, etc.) > > Has anyone else seen errors like this within PBS? > > ============================================ > Steve Repsher > Boeing Defense, Space, & Security - Rotorcraft > Aerodynamics/CFD > Phone: (610) 591-1510 > Fax: (610) 591-6263 > stephen.j.reps...@boeing.com > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users