That error would indicate something wrong with the pbs connection - it is tm_init that is crashing. I note that you did --with-tm pointing to a different location - was that intentional? Could be something wrong with that pbs build
On Jun 10, 2010, at 8:44 AM, Richard Walsh wrote: > > All, > > I am upgrading from 1.4.1 to 1.4.2 on both a cluster with IB and one without. > I have no problem on the GE cluster without IB which requires no special > configure > options for the IB. 1.4.2 works perfectly there with both the latest Intel > and PGI > compiler. > > On the IB system 1.4.1 has worked fine with the following configure line: > > ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm > --with-openib --prefix=/share/apps/openmpi-intel/1.4.1 > --with-tm=/share/apps/pbs/10.1.0.91350 > > I have now built 1.4.2. with the almost identical: > > $ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm > --with-openib --prefix=/share/apps/openmpi-intel/1.4.2 > --with-tm=/share/apps/pbs/default > > When I run a basic MPI test program with: > > /share/apps/openmpi-intel/1.4.2/bin/mpirun -np 16 -machinefile $PBS_NODEFILE > ./hello_mpi.exe > > which defaults to using the IB switch, or with: > > /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 > -machinefile $PBS_NODEFILE ./hello_mpi.exe > > which forces the use of GE, I get the same error: > > [compute-0-3:22515] *** Process received signal *** > [compute-0-3:22515] Signal: Segmentation fault (11) > [compute-0-3:22515] Signal code: Address not mapped (1) > [compute-0-3:22515] Failing at address: 0x3f > [compute-0-3:22515] [ 0] /lib64/libpthread.so.0 [0x3639e0e7c0] > [compute-0-3:22515] [ 1] > /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(discui_+0x84) > [0x2b7b546dd3d0] > [compute-0-3:22515] [ 2] > /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(diswsi+0xc3) > [0x2b7b546da9e3] > [compute-0-3:22515] [ 3] > /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d868c] > [compute-0-3:22515] [ 4] > /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(tm_init+0x1fe) > [0x2b7b546d8978] > [compute-0-3:22515] [ 5] > /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d791c] > [compute-0-3:22515] [ 6] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x404c27] > [compute-0-3:22515] [ 7] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403e38] > [compute-0-3:22515] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x363961d994] > [compute-0-3:22515] [ 9] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403d69] > [compute-0-3:22515] *** End of error message *** > /var/spool/PBS/mom_priv/jobs/9909.bob.csi.cuny.edu.SC: line 42: 22515 > Segmentation fault /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl > tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe > > When compiling with the PGI compiler suite I get the same result > although the traceback gives less detail. I notice postings that suggest > the if I disable the memory-manager I might be able to get around > this problem, but that will result in a performance hit on this IB > system. > > Have others seen this? Suggestions? > > Thanks, > > Richard Walsh > CUNY HPC Center > > Richard Walsh > Parallel Applications and Systems Manager > CUNY HPC Center, Staten Island, NY > 718-982-3319 > 612-382-4620 > > Mighty the Wizard > Who found me at sunrise > Sleeping, and woke me > And learn'd me Magic! > > Think green before you print this email. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users