That error would indicate something wrong with the pbs connection - it is 
tm_init that is crashing. I note that you did --with-tm pointing to a different 
location - was that intentional? Could be something wrong with that pbs build

On Jun 10, 2010, at 8:44 AM, Richard Walsh wrote:

> 
> All,
> 
> I am upgrading from 1.4.1 to 1.4.2 on both a cluster with IB and one without.
> I have no problem on the GE cluster without IB which requires no special 
> configure
> options for the IB.  1.4.2 works perfectly there with both the latest Intel 
> and PGI
> compiler.
> 
> On the IB system 1.4.1 has worked fine with the following configure line:
> 
> ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm 
> --with-openib --prefix=/share/apps/openmpi-intel/1.4.1 
> --with-tm=/share/apps/pbs/10.1.0.91350
> 
> I have now built 1.4.2. with the almost identical:
> 
> $ ./configure CC=icc CXX=icpc F77=ifort FC=ifort --enable-openib-ibcm 
> --with-openib --prefix=/share/apps/openmpi-intel/1.4.2 
> --with-tm=/share/apps/pbs/default
> 
> When I run a basic MPI test program with:
> 
> /share/apps/openmpi-intel/1.4.2/bin/mpirun -np 16 -machinefile $PBS_NODEFILE 
> ./hello_mpi.exe
> 
> which defaults to using the IB switch, or with:
> 
> /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl tcp,self -np 16 
> -machinefile $PBS_NODEFILE ./hello_mpi.exe
> 
> which forces the use of GE, I get the same error:
> 
> [compute-0-3:22515] *** Process received signal ***
> [compute-0-3:22515] Signal: Segmentation fault (11)
> [compute-0-3:22515] Signal code: Address not mapped (1)
> [compute-0-3:22515] Failing at address: 0x3f
> [compute-0-3:22515] [ 0] /lib64/libpthread.so.0 [0x3639e0e7c0]
> [compute-0-3:22515] [ 1] 
> /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(discui_+0x84) 
> [0x2b7b546dd3d0]
> [compute-0-3:22515] [ 2] 
> /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(diswsi+0xc3) 
> [0x2b7b546da9e3]
> [compute-0-3:22515] [ 3] 
> /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d868c]
> [compute-0-3:22515] [ 4] 
> /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so(tm_init+0x1fe) 
> [0x2b7b546d8978]
> [compute-0-3:22515] [ 5] 
> /share/apps/openmpi-intel/1.4.2/lib/openmpi/mca_plm_tm.so [0x2b7b546d791c]
> [compute-0-3:22515] [ 6] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x404c27]
> [compute-0-3:22515] [ 7] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403e38]
> [compute-0-3:22515] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) 
> [0x363961d994]
> [compute-0-3:22515] [ 9] /share/apps/openmpi-intel/1.4.2/bin/mpirun [0x403d69]
> [compute-0-3:22515] *** End of error message ***
> /var/spool/PBS/mom_priv/jobs/9909.bob.csi.cuny.edu.SC: line 42: 22515 
> Segmentation fault      /share/apps/openmpi-intel/1.4.2/bin/mpirun -mca btl 
> tcp,self -np 16 -machinefile $PBS_NODEFILE ./hello_mpi.exe
> 
> When compiling with the PGI compiler suite I get the same result
> although the traceback gives less detail.  I notice postings that suggest
> the if I disable the memory-manager I might be able to get around
> this problem, but that will result in a performance hit on this IB
> system.
> 
> Have others seen this?  Suggestions?
> 
> Thanks,
> 
> Richard Walsh
> CUNY HPC Center
> 
>   Richard Walsh
>   Parallel Applications and Systems Manager
>   CUNY HPC Center, Staten Island, NY
>   718-982-3319
>   612-382-4620
> 
>   Mighty the Wizard
>   Who found me at sunrise
>   Sleeping, and woke me
>   And learn'd me Magic!
> 
> Think green before you print this email.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to