I'm having a problem using OpenMPI under PBS Pro 10.4.  I tried both 1.4.3 and 
1.5.3, both behave the same.  I'm able to run just fine if I don't use PBS and 
go direct to the nodes.  Also, if I run under PBS and use only 1 node, it works 
fine, but as soon as I span nodes, I get the following:

[a4ou-n501:07366] *** Process received signal ***
[a4ou-n501:07366] Signal: Segmentation fault (11)
[a4ou-n501:07366] Signal code: Address not mapped (1)
[a4ou-n501:07366] Failing at address: 0x3f
[a4ou-n501:07366] [ 0] /lib64/libpthread.so.0 [0x3f2b20eb10]
[a4ou-n501:07366] [ 1] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(discui_+0x84) 
[0x2affa453765c]
[a4ou-n501:07366] [ 2] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(diswsi+0xc3) 
[0x2affa4534c6f]
[a4ou-n501:07366] [ 3] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0 
[0x2affa453290c]
[a4ou-n501:07366] [ 4] 
/opt/ompi/1.4.3/intel/lib/libopen-rte.so.0(tm_init+0x1fe) [0x2affa4532bf8]
[a4ou-n501:07366] [ 5] /opt/ompi/1.4.3/intel/lib/libopen-rte.so.0 
[0x2affa452691c]
[a4ou-n501:07366] [ 6] mpirun [0x404c17]
[a4ou-n501:07366] [ 7] mpirun [0x403e28]
[a4ou-n501:07366] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3f2a61d994]
[a4ou-n501:07366] [ 9] mpirun [0x403d59]
[a4ou-n501:07366] *** End of error message ***
Segmentation fault

I searched the archives and found a similar issue from last year:

http://www.open-mpi.org/community/lists/users/2010/02/12084.php

The last update I saw was that someone was going to contact Altair and have 
them look at why it was failing to do the tm_init.  Does anyone have an update 
to this, and has anyone been able to run successfully using recent versions of 
PBSPro?  I've also contacted our rep at Altair, but he hasn't responded yet.

Thanks, Justin.

Justin Wood
Systems Engineer
FNMOC | SAIC
7 Grace Hopper, Stop 1
Monterey, CA
justin.g.wood....@navy.mil
justin.g.w...@saic.com
office: 831.656.4671
mobile: 831.869.1576


Reply via email to