Hi all:

I'm having trouble getting torque/maui working with OpenMPI.

Currently, I am getting hard failures when an MPI_Send is called.  When
run without qsub (no torque/maui), the mpi job runs fine, so its something that
qsub/torque/maui is doing (I think).  Here's the error:

libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'.
--------------------------------------------------------------------------
[0,1,0]: OpenIB on host localhost was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Signal:8 info.si_errno:0(Success) si_code:1(FPE_INTDIV)
Failing at addr:0x40cc2d
[0] func:/usr/lib64/openmpi/libopal.so.0 [0x3ecfb21dc5]
[1] func:/lib64/tls/libpthread.so.0 [0x3ed040c4f0]
[2] func:repdig_mpi(sendSeeds+0x3d) [0x40cc2d]
[3] func:repdig_mpi(main+0x3b6) [0x40c026]
[4] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3ecfd1c3fb]
[5] func:repdig_mpi [0x4030ea]
*** End of error message ***

I don't really know where to begin looking.  I know in the stack trace
the actual problem is occurring in #2 (sendSeeds), but that is a basic
MPI_Send(), and works when not using torque.

My system (installed from Rocks 4.3) does not have infiniband; I think
I just figured out how to disable it; in any case, the same warning
shows up when not running it through torque, and the job runs
successfully.

Thoughts/suggestions?

Thanks!
--Jim

Reply via email to