Hi all: I'm having trouble getting torque/maui working with OpenMPI.
Currently, I am getting hard failures when an MPI_Send is called. When run without qsub (no torque/maui), the mpi job runs fine, so its something that qsub/torque/maui is doing (I think). Here's the error: libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'. -------------------------------------------------------------------------- [0,1,0]: OpenIB on host localhost was unable to find any HCAs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- Signal:8 info.si_errno:0(Success) si_code:1(FPE_INTDIV) Failing at addr:0x40cc2d [0] func:/usr/lib64/openmpi/libopal.so.0 [0x3ecfb21dc5] [1] func:/lib64/tls/libpthread.so.0 [0x3ed040c4f0] [2] func:repdig_mpi(sendSeeds+0x3d) [0x40cc2d] [3] func:repdig_mpi(main+0x3b6) [0x40c026] [4] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3ecfd1c3fb] [5] func:repdig_mpi [0x4030ea] *** End of error message *** I don't really know where to begin looking. I know in the stack trace the actual problem is occurring in #2 (sendSeeds), but that is a basic MPI_Send(), and works when not using torque. My system (installed from Rocks 4.3) does not have infiniband; I think I just figured out how to disable it; in any case, the same warning shows up when not running it through torque, and the job runs successfully. Thoughts/suggestions? Thanks! --Jim