If you do not have IB hardware, you might want to permanently disable
the IB support. You can do this by setting an MCA parameter or
simply removing the $prefix/lib/openmpi/mca_btl_openib.* files. This
will suppress the warning that you're seeing.
As for your problem with MPI_SEND, do you know that your program is
correct? I.e., it's a little odd that you're failing directly in
seedSends, not in an MPI function. Are you getting a core dump that
you can examine, or can you attach a debugger to see where exactly it
is failing?
On Oct 4, 2007, at 8:36 PM, Jim Kusznir wrote:
Hi all:
I'm having trouble getting torque/maui working with OpenMPI.
Currently, I am getting hard failures when an MPI_Send is called.
When
run without qsub (no torque/maui), the mpi job runs fine, so its
something that
qsub/torque/maui is doing (I think). Here's the error:
libibverbs: Fatal: couldn't open sysfs class 'infiniband_verbs'.
----------------------------------------------------------------------
----
[0,1,0]: OpenIB on host localhost was unable to find any HCAs.
Another transport will be used instead, although this may result in
lower performance.
----------------------------------------------------------------------
----
Signal:8 info.si_errno:0(Success) si_code:1(FPE_INTDIV)
Failing at addr:0x40cc2d
[0] func:/usr/lib64/openmpi/libopal.so.0 [0x3ecfb21dc5]
[1] func:/lib64/tls/libpthread.so.0 [0x3ed040c4f0]
[2] func:repdig_mpi(sendSeeds+0x3d) [0x40cc2d]
[3] func:repdig_mpi(main+0x3b6) [0x40c026]
[4] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x3ecfd1c3fb]
[5] func:repdig_mpi [0x4030ea]
*** End of error message ***
I don't really know where to begin looking. I know in the stack trace
the actual problem is occurring in #2 (sendSeeds), but that is a basic
MPI_Send(), and works when not using torque.
My system (installed from Rocks 4.3) does not have infiniband; I think
I just figured out how to disable it; in any case, the same warning
shows up when not running it through torque, and the job runs
successfully.
Thoughts/suggestions?
Thanks!
--Jim
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems