On May 15, 2006, at 9:14 AM, Gurhan Ozen wrote:
Jeff, George, Brian thanks for your inputs in this.
I did "kind of" get openib working. Different revisions of kernel was
running on both boxes, getting them running on the very same revisions
of kernel and recompiling open-mpi with that rev. of kernel got me
hello_world program running over openib stack.
However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
not working. For each one of them, i get the same error:
[hostname:11992] *** An error occurred in MPI_Isend
[hostname:11992] *** on communicator MPI_COMM_WORLD
[hostname:11992] *** MPI_ERR_INTERN: internal error
[hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)
[hostname:11998] *** An error occurred in MPI_Barrier
[hostname:11998] *** on communicator MPI_COMM_WORLD
[hostname:11998] *** MPI_ERR_INTERN: internal error
[hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby
[hostname:01916] *** An error occurred in MPI_Send
[hostname:01916] *** on communicator MPI_COMM_WORLD
[hostname:01916] *** MPI_ERR_INTERN: internal error
[hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)
This is not just happening over network, but also locally. I am
inclined to think that i miss some compilation flags or whatever.. I
have tried this with openmpi-1.1a4 version as well , but kept on
getting the same errors.
Questions of the day:
1- Does anyone know why I might be getting this errors?
This generally means that there was no btl available to move data
between nodes. So I think you still have some issues with your
network setup (unfortunately, I'm not able to help here. George asked
for some debugging information that would be most helpful to us --
you might want to try getting that data with your current setup).
2- I couldn't find any "free" debuggers for debugging open-mpi
programs, does anyone know of any? Are there any tricks to use gdb ,
at least to debug locally running mpi programs?
The simple, dirty trick is to setup X11 forwarding with ssh and run:
mpirun -np X -d xterm -e gdb <myapp>
You'll get a bunch of xterms open and can debug that way. It's
simple, it's cheap, but it definitely doesn't scale.
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/