On May 15, 2006, at 9:14 AM, Gurhan Ozen wrote:

Jeff, George, Brian thanks for your inputs in this.

I did "kind of" get openib working. Different revisions of kernel was
running on both boxes, getting them running on the very same revisions
of kernel and recompiling open-mpi with that rev. of kernel got me
hello_world program running over openib stack.

However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
not working. For each one of them, i get the same error:

[hostname:11992] *** An error occurred in MPI_Isend
[hostname:11992] *** on communicator MPI_COMM_WORLD
[hostname:11992] *** MPI_ERR_INTERN: internal error
[hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)

[hostname:11998] *** An error occurred in MPI_Barrier
[hostname:11998] *** on communicator MPI_COMM_WORLD
[hostname:11998] *** MPI_ERR_INTERN: internal error
[hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby

[hostname:01916] *** An error occurred in MPI_Send
[hostname:01916] *** on communicator MPI_COMM_WORLD
[hostname:01916] *** MPI_ERR_INTERN: internal error
[hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)

This is not just happening over network, but also locally. I am
inclined to think that i miss some compilation flags or whatever.. I
have tried this with  openmpi-1.1a4 version as well , but kept on
getting the same errors.

Questions of the day:
1- Does anyone know why I might be getting this errors?

This generally means that there was no btl available to move data between nodes. So I think you still have some issues with your network setup (unfortunately, I'm not able to help here. George asked for some debugging information that would be most helpful to us -- you might want to try getting that data with your current setup).

2- I couldn't find any "free" debuggers for debugging open-mpi
programs, does anyone know of any? Are there any tricks to use gdb ,
at least to debug locally running mpi programs?

The simple, dirty trick is to setup X11 forwarding with ssh and run:

  mpirun -np X -d xterm -e gdb <myapp>

You'll get a bunch of xterms open and can debug that way. It's simple, it's cheap, but it definitely doesn't scale.

Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to