Either gmail or  ompi users list is borked, i am resending this
since it hasn't showed up in the list yet after 2 days..

Thanks,
gurhan

---------- Forwarded message ----------
From: Gurhan Ozen <gurhan.o...@gmail.com>
List-Post: users@lists.open-mpi.org
Date: May 15, 2006 9:14 AM
Subject: Re: [OMPI users] Open MPI and OpenIB
To: Open MPI Users <us...@open-mpi.org>


Jeff, George, Brian thanks for your inputs in this.

I did "kind of" get openib working. Different revisions of kernel was
running on both boxes, getting them running on the very same revisions
of kernel and recompiling open-mpi with that rev. of kernel got me
hello_world program running over openib stack.

However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
not working. For each one of them, i get the same error:

[hostname:11992] *** An error occurred in MPI_Isend
[hostname:11992] *** on communicator MPI_COMM_WORLD
[hostname:11992] *** MPI_ERR_INTERN: internal error
[hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)

[hostname:11998] *** An error occurred in MPI_Barrier
[hostname:11998] *** on communicator MPI_COMM_WORLD
[hostname:11998] *** MPI_ERR_INTERN: internal error
[hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby

[hostname:01916] *** An error occurred in MPI_Send
[hostname:01916] *** on communicator MPI_COMM_WORLD
[hostname:01916] *** MPI_ERR_INTERN: internal error
[hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)

This is not just happening over network, but also locally. I am
inclined to think that i miss some compilation flags or whatever.. I
have tried this with  openmpi-1.1a4 version as well , but kept on
getting the same errors.

Questions of the day:
1- Does anyone know why I might be getting this errors?
2- I couldn't find any "free" debuggers for debugging open-mpi
programs, does anyone know of any? Are there any tricks to use gdb ,
at least to debug locally running mpi programs?

Thanks again,
Gurhan

On 5/12/06, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> -----Original Message-----
> From: users-boun...@open-mpi.org
> [mailto:users-boun...@open-mpi.org] On Behalf Of Gurhan Ozen
> Sent: Thursday, May 11, 2006 4:11 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Open MPI and OpenIB
>
> At any rate though, --mca btl ib,self looks like the traffic goes over
> ethernet device .. I couldn't find any documentation on the "self"
> argument of mca, does it mean to explore alternatives if the desired
> btl (in this case ib) doesn't work?

Note that Open MPI still does use TCP for "setup" information; a bunch
of data is passed around via mpirun and MPI_INIT for all the processes
to find each other, etc.  Similar control messages get passed around
during MPI_FINALIZE as well.

This is likely the TCP traffice that you are seeing.  However, rest
assured that the btl MCA parameter will unequivocally set the network
that MPI traffic will use.

I've updated the on-line FAQ with regards to the "self" BTL module.

And finally, a man page is available for mpirun in the [not yet
released] Open MPI 1.1 (see
http://svn.open-mpi.org/svn/ompi/trunk/orte/tools/orterun/orterun.1).
It should be pretty much the same for 1.0.  One notable difference is I
just recently added a -nolocal option (not yet on the trunk, but likely
will be in the not-distant future) that does not exist in 1.0.

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to