Either gmail or ompi users list is borked, i am resending this since it hasn't showed up in the list yet after 2 days..
Thanks, gurhan ---------- Forwarded message ---------- From: Gurhan Ozen <gurhan.o...@gmail.com> List-Post: users@lists.open-mpi.org Date: May 15, 2006 9:14 AM Subject: Re: [OMPI users] Open MPI and OpenIB To: Open MPI Users <us...@open-mpi.org> Jeff, George, Brian thanks for your inputs in this. I did "kind of" get openib working. Different revisions of kernel was running on both boxes, getting them running on the very same revisions of kernel and recompiling open-mpi with that rev. of kernel got me hello_world program running over openib stack. However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are not working. For each one of them, i get the same error: [hostname:11992] *** An error occurred in MPI_Isend [hostname:11992] *** on communicator MPI_COMM_WORLD [hostname:11992] *** MPI_ERR_INTERN: internal error [hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye) [hostname:11998] *** An error occurred in MPI_Barrier [hostname:11998] *** on communicator MPI_COMM_WORLD [hostname:11998] *** MPI_ERR_INTERN: internal error [hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby [hostname:01916] *** An error occurred in MPI_Send [hostname:01916] *** on communicator MPI_COMM_WORLD [hostname:01916] *** MPI_ERR_INTERN: internal error [hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye) This is not just happening over network, but also locally. I am inclined to think that i miss some compilation flags or whatever.. I have tried this with openmpi-1.1a4 version as well , but kept on getting the same errors. Questions of the day: 1- Does anyone know why I might be getting this errors? 2- I couldn't find any "free" debuggers for debugging open-mpi programs, does anyone know of any? Are there any tricks to use gdb , at least to debug locally running mpi programs? Thanks again, Gurhan On 5/12/06, Jeff Squyres (jsquyres) <jsquy...@cisco.com> wrote:
> -----Original Message----- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Gurhan Ozen > Sent: Thursday, May 11, 2006 4:11 PM > To: Open MPI Users > Subject: Re: [OMPI users] Open MPI and OpenIB > > At any rate though, --mca btl ib,self looks like the traffic goes over > ethernet device .. I couldn't find any documentation on the "self" > argument of mca, does it mean to explore alternatives if the desired > btl (in this case ib) doesn't work? Note that Open MPI still does use TCP for "setup" information; a bunch of data is passed around via mpirun and MPI_INIT for all the processes to find each other, etc. Similar control messages get passed around during MPI_FINALIZE as well. This is likely the TCP traffice that you are seeing. However, rest assured that the btl MCA parameter will unequivocally set the network that MPI traffic will use. I've updated the on-line FAQ with regards to the "self" BTL module. And finally, a man page is available for mpirun in the [not yet released] Open MPI 1.1 (see http://svn.open-mpi.org/svn/ompi/trunk/orte/tools/orterun/orterun.1). It should be pretty much the same for 1.0. One notable difference is I just recently added a -nolocal option (not yet on the trunk, but likely will be in the not-distant future) that does not exist in 1.0. -- Jeff Squyres Server Virtualization Business Unit Cisco Systems _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users