Might also be worth checking to ensure that UD is enabled on your IB installation as we depend upon it for wireup of IB connections.
> On Dec 28, 2014, at 12:18 AM, Gilles Gouaillardet > <gilles.gouaillar...@gmail.com> wrote: > > Where does the error occurs ? > MPI_Init ? > MPI_Finalize ? > In between ? > > In the first case, the bug is likely a mishandled error case, > which means OpenMPI is unlikely the root cause of the crash. > > Did you check infniband is up and running on your cluster ? > > Cheers, > > Gilles > > Saliya Ekanayake <esal...@gmail.com>さんのメール: > It's been a while on this, but we are still having trouble getting OpenMPI to > work with Infiniband on this cluster. We tried with latest 1.8.4 as well, but > it's still the same. > > To recap, we get the following error when MPI initializes (in the simple > Hello world C example) with Infiniband. Everything works fine if we > explicitly turn off openib with --mca btl ^openib > > This is the error I got after debugging with gdb as you suggested. > > hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: > Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) > (&m->cm_recv_msg_queue))->obj_magic_id' failed. > > Thank you, > Saliya > > On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com > <mailto:esal...@gmail.com>> wrote: > Thank you Jeff, I'll try this and let you know. > > Saliya > > On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com > <mailto:jsquy...@cisco.com>> wrote: > I am sorry for the delay; I've been caught up in SC deadlines. :-( > > I don't see anything blatantly wrong in this output. > > Two things: > > 1. Can you try a nightly v1.8.4 snapshot tarball? This will check to see if > whatever the bug is has been fixed for the upcoming release: > > http://www.open-mpi.org/nightly/v1.8/ > <http://www.open-mpi.org/nightly/v1.8/> > > 2. Build Open MPI with the --enable-debug option (note that this adds a > slight-but-noticeable performance penalty). When you run, it should dump a > core file. Load that core file in a debugger and see where it is failing > (i.e., file and line in the OMPI source). > > We don't usually have to resort to asking users to perform #2, but there's no > additional information to give a clue as to what is happening. :-( > > > > On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com > <mailto:esal...@gmail.com>> wrote: > > > Hi Jeff, > > > > You are probably busy, but just checking if you had a chance to look at > > this. > > > > Thanks, > > Saliya > > > > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com > > <mailto:esal...@gmail.com>> wrote: > > Hi Jeff, > > > > I've attached a tar file with information. > > > > Thank you, > > Saliya > > > > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com > > <mailto:jsquy...@cisco.com>> wrote: > > Looks like it's failing in the openib BTL setup. > > > > Can you send the info listed here? > > > > http://www.open-mpi.org/community/help/ > > <http://www.open-mpi.org/community/help/> > > > > > > > > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com > > <mailto:esal...@gmail.com>> wrote: > > > > > Hi, > > > > > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It > > > builds fine, but when I try to run even the simplest hello.c program > > > it'll cause a segfault. Any suggestions on how to correct this? > > > > > > The steps I did and error message are below. > > > > > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached. > > > 2. cd to examples directory and mpicc hello_c.c > > > 3. mpirun -np 2 ./a.out > > > 4. Error text is attached. > > > > > > Please let me know if you need more info. > > > > > > Thank you, > > > Saliya > > > > > > > > > -- > > > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com> > > > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383> > > > http://saliya.org <http://saliya.org/> > > > <ompi_info.txt><error.txt>_______________________________________________ > > > users mailing list > > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > > Link to this post: > > > http://www.open-mpi.org/community/lists/users/2014/11/25668.php > > > <http://www.open-mpi.org/community/lists/users/2014/11/25668.php> > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com <mailto:jsquy...@cisco.com> > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > <http://www.cisco.com/web/about/doing_business/legal/cri/> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/11/25672.php > > <http://www.open-mpi.org/community/lists/users/2014/11/25672.php> > > > > > > > > -- > > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com> > > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383> > > http://saliya.org <http://saliya.org/> > > > > > > > > -- > > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com> > > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383> > > http://saliya.org <http://saliya.org/> > > _______________________________________________ > > users mailing list > > us...@open-mpi.org <mailto:us...@open-mpi.org> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/11/25717.php > > <http://www.open-mpi.org/community/lists/users/2014/11/25717.php> > > > -- > Jeff Squyres > jsquy...@cisco.com <mailto:jsquy...@cisco.com> > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > <http://www.cisco.com/web/about/doing_business/legal/cri/> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25723.php > <http://www.open-mpi.org/community/lists/users/2014/11/25723.php> > > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > Cell 812-391-4914 > http://saliya.org > <http://saliya.org/>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26074.php