Where does the error occurs ? MPI_Init ? MPI_Finalize ? In between ? In the first case, the bug is likely a mishandled error case, which means OpenMPI is unlikely the root cause of the crash.
Did you check infniband is up and running on your cluster ? Cheers, Gilles Saliya Ekanayake <esal...@gmail.com>さんのメール: >It's been a while on this, but we are still having trouble getting OpenMPI to >work with Infiniband on this cluster. We tried with latest 1.8.4 as well, but >it's still the same. > > >To recap, we get the following error when MPI initializes (in the simple Hello >world C example) with Infiniband. Everything works fine if we explicitly turn >off openib with --mca btl ^openib > > >This is the error I got after debugging with gdb as you suggested. > > >hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: >Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) >(&m->cm_recv_msg_queue))->obj_magic_id' failed. > > >Thank you, > >Saliya > > >On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com> wrote: > >Thank you Jeff, I'll try this and let you know. > >Saliya > >On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> wrote: > >I am sorry for the delay; I've been caught up in SC deadlines. :-( > >I don't see anything blatantly wrong in this output. > >Two things: > >1. Can you try a nightly v1.8.4 snapshot tarball? This will check to see if >whatever the bug is has been fixed for the upcoming release: > > http://www.open-mpi.org/nightly/v1.8/ > >2. Build Open MPI with the --enable-debug option (note that this adds a >slight-but-noticeable performance penalty). When you run, it should dump a >core file. Load that core file in a debugger and see where it is failing >(i.e., file and line in the OMPI source). > >We don't usually have to resort to asking users to perform #2, but there's no >additional information to give a clue as to what is happening. :-( > > > >On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com> wrote: > >> Hi Jeff, >> >> You are probably busy, but just checking if you had a chance to look at this. >> >> Thanks, >> Saliya >> >> On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com> wrote: >> Hi Jeff, >> >> I've attached a tar file with information. >> >> Thank you, >> Saliya >> >> On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >> wrote: >> Looks like it's failing in the openib BTL setup. >> >> Can you send the info listed here? >> >> http://www.open-mpi.org/community/help/ >> >> >> >> On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com> wrote: >> >> > Hi, >> > >> > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It >> > builds fine, but when I try to run even the simplest hello.c program it'll >> > cause a segfault. Any suggestions on how to correct this? >> > >> > The steps I did and error message are below. >> > >> > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached. >> > 2. cd to examples directory and mpicc hello_c.c >> > 3. mpirun -np 2 ./a.out >> > 4. Error text is attached. >> > >> > Please let me know if you need more info. >> > >> > Thank you, >> > Saliya >> > >> > >> > -- >> > Saliya Ekanayake esal...@gmail.com >> > Cell 812-391-4914 Home 812-961-6383 >> > http://saliya.org >> > <ompi_info.txt><error.txt>_______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2014/11/25668.php >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25672.php >> >> >> >> -- >> Saliya Ekanayake esal...@gmail.com >> Cell 812-391-4914 Home 812-961-6383 >> http://saliya.org >> >> >> >> -- >> Saliya Ekanayake esal...@gmail.com >> Cell 812-391-4914 Home 812-961-6383 >> http://saliya.org >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25717.php > > >-- >Jeff Squyres >jsquy...@cisco.com >For corporate legal information go to: >http://www.cisco.com/web/about/doing_business/legal/cri/ > >_______________________________________________ >users mailing list >us...@open-mpi.org >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >Link to this post: >http://www.open-mpi.org/community/lists/users/2014/11/25723.php > > > > >-- > >Saliya Ekanayake > >Ph.D. Candidate | Research Assistant > >School of Informatics and Computing | Digital Science Center > >Indiana University, Bloomington >Cell 812-391-4914 >http://saliya.org >