So you are saying the test worked, but you are still encountering an error when executing an MPI job? Or are you saying things now work?
> On Dec 28, 2014, at 5:58 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > > Thank you Ralph. This produced the warning on memory limits similar to [1] > and setting ulimit -l unlimited worked. > > [1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html > <http://lists.openfabrics.org/pipermail/general/2007-June/036941.html> > > Saliya > > On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain <r...@open-mpi.org > <mailto:r...@open-mpi.org>> wrote: > Have the admin try running the ibv_ud_pingpong test - that will exercise the > portion of the system under discussion. > > >> On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake <esal...@gmail.com >> <mailto:esal...@gmail.com>> wrote: >> >> What I heard from the administrator is that, >> >> "The tests that work are the simple utilities ib_read_lat and ib_read_bw >> that measures latency and bandwith between two nodes. They are part of >> the "perftest" repo package." >> >> On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com >> <mailto:esal...@gmail.com>> wrote: >> This happens at MPI_Init. I've attached the full error message. >> >> The sys admin mentioned Infiniband utility tests ran OK. I'll contact him >> for more details and let you know. >> >> Thank you, >> Saliya >> >> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote: >> Where does the error occurs ? >> MPI_Init ? >> MPI_Finalize ? >> In between ? >> >> In the first case, the bug is likely a mishandled error case, >> which means OpenMPI is unlikely the root cause of the crash. >> >> Did you check infniband is up and running on your cluster ? >> >> Cheers, >> >> Gilles >> >> Saliya Ekanayake <esal...@gmail.com <mailto:esal...@gmail.com>>さんのメール: >> It's been a while on this, but we are still having trouble getting OpenMPI >> to work with Infiniband on this cluster. We tried with latest 1.8.4 as well, >> but it's still the same. >> >> To recap, we get the following error when MPI initializes (in the simple >> Hello world C example) with Infiniband. Everything works fine if we >> explicitly turn off openib with --mca btl ^openib >> >> This is the error I got after debugging with gdb as you suggested. >> >> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: >> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) >> (&m->cm_recv_msg_queue))->obj_magic_id' failed. >> >> Thank you, >> Saliya >> >> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com >> <mailto:esal...@gmail.com>> wrote: >> Thank you Jeff, I'll try this and let you know. >> >> Saliya >> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com >> <mailto:jsquy...@cisco.com>> wrote: >> I am sorry for the delay; I've been caught up in SC deadlines. :-( >> >> I don't see anything blatantly wrong in this output. >> >> Two things: >> >> 1. Can you try a nightly v1.8.4 snapshot tarball? This will check to see if >> whatever the bug is has been fixed for the upcoming release: >> >> http://www.open-mpi.org/nightly/v1.8/ >> <http://www.open-mpi.org/nightly/v1.8/> >> >> 2. Build Open MPI with the --enable-debug option (note that this adds a >> slight-but-noticeable performance penalty). When you run, it should dump a >> core file. Load that core file in a debugger and see where it is failing >> (i.e., file and line in the OMPI source). >> >> We don't usually have to resort to asking users to perform #2, but there's >> no additional information to give a clue as to what is happening. :-( >> >> >> >> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com >> <mailto:esal...@gmail.com>> wrote: >> >> > Hi Jeff, >> > >> > You are probably busy, but just checking if you had a chance to look at >> > this. >> > >> > Thanks, >> > Saliya >> > >> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com >> > <mailto:esal...@gmail.com>> wrote: >> > Hi Jeff, >> > >> > I've attached a tar file with information. >> > >> > Thank you, >> > Saliya >> > >> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) >> > <jsquy...@cisco.com <mailto:jsquy...@cisco.com>> wrote: >> > Looks like it's failing in the openib BTL setup. >> > >> > Can you send the info listed here? >> > >> > http://www.open-mpi.org/community/help/ >> > <http://www.open-mpi.org/community/help/> >> > >> > >> > >> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com >> > <mailto:esal...@gmail.com>> wrote: >> > >> > > Hi, >> > > >> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It >> > > builds fine, but when I try to run even the simplest hello.c program >> > > it'll cause a segfault. Any suggestions on how to correct this? >> > > >> > > The steps I did and error message are below. >> > > >> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached. >> > > 2. cd to examples directory and mpicc hello_c.c >> > > 3. mpirun -np 2 ./a.out >> > > 4. Error text is attached. >> > > >> > > Please let me know if you need more info. >> > > >> > > Thank you, >> > > Saliya >> > > >> > > >> > > -- >> > > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com> >> > > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383> >> > > http://saliya.org <http://saliya.org/> >> > > <ompi_info.txt><error.txt>_______________________________________________ >> > > users mailing list >> > > us...@open-mpi.org <mailto:us...@open-mpi.org> >> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > > Link to this post: >> > > http://www.open-mpi.org/community/lists/users/2014/11/25668.php >> > > <http://www.open-mpi.org/community/lists/users/2014/11/25668.php> >> > >> > >> > -- >> > Jeff Squyres >> > jsquy...@cisco.com <mailto:jsquy...@cisco.com> >> > For corporate legal information go to: >> > http://www.cisco.com/web/about/doing_business/legal/cri/ >> > <http://www.cisco.com/web/about/doing_business/legal/cri/> >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org <mailto:us...@open-mpi.org> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2014/11/25672.php >> > <http://www.open-mpi.org/community/lists/users/2014/11/25672.php> >> > >> > >> > >> > -- >> > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com> >> > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383> >> > http://saliya.org <http://saliya.org/> >> > >> > >> > >> > -- >> > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com> >> > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383> >> > http://saliya.org <http://saliya.org/> >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org <mailto:us...@open-mpi.org> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2014/11/25717.php >> > <http://www.open-mpi.org/community/lists/users/2014/11/25717.php> >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com <mailto:jsquy...@cisco.com> >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> <http://www.cisco.com/web/about/doing_business/legal/cri/> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/11/25723.php >> <http://www.open-mpi.org/community/lists/users/2014/11/25723.php> >> >> >> >> -- >> Saliya Ekanayake >> Ph.D. Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> Cell 812-391-4914 <tel:812-391-4914> >> http://saliya.org <http://saliya.org/> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/26074.php >> <http://www.open-mpi.org/community/lists/users/2014/12/26074.php> >> >> >> >> -- >> Saliya Ekanayake >> Ph.D. Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> Cell 812-391-4914 <tel:812-391-4914> >> http://saliya.org >> <http://saliya.org/>_______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/26078.php >> <http://www.open-mpi.org/community/lists/users/2014/12/26078.php> > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26079.php > <http://www.open-mpi.org/community/lists/users/2014/12/26079.php> > > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > Cell 812-391-4914 > http://saliya.org > <http://saliya.org/>_______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26080.php