I meant it works now, sorry for the confusion. Running the test revealed a warning on memory registration, which we fixed by setting unlimited in ulimit -l. Then running OMPI sample worked too.
Thank you, saliya On Sun, Dec 28, 2014 at 11:18 PM, Ralph Castain <r...@open-mpi.org> wrote: > So you are saying the test worked, but you are still encountering an error > when executing an MPI job? Or are you saying things now work? > > > On Dec 28, 2014, at 5:58 PM, Saliya Ekanayake <esal...@gmail.com> wrote: > > Thank you Ralph. This produced the warning on memory limits similar to [1] > and setting ulimit -l unlimited worked. > > [1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html > > Saliya > > On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Have the admin try running the ibv_ud_pingpong test - that will exercise >> the portion of the system under discussion. >> >> >> On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake <esal...@gmail.com> wrote: >> >> What I heard from the administrator is that, >> >> "The tests that work are the simple utilities ib_read_lat and ib_read_bw >> that measures latency and bandwith between two nodes. They are part of >> the "perftest" repo package." >> On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com> wrote: >> >>> This happens at MPI_Init. I've attached the full error message. >>> >>> The sys admin mentioned Infiniband utility tests ran OK. I'll contact >>> him for more details and let you know. >>> >>> Thank you, >>> Saliya >>> >>> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet < >>> gilles.gouaillar...@gmail.com> wrote: >>> >>>> Where does the error occurs ? >>>> MPI_Init ? >>>> MPI_Finalize ? >>>> In between ? >>>> >>>> In the first case, the bug is likely a mishandled error case, >>>> which means OpenMPI is unlikely the root cause of the crash. >>>> >>>> Did you check infniband is up and running on your cluster ? >>>> >>>> Cheers, >>>> >>>> Gilles >>>> >>>> Saliya Ekanayake <esal...@gmail.com>さんのメール: >>>> It's been a while on this, but we are still having trouble getting >>>> OpenMPI to work with Infiniband on this cluster. We tried with latest 1.8.4 >>>> as well, but it's still the same. >>>> >>>> To recap, we get the following error when MPI initializes (in the >>>> simple Hello world C example) with Infiniband. Everything works fine if we >>>> explicitly turn off openib with --mca btl ^openib >>>> >>>> This is the error I got after debugging with gdb as you suggested. >>>> >>>> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: >>>> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) >>>> (&m->cm_recv_msg_queue))->obj_magic_id' failed. >>>> >>>> Thank you, >>>> Saliya >>>> >>>> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com> >>>> wrote: >>>> >>>>> Thank you Jeff, I'll try this and let you know. >>>>> Saliya >>>>> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> >>>>> wrote: >>>>> >>>>>> I am sorry for the delay; I've been caught up in SC deadlines. :-( >>>>>> >>>>>> I don't see anything blatantly wrong in this output. >>>>>> >>>>>> Two things: >>>>>> >>>>>> 1. Can you try a nightly v1.8.4 snapshot tarball? This will check to >>>>>> see if whatever the bug is has been fixed for the upcoming release: >>>>>> >>>>>> http://www.open-mpi.org/nightly/v1.8/ >>>>>> >>>>>> 2. Build Open MPI with the --enable-debug option (note that this adds >>>>>> a slight-but-noticeable performance penalty). When you run, it should >>>>>> dump >>>>>> a core file. Load that core file in a debugger and see where it is >>>>>> failing >>>>>> (i.e., file and line in the OMPI source). >>>>>> >>>>>> We don't usually have to resort to asking users to perform #2, but >>>>>> there's no additional information to give a clue as to what is happening. >>>>>> :-( >>>>>> >>>>>> >>>>>> >>>>>> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com> >>>>>> wrote: >>>>>> >>>>>> > Hi Jeff, >>>>>> > >>>>>> > You are probably busy, but just checking if you had a chance to >>>>>> look at this. >>>>>> > >>>>>> > Thanks, >>>>>> > Saliya >>>>>> > >>>>>> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com> >>>>>> wrote: >>>>>> > Hi Jeff, >>>>>> > >>>>>> > I've attached a tar file with information. >>>>>> > >>>>>> > Thank you, >>>>>> > Saliya >>>>>> > >>>>>> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) < >>>>>> jsquy...@cisco.com> wrote: >>>>>> > Looks like it's failing in the openib BTL setup. >>>>>> > >>>>>> > Can you send the info listed here? >>>>>> > >>>>>> > http://www.open-mpi.org/community/help/ >>>>>> > >>>>>> > >>>>>> > >>>>>> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > > Hi, >>>>>> > > >>>>>> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently >>>>>> setup. It builds fine, but when I try to run even the simplest hello.c >>>>>> program it'll cause a segfault. Any suggestions on how to correct this? >>>>>> > > >>>>>> > > The steps I did and error message are below. >>>>>> > > >>>>>> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached. >>>>>> > > 2. cd to examples directory and mpicc hello_c.c >>>>>> > > 3. mpirun -np 2 ./a.out >>>>>> > > 4. Error text is attached. >>>>>> > > >>>>>> > > Please let me know if you need more info. >>>>>> > > >>>>>> > > Thank you, >>>>>> > > Saliya >>>>>> > > >>>>>> > > >>>>>> > > -- >>>>>> > > Saliya Ekanayake esal...@gmail.com >>>>>> > > Cell 812-391-4914 Home 812-961-6383 >>>>>> > > http://saliya.org >>>>>> > > >>>>>> <ompi_info.txt><error.txt>_______________________________________________ >>>>>> > > users mailing list >>>>>> > > us...@open-mpi.org >>>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> > > Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25668.php >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Jeff Squyres >>>>>> > jsquy...@cisco.com >>>>>> > For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> > >>>>>> > _______________________________________________ >>>>>> > users mailing list >>>>>> > us...@open-mpi.org >>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> > Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25672.php >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Saliya Ekanayake esal...@gmail.com >>>>>> > Cell 812-391-4914 Home 812-961-6383 >>>>>> > http://saliya.org >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Saliya Ekanayake esal...@gmail.com >>>>>> > Cell 812-391-4914 Home 812-961-6383 >>>>>> > http://saliya.org >>>>>> > _______________________________________________ >>>>>> > users mailing list >>>>>> > us...@open-mpi.org >>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> > Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25717.php >>>>>> >>>>>> >>>>>> -- >>>>>> Jeff Squyres >>>>>> jsquy...@cisco.com >>>>>> For corporate legal information go to: >>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25723.php >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Saliya Ekanayake >>>> Ph.D. Candidate | Research Assistant >>>> School of Informatics and Computing | Digital Science Center >>>> Indiana University, Bloomington >>>> Cell 812-391-4914 >>>> http://saliya.org >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2014/12/26074.php >>>> >>> >>> >>> >>> -- >>> Saliya Ekanayake >>> Ph.D. Candidate | Research Assistant >>> School of Informatics and Computing | Digital Science Center >>> Indiana University, Bloomington >>> Cell 812-391-4914 >>> http://saliya.org >>> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/26078.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/26079.php >> > > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > Cell 812-391-4914 > http://saliya.org > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26080.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26081.php > -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington Cell 812-391-4914 http://saliya.org