So you are saying the test worked, but you are still encountering an error when 
executing an MPI job? Or are you saying things now work?


> On Dec 28, 2014, at 5:58 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
> 
> Thank you Ralph. This produced the warning on memory limits similar to [1] 
> and setting ulimit -l unlimited worked.
> 
> [1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html 
> <http://lists.openfabrics.org/pipermail/general/2007-June/036941.html>
> 
> Saliya
> 
> On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>> wrote:
> Have the admin try running the ibv_ud_pingpong test - that will exercise the 
> portion of the system under discussion.
> 
> 
>> On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake <esal...@gmail.com 
>> <mailto:esal...@gmail.com>> wrote:
>> 
>> What I heard from the administrator is that, 
>> 
>> "The tests that work are the simple utilities ib_read_lat and ib_read_bw
>> that measures latency and bandwith between two nodes. They are part of
>> the "perftest" repo package."
>> 
>> On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com 
>> <mailto:esal...@gmail.com>> wrote:
>> This happens at MPI_Init. I've attached the full error message.
>> 
>> The sys admin mentioned Infiniband utility tests ran OK. I'll contact him 
>> for more details and let you know.
>> 
>> Thank you,
>> Saliya
>> 
>> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:
>> Where does the error occurs ?
>> MPI_Init ?
>> MPI_Finalize ?
>> In between ?
>> 
>> In the first case, the bug is likely a mishandled error case,
>> which means OpenMPI is unlikely the root cause of the crash.
>> 
>> Did you check infniband is up and running on your cluster ?
>> 
>> Cheers,
>> 
>> Gilles 
>> 
>> Saliya Ekanayake <esal...@gmail.com <mailto:esal...@gmail.com>>さんのメール:
>> It's been a while on this, but we are still having trouble getting OpenMPI 
>> to work with Infiniband on this cluster. We tried with latest 1.8.4 as well, 
>> but it's still the same.
>> 
>> To recap, we get the following error when MPI initializes (in the simple 
>> Hello world C example) with Infiniband. Everything works fine if we 
>> explicitly turn off openib with --mca btl ^openib
>> 
>> This is the error I got after debugging with gdb as you suggested.
>> 
>> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize: 
>> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) 
>> (&m->cm_recv_msg_queue))->obj_magic_id' failed.
>> 
>> Thank you,
>> Saliya
>> 
>> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com 
>> <mailto:esal...@gmail.com>> wrote:
>> Thank you Jeff, I'll try this and  let you know. 
>> 
>> Saliya 
>> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com 
>> <mailto:jsquy...@cisco.com>> wrote:
>> I am sorry for the delay; I've been caught up in SC deadlines.  :-(
>> 
>> I don't see anything blatantly wrong in this output.
>> 
>> Two things:
>> 
>> 1. Can you try a nightly v1.8.4 snapshot tarball?  This will check to see if 
>> whatever the bug is has been fixed for the upcoming release:
>> 
>>     http://www.open-mpi.org/nightly/v1.8/ 
>> <http://www.open-mpi.org/nightly/v1.8/>
>> 
>> 2. Build Open MPI with the --enable-debug option (note that this adds a 
>> slight-but-noticeable performance penalty).  When you run, it should dump a 
>> core file.  Load that core file in a debugger and see where it is failing 
>> (i.e., file and line in the OMPI source).
>> 
>> We don't usually have to resort to asking users to perform #2, but there's 
>> no additional information to give a clue as to what is happening.  :-(
>> 
>> 
>> 
>> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com 
>> <mailto:esal...@gmail.com>> wrote:
>> 
>> > Hi Jeff,
>> >
>> > You are probably busy, but just checking if you had a chance to look at 
>> > this.
>> >
>> > Thanks,
>> > Saliya
>> >
>> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com 
>> > <mailto:esal...@gmail.com>> wrote:
>> > Hi Jeff,
>> >
>> > I've attached a tar file with information.
>> >
>> > Thank you,
>> > Saliya
>> >
>> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) 
>> > <jsquy...@cisco.com <mailto:jsquy...@cisco.com>> wrote:
>> > Looks like it's failing in the openib BTL setup.
>> >
>> > Can you send the info listed here?
>> >
>> >     http://www.open-mpi.org/community/help/ 
>> > <http://www.open-mpi.org/community/help/>
>> >
>> >
>> >
>> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com 
>> > <mailto:esal...@gmail.com>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup. It 
>> > > builds fine, but when I try to run even the simplest hello.c program 
>> > > it'll cause a segfault. Any suggestions on how to correct this?
>> > >
>> > > The steps I did and error message are below.
>> > >
>> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached.
>> > > 2. cd to examples directory and mpicc hello_c.c
>> > > 3. mpirun -np 2 ./a.out
>> > > 4. Error text is attached.
>> > >
>> > > Please let me know if you need more info.
>> > >
>> > > Thank you,
>> > > Saliya
>> > >
>> > >
>> > > --
>> > > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com>
>> > > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383>
>> > > http://saliya.org <http://saliya.org/>
>> > > <ompi_info.txt><error.txt>_______________________________________________
>> > > users mailing list
>> > > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > > Link to this post: 
>> > > http://www.open-mpi.org/community/lists/users/2014/11/25668.php 
>> > > <http://www.open-mpi.org/community/lists/users/2014/11/25668.php>
>> >
>> >
>> > --
>> > Jeff Squyres
>> > jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>> > For corporate legal information go to: 
>> > http://www.cisco.com/web/about/doing_business/legal/cri/ 
>> > <http://www.cisco.com/web/about/doing_business/legal/cri/>
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > Link to this post: 
>> > http://www.open-mpi.org/community/lists/users/2014/11/25672.php 
>> > <http://www.open-mpi.org/community/lists/users/2014/11/25672.php>
>> >
>> >
>> >
>> > --
>> > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com>
>> > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383>
>> > http://saliya.org <http://saliya.org/>
>> >
>> >
>> >
>> > --
>> > Saliya Ekanayake esal...@gmail.com <mailto:esal...@gmail.com>
>> > Cell 812-391-4914 <tel:812-391-4914> Home 812-961-6383 <tel:812-961-6383>
>> > http://saliya.org <http://saliya.org/>
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > Link to this post: 
>> > http://www.open-mpi.org/community/lists/users/2014/11/25717.php 
>> > <http://www.open-mpi.org/community/lists/users/2014/11/25717.php>
>> 
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com <mailto:jsquy...@cisco.com>
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/ 
>> <http://www.cisco.com/web/about/doing_business/legal/cri/>
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/11/25723.php 
>> <http://www.open-mpi.org/community/lists/users/2014/11/25723.php>
>> 
>> 
>> 
>> -- 
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> Cell 812-391-4914 <tel:812-391-4914>
>> http://saliya.org <http://saliya.org/>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/12/26074.php 
>> <http://www.open-mpi.org/community/lists/users/2014/12/26074.php>
>> 
>> 
>> 
>> -- 
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> Cell 812-391-4914 <tel:812-391-4914>
>> http://saliya.org 
>> <http://saliya.org/>_______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/12/26078.php 
>> <http://www.open-mpi.org/community/lists/users/2014/12/26078.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users 
> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26079.php 
> <http://www.open-mpi.org/community/lists/users/2014/12/26079.php>
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org 
> <http://saliya.org/>_______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/12/26080.php

Reply via email to