What I heard from the administrator is that,

"The tests that work are the simple utilities ib_read_lat and ib_read_bw
that measures latency and bandwith between two nodes. They are part of
the "perftest" repo package."
On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com> wrote:

> This happens at MPI_Init. I've attached the full error message.
>
> The sys admin mentioned Infiniband utility tests ran OK. I'll contact him
> for more details and let you know.
>
> Thank you,
> Saliya
>
> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Where does the error occurs ?
>> MPI_Init ?
>> MPI_Finalize ?
>> In between ?
>>
>> In the first case, the bug is likely a mishandled error case,
>> which means OpenMPI is unlikely the root cause of the crash.
>>
>> Did you check infniband is up and running on your cluster ?
>>
>> Cheers,
>>
>> Gilles
>>
>> Saliya Ekanayake <esal...@gmail.com>さんのメール:
>> It's been a while on this, but we are still having trouble getting
>> OpenMPI to work with Infiniband on this cluster. We tried with latest 1.8.4
>> as well, but it's still the same.
>>
>> To recap, we get the following error when MPI initializes (in the simple
>> Hello world C example) with Infiniband. Everything works fine if we
>> explicitly turn off openib with --mca btl ^openib
>>
>> This is the error I got after debugging with gdb as you suggested.
>>
>> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize:
>> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *)
>> (&m->cm_recv_msg_queue))->obj_magic_id' failed.
>>
>> Thank you,
>> Saliya
>>
>> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com>
>> wrote:
>>
>>> Thank you Jeff, I'll try this and  let you know.
>>>
>>> Saliya
>>> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>
>>> wrote:
>>>
>>>> I am sorry for the delay; I've been caught up in SC deadlines.  :-(
>>>>
>>>> I don't see anything blatantly wrong in this output.
>>>>
>>>> Two things:
>>>>
>>>> 1. Can you try a nightly v1.8.4 snapshot tarball?  This will check to
>>>> see if whatever the bug is has been fixed for the upcoming release:
>>>>
>>>>     http://www.open-mpi.org/nightly/v1.8/
>>>>
>>>> 2. Build Open MPI with the --enable-debug option (note that this adds a
>>>> slight-but-noticeable performance penalty).  When you run, it should dump a
>>>> core file.  Load that core file in a debugger and see where it is failing
>>>> (i.e., file and line in the OMPI source).
>>>>
>>>> We don't usually have to resort to asking users to perform #2, but
>>>> there's no additional information to give a clue as to what is happening.
>>>> :-(
>>>>
>>>>
>>>>
>>>> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Jeff,
>>>> >
>>>> > You are probably busy, but just checking if you had a chance to look
>>>> at this.
>>>> >
>>>> > Thanks,
>>>> > Saliya
>>>> >
>>>> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com>
>>>> wrote:
>>>> > Hi Jeff,
>>>> >
>>>> > I've attached a tar file with information.
>>>> >
>>>> > Thank you,
>>>> > Saliya
>>>> >
>>>> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) <
>>>> jsquy...@cisco.com> wrote:
>>>> > Looks like it's failing in the openib BTL setup.
>>>> >
>>>> > Can you send the info listed here?
>>>> >
>>>> >     http://www.open-mpi.org/community/help/
>>>> >
>>>> >
>>>> >
>>>> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com>
>>>> wrote:
>>>> >
>>>> > > Hi,
>>>> > >
>>>> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently setup.
>>>> It builds fine, but when I try to run even the simplest hello.c program
>>>> it'll cause a segfault. Any suggestions on how to correct this?
>>>> > >
>>>> > > The steps I did and error message are below.
>>>> > >
>>>> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached.
>>>> > > 2. cd to examples directory and mpicc hello_c.c
>>>> > > 3. mpirun -np 2 ./a.out
>>>> > > 4. Error text is attached.
>>>> > >
>>>> > > Please let me know if you need more info.
>>>> > >
>>>> > > Thank you,
>>>> > > Saliya
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Saliya Ekanayake esal...@gmail.com
>>>> > > Cell 812-391-4914 Home 812-961-6383
>>>> > > http://saliya.org
>>>> > >
>>>> <ompi_info.txt><error.txt>_______________________________________________
>>>> > > users mailing list
>>>> > > us...@open-mpi.org
>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> > > Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25668.php
>>>> >
>>>> >
>>>> > --
>>>> > Jeff Squyres
>>>> > jsquy...@cisco.com
>>>> > For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>> >
>>>> > _______________________________________________
>>>> > users mailing list
>>>> > us...@open-mpi.org
>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> > Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25672.php
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Saliya Ekanayake esal...@gmail.com
>>>> > Cell 812-391-4914 Home 812-961-6383
>>>> > http://saliya.org
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Saliya Ekanayake esal...@gmail.com
>>>> > Cell 812-391-4914 Home 812-961-6383
>>>> > http://saliya.org
>>>> > _______________________________________________
>>>> > users mailing list
>>>> > us...@open-mpi.org
>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> > Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25717.php
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquy...@cisco.com
>>>> For corporate legal information go to:
>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/11/25723.php
>>>>
>>>
>>
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> Cell 812-391-4914
>> http://saliya.org
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/26074.php
>>
>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org
>

Reply via email to