I meant it works now, sorry for the confusion.

Running the test revealed a warning on memory registration, which we fixed
by setting unlimited in ulimit -l. Then running OMPI sample worked too.

Thank you,
saliya



On Sun, Dec 28, 2014 at 11:18 PM, Ralph Castain <r...@open-mpi.org> wrote:

> So you are saying the test worked, but you are still encountering an error
> when executing an MPI job? Or are you saying things now work?
>
>
> On Dec 28, 2014, at 5:58 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
>
> Thank you Ralph. This produced the warning on memory limits similar to [1]
> and setting ulimit -l unlimited worked.
>
> [1] http://lists.openfabrics.org/pipermail/general/2007-June/036941.html
>
> Saliya
>
> On Sun, Dec 28, 2014 at 5:57 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Have the admin try running the ibv_ud_pingpong test - that will exercise
>> the portion of the system under discussion.
>>
>>
>> On Dec 28, 2014, at 2:31 PM, Saliya Ekanayake <esal...@gmail.com> wrote:
>>
>> What I heard from the administrator is that,
>>
>> "The tests that work are the simple utilities ib_read_lat and ib_read_bw
>> that measures latency and bandwith between two nodes. They are part of
>> the "perftest" repo package."
>> On Dec 28, 2014 10:20 AM, "Saliya Ekanayake" <esal...@gmail.com> wrote:
>>
>>> This happens at MPI_Init. I've attached the full error message.
>>>
>>> The sys admin mentioned Infiniband utility tests ran OK. I'll contact
>>> him for more details and let you know.
>>>
>>> Thank you,
>>> Saliya
>>>
>>> On Sun, Dec 28, 2014 at 3:18 AM, Gilles Gouaillardet <
>>> gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Where does the error occurs ?
>>>> MPI_Init ?
>>>> MPI_Finalize ?
>>>> In between ?
>>>>
>>>> In the first case, the bug is likely a mishandled error case,
>>>> which means OpenMPI is unlikely the root cause of the crash.
>>>>
>>>> Did you check infniband is up and running on your cluster ?
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> Saliya Ekanayake <esal...@gmail.com>さんのメール:
>>>> It's been a while on this, but we are still having trouble getting
>>>> OpenMPI to work with Infiniband on this cluster. We tried with latest 1.8.4
>>>> as well, but it's still the same.
>>>>
>>>> To recap, we get the following error when MPI initializes (in the
>>>> simple Hello world C example) with Infiniband. Everything works fine if we
>>>> explicitly turn off openib with --mca btl ^openib
>>>>
>>>> This is the error I got after debugging with gdb as you suggested.
>>>>
>>>> hello_c: connect/btl_openib_connect_udcm.c:736: udcm_module_finalize:
>>>> Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *)
>>>> (&m->cm_recv_msg_queue))->obj_magic_id' failed.
>>>>
>>>> Thank you,
>>>> Saliya
>>>>
>>>> On Mon, Nov 10, 2014 at 10:01 AM, Saliya Ekanayake <esal...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thank you Jeff, I'll try this and  let you know.
>>>>> Saliya
>>>>> On Nov 10, 2014 6:42 AM, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com>
>>>>> wrote:
>>>>>
>>>>>> I am sorry for the delay; I've been caught up in SC deadlines.  :-(
>>>>>>
>>>>>> I don't see anything blatantly wrong in this output.
>>>>>>
>>>>>> Two things:
>>>>>>
>>>>>> 1. Can you try a nightly v1.8.4 snapshot tarball?  This will check to
>>>>>> see if whatever the bug is has been fixed for the upcoming release:
>>>>>>
>>>>>>     http://www.open-mpi.org/nightly/v1.8/
>>>>>>
>>>>>> 2. Build Open MPI with the --enable-debug option (note that this adds
>>>>>> a slight-but-noticeable performance penalty).  When you run, it should 
>>>>>> dump
>>>>>> a core file.  Load that core file in a debugger and see where it is 
>>>>>> failing
>>>>>> (i.e., file and line in the OMPI source).
>>>>>>
>>>>>> We don't usually have to resort to asking users to perform #2, but
>>>>>> there's no additional information to give a clue as to what is happening.
>>>>>> :-(
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Nov 9, 2014, at 11:43 AM, Saliya Ekanayake <esal...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> > Hi Jeff,
>>>>>> >
>>>>>> > You are probably busy, but just checking if you had a chance to
>>>>>> look at this.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Saliya
>>>>>> >
>>>>>> > On Thu, Nov 6, 2014 at 9:19 AM, Saliya Ekanayake <esal...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Jeff,
>>>>>> >
>>>>>> > I've attached a tar file with information.
>>>>>> >
>>>>>> > Thank you,
>>>>>> > Saliya
>>>>>> >
>>>>>> > On Tue, Nov 4, 2014 at 4:18 PM, Jeff Squyres (jsquyres) <
>>>>>> jsquy...@cisco.com> wrote:
>>>>>> > Looks like it's failing in the openib BTL setup.
>>>>>> >
>>>>>> > Can you send the info listed here?
>>>>>> >
>>>>>> >     http://www.open-mpi.org/community/help/
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > On Nov 4, 2014, at 1:10 PM, Saliya Ekanayake <esal...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > > Hi,
>>>>>> > >
>>>>>> > > I am using OpenMPI 1.8.1 in a Linux cluster that we recently
>>>>>> setup. It builds fine, but when I try to run even the simplest hello.c
>>>>>> program it'll cause a segfault. Any suggestions on how to correct this?
>>>>>> > >
>>>>>> > > The steps I did and error message are below.
>>>>>> > >
>>>>>> > > 1. Built OpenMPI 1.8.1 on the cluster. The ompi_info is attached.
>>>>>> > > 2. cd to examples directory and mpicc hello_c.c
>>>>>> > > 3. mpirun -np 2 ./a.out
>>>>>> > > 4. Error text is attached.
>>>>>> > >
>>>>>> > > Please let me know if you need more info.
>>>>>> > >
>>>>>> > > Thank you,
>>>>>> > > Saliya
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Saliya Ekanayake esal...@gmail.com
>>>>>> > > Cell 812-391-4914 Home 812-961-6383
>>>>>> > > http://saliya.org
>>>>>> > >
>>>>>> <ompi_info.txt><error.txt>_______________________________________________
>>>>>> > > users mailing list
>>>>>> > > us...@open-mpi.org
>>>>>> > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> > > Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25668.php
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Jeff Squyres
>>>>>> > jsquy...@cisco.com
>>>>>> > For corporate legal information go to:
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > users mailing list
>>>>>> > us...@open-mpi.org
>>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> > Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25672.php
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Saliya Ekanayake esal...@gmail.com
>>>>>> > Cell 812-391-4914 Home 812-961-6383
>>>>>> > http://saliya.org
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> > Saliya Ekanayake esal...@gmail.com
>>>>>> > Cell 812-391-4914 Home 812-961-6383
>>>>>> > http://saliya.org
>>>>>> > _______________________________________________
>>>>>> > users mailing list
>>>>>> > us...@open-mpi.org
>>>>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> > Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25717.php
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> jsquy...@cisco.com
>>>>>> For corporate legal information go to:
>>>>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> http://www.open-mpi.org/community/lists/users/2014/11/25723.php
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Saliya Ekanayake
>>>> Ph.D. Candidate | Research Assistant
>>>> School of Informatics and Computing | Digital Science Center
>>>> Indiana University, Bloomington
>>>> Cell 812-391-4914
>>>> http://saliya.org
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2014/12/26074.php
>>>>
>>>
>>>
>>>
>>> --
>>> Saliya Ekanayake
>>> Ph.D. Candidate | Research Assistant
>>> School of Informatics and Computing | Digital Science Center
>>> Indiana University, Bloomington
>>> Cell 812-391-4914
>>> http://saliya.org
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/26078.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2014/12/26079.php
>>
>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> Cell 812-391-4914
> http://saliya.org
>  _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/26080.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/12/26081.php
>



-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell 812-391-4914
http://saliya.org

Reply via email to