Re: [OMPI users] mpi send/recv pair hangin

2018-04-09 Thread George Bosilca
Noam,

I have few questions for you. According to your original email you are
using OMPI 3.0.1 (but the hang can also be reproduced with the 3.0.0). Also
according to your stacktrace I assume it is an x86_64, compiled with icc.

Is your application multithreaded ? How did you initialized MPI (which
level of threading) ? Can you send us the opal_config.h file please.

Thanks,
  George.




On Sun, Apr 8, 2018 at 8:30 PM, George Bosilca  wrote:

> Right, it has nothing to do with the tag. The sequence number is an
> internal counter that help OMPI to deliver the messages in the MPI required
> order (FIFO ordering per communicator per peer).
>
> Thanks for offering your help to debug this issue. We'll need to figure
> out how this can happen, and we will get back to you for further debugging.
>
>   George.
>
>
>
> On Sun, Apr 8, 2018 at 6:00 PM, Noam Bernstein <
> noam.bernst...@nrl.navy.mil> wrote:
>
>> On Apr 8, 2018, at 3:58 PM, George Bosilca  wrote:
>>
>> Noam,
>>
>> Thanks for your output, it highlight an usual outcome. It shows that a
>> process (29662) has pending messages from other processes that are
>> tagged with a past sequence number, something that should have not
>> happened. The only way to get that is if somehow we screwed-up the sending
>> part and push the same sequence number twice ...
>>
>> More digging is required.
>>
>>
>> OK - these sequence numbers are unrelated to the send/recv tags, right?
>> I’m happy to do any further debugging.  I can’t share code, since we do
>> have access but it’s not open source, but I’d be happy to test out anything
>> you can suggest.
>>
>> thanks,
>> Noam
>>
>> 
>> |
>> |
>> |
>> *U.S. NAVAL*
>> |
>> |
>> _*RESEARCH*_
>> |
>> LABORATORY
>>
>> Noam Bernstein, Ph.D.
>> Center for Materials Physics and Technology
>> U.S. Naval Research Laboratory
>> T +1 202 404 8628  F +1 202 404 7546
>> https://www.nrl.navy.mil
>>
>>
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>>
>
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] mpi send/recv pair hangin

2018-04-09 Thread Noam Bernstein
On Apr 9, 2018, at 6:36 PM, George Bosilca  wrote:Noam,I have few questions for you. According to your original email you are using OMPI 3.0.1 (but the hang can also be reproduced with the 3.0.0).Correct. Also according to your stacktrace I assume it is an x86_64, compiled with icc.x86_64, yes, but, gcc + ifort.  I can test with gcc+gfortran if that’s helpful.Is your application multithreaded ? How did you initialized MPI (which level of threading) ? Can you send us the opal_config.h file please.No, no multithreading, at least not intentionally.  I can run with OMP_NUM_THREADS explicitly 1 if you’d like to exclude that as a possibility.  opal_config.h is attached, from ./opal/include/opal_config.h in the build directory.	Noam






||

|U.S. NAVAL|

|_RESEARCH_|

LABORATORY



Noam Bernstein, Ph.D.Center for Materials Physics and TechnologyU.S. Naval Research LaboratoryT +1 202 404 8628  F +1 202 404 7546https://www.nrl.navy.mil



opal_config.h
Description: Binary data
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users