Nathan and Gilles,

Thanks a lot for your suggestions!

This problem is related to an strace based I/O analysis tool [1] that I am 
working on.
So the strace command is inside a wrapper for different platforms and 
outputting to log
files for all processes.

Based on your suggestions:

- Swithing of the tcp btl did not change the behaviour.

- Switching of -f makes no difference, too.

- Did not think about dev/null so far. Thanks!
  However neither /dev/null nor putting the log file onto another file system 
did change the behaviour.

Waiting much longer and getting lucky today, I got to a point, where mpi 
finalized.  Looking at the single processes one of
them runs into a timeout (30sec) with fcntl, and the polls seem to target the 
related file descriptor in this case.
I found the following 13 year old problem description
"Fcntl F_SETLKW hangs on Linux NFS mount" [2], which is exactly the behaviour 
that I see.
Sometimes I trigger the timeout, but mostly I run into the infinite polling.

So, I am not sure any more that this is a MPI issue
and that there is anything on the MPI side to improve.

[1] https://github.com/cniethammer/strace-analyzer
[2]https://linux-nfs.vger.kernel.narkive.com/l14j0vTR/fcntl-f-setlkw-hangs-on-linux-nfs-mount

Best
Christoph 

----- Mensaje original -----
De: "Gilles Gouaillardet" <gilles.gouaillar...@gmail.com>
Para: "Open MPI Users" <users@lists.open-mpi.org>
Enviados: Domingo, 31 de Marzo 2019 5:42:49
Asunto: Re: [OMPI users] Using strace with Open MPI on Cray

Christoph,

I do not know how to fix this, but here are some suggestions/thoughts

- do you need the -f flag ? if not, just remote it
- what if you mpirun strace -o /dev/null ... ?
- if the former works, then you might want to redirect the strace
output to a local file (mpirun wrapper.sh, in which wrapper.sh sets
the output file based on $PMIX_RANK or $$, and then exec strace ...

Cheers,

Gilles

On Sat, Mar 30, 2019 at 6:29 PM Christoph Niethammer <nietham...@hlrs.de> wrote:
>
> Hello,
>
> I was trying to investigate some processes with strace under Open MPI.
> However I have some issues when MPI I/O functionality is included writing 
> data to a NFS file system.
>
> mpirun -np 2 strace -f ./hello-world mpi-io
>
> does not return and strace is stuck reporting infinite "poll" calls.
> However, the program works fine without strace.
>
> I tried with Open MPI 3.x and 4.0.1 switching between ompi and romio on 
> different operating systems (CentOS 7.6, SLES 12).
>
> I'd appreciate any hints which help me to understand what is going on.
>
> Best
> Christoph
>
> --
>
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
>
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to