Hello all,

Here a short update: It seems, that the ompio io module does not support the 
external32 and internal data representation.
However, this problem gets only triggered on NFS file systems for me.

In my case, trying to use external32, this lead to writing with the default 
file view for all processes, and so overlapping writes in the file.

I opened a ticket for this under https://github.com/open-mpi/ompi/issues/6558

Best
Christoph

----- Mensaje original -----
De: "niethammer" <nietham...@hlrs.de>
Para: "Open MPI Users" <users@lists.open-mpi.org>
Enviados: Domingo, 31 de Marzo 2019 12:21:13
Asunto: Re: [OMPI users] Using strace with Open MPI on Cray

Nathan and Gilles,

Thanks a lot for your suggestions!

This problem is related to an strace based I/O analysis tool [1] that I am 
working on.
So the strace command is inside a wrapper for different platforms and 
outputting to log
files for all processes.

Based on your suggestions:

- Swithing of the tcp btl did not change the behaviour.

- Switching of -f makes no difference, too.

- Did not think about dev/null so far. Thanks!
  However neither /dev/null nor putting the log file onto another file system 
did change the behaviour.

Waiting much longer and getting lucky today, I got to a point, where mpi 
finalized.  Looking at the single processes one of
them runs into a timeout (30sec) with fcntl, and the polls seem to target the 
related file descriptor in this case.
I found the following 13 year old problem description
"Fcntl F_SETLKW hangs on Linux NFS mount" [2], which is exactly the behaviour 
that I see.
Sometimes I trigger the timeout, but mostly I run into the infinite polling.

So, I am not sure any more that this is a MPI issue
and that there is anything on the MPI side to improve.

[1] https://github.com/cniethammer/strace-analyzer
[2]https://linux-nfs.vger.kernel.narkive.com/l14j0vTR/fcntl-f-setlkw-hangs-on-linux-nfs-mount

Best
Christoph 

----- Mensaje original -----
De: "Gilles Gouaillardet" <gilles.gouaillar...@gmail.com>
Para: "Open MPI Users" <users@lists.open-mpi.org>
Enviados: Domingo, 31 de Marzo 2019 5:42:49
Asunto: Re: [OMPI users] Using strace with Open MPI on Cray

Christoph,

I do not know how to fix this, but here are some suggestions/thoughts

- do you need the -f flag ? if not, just remote it
- what if you mpirun strace -o /dev/null ... ?
- if the former works, then you might want to redirect the strace
output to a local file (mpirun wrapper.sh, in which wrapper.sh sets
the output file based on $PMIX_RANK or $$, and then exec strace ...

Cheers,

Gilles

On Sat, Mar 30, 2019 at 6:29 PM Christoph Niethammer <nietham...@hlrs.de> wrote:
>
> Hello,
>
> I was trying to investigate some processes with strace under Open MPI.
> However I have some issues when MPI I/O functionality is included writing 
> data to a NFS file system.
>
> mpirun -np 2 strace -f ./hello-world mpi-io
>
> does not return and strace is stuck reporting infinite "poll" calls.
> However, the program works fine without strace.
>
> I tried with Open MPI 3.x and 4.0.1 switching between ompi and romio on 
> different operating systems (CentOS 7.6, SLES 12).
>
> I'd appreciate any hints which help me to understand what is going on.
>
> Best
> Christoph
>
> --
>
> Christoph Niethammer
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstrasse 19
> 70569 Stuttgart
>
> Tel: ++49(0)711-685-87203
> email: nietham...@hlrs.de
> http://www.hlrs.de/people/niethammer
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to