Hello all, Here a short update: It seems, that the ompio io module does not support the external32 and internal data representation. However, this problem gets only triggered on NFS file systems for me.
In my case, trying to use external32, this lead to writing with the default file view for all processes, and so overlapping writes in the file. I opened a ticket for this under https://github.com/open-mpi/ompi/issues/6558 Best Christoph ----- Mensaje original ----- De: "niethammer" <nietham...@hlrs.de> Para: "Open MPI Users" <users@lists.open-mpi.org> Enviados: Domingo, 31 de Marzo 2019 12:21:13 Asunto: Re: [OMPI users] Using strace with Open MPI on Cray Nathan and Gilles, Thanks a lot for your suggestions! This problem is related to an strace based I/O analysis tool [1] that I am working on. So the strace command is inside a wrapper for different platforms and outputting to log files for all processes. Based on your suggestions: - Swithing of the tcp btl did not change the behaviour. - Switching of -f makes no difference, too. - Did not think about dev/null so far. Thanks! However neither /dev/null nor putting the log file onto another file system did change the behaviour. Waiting much longer and getting lucky today, I got to a point, where mpi finalized. Looking at the single processes one of them runs into a timeout (30sec) with fcntl, and the polls seem to target the related file descriptor in this case. I found the following 13 year old problem description "Fcntl F_SETLKW hangs on Linux NFS mount" [2], which is exactly the behaviour that I see. Sometimes I trigger the timeout, but mostly I run into the infinite polling. So, I am not sure any more that this is a MPI issue and that there is anything on the MPI side to improve. [1] https://github.com/cniethammer/strace-analyzer [2]https://linux-nfs.vger.kernel.narkive.com/l14j0vTR/fcntl-f-setlkw-hangs-on-linux-nfs-mount Best Christoph ----- Mensaje original ----- De: "Gilles Gouaillardet" <gilles.gouaillar...@gmail.com> Para: "Open MPI Users" <users@lists.open-mpi.org> Enviados: Domingo, 31 de Marzo 2019 5:42:49 Asunto: Re: [OMPI users] Using strace with Open MPI on Cray Christoph, I do not know how to fix this, but here are some suggestions/thoughts - do you need the -f flag ? if not, just remote it - what if you mpirun strace -o /dev/null ... ? - if the former works, then you might want to redirect the strace output to a local file (mpirun wrapper.sh, in which wrapper.sh sets the output file based on $PMIX_RANK or $$, and then exec strace ... Cheers, Gilles On Sat, Mar 30, 2019 at 6:29 PM Christoph Niethammer <nietham...@hlrs.de> wrote: > > Hello, > > I was trying to investigate some processes with strace under Open MPI. > However I have some issues when MPI I/O functionality is included writing > data to a NFS file system. > > mpirun -np 2 strace -f ./hello-world mpi-io > > does not return and strace is stuck reporting infinite "poll" calls. > However, the program works fine without strace. > > I tried with Open MPI 3.x and 4.0.1 switching between ompi and romio on > different operating systems (CentOS 7.6, SLES 12). > > I'd appreciate any hints which help me to understand what is going on. > > Best > Christoph > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: nietham...@hlrs.de > http://www.hlrs.de/people/niethammer > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users