[OMPI users] Slow collective MPI File IO

2020-04-06 Thread Dong-In Kang via users
Hi, I am running an MPI program where N processes write to a single file on a single shared memory machine. I’m using OpenMPI v.4.0.2. Each MPI process write a 1MB chunk of data for 1K times sequentially. There is no overlap in the file between any of the two MPI processes. I ran the program

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Gabriel, Edgar via users
Hi, A couple of comments. First, if you use MPI_File_write_at, this is usually not considered collective I/O, even if executed by multiple processes. MPI_File_write_at_all would be collective I/O. Second, MPI I/O can not do ‘magic’, but is bound by hardware that you are providing. If already a

[OMPI users] Clean termination after receiving multiple SIGINT

2020-04-06 Thread Kreutzer, Moritz via users
Hi, We are invoking mpirun from within a script which installs some signal handlers. Now, if we abort an Open MPI run with CTRL+C, the system sends SIGINT to the entire process group. Hence, the mpirun process receives a SIGINT from the system with si_code=SI_KERNEL. Additionally, our own signa

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Dong-In Kang via users
Thank you Edgar for the information. I also tried MPI_File_write_at_all(), but it usually makes the performance worse. My program is very simple. Each MPI process writes a consecutive portion of a file. No interleaving among the MPI processes. I think in this case I can use MPI_File_write_at(). I

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Collin Strassburger via users
Hello, Just a quick comment on this; is your code written in C/C++ or Fortran? Fortran has issues with writing at a decent speed regardless of MPI setup and as such should be avoided for file IO (yet I still occasionally see it implemented). Collin From: users On Behalf Of Dong-In Kang via

Re: [OMPI users] Clean termination after receiving multiple SIGINT

2020-04-06 Thread Ralph Castain via users
Currently, mpirun takes that second SIGINT to mean "you seem to be stuck trying to cleanly abort - just die", which means mpirun exits immediately without doing any cleanup. The individual procs all commit suicide when they see their daemons go away, which is why you don't get zombies left behin

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Dong-In Kang via users
Hi Collin, It is written in C. So, I think it is OK. Thank you, David On Mon, Apr 6, 2020 at 10:19 AM Collin Strassburger < cstrassbur...@bihrle.com> wrote: > Hello, > > > > Just a quick comment on this; is your code written in C/C++ or Fortran? > Fortran has issues with writing at a decent sp

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Gabriel, Edgar via users
The one test that would give you a good idea of the upper bound for your scenario would be that write a benchmark where each process writes to a separate file, and look at the overall bandwidth achieved across all processes. The MPI I/O performance will be less or equal to the bandwidth achieved

Re: [OMPI users] Clean termination after receiving multiple SIGINT

2020-04-06 Thread Kreutzer, Moritz via users
Thanks for the explanation, Ralph! I guess the reason we need to pass the signal down is to achieve correct behavior when a signal does not come via CTRL+C, but in case someone kills our top-level script (which eventually calls mpirun) using “kill $PID” or similar, in which case we would have t

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Dong-In Kang via users
Yes, I agree with you. I think I did the test using each file per MPI process. Each MPI process opens a file with the file name followed by its rank using MPI_File_open(MPI_COMM_SELF, ...). It showed a few times better performance (with np=4 or 8 on my workstation) than single MPI process (with np

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Gilles GOUAILLARDET via users
Collin, Do you have any data to backup your claim? As long as MPI-IO is used to perform file I/O, the Fortran bindings overhead should be hardly noticeable. Cheers, Gilles On April 6, 2020, at 23:22, Collin Strassburger via users wrote: Hello,   Just a quick comment on this; is your

Re: [OMPI users] Clean termination after receiving multiple SIGINT

2020-04-06 Thread Ralph Castain via users
I don't know that it is officially documented anywhere - it does get printed out when the first CTRL-C arrives. On the plus side, it has been 5 seconds (as opposed to some other time) since the beginning of OMPI, so it is pretty safe to rely on it. I wonder if you could get around this problem

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Gilles Gouaillardet via users
David, I suggest you rely on well established benchmarks such as IOR or iozone. As already pointed by Edgar, you first need to make sure you are not benchmarking your (memory) cache by comparing the bandwidth you measure vs the performance you can expect from your hardware. As a side note, unl

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Collin Strassburger via users
Gilles, I just checked the write implementation of the Fortran codes with which I have noticed the issue; while they are compiled with MPI, they are not using MPI-IO. Thank you for pointing out the important distinction! Thanks, Collin From: users On Behalf Of Gilles GOUAILLARDET via user

Re: [OMPI users] Slow collective MPI File IO

2020-04-06 Thread Benson Muite via users
If possible, consider changing to a non-blocking write using MPI_FILE_WRITE_ALL_BEGIN so that if possible, work can continue while the file is being written to disk. You may need to make a copy of the data being written if the space will be used for another purpose while the data is being writt

Re: [OMPI users] Clean termination after receiving multiple SIGINT

2020-04-06 Thread Kreutzer, Moritz via users
We were thinking of doing the same (putting mpirun into its own process group). When doing that, we have to make sure to propagate _all_ relevant signals our top-level wrapper receives to mpirun. That’s also a viable option I guess. Now we have some choices on the table and have to decide which