Re: [OMPI users] Slow collective MPI File IO

Dong-In Kang via users Mon, 06 Apr 2020 08:10:26 -0700

Yes, I agree with you.
I think I did the test using each file per MPI process.
Each MPI process opens a file with the file name followed by its rank using
MPI_File_open(MPI_COMM_SELF, ...).
It showed a few times better performance (with np=4 or 8 on my workstation)
than single MPI process  (with np = 1) can achieve.
As I mentioned before I could get:
"As for the local disk, at least 2 times faster than single MPI process can
achieve.
As for the ramdisk, at least 5 times faster.
Luster, I know that it is at least 7-8 times or more faster depending on
the configuration.".
However, when a single file is shared by multiple MPI processes (np > 1),
the sum of write speed of all MPI processes is at most the performance of
run with a single MPI process run(np = 1).


I expect the simple MPI File IO is scalable at least for small number of
processes.
But I don't see that at all now.
I ran it on a shared memory machine having tens of cores, but saw the same
results.
Any idea?

David



On Mon, Apr 6, 2020 at 10:47 AM Gabriel, Edgar <egabr...@central.uh.edu>
wrote:

> The one test that would give you a good idea of the upper bound for your
> scenario would be that write a benchmark where each process writes to a
> separate file, and look at the overall bandwidth achieved across all
> processes. The MPI I/O performance will be less or equal to the bandwidth
> achieved in this scenario, as long as the number of processes are moderate.
>
>
>
> Thanks
>
> Edgar
>
>
>
> *From:* Dong-In Kang <dik...@gmail.com>
> *Sent:* Monday, April 6, 2020 9:34 AM
> *To:* Collin Strassburger <cstrassbur...@bihrle.com>
> *Cc:* Open MPI Users <users@lists.open-mpi.org>; Gabriel, Edgar <
> egabr...@central.uh.edu>
> *Subject:* Re: [OMPI users] Slow collective MPI File IO
>
>
>
> Hi Collin,
>
>
>
> It is written in C.
>
> So, I think it is OK.
>
>
>
> Thank you,
>
> David
>
>
>
>
>
> On Mon, Apr 6, 2020 at 10:19 AM Collin Strassburger <
> cstrassbur...@bihrle.com> wrote:
>
> Hello,
>
>
>
> Just a quick comment on this; is your code written in C/C++ or Fortran?
> Fortran has issues with writing at a decent speed regardless of MPI setup
> and as such should be avoided for file IO (yet I still occasionally see it
> implemented).
>
>
>
> Collin
>
>
>
> *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Dong-In
> Kang via users
> *Sent:* Monday, April 6, 2020 10:02 AM
> *To:* Gabriel, Edgar <egabr...@central.uh.edu>
> *Cc:* Dong-In Kang <dik...@gmail.com>; Open MPI Users <
> users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] Slow collective MPI File IO
>
>
>
>
>
> Thank you Edgar for the information.
>
>
>
> I also tried MPI_File_write_at_all(), but it usually makes the
> performance worse.
>
> My program is very simple.
>
> Each MPI process writes a consecutive portion of a file.
>
> No interleaving among the MPI processes.
>
> I think in this case I can use MPI_File_write_at().
>
>
>
> I tested the maximum bandwidth of the target devices and they are at least
> a few times bigger than what single process can achieve.
>
> I tested it using the same program but open the individual files using
> MPI_COMM_SELF.
>
> I tested 32MB chunk, but didn't show noticeable changes. I also tried
> 512MB chunk, but no noticeable difference.
>
> (There are performance differences between using 32MB chunk and using
> 512MB chunk.
>
> But, they still don't make multiple MPI processes file IO exceeds the
> performance of single MPI process file IO)
>
> As for the local disk, at least 2 times faster than single MPI process can
> achieve.
>
> As for the ramdisk, at least 5 times faster.
>
> Luster, I know that it is at least 7-8 times or more faster depending on
> the configuration.
>
>
>
> About caching effect, it would be the case of MPI_File_read().
>
> I can see very high bandwidth of MPI_File_read(), which I believe comes
> from caches in RAM.
>
> But as for MPI_File_write, I think it doesn't be affected by caching.
>
> And I create a new file for each test and removes the file at the end of
> the testing.
>
>
>
> I may make a very simple mistake, but I don't know what it is.
>
> I saw MPI_File I/O could achieve multiple times of speedup over single
> process file IO,
>
> when faster file system is used like Lustre from a few reports in the
> internet.
>
> I started this experiment because I couldn't get speedup on Lustre file
> system.
> And then I moved the experiment to ramdisk and local disk, because it can
> remove the issue of Lustre configuration.
>
>
>
> Any comments are welcome.
>
>
>
> David
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Apr 6, 2020 at 9:03 AM Gabriel, Edgar <egabr...@central.uh.edu>
> wrote:
>
> Hi,
>
>
>
> A couple of comments. First, if you use MPI_File_write_at, this is usually
> not considered collective I/O, even if executed by multiple processes.
> MPI_File_write_at_all would be collective I/O.
>
>
>
> Second, MPI I/O can not do ‘magic’, but is bound by hardware that you are
> providing. If already a single process is able to saturate the bandwidth of
> your file system and hardware, you will not be able to see performance
> improvements from multiple processes (some minor exceptions maybe due to
> caching effects, but that is only for smaller problem sizes, the larger the
> amount of data that you try to write, the lesser the caching effects become
> in file I/O). So the first question that you have to answer, what is the
> sustained bandwidth of your hardware, and are you able  to saturate it
> already with a single process. If you are using a single hard drive (or
> even 2 or 3 hard drives in a RAID 0 configuration), this is almost
> certainly the case.
>
>
>
> Lastly, the configuration parameters of your tests also play a major role.
> As a general rule, the larger amounts of data you are able to provide per
> file I/O call, the better the performance will be. 1MB of data per call is
> probably on the smaller side. The ompio implementation of MPI I/O breaks
> large individual I/O operations (e.g. MPI_File_write_at) into chunks of
> 512MB for performance reasons internally. Large collective I/O operations
> (e.g. MPI_File_write_at_all) are broken into chunks of 32 MB. This gives
> you some hints on the quantities of data that you would have to use for
> performance reasons.
>
>
>
> Along the same lines, one final comment. You say you did 1000 writes of
> 1MB each. For a single process that is about 1GB of data. Depending on how
> much main memory  your PC has, this amount of data can still be cached in
> modern systems, and you might have an unrealistically high bandwidth value
> for the 1 process case that you are comparing against (it depends a bit on
> what your benchmark does, and whether you force flushing the data to disk
> inside of your measurement loop).
>
>
>
> Hope this gives you some pointers on where to start to look.
>
> Thanks
>
> Edgar
>
>
>
> *From:* users <users-boun...@lists.open-mpi.org> *On Behalf Of *Dong-In
> Kang via users
> *Sent:* Monday, April 6, 2020 7:14 AM
> *To:* users@lists.open-mpi.org
> *Cc:* Dong-In Kang <dik...@gmail.com>
> *Subject:* [OMPI users] Slow collective MPI File IO
>
>
>
> Hi,
>
>
>
> I am running an MPI program where N processes write to a single file on a
> single shared memory machine.
>
> I’m using OpenMPI v.4.0.2.
>
> Each MPI process write a 1MB chunk of data for 1K times sequentially.
>
> There is no overlap in the file between any of the two MPI processes.
>
> I ran the program for -np = {1, 2, 4, 8}.
>
> I am seeing that the speed of the collective write to a file for -np = {2,
> 4, 8} never exceeds the speed of -np = {1}.
>
> I did the experiment with a few different file systems {local disk, ram
> disk, Luster FS}.
>
> For all of them, I see similar results.
>
> The speed of collective write to a single shared file never exceeds the
> speed of single MPI process case.
>
> Any tip or suggestions?
>
>
>
> I used MPI_File_write_at() routine with proper offset for each MPI process.
>
> (I also tried MPI_File_write_at_all() routine, which makes the performance
> worse as np gets bigger.)
>
> Before writing, MPI_Barrrier() is used.
>
> The start time is taken right after MPI_Barrier() using MPI_Timer();
>
> The end time is taken right after another MPI_Barrier().
>
> The speed of the collective write is calculate as
>
> (total data amount written to the file)/(time between the first
> MPI_Barrier() and the second MPI_Barrier());
>
>
>
> Any idea to increase the speed?
>
>
>
> Thanks,
>
> David
>
>
>
>
>
>
> --
>
> =========
>
> Jesus is My Lord!
>
>
>


-- 
=========
Jesus is My Lord!

Re: [OMPI users] Slow collective MPI File IO

Reply via email to