Re: [OMPI users] MPI_Accumulate() Blocking?

Marc-André Hermanns Fri, 05 May 2017 00:41:33 -0700

Ben,

I would regard the serialization an implementation issue not a
standards issue, thus it would still be a valid approach to perform
the operations in the way the benchmark does.


As far as I know, Nathan Hjelm did a major overhaul of the RMA
handling in Open-MPI 2.x, so my first suggestion would be to update
your installation to the latest Open-MPI and check the outcome.

That said, I think I saw similar issues with a local installation of
Open-MPI 2.0.1 that I wanted to talk to Nathan about. I still have to
investigate this further, as I currently cannot rule out a
user/configuration error on my part.

The general problem here is that in passive-target synchronization the
target cannot easily 'help' in getting things done. If your operation
needs anything that the NIC cannot do on its own via DMA, you will
need to get the target involved somehow.

Off the cuff, I can think of three ways of handling such situations
(there might be more, but I am not an implementor):

(1) Have a separate progress-thread running on the target to handle
RMA operations transparently.

(2) Have the target react to interrupts issued by the NIC to handle
incoming communication.

(3) Have the RMA engine check pending requests every time MPI is called.

I think Open-MPI 1.x was using approach (3), but Nathan should correct
me if I am wrong. Version 2.x should offload as much as possible to
the NIC, but may still need target intervention on some operations.

@Nathan: Do you have any suggestions on tuning for the Open-MPI
implementation?

Cheers,
Marc-Andre

On 04.05.2017 21:27, Benjamin Brock wrote:
> Is there any way to issue simultaneous MPI_Accumulate() requests to
> different targets, then?  I need to update a distributed array, and
> this serializes all of the communication.
> 
> Ben
> 
> On Thu, May 4, 2017 at 5:53 AM, Marc-André Hermanns
> <m.a.herma...@fz-juelich.de <mailto:m.a.herma...@fz-juelich.de>> wrote:
> 
>     Dear Benjamin,
> 
>     as far as I understand the MPI standard, RMA operations non-blocking
>     in the sense that you need to complete them with a separate call
>     (flush/unlock/...).
> 
>     I cannot find the place in the standard right now, but I think an
>     implementation is allowed to either buffer RMA requests or block until
>     the RMA operation can be initiated, and the user should not assume
>     either. I have seen the one and the other behavior across
>     implementations in the past.
> 
>     For your second question, yes, flush is supposed to block until remote
>     completion of the operation.
> 
>     That said, I think to recall that Open-MPI 1.x did not support
>     asynchronous target-side progress for passive-target synchronization
>     (which is used in your benchmark example), so the behavior you
>     observed is to some extent expected.
> 
>     Cheers,
>     Marc-Andre
> 
> 
> 
>     On 04.05.2017 01:25, Benjamin Brock wrote:
>     > MPI_Accumulate() is meant to be non-blocking, and MPI will block
>     until
>     > completion when an MPI_Win_flush() is called, correct?
>     >
>     > In this (https://hastebin.com/raw/iwakacadey
>     <https://hastebin.com/raw/iwakacadey>) microbenchmark,
>     > MPI_Accumulate() seems to be blocking for me in OpenMPI 1.10.6.
>     >
>     > I'm seeing timings like
>     >
>     > [brock@nid00622 junk]$ mpirun -n 4 ./junk
>     > Write: 0.499229 rq, 0.000018 fl; Read: 0.463764 rq, 0.000035 fl
>     > Write: 0.464914 rq, 0.000012 fl; Read: 0.419703 rq, 0.000024 fl
>     > Write: 0.499686 rq, 0.000014 fl; Read: 0.422557 rq, 0.000023 fl
>     > Write: 0.437960 rq, 0.000015 fl; Read: 0.396530 rq, 0.000023 fl
>     >
>     > Meaning up to half a second is being spent issuing requests, but
>     > almost no time is spent in flushes.  The time spent in requests
>     scales
>     > with the size of the messages, but the time spent in flushes
>     stays the
>     > same.
>     >
>     > I'm compiling this with mpicxx acc.cpp -o acc -std=gnu++11 -O3.
>     >
>     > Any suggestions?  Am I using this incorrectly?
>     >
>     > Ben
>     >
>     >
>     > _______________________________________________
>     > users mailing list
>     > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>     > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>     >
> 
>     --
>     Marc-Andre Hermanns
>     Jülich Aachen Research Alliance,
>     High Performance Computing (JARA-HPC)
>     Jülich Supercomputing Centre (JSC)
> 
>     Wilhelm-Johnen-Str.
>     52425 Jülich
>     Germany
> 
>     Phone: +49 2461 61 2509 <tel:%2B49%202461%2061%202509> | +49 241
>     80 24381 <tel:%2B49%20241%2080%2024381>
>     Fax: +49 2461 80 6 99753 <tel:%2B49%202461%2080%206%2099753>
>     www.jara.org/jara-hpc <http://www.jara.org/jara-hpc>
>     email: m.a.herma...@fz-juelich.de <mailto:m.a.herma...@fz-juelich.de>
> 
> 
>     _______________________________________________
>     users mailing list
>     users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>     <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 

-- 
Marc-Andre Hermanns
Jülich Aachen Research Alliance,
High Performance Computing (JARA-HPC)
Jülich Supercomputing Centre (JSC)

Wilhelm-Johnen-Str.
52425 Jülich
Germany

Phone: +49 2461 61 2509 | +49 241 80 24381
Fax: +49 2461 80 6 99753
www.jara.org/jara-hpc
email: m.a.herma...@fz-juelich.de

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Accumulate() Blocking?

Reply via email to