Hi! On Tue, 24 Oct 2017 11:55:27 +0200, Jakub Jelinek <ja...@redhat.com> wrote: > The following patch implements coalescing of transfers (only those that are > copied into the freshly allocated device buffer) into one or multiple larger > transfers. The patch doesn't coalesce > 32KB transfers or transfers where > the gap is 4KB or more. I guess it would be not too hard to do similar > coalescing for the dev2host transfers that are from a single device mapping, > though probably far less important than the more common host2dev transfers.
I too wondered about device to host copies. (..., and in the OpenACC context, how that would interact with 'async'...) And then, I wondered about 'OpenMP target enter data' directives -- if that one creates/copies multiple objects, wouldn't that likewise benefit from the coalescing optimization? There is the (implementation?) problem, though, that 'GOMP_target_enter_exit_data' calls 'gomp_map_vars' separately for each mapping -- is that just because of the special 'GOMP_MAP_STRUCT' handling? (Could we easily do "ranges" between such interrupters?) And then, could we go as far as using the coalescing optimization even for 'update'/'exit data' directives, and/or potentially for generally all host to device and device to host copies, when we can determine that the device addresses are adjacent to each other? Or would figuring that out require more effort compared to just launching individual transfers? Just an idea that I had... Grüße Thomas
signature.asc
Description: PGP signature