[PATCH] D78655: [CUDA][HIP] Let non-caputuring lambda be host device

Artem Belevich via Phabricator via cfe-commits Tue, 23 Jun 2020 10:43:31 -0700

tra added a comment.

In D78655#2108047 <https://reviews.llvm.org/D78655#2108047>, @pfultz2 wrote:


> > Could you give an example to demonstrate current use and how it will break?
>
> Here is place where it would break:
>
> https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/src/targets/gpu/device/include/migraphx/gpu/device/multi_index.hpp#L129
>
> This change was already included in a fork of llvm in rocm 3.5 and 3.6 
> releases which is why this compiles. This also compiles using the hcc-based 
> hip compilers which is what previous rocm versions used. It would be best if 
> this can be upstreamed, so we dont have to hold on to these extra changes in 
> a fork.


It may be OK to require updated software in order to switch to a new compiler. 
E.g. it would be unreasonable for clang to compile all existing HCC code. Nor 
did we promise to compile all existing CUDA code when it was at the point in 
time where HIP is now -- new compiler emerging in an ecosystem with existing 
code which compiles and works fine with the incumbent compiler, but needs some 
tweaks to compile/work with clang. There will be some back and forth before we 
reach the equilibrium where most things compile and work.

You may need to make some portability tweaks to your code to make it work with 
upstream and internal clang versions + hcc. This is roughly what's been done to 
existing CUDA code -- pretty much all major libraries that use CUDA 
(tensorflow, Thrust, cutlas, cub, pytorch) had to have minor tweaks to make it 
portable to clang.

Now, back to the specifics of your example. I'm still not 100% sure I 
understand what the problem is. Can you boil down the use case to an example on 
godbolt? Not just the lambda itself, but also the way it's intended to be used. 
It does not need to compile, I just need it to understand your use case and the 
problem.
I can imaging passing lambda type as a template parameter which would make it 
hard to predict/control where/how it will finally be instantiated or used, but 
it would be great to have a practical example.

> Part of the motivation for this change was that it wasn't always clear in 
> code where the `__device__` attribute is needed with lambdas sometimes. It 
> also makes it more consistent with `constexpr` lambdas and hcc-based hip 
> compiler. Including this for capturing lambdas will make this simpler and 
> easier to understand.
> 
> If there are concerns about making it default for capturing lambdas, then can 
> we at least just have a flag to enable this for capturing lambdas?

I've just pointed that the assumption that having the capture implies having 
enclosing function is invalid. We've already decided to proceed with promotion 
of all lambdas in general, so it's mostly the matter of taking care of 
implementation details.
Dealing with capturing lambdas in a separate patch is one option. IMO it makes 
sense in general as capturing lambdas do have their own distinct quirks, while 
promotion of non-capturing lambdas are relatively uncontroversial.
If Sam decides to incorporate support for capturing lambdas in this patch, we 
could still do it by restricting the capturing lambda promotion to the ones 
within a function scope only. I.e. lambdas created in global scope would still 
be host.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78655/new/

https://reviews.llvm.org/D78655



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D78655: [CUDA][HIP] Let non-caputuring lambda be host device

Reply via email to