tra added inline comments.

================
Comment at: clang/lib/Sema/SemaDecl.cpp:7247-7250
+  // CUDA/HIP: Function-scope static variables in device or global functions
+  // have implicit device or constant attribute. Function-scope static 
variables
+  // in host device functions have implicit device or constant attribute in
+  // device compilation only.
----------------
yaxunl wrote:
> tra wrote:
> > This is somewhat confusing. I guess the issue is that we're conflating all 
> > the functionality implied by the `__device__` attribute and the `accessible 
> > on device` which is a subset of it. For the static vars in D functions you 
> > only need for it to be accessible on device, IMO. For HD functions, you do 
> > need the full `__device__` functionality, with host shadow and runtime 
> > registration.
> > 
> > While adding implicit `__device__` works for statics in the device-only 
> > functions, it's a bit of an overkill. It also gives us a somewhat different 
> > AST between host/device compilations.
> > 
> > Perhaps we can handle statics in device-only functions w/o adding implicit 
> > `__device__`. Can we check the parent of the variable instead when we check 
> > whether we're allowed to reference the variable? 
> Before we consider a function scope static variable without explicit device 
> attribute, let's consider the difference between a static variable with 
> explicit device attribute and a global device variable. They are both emitted 
> in device compilation and have shadow variables in host compilation. The only 
> difference is the linkage. A global device variable is supposed to be visible 
> to other compilation units, whereas a static device variable is supposed to 
> be visible to the same compilation unit only. A function scope static 
> variable with device attribute has similar traits: It needs to be emitted in 
> device compilation, and it needs a shadow variable in host compilation in 
> case it needs to be accessed in host code. The only difference is that it is 
> only visible inside the function.
> 
> Now let's consider a static var without device attribute in a device 
> function. From sema and codegen point of view, it should have difference from 
> a function scope static var with device attribute. Adding an implicit device 
> attribute would simplify its handling.
> 
> Now let's consider a static var without device attribute in a host device 
> function. The following code is valid for both nvcc and cuda-clang:
> 
> ```
> int __device__ __host__ func(int x) {
>   static int a = 1;
>   return a + x;
> }
> ```
> This requires the static variable is directly accessible in both device and 
> host compilation. This requires that in device compilation, the static var 
> behaves like a static var with explicit device attribute, whereas in host 
> compilation, the static var behaves like a normal host static var. By adding 
> implicit device attribute, we can clearly distinguish these situations and 
> reuse the sema and codegen logic of device attribute.
> A function scope static variable with device attribute has similar traits: It 
> needs to be emitted in device compilation, and it needs a shadow variable in 
> host compilation in case it needs to be accessed in host code. 

This is the part I don't agree with. Can you give me an example how a local 
variable in a `__device__` function can be accessed from the host code?

One can't refer to local static vars from outside of the function and even if 
the function returns the address, it will make no sense on the host side, 
because there's no reverse `device-address to host shadow` registration. I do 
not think we need host shadow or registration for device-side local statics. 
What do I miss?

> Now let's consider a static var without device attribute in a device 
> function. From sema and codegen point of view, it should have difference from 
> a function scope static var with device attribute. Adding an implicit device 
> attribute would simplify its handling.

I agree that it makes things simpler. What I'm saying is that the simple 
solution comes with an overhead that's not needed. 

>
> ```
> int __device__ __host__ func(int x) {
>   static int a = 1;
>   return a + x;
>}
>```
> 
> This requires the static variable is directly accessible in both device and 
> host compilation. 
> This requires that in device compilation, the static var behaves like a 
> static var with explicit device attribute, 
> whereas in host compilation, the static var behaves like a normal host static 
> var. 

I'm not sure I follow your reasoning. `directly accessible in both device and 
host compilation.` would need an equivalent of `__managed__` attribute. Regular 
`__device__` variables only allow the variable to have an address on the host 
side which can then be translated into device-side address by the runtime. The 
variable is only directly accessible from device.

> By adding implicit device attribute, we can clearly distinguish these 
> situations and reuse the sema and codegen logic of device attribute.

While this approach does remove that shadow+registration overhead, it does not 
give both host and device access to the same variable and it creates more 
divergence between host and device AST, which I'd prefer to avoid, if possible.

To summarize, we appear to agree on what we want in the end -- a variable 
accessible on its respective side only w/o overhead of the shadown and 
registration. What we disagree on is on how to implement it.
Your approach is to add `__device__` attibute only during device-side 
compilation only, which allows using parts of the functionality that happes to 
do the right thing in the individual compilation at the price of AST divergence.
I think that AST divergence should be avoided and that we should have a uniform 
way of handling local static vars on both sides.

Also, we'll need to figure out and document how static vars are expected to 
work in HD functions. Should they be implicitly `__managed__`? That would be 
the most intuitively sensible thing, but it's not going to work with CUDA as we 
don't support `__managed__` yet.

We could explicitly say that both host and device have their own instance of 
the local static variable. It's sort of how it works in practice now, but it's 
deviating of what a user would expect from a static var. It's probably a more 
natural fit for CUDA/HIP programming model in general. E.g. consider that we 
may be running on more than one GPU. In order for a static var to work for all 
GPUs and the host, it should live on the host and then be memory-mapped on each 
device. I'm not sure if `__managed__` can handle that in principle for CUDA. 
Each-carries-their own approach is more consistent -- that's how we treat 
global variables anyways.





CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95560/new/

https://reviews.llvm.org/D95560

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to