jdoerfert added a comment.

In D60907#1483615 <https://reviews.llvm.org/D60907#1483615>, @hfinkel wrote:

> In D60907#1479370 <https://reviews.llvm.org/D60907#1479370>, @gtbercea wrote:
>
> > In D60907#1479142 <https://reviews.llvm.org/D60907#1479142>, @hfinkel wrote:
> >
> > > In D60907#1479118 <https://reviews.llvm.org/D60907#1479118>, @gtbercea 
> > > wrote:
> > >
> > > > Ping @hfinkel @tra
> > >
> > >
> > > The last two comments in D47849 <https://reviews.llvm.org/D47849> 
> > > indicated exploration of a different approach, and one which still seems 
> > > superior to this one. Can you please comment on why you're now pursuing 
> > > this approach instead?
> >
> >
> > ...
> >
> > Hal, as far as I can tell, this solution is similar to yours but with a 
> > slightly different implementation. If there are particular aspects about 
> > this patch you would like to discuss/give feedback on please let me know.
>
>
> The solution I suggested had the advantages of:
>
> 1. Being able to directly reuse the code in 
> `__clang_cuda_device_functions.h`. On the other hand, using this solution we 
> need to implement a wrapper function for every math function. When 
> `__clang_cuda_device_functions.h` is updated, we need to update the OpenMP 
> wrapper as well.


I'd even go as far as to argue that `__clang_cuda_device_functions.h` should 
include the internal math.h wrapper to get all math functions. See also the 
next comment.

> 2. Providing access to wrappers for other CUDA intrinsics in a natural way 
> (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d 
> than __nv_rnorm3d in user code].

@hfinkel 
I don't see why you want to mix CUDA intrinsics with math.h overloads. I added 
a rough outline of how I imagined the internal math.h header to look like as a 
comment in D47849 <https://reviews.llvm.org/D47849>. Could you elaborate how 
that differs from what you imagine and how the other intrinsics come in?

> 3. Being similar to the "declare variant" functionality from OpenMP 5, and 
> thus, I suspect, closer to the solution we'll eventually be able to apply in 
> a standard way to all targets.

I can see this.

>> This solution is following Alexey's suggestions. This solution allows the 
>> optimization of math calls if they apply (example: pow(x,2) => x*x ) which 
>> was one of the issues in the previous solution I implemented.
> 
> So we're also missing that optimization for CUDA code when compiling with 
> Clang? Isn't this also something that, regardless, should be fixed?

Maybe through a general built-in recognition and lowering into target specific 
implementations/intrinsics late again?


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60907/new/

https://reviews.llvm.org/D60907



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to