tra accepted this revision. tra added inline comments.
================ Comment at: clang/lib/Headers/__clang_cuda_cmath.h:85 +// (note that we do not create implicit base functions here). To avoid +// this clash we add a new trait to some of them that is always true +// (this is LLVM after all ;)). It will only influence the mangled name ---------------- jdoerfert wrote: > tra wrote: > > jdoerfert wrote: > > > tra wrote: > > > > jdoerfert wrote: > > > > > tra wrote: > > > > > > If you just want to disable some existing declarations that get in > > > > > > the way, one way to do it would be to redeclare them with an > > > > > > `__arrtibute__((enable_if(false)))` > > > > > > > > > > > > Having overloads with different return types will be observable. > > > > > We need both overloads as we don't know what return type the system > > > > > uses. I modeled the test below this way, that is we don't know if > > > > > `isnan` has a `bool` or `int` return type. > > > > > > > > > > > Having overloads with different return types will be observable. > > > > > > > > > > Unsure what observable effect you expect, the variants are there, > > > > > yes, but they have different names (wrt the base function and the > > > > > other variant function). The variant without a base function is > > > > > simply an unused internal function. Could you elaborate what problem > > > > > you expect? > > > > What will be the result of `sizeof(isinf(1.0f))` ? I would expect it to > > > > be the same on host and on the device. > > > > I'm not quite sure what the pragma would do, so it's possible I'm > > > > barking at the wrong tree here. > > > > > > > So, I actually had to run this to verify what I suspected would happen: > > > > > > `sizeof(isinf(1.0f))` is in the AST usually: > > > ``` > > > | `-UnaryExprOrTypeTraitExpr 0x5586a27bd590 <col:14, col:37> > > > 'unsigned long' sizeof > > > > > > > > > | `-ParenExpr 0x5586a27bd570 <col:20, col:37> 'bool' > > > | `-CallExpr 0x5586a27bd548 <col:21, col:36> 'bool' > > > | |-ImplicitCastExpr 0x5586a27bd530 <col:21, col:26> > > > 'bool (*)(float)' <FunctionToPointerDecay> > > > | | `-DeclRefExpr 0x5586a27bd500 <col:21, col:26> 'bool > > > (float)' lvalue Function 0x5586a276f9f0 'isinf' 'bool (float)' > > > non_odr_use_unevaluated > > > | `-FloatingLiteral 0x5586a27bd290 <col:32> 'float' > > > 1.000000e+00 > > > ``` > > > If `isinf` has an applicable variant, it will be picked up: > > > ``` > > > | `-UnaryExprOrTypeTraitExpr 0x55f9ac949a20 <col:14, col:37> > > > 'unsigned long' sizeof > > > > > > > > > | `-ParenExpr 0x55f9ac949a00 <col:20, col:37> 'bool' > > > | `-PseudoObjectExpr 0x55f9ac9499e0 <col:21, col:36> 'bool' > > > | |-CallExpr 0x55f9ac949978 <col:21, col:36> 'bool' > > > | | |-ImplicitCastExpr 0x55f9ac949960 <col:21, col:26> > > > 'bool (*)(float)' <FunctionToPointerDecay> > > > | | | `-DeclRefExpr 0x55f9ac949930 <col:21, col:26> 'bool > > > (float)' lvalue Function 0x55f9ac7cd1d0 'isinf' 'bool (float)' > > > non_odr_use_unevaluated > > > | | `-FloatingLiteral 0x55f9ac9496c0 <col:32> 'float' > > > 1.000000e+00 > > > | `-CallExpr 0x55f9ac9499b8 > > > </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20, > > > > > > //data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:36> > > > 'bool' > > > | |-ImplicitCastExpr 0x55f9ac9499a0 > > > </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20> > > > 'bool (*)(float) __attribute__((nothrow))' <FunctionToPointerDecay> > > > | | `-DeclRefExpr 0x55f9ac939790 <col:20> 'bool (float) > > > __attribute__((nothrow))' Function 0x55f9ac939690 > > > 'isinf[implementation={extension(disable_implicit_base, match_any, > > > allow_templates)}, device={arch(nvptx, nvptx64)}]' 'bool (float@ > > > | `-FloatingLiteral 0x55f9ac9496c0 > > > <//data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:32> > > > 'float' 1.000000e+00 > > > ``` > > > That is the behavior I expected, as it happens for any base function call > > > with an applicable variant. > > > > > > This patch doesn't change any of this. We have two specialization that do > > > only differ in their return type but each will only be a variant of a > > > base function with that return type. In any context, when we have a call > > > to the original base function, then we try to specialize. Since only the > > > `bool` return *or* the `int` return specializations are variants of the > > > base function, we might replace the base call with a call, but consistent > > > on host and device. I hope this makes some sense, I don't think I did a > > > good job explaining. > > It sounds like openmp's 'variant' is more of an 'overlay' rather than a > > CUDA-style target overload that I was thinking of (and overloads don't > > allow different return types at all). If I understand you correctly, the > > code below allows (literally?) matching host-side function signatures. > > Because the functions returning bool and functions returning int can't > > coexist on the host, there will be no conflicts on device side either. > > > > Is that in the ballpark of what's happening? If I'm still off, could you > > point me to more info about how "pragma omp declare variant" works? > > > > > > It sounds like openmp's 'variant' is more of an 'overlay' rather than a > > CUDA-style target overload that I was thinking of (and overloads don't > > allow different return types at all). > > [...] > > Is that in the ballpark of what's happening? > > Yep. Basically, you can provide N specialization for a function and calls to > that function are replaced by calls to a matching specialization. We also > only do this for direct calls, that is `&foo` will always give you the > address of the base version, which may or may not be desirable but is > certainly different from overloading. I also had to completely give up on my > overloading based implementation of declare variant :(, but the new one works > really well ;) > > > If I understand you correctly, the code below allows (literally?) matching > > host-side function signatures. Because the functions returning bool and > > functions returning int can't coexist on the host, there will be no > > conflicts on device side either. > > Exactly, with the caveat mentioned here in the TODO: We mangle the variants > to avoid conflicts with the base function. Since this mangling is only based > on the context selector and the function name, two variants that only differ > in their return type would clash. To avoid this I added a "no-op" context > selector trait here that will ensure the names are different in the > "overlay/variant" space. > > > > how "pragma omp declare variant" works? > > So, this is an extension to the context selector as allowed by the standard. > The latest public version is > https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf, `declare variant` > is on page 56, Section 2.3.5. OpenMP 5.1 (Nov 2020) will have various > clarifications but the principles are the same. Note that there is `declare > variant` and the `begin/end` version which behave slightly different. I > implemented all of math and complex support with the begin/end version and I > believe it to be far superior anyway ;) > > Thank you for the details. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D85879/new/ https://reviews.llvm.org/D85879 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits