https://reviews.llvm.org/D25403 for adding constexpr to the new libc++ functions.
On Sat, Oct 8, 2016 at 8:56 PM, Justin Lebar <jle...@google.com> wrote: >> > The fix is not as simple as simply changing our implementation of >> e.g. >> std::isnan to call __builtin_isnanf, because we also would want to >> fix >> ::isnanf, >> >> No, if I understand what you're saying, you specifically wouldn't. > > I understand how this is feasible on the CPU, because ::isnan is a > library function that can never be inlined. But on the GPU, these > library functions are (at the moment) always declared inline. That > seems to complicate this idea. > > Right now ::isnan(x) is going to call __nv_isnan(x), which computes > abs(x). If we pass -ffast-math, the compiler will be able to assume > that abs(x) is not nan. I guess you're saying that we would need to > special-case __nv_isnan so that -ffast-math is always off > (essentially). But, what if it gets inlined? > > It looks like libstdc++'s std::isnan calls __builtin_isnan (same for > its std::isinf), and its ::isnan is an alias for std::isnan. So > libstdc++'s isnan is going to return false with -ffast-math (or anyway > it will do the same thing as the builtin functions, which aiui is what > you're proposing libc++'s isnan *not* do). > >> This is important for use cases where, for example, even though the user >> might want fast math, they still need to check their inputs for NaNs. > > Since this isn't going to work with libstdc++, and it relies on not > doing anything that the compiler might construe as "arithmetic" on the > value, this seems pretty dicey to me. One could instead compile a > separate TU without -ffast-math and do all their validation there? > I'd have a lot more confidence in that working today, continuing to > work tomorrow, and being portable across compilers and standard > libraries. > > I don't mean to relitigate https://reviews.llvm.org/D18639, but I am > not convinced that libc++'s isnan should have a path that returns true > with -ffast-math, given that > > * libstdc++'s isnan will always return false with -ffast-math, > * it's at best complicated for us to make this work if you can inline > the body of isnan (as we can on the GPU), > * it's at best complicated for users to write "correct" C++ that > calls isnan with -ffast-math, especially if they want their code to > continue to work in the future in the face of changing compilers > (-ffast-math is not specified anywhere, so who knows what it means), > and > * there's a relatively simple workaround (use a separate TU) that > sidesteps all these problems. > > I'm not saying we should go in and change libc++'s CPU implementation > of isnan to call the builtin. I'll leave that up to people who care > about CPU code. But at least on the GPU, it still makes sense to me > to fix the problem you originally identified by making > std::/::isnan/isinf always return false/true with -ffast-math. Which > I think we should be able to do with the intrinsic upgrade I > originally suggested. > > On a separate note: Can we make __libcpp_isnan and __libcpp_isinf > constexpr? This will make them implicitly host+device functions, > solving the problem on the GPU. Otherwise I may have to reimplement > these functions in a header, and that's lame. Although I am clearly > not above that. :) > > On Sat, Oct 8, 2016 at 6:50 PM, Hal Finkel <hfin...@anl.gov> wrote: >> ----- Original Message ----- >>> From: "Justin Lebar" <jle...@google.com> >>> To: "Hal Finkel" <hfin...@anl.gov> >>> Cc: "Clang Commits" <cfe-commits@lists.llvm.org> >>> Sent: Saturday, October 8, 2016 6:16:12 PM >>> Subject: Re: r283680 - [CUDA] Support <complex> and std::min/max on the >>> device. >>> >>> Hal, >>> >>> On NVPTX, these functions eventually get resolved to function calls >>> in >>> libdevice, e.g. __nv_isinff and __nv_isnanf. >>> >>> llvm does not do a good job understanding the body of e.g. >>> __nvvm_isnanf, because it uses nvptx-specific intrinsic functions, >>> notably @llvm.nvvm.fabs.f. These are opaque to the LLVM optimizer. >>> >>> The fix is not as simple as simply changing our implementation of >>> e.g. >>> std::isnan to call __builtin_isnanf, because we also would want to >>> fix >>> ::isnanf, >> >> No, if I understand what you're saying, you specifically wouldn't. We had a >> discussion about this on the review thread(s) that led to r283051, and while >> we want to elide the checks inside the mathematical functions, we don't want >> to replace isnan itself with something that will get optimized away. We want >> to keep the ability for the user to explicitly check for NaNs, etc. even if >> we don't want those checks to appear inside of mathematical operations. This >> is important for use cases where, for example, even though the user might >> want fast math, they still need to check their inputs for NaNs. >> >> -Hal >> >>> but we can't override that implementation without some >>> major >>> surgery on the nvptx headers. >>> >>> David Majnemer and I talked about one way to fix this, namely by >>> using >>> IR intrinsic upgrades to replace the opaque nvptx intrinsics with >>> LLVM >>> intrinsics. LLVM would then be able to understand these intrinsics >>> and optimize them. We would reap benefits not just for std::isnan, >>> but also e.g. constant-folding calls like std::abs that also >>> eventually end up in libnvvm. >>> >>> I did the first half of this work, by adding lowerings for the >>> various >>> LLVM intrinsics to the NVPTX backend [1]. But David is now busy with >>> other things and hasn't been able to help with the second half, >>> namely >>> using IR upgrades to replace the nvptx target-specific intrinsics >>> with >>> generalized LLVM intrinsics. Perhaps this is something you'd be able >>> to help with? >>> >>> In any case, using builtins here without fixing std::isnan and >>> ::isnan >>> feels to me to be the wrong solution. It seems to me that we should >>> be able to rely on std::isnan and friends being fast, and if they're >>> not, we should fix that. Using builtins here would be "cheating" to >>> make our implementation faster than user code. >>> >>> I'll note, separately, that on x86, clang does not seem to >>> constant-fold std::isinf or __builtin_isinff to false with >>> -ffast-math >>> -ffinite-math-only. GCC can do it. Clang gets std::isnan. >>> https://godbolt.org/g/vZB55a >>> >>> By the way, the changes you made to libc++ unfortunately break this >>> patch with libc++, because e.g. __libcpp_isnan is not a device >>> function. I'll have to think about how to fix that -- I may send you >>> a patch. >>> >>> Regards, >>> -Justin >>> >>> [1] https://reviews.llvm.org/D24300 >>> >>> On Sat, Oct 8, 2016 at 3:36 PM, Hal Finkel <hfin...@anl.gov> wrote: >>> > Hi Justin, >>> > >>> > This is neat! >>> > >>> > I see a bunch of uses of std::isinf, etc. here. It tends to be >>> > important that, when using -ffast-math (or -ffinite-math-only) >>> > these checks get optimized away. Can you please check that they >>> > do? If not, you might mirror what I've done in r283051 for libc++, >>> > which is similar to what libstdc++ ends up doing, so that we use >>> > __builtin_isnan/isinf/isfinite. >>> > >>> > Thanks again, >>> > Hal >>> > >>> > ----- Original Message ----- >>> >> From: "Justin Lebar via cfe-commits" <cfe-commits@lists.llvm.org> >>> >> To: cfe-commits@lists.llvm.org >>> >> Sent: Saturday, October 8, 2016 5:16:13 PM >>> >> Subject: r283680 - [CUDA] Support <complex> and std::min/max on >>> >> the device. >>> >> >>> >> Author: jlebar >>> >> Date: Sat Oct 8 17:16:12 2016 >>> >> New Revision: 283680 >>> >> >>> >> URL: http://llvm.org/viewvc/llvm-project?rev=283680&view=rev >>> >> Log: >>> >> [CUDA] Support <complex> and std::min/max on the device. >>> >> >>> >> Summary: >>> >> We do this by wrapping <complex> and <algorithm>. >>> >> >>> >> Tests are in the test-suite. >>> >> >>> >> Reviewers: tra >>> >> >>> >> Subscribers: jhen, beanz, cfe-commits, mgorny >>> >> >>> >> Differential Revision: https://reviews.llvm.org/D24979 >>> >> >>> >> Added: >>> >> cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h >>> >> cfe/trunk/lib/Headers/cuda_wrappers/ >>> >> cfe/trunk/lib/Headers/cuda_wrappers/algorithm >>> >> cfe/trunk/lib/Headers/cuda_wrappers/complex >>> >> Modified: >>> >> cfe/trunk/lib/Driver/ToolChains.cpp >>> >> cfe/trunk/lib/Headers/CMakeLists.txt >>> >> cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h >>> >> >>> >> Modified: cfe/trunk/lib/Driver/ToolChains.cpp >>> >> URL: >>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChains.cpp?rev=283680&r1=283679&r2=283680&view=diff >>> >> ============================================================================== >>> >> --- cfe/trunk/lib/Driver/ToolChains.cpp (original) >>> >> +++ cfe/trunk/lib/Driver/ToolChains.cpp Sat Oct 8 17:16:12 2016 >>> >> @@ -4694,6 +4694,15 @@ void Linux::AddClangCXXStdlibIncludeArgs >>> >> >>> >> void Linux::AddCudaIncludeArgs(const ArgList &DriverArgs, >>> >> ArgStringList &CC1Args) const { >>> >> + if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) { >>> >> + // Add cuda_wrappers/* to our system include path. This lets >>> >> us >>> >> wrap >>> >> + // standard library headers. >>> >> + SmallString<128> P(getDriver().ResourceDir); >>> >> + llvm::sys::path::append(P, "include"); >>> >> + llvm::sys::path::append(P, "cuda_wrappers"); >>> >> + addSystemInclude(DriverArgs, CC1Args, P); >>> >> + } >>> >> + >>> >> if (DriverArgs.hasArg(options::OPT_nocudainc)) >>> >> return; >>> >> >>> >> >>> >> Modified: cfe/trunk/lib/Headers/CMakeLists.txt >>> >> URL: >>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/CMakeLists.txt?rev=283680&r1=283679&r2=283680&view=diff >>> >> ============================================================================== >>> >> --- cfe/trunk/lib/Headers/CMakeLists.txt (original) >>> >> +++ cfe/trunk/lib/Headers/CMakeLists.txt Sat Oct 8 17:16:12 2016 >>> >> @@ -24,10 +24,13 @@ set(files >>> >> bmiintrin.h >>> >> __clang_cuda_builtin_vars.h >>> >> __clang_cuda_cmath.h >>> >> + __clang_cuda_complex_builtins.h >>> >> __clang_cuda_intrinsics.h >>> >> __clang_cuda_math_forward_declares.h >>> >> __clang_cuda_runtime_wrapper.h >>> >> cpuid.h >>> >> + cuda_wrappers/algorithm >>> >> + cuda_wrappers/complex >>> >> clflushoptintrin.h >>> >> emmintrin.h >>> >> f16cintrin.h >>> >> >>> >> Added: cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h >>> >> URL: >>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h?rev=283680&view=auto >>> >> ============================================================================== >>> >> --- cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h (added) >>> >> +++ cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h Sat Oct >>> >> 8 >>> >> 17:16:12 2016 >>> >> @@ -0,0 +1,203 @@ >>> >> +/*===-- __clang_cuda_complex_builtins - CUDA impls of runtime >>> >> complex fns ---=== >>> >> + * >>> >> + * Permission is hereby granted, free of charge, to any person >>> >> obtaining a copy >>> >> + * of this software and associated documentation files (the >>> >> "Software"), to deal >>> >> + * in the Software without restriction, including without >>> >> limitation >>> >> the rights >>> >> + * to use, copy, modify, merge, publish, distribute, sublicense, >>> >> and/or sell >>> >> + * copies of the Software, and to permit persons to whom the >>> >> Software is >>> >> + * furnished to do so, subject to the following conditions: >>> >> + * >>> >> + * The above copyright notice and this permission notice shall be >>> >> included in >>> >> + * all copies or substantial portions of the Software. >>> >> + * >>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY >>> >> KIND, >>> >> EXPRESS OR >>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> >> MERCHANTABILITY, >>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>> >> EVENT >>> >> SHALL THE >>> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >>> >> OR >>> >> OTHER >>> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>> >> OTHERWISE, >>> >> ARISING FROM, >>> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> >> DEALINGS IN >>> >> + * THE SOFTWARE. >>> >> + * >>> >> + >>> >> *===-----------------------------------------------------------------------=== >>> >> + */ >>> >> + >>> >> +#ifndef __CLANG_CUDA_COMPLEX_BUILTINS >>> >> +#define __CLANG_CUDA_COMPLEX_BUILTINS >>> >> + >>> >> +// This header defines __muldc3, __mulsc3, __divdc3, and >>> >> __divsc3. >>> >> These are >>> >> +// libgcc functions that clang assumes are available when >>> >> compiling >>> >> c99 complex >>> >> +// operations. (These implementations come from libc++, and have >>> >> been modified >>> >> +// to work with CUDA.) >>> >> + >>> >> +extern "C" inline __device__ double _Complex __muldc3(double __a, >>> >> double __b, >>> >> + double __c, >>> >> double __d) { >>> >> + double __ac = __a * __c; >>> >> + double __bd = __b * __d; >>> >> + double __ad = __a * __d; >>> >> + double __bc = __b * __c; >>> >> + double _Complex z; >>> >> + __real__(z) = __ac - __bd; >>> >> + __imag__(z) = __ad + __bc; >>> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { >>> >> + int __recalc = 0; >>> >> + if (std::isinf(__a) || std::isinf(__b)) { >>> >> + __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); >>> >> + __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); >>> >> + if (std::isnan(__c)) >>> >> + __c = std::copysign(0, __c); >>> >> + if (std::isnan(__d)) >>> >> + __d = std::copysign(0, __d); >>> >> + __recalc = 1; >>> >> + } >>> >> + if (std::isinf(__c) || std::isinf(__d)) { >>> >> + __c = std::copysign(std::isinf(__c) ? 1 : 0, __c); >>> >> + __d = std::copysign(std::isinf(__d) ? 1 : 0, __d); >>> >> + if (std::isnan(__a)) >>> >> + __a = std::copysign(0, __a); >>> >> + if (std::isnan(__b)) >>> >> + __b = std::copysign(0, __b); >>> >> + __recalc = 1; >>> >> + } >>> >> + if (!__recalc && (std::isinf(__ac) || std::isinf(__bd) || >>> >> + std::isinf(__ad) || std::isinf(__bc))) { >>> >> + if (std::isnan(__a)) >>> >> + __a = std::copysign(0, __a); >>> >> + if (std::isnan(__b)) >>> >> + __b = std::copysign(0, __b); >>> >> + if (std::isnan(__c)) >>> >> + __c = std::copysign(0, __c); >>> >> + if (std::isnan(__d)) >>> >> + __d = std::copysign(0, __d); >>> >> + __recalc = 1; >>> >> + } >>> >> + if (__recalc) { >>> >> + // Can't use std::numeric_limits<double>::infinity() -- >>> >> that >>> >> doesn't have >>> >> + // a device overload (and isn't constexpr before C++11, >>> >> naturally). >>> >> + __real__(z) = __builtin_huge_valf() * (__a * __c - __b * >>> >> __d); >>> >> + __imag__(z) = __builtin_huge_valf() * (__a * __d + __b * >>> >> __c); >>> >> + } >>> >> + } >>> >> + return z; >>> >> +} >>> >> + >>> >> +extern "C" inline __device__ float _Complex __mulsc3(float __a, >>> >> float __b, >>> >> + float __c, >>> >> float __d) { >>> >> + float __ac = __a * __c; >>> >> + float __bd = __b * __d; >>> >> + float __ad = __a * __d; >>> >> + float __bc = __b * __c; >>> >> + float _Complex z; >>> >> + __real__(z) = __ac - __bd; >>> >> + __imag__(z) = __ad + __bc; >>> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { >>> >> + int __recalc = 0; >>> >> + if (std::isinf(__a) || std::isinf(__b)) { >>> >> + __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); >>> >> + __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); >>> >> + if (std::isnan(__c)) >>> >> + __c = std::copysign(0, __c); >>> >> + if (std::isnan(__d)) >>> >> + __d = std::copysign(0, __d); >>> >> + __recalc = 1; >>> >> + } >>> >> + if (std::isinf(__c) || std::isinf(__d)) { >>> >> + __c = std::copysign(std::isinf(__c) ? 1 : 0, __c); >>> >> + __d = std::copysign(std::isinf(__d) ? 1 : 0, __d); >>> >> + if (std::isnan(__a)) >>> >> + __a = std::copysign(0, __a); >>> >> + if (std::isnan(__b)) >>> >> + __b = std::copysign(0, __b); >>> >> + __recalc = 1; >>> >> + } >>> >> + if (!__recalc && (std::isinf(__ac) || std::isinf(__bd) || >>> >> + std::isinf(__ad) || std::isinf(__bc))) { >>> >> + if (std::isnan(__a)) >>> >> + __a = std::copysign(0, __a); >>> >> + if (std::isnan(__b)) >>> >> + __b = std::copysign(0, __b); >>> >> + if (std::isnan(__c)) >>> >> + __c = std::copysign(0, __c); >>> >> + if (std::isnan(__d)) >>> >> + __d = std::copysign(0, __d); >>> >> + __recalc = 1; >>> >> + } >>> >> + if (__recalc) { >>> >> + __real__(z) = __builtin_huge_valf() * (__a * __c - __b * >>> >> __d); >>> >> + __imag__(z) = __builtin_huge_valf() * (__a * __d + __b * >>> >> __c); >>> >> + } >>> >> + } >>> >> + return z; >>> >> +} >>> >> + >>> >> +extern "C" inline __device__ double _Complex __divdc3(double __a, >>> >> double __b, >>> >> + double __c, >>> >> double __d) { >>> >> + int __ilogbw = 0; >>> >> + // Can't use std::max, because that's defined in <algorithm>, >>> >> and >>> >> we don't >>> >> + // want to pull that in for every compile. The CUDA headers >>> >> define >>> >> + // ::max(float, float) and ::max(double, double), which is >>> >> sufficient for us. >>> >> + double __logbw = std::logb(max(std::abs(__c), std::abs(__d))); >>> >> + if (std::isfinite(__logbw)) { >>> >> + __ilogbw = (int)__logbw; >>> >> + __c = std::scalbn(__c, -__ilogbw); >>> >> + __d = std::scalbn(__d, -__ilogbw); >>> >> + } >>> >> + double __denom = __c * __c + __d * __d; >>> >> + double _Complex z; >>> >> + __real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, >>> >> -__ilogbw); >>> >> + __imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, >>> >> -__ilogbw); >>> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { >>> >> + if ((__denom == 0.0) && (!std::isnan(__a) || >>> >> !std::isnan(__b))) >>> >> { >>> >> + __real__(z) = std::copysign(__builtin_huge_valf(), __c) * >>> >> __a; >>> >> + __imag__(z) = std::copysign(__builtin_huge_valf(), __c) * >>> >> __b; >>> >> + } else if ((std::isinf(__a) || std::isinf(__b)) && >>> >> std::isfinite(__c) && >>> >> + std::isfinite(__d)) { >>> >> + __a = std::copysign(std::isinf(__a) ? 1.0 : 0.0, __a); >>> >> + __b = std::copysign(std::isinf(__b) ? 1.0 : 0.0, __b); >>> >> + __real__(z) = __builtin_huge_valf() * (__a * __c + __b * >>> >> __d); >>> >> + __imag__(z) = __builtin_huge_valf() * (__b * __c - __a * >>> >> __d); >>> >> + } else if (std::isinf(__logbw) && __logbw > 0.0 && >>> >> std::isfinite(__a) && >>> >> + std::isfinite(__b)) { >>> >> + __c = std::copysign(std::isinf(__c) ? 1.0 : 0.0, __c); >>> >> + __d = std::copysign(std::isinf(__d) ? 1.0 : 0.0, __d); >>> >> + __real__(z) = 0.0 * (__a * __c + __b * __d); >>> >> + __imag__(z) = 0.0 * (__b * __c - __a * __d); >>> >> + } >>> >> + } >>> >> + return z; >>> >> +} >>> >> + >>> >> +extern "C" inline __device__ float _Complex __divsc3(float __a, >>> >> float __b, >>> >> + float __c, >>> >> float __d) { >>> >> + int __ilogbw = 0; >>> >> + float __logbw = std::logb(max(std::abs(__c), std::abs(__d))); >>> >> + if (std::isfinite(__logbw)) { >>> >> + __ilogbw = (int)__logbw; >>> >> + __c = std::scalbn(__c, -__ilogbw); >>> >> + __d = std::scalbn(__d, -__ilogbw); >>> >> + } >>> >> + float __denom = __c * __c + __d * __d; >>> >> + float _Complex z; >>> >> + __real__(z) = std::scalbn((__a * __c + __b * __d) / __denom, >>> >> -__ilogbw); >>> >> + __imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom, >>> >> -__ilogbw); >>> >> + if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) { >>> >> + if ((__denom == 0) && (!std::isnan(__a) || !std::isnan(__b))) >>> >> { >>> >> + __real__(z) = std::copysign(__builtin_huge_valf(), __c) * >>> >> __a; >>> >> + __imag__(z) = std::copysign(__builtin_huge_valf(), __c) * >>> >> __b; >>> >> + } else if ((std::isinf(__a) || std::isinf(__b)) && >>> >> std::isfinite(__c) && >>> >> + std::isfinite(__d)) { >>> >> + __a = std::copysign(std::isinf(__a) ? 1 : 0, __a); >>> >> + __b = std::copysign(std::isinf(__b) ? 1 : 0, __b); >>> >> + __real__(z) = __builtin_huge_valf() * (__a * __c + __b * >>> >> __d); >>> >> + __imag__(z) = __builtin_huge_valf() * (__b * __c - __a * >>> >> __d); >>> >> + } else if (std::isinf(__logbw) && __logbw > 0 && >>> >> std::isfinite(__a) && >>> >> + std::isfinite(__b)) { >>> >> + __c = std::copysign(std::isinf(__c) ? 1 : 0, __c); >>> >> + __d = std::copysign(std::isinf(__d) ? 1 : 0, __d); >>> >> + __real__(z) = 0 * (__a * __c + __b * __d); >>> >> + __imag__(z) = 0 * (__b * __c - __a * __d); >>> >> + } >>> >> + } >>> >> + return z; >>> >> +} >>> >> + >>> >> +#endif // __CLANG_CUDA_COMPLEX_BUILTINS >>> >> >>> >> Modified: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h >>> >> URL: >>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h?rev=283680&r1=283679&r2=283680&view=diff >>> >> ============================================================================== >>> >> --- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h >>> >> (original) >>> >> +++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h Sat Oct >>> >> 8 >>> >> 17:16:12 2016 >>> >> @@ -312,6 +312,7 @@ __device__ inline __cuda_builtin_gridDim >>> >> >>> >> #include <__clang_cuda_cmath.h> >>> >> #include <__clang_cuda_intrinsics.h> >>> >> +#include <__clang_cuda_complex_builtins.h> >>> >> >>> >> // curand_mtgp32_kernel helpfully redeclares blockDim and >>> >> threadIdx >>> >> in host >>> >> // mode, giving them their "proper" types of dim3 and uint3. >>> >> This >>> >> is >>> >> >>> >> Added: cfe/trunk/lib/Headers/cuda_wrappers/algorithm >>> >> URL: >>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_wrappers/algorithm?rev=283680&view=auto >>> >> ============================================================================== >>> >> --- cfe/trunk/lib/Headers/cuda_wrappers/algorithm (added) >>> >> +++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm Sat Oct 8 >>> >> 17:16:12 >>> >> 2016 >>> >> @@ -0,0 +1,96 @@ >>> >> +/*===---- complex - CUDA wrapper for <algorithm> >>> >> ----------------------------=== >>> >> + * >>> >> + * Permission is hereby granted, free of charge, to any person >>> >> obtaining a copy >>> >> + * of this software and associated documentation files (the >>> >> "Software"), to deal >>> >> + * in the Software without restriction, including without >>> >> limitation >>> >> the rights >>> >> + * to use, copy, modify, merge, publish, distribute, sublicense, >>> >> and/or sell >>> >> + * copies of the Software, and to permit persons to whom the >>> >> Software is >>> >> + * furnished to do so, subject to the following conditions: >>> >> + * >>> >> + * The above copyright notice and this permission notice shall be >>> >> included in >>> >> + * all copies or substantial portions of the Software. >>> >> + * >>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY >>> >> KIND, >>> >> EXPRESS OR >>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> >> MERCHANTABILITY, >>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>> >> EVENT >>> >> SHALL THE >>> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >>> >> OR >>> >> OTHER >>> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>> >> OTHERWISE, >>> >> ARISING FROM, >>> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> >> DEALINGS IN >>> >> + * THE SOFTWARE. >>> >> + * >>> >> + >>> >> *===-----------------------------------------------------------------------=== >>> >> + */ >>> >> + >>> >> +#ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM >>> >> +#define __CLANG_CUDA_WRAPPERS_ALGORITHM >>> >> + >>> >> +// This header defines __device__ overloads of std::min/max, but >>> >> only if we're >>> >> +// <= C++11. In C++14, these functions are constexpr, and so are >>> >> implicitly >>> >> +// __host__ __device__. >>> >> +// >>> >> +// We don't support the initializer_list overloads because >>> >> +// initializer_list::begin() and end() are not __host__ >>> >> __device__ >>> >> functions. >>> >> +// >>> >> +// When compiling in C++14 mode, we could force std::min/max to >>> >> have >>> >> different >>> >> +// implementations for host and device, by declaring the device >>> >> overloads >>> >> +// before the constexpr overloads appear. We choose not to do >>> >> this >>> >> because >>> >> + >>> >> +// a) why write our own implementation when we can use one from >>> >> the >>> >> standard >>> >> +// library? and >>> >> +// b) libstdc++ is evil and declares min/max inside a header >>> >> that >>> >> is included >>> >> +// *before* we include <algorithm>. So we'd have to >>> >> unconditionally >>> >> +// declare our __device__ overloads of min/max, but that >>> >> would >>> >> pollute >>> >> +// things for people who choose not to include <algorithm>. >>> >> + >>> >> +#include_next <algorithm> >>> >> + >>> >> +#if __cplusplus <= 201103L >>> >> + >>> >> +// We need to define these overloads in exactly the namespace our >>> >> standard >>> >> +// library uses (including the right inline namespace), otherwise >>> >> they won't be >>> >> +// picked up by other functions in the standard library (e.g. >>> >> functions in >>> >> +// <complex>). Thus the ugliness below. >>> >> +#ifdef _LIBCPP_BEGIN_NAMESPACE_STD >>> >> +_LIBCPP_BEGIN_NAMESPACE_STD >>> >> +#else >>> >> +namespace std { >>> >> +#ifdef _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> >> +_GLIBCXX_BEGIN_NAMESPACE_VERSION >>> >> +#endif >>> >> +#endif >>> >> + >>> >> +template <class __T, class __Cmp> >>> >> +inline __device__ const __T & >>> >> +max(const __T &__a, const __T &__b, __Cmp __cmp) { >>> >> + return __cmp(__a, __b) ? __b : __a; >>> >> +} >>> >> + >>> >> +template <class __T> >>> >> +inline __device__ const __T & >>> >> +max(const __T &__a, const __T &__b) { >>> >> + return __a < __b ? __b : __a; >>> >> +} >>> >> + >>> >> +template <class __T, class __Cmp> >>> >> +inline __device__ const __T & >>> >> +min(const __T &__a, const __T &__b, __Cmp __cmp) { >>> >> + return __cmp(__b, __a) ? __b : __a; >>> >> +} >>> >> + >>> >> +template <class __T> >>> >> +inline __device__ const __T & >>> >> +min(const __T &__a, const __T &__b) { >>> >> + return __a < __b ? __b : __a; >>> >> +} >>> >> + >>> >> +#ifdef _LIBCPP_END_NAMESPACE_STD >>> >> +_LIBCPP_END_NAMESPACE_STD >>> >> +#else >>> >> +#ifdef _GLIBCXX_BEGIN_NAMESPACE_VERSION >>> >> +_GLIBCXX_END_NAMESPACE_VERSION >>> >> +#endif >>> >> +} // namespace std >>> >> +#endif >>> >> + >>> >> +#endif // __cplusplus <= 201103L >>> >> +#endif // __CLANG_CUDA_WRAPPERS_ALGORITHM >>> >> >>> >> Added: cfe/trunk/lib/Headers/cuda_wrappers/complex >>> >> URL: >>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_wrappers/complex?rev=283680&view=auto >>> >> ============================================================================== >>> >> --- cfe/trunk/lib/Headers/cuda_wrappers/complex (added) >>> >> +++ cfe/trunk/lib/Headers/cuda_wrappers/complex Sat Oct 8 >>> >> 17:16:12 >>> >> 2016 >>> >> @@ -0,0 +1,79 @@ >>> >> +/*===---- complex - CUDA wrapper for <complex> >>> >> ------------------------------=== >>> >> + * >>> >> + * Permission is hereby granted, free of charge, to any person >>> >> obtaining a copy >>> >> + * of this software and associated documentation files (the >>> >> "Software"), to deal >>> >> + * in the Software without restriction, including without >>> >> limitation >>> >> the rights >>> >> + * to use, copy, modify, merge, publish, distribute, sublicense, >>> >> and/or sell >>> >> + * copies of the Software, and to permit persons to whom the >>> >> Software is >>> >> + * furnished to do so, subject to the following conditions: >>> >> + * >>> >> + * The above copyright notice and this permission notice shall be >>> >> included in >>> >> + * all copies or substantial portions of the Software. >>> >> + * >>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY >>> >> KIND, >>> >> EXPRESS OR >>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>> >> MERCHANTABILITY, >>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>> >> EVENT >>> >> SHALL THE >>> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >>> >> OR >>> >> OTHER >>> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>> >> OTHERWISE, >>> >> ARISING FROM, >>> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER >>> >> DEALINGS IN >>> >> + * THE SOFTWARE. >>> >> + * >>> >> + >>> >> *===-----------------------------------------------------------------------=== >>> >> + */ >>> >> + >>> >> +#pragma once >>> >> + >>> >> +// Wrapper around <complex> that forces its functions to be >>> >> __host__ >>> >> +// __device__. >>> >> + >>> >> +// First, include host-only headers we think are likely to be >>> >> included by >>> >> +// <complex>, so that the pragma below only applies to <complex> >>> >> itself. >>> >> +#if __cplusplus >= 201103L >>> >> +#include <type_traits> >>> >> +#endif >>> >> +#include <stdexcept> >>> >> +#include <cmath> >>> >> +#include <sstream> >>> >> + >>> >> +// Next, include our <algorithm> wrapper, to ensure that device >>> >> overloads of >>> >> +// std::min/max are available. >>> >> +#include <algorithm> >>> >> + >>> >> +#pragma clang force_cuda_host_device begin >>> >> + >>> >> +// When compiling for device, ask libstdc++ to use its own >>> >> implements of >>> >> +// complex functions, rather than calling builtins (which resolve >>> >> to >>> >> library >>> >> +// functions that don't exist when compiling CUDA device code). >>> >> +// >>> >> +// This is a little dicey, because it causes libstdc++ to define >>> >> a >>> >> different >>> >> +// set of overloads on host and device. >>> >> +// >>> >> +// // Present only when compiling for host. >>> >> +// __host__ __device__ void complex<float> sin(const >>> >> complex<float>& x) { >>> >> +// return __builtin_csinf(x); >>> >> +// } >>> >> +// >>> >> +// // Present when compiling for host and for device. >>> >> +// template <typename T> >>> >> +// void __host__ __device__ complex<T> sin(const complex<T>& x) >>> >> { >>> >> +// return complex<T>(sin(x.real()) * cosh(x.imag()), >>> >> +// cos(x.real()), sinh(x.imag())); >>> >> +// } >>> >> +// >>> >> +// This is safe because when compiling for device, all function >>> >> calls in >>> >> +// __host__ code to sin() will still resolve to *something*, even >>> >> if >>> >> they don't >>> >> +// resolve to the same function as they resolve to when compiling >>> >> for host. We >>> >> +// don't care that they don't resolve to the right function >>> >> because >>> >> we won't >>> >> +// codegen this host code when compiling for device. >>> >> + >>> >> +#pragma push_macro("_GLIBCXX_USE_C99_COMPLEX") >>> >> +#pragma push_macro("_GLIBCXX_USE_C99_COMPLEX_TR1") >>> >> +#define _GLIBCXX_USE_C99_COMPLEX 0 >>> >> +#define _GLIBCXX_USE_C99_COMPLEX_TR1 0 >>> >> + >>> >> +#include_next <complex> >>> >> + >>> >> +#pragma pop_macro("_GLIBCXX_USE_C99_COMPLEX_TR1") >>> >> +#pragma pop_macro("_GLIBCXX_USE_C99_COMPLEX") >>> >> + >>> >> +#pragma clang force_cuda_host_device end >>> >> >>> >> >>> >> _______________________________________________ >>> >> cfe-commits mailing list >>> >> cfe-commits@lists.llvm.org >>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits >>> >> >>> > >>> > -- >>> > Hal Finkel >>> > Lead, Compiler Technology and Programming Languages >>> > Leadership Computing Facility >>> > Argonne National Laboratory >>> >> >> -- >> Hal Finkel >> Lead, Compiler Technology and Programming Languages >> Leadership Computing Facility >> Argonne National Laboratory _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits