[clang] [Clang] __has_builtin should return false for aux triple builtins (PR #121839)

2025-01-27 Thread Artem Belevich via cfe-commits
Artem-B wrote: I think the conceptual problem here is that `__has_builtin()` conflates "builtin exists" with "builtin is usable on the target". For C++ they are the same. For CUDA they are not. Builting in aux-triple do exist (as in compiler does see them when it constructs AST), but we can't

[clang] [StrTable] Mechanically convert NVPTX builtins to use TableGen (PR #122873)

2025-01-27 Thread Artem Belevich via cfe-commits
@@ -104,9 +104,39 @@ class PrototypeParser { void ParseType(StringRef T) { T = T.trim(); + +auto ConsumeAddrSpace = [&]() -> std::optional { + T = T.trim(); + if (!T.consume_back(">")) +return std::nullopt; + + auto Open = T.find_last_of('<');

[clang] [StrTable] Mechanically convert NVPTX builtins to use TableGen (PR #122873)

2025-01-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM overall with a couple of nits. I like the direction of the change. Tablegen is probably a better way to handle NVPTX quirks, than a preprocessor. We may finally be able to replace the string-based constraints that are growing a bit to

[clang] [StrTable] Mechanically convert NVPTX builtins to use TableGen (PR #122873)

2025-01-27 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1078 @@ +//===--- BuiltinsNVPTX.td - NVPTX Builtin function defs -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: A

[clang] [StrTable] Mechanically convert NVPTX builtins to use TableGen (PR #122873)

2025-01-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/122873 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Revert "[Clang] __has_builtin should return false for aux triple builtins (#121839) (PR #124626)

2025-01-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/124626 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] Make target intrinsics work with ptx 8.7 (PR #124818)

2025-01-28 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B created https://github.com/llvm/llvm-project/pull/124818 Fixes build break with CUDA-12.8 introduced in #123398 >From ebb322550e72a4c7c58d990340701248d6f61c95 Mon Sep 17 00:00:00 2001 From: Artem Belevich Date: Tue, 28 Jan 2025 10:45:16 -0800 Subject: [PATCH] [CUDA]

[clang] [clang][X86] Support __attribute__((model("small"/"large"))) (PR #124834)

2025-01-28 Thread Artem Belevich via cfe-commits
@@ -5,7 +5,7 @@ // RUN: %clang_cc1 -triple riscv64 -verify=expected,riscv64 -fsyntax-only %s // RUN: %clang_cc1 -triple x86_64 -verify=expected,x86_64 -fsyntax-only %s Artem-B wrote: It would be great to check nvptx64 as well -- we want to make sure this does

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-28 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,19 @@ +//===--- AtomicOptions.def - Atomic Options database -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Ap

[clang] [StrTable] Mechanically convert NVPTX builtins to use TableGen (PR #122873)

2025-01-28 Thread Artem Belevich via cfe-commits
@@ -104,9 +104,39 @@ class PrototypeParser { void ParseType(StringRef T) { T = T.trim(); + +auto ConsumeAddrSpace = [&]() -> std::optional { + T = T.trim(); + if (!T.consume_back(">")) +return std::nullopt; + + auto Open = T.find_last_of('<');

[clang] [StrTable] Mechanically convert NVPTX builtins to use TableGen (PR #122873)

2025-01-28 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,1078 @@ +//===--- BuiltinsNVPTX.td - NVPTX Builtin function defs -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: A

[clang] [CUDA] Make target intrinsics work with ptx 8.7 (PR #124818)

2025-01-28 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/124818 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][X86] Support __attribute__((model("small"/"large"))) (PR #124834)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -1,64 +1,40 @@ -// RUN: %clang_cc1 -triple aarch64 -verify=expected,aarch64 -fsyntax-only %s +// RUN: %clang_cc1 -triple aarch64 -verify=expected,unsupported -fsyntax-only %s // RUN: %clang_cc1 -triple loongarch64 -verify=expected,loongarch64 -fsyntax-only %s -// RUN: %clang

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-29 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/114841 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-29 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: Mostly drive-by comments. Don't have strong opinion on either the attributes themselves nor how the values get parsed. One thing that's missing is the documentation for the attributes. Their format and meaning are far from obvious and should be documented

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -305,6 +305,13 @@ def err_drv_invalid_int_value : Error<"invalid integral value '%1' in '%0'">; def err_drv_invalid_value_with_suggestion : Error< "invalid value '%1' in '%0', expected one of: %2">; def err_drv_alignment_not_power_of_two : Error<"alignment is not a powe

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -240,3 +240,49 @@ LLVM_DUMP_METHOD void FPOptionsOverride::dump() { #include "clang/Basic/FPOptions.def" llvm::errs() << "\n"; } + +AtomicOptionsOverride +AtomicOptions::getChangesSlow(const AtomicOptions &Base) const { + AtomicOptions::storage_type OverrideMask = 0; +#de

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -2355,6 +2355,14 @@ def fsymbol_partition_EQ : Joined<["-"], "fsymbol-partition=">, Group, Visibility<[ClangOption, CC1Option]>, MarshallingInfoString>; +def fatomic_EQ : CommaJoined<["-"], "fatomic=">, Group, + Visibility<[ClangOption, CC1Option]>, + HelpText<"Speci

[clang] [clang][X86] Support __attribute__((model("small"/"large"))) (PR #124834)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -5,7 +5,7 @@ // RUN: %clang_cc1 -triple riscv64 -verify=expected,riscv64 -fsyntax-only %s // RUN: %clang_cc1 -triple x86_64 -verify=expected,x86_64 -fsyntax-only %s Artem-B wrote: You could do it here, too, as the test does not actually need extra CUDA SDK

[clang] [clang][X86] Support __attribute__((model("small"/"large"))) (PR #124834)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -1,64 +1,40 @@ -// RUN: %clang_cc1 -triple aarch64 -verify=expected,aarch64 -fsyntax-only %s +// RUN: %clang_cc1 -triple aarch64 -verify=expected,unsupported -fsyntax-only %s // RUN: %clang_cc1 -triple loongarch64 -verify=expected,loongarch64 -fsyntax-only %s -// RUN: %clang

[clang] Add clang atomic control options and attribute (PR #114841)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,19 @@ +//===--- AtomicOptions.def - Atomic Options database -*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Ap

[clang] [clang][X86] Support __attribute__((model("small"/"large"))) (PR #124834)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -1,64 +1,40 @@ -// RUN: %clang_cc1 -triple aarch64 -verify=expected,aarch64 -fsyntax-only %s +// RUN: %clang_cc1 -triple aarch64 -verify=expected,unsupported -fsyntax-only %s // RUN: %clang_cc1 -triple loongarch64 -verify=expected,loongarch64 -fsyntax-only %s -// RUN: %clang

[clang] [clang][X86] Support __attribute__((model("small"/"large"))) (PR #124834)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -5,7 +5,7 @@ // RUN: %clang_cc1 -triple riscv64 -verify=expected,riscv64 -fsyntax-only %s // RUN: %clang_cc1 -triple x86_64 -verify=expected,x86_64 -fsyntax-only %s Artem-B wrote: nvptx does not *use* the code models, but CUDA compilation will *see* those a

[clang] Reland "[HIP] Use original file path for CUID" (#108771) (PR #111885)

2025-01-29 Thread Artem Belevich via cfe-commits
@@ -1,13 +1,15 @@ // Check CUID generated by hash. // The same CUID is generated for the same file with the same options. +// RUN: cd %S + // RUN: %clang -### -x hip --target=x86_64-unknown-linux-gnu --no-offload-new-driver \ // RUN: --offload-arch=gfx906 -c -nogpuinc -no

[clang] [CUDA][HIP] improve error message for missing cmath (PR #122155)

2025-01-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/122155 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Fix overriding of constexpr virtual function (PR #121986)

2025-01-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/121986 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Fix overriding of constexpr virtual function (PR #121986)

2025-01-08 Thread Artem Belevich via cfe-commits
@@ -1309,6 +1309,13 @@ Sema::CheckOverload(Scope *S, FunctionDecl *New, const LookupResult &Old, return Ovl_Overload; } +template static bool hasExplicitAttr(const FunctionDecl *D) { + assert(D && "function delc should not be null"); Artem-B wrote: `delc

[clang] [CUDA][HIP] Fix overriding of constexpr virtual function (PR #121986)

2025-01-08 Thread Artem Belevich via cfe-commits
@@ -1595,8 +1606,21 @@ static bool IsOverloadOrOverrideImpl(Sema &SemaRef, FunctionDecl *New, // Allow overloading of functions with same signature and different CUDA // target attributes. -if (NewTarget != OldTarget) +if (NewTarget != OldTarg

[clang] [CUDA] Move CUDA to new driver by default (PR #122312)

2025-01-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: One thing that's missing is the release note. It should have an entry about the change and a pointer to the details and instructions on how to revert to the new build and, possibly a set of instructions for common use cases. E.g. GPU-linking a library (i.e. linking RDC-compiled

[clang] [CUDA] Move CUDA to new driver by default (PR #122312)

2025-01-09 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/122312 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang] Add __has_target_builtin macro (PR #126324)

2025-02-14 Thread Artem Belevich via cfe-commits
Artem-B wrote: > why there's a __has_builtin that's different from __can_use_builtin (or > whatever we name it), and I don't know that any of us have an answer for that my $.02 IMO it's a side effect of heterogeneous compilation, where compiler has to parse source code for multiple targets (an

[clang] [CUDA] Add support for sm101 and sm120 target architectures (PR #127187)

2025-02-14 Thread Artem Belevich via cfe-commits
@@ -21,9 +21,17 @@ class SM newer_list> : SMFeatures { !strconcat(f, "|", newer.Features)); } +let Features = "sm_120a" in def SM_120a : SMFeatures; + +def SM_120 : SM<"120", [SM_120a]>; + +let Features = "sm_101a" in def SM_101a : SMFeatures; + +def S

[clang] [CUDA] Add support for sm101 and sm120 target architectures (PR #127187)

2025-02-14 Thread Artem Belevich via cfe-commits
@@ -300,6 +306,10 @@ void NVPTXTargetInfo::getTargetDefines(const LangOptions &Opts, Builder.defineMacro("__CUDA_ARCH_FEAT_SM90_ALL", "1"); if (GPU == OffloadArch::SM_100a) Builder.defineMacro("__CUDA_ARCH_FEAT_SM100_ALL", "1"); +if (GPU == OffloadArch::SM_

[clang] [CUDA] Add support for sm101 target architecture (Tegra Blackwell) (PR #127187)

2025-02-14 Thread Artem Belevich via cfe-commits
@@ -21,6 +21,10 @@ class SM newer_list> : SMFeatures { !strconcat(f, "|", newer.Features)); } +let Features = "sm_101a" in def SM_101a : SMFeatures; + +def SM_101 : SM<"101", [SM_101a]>; Artem-B wrote: This needs some changes. First,

[clang] [Clang] Add __has_target_builtin macro (PR #126324)

2025-02-11 Thread Artem Belevich via cfe-commits
@@ -96,6 +101,37 @@ the header file to conditionally make a function constexpr whenever the constant evaluation of the corresponding builtin (for example, ``std::fmax`` calls ``__builtin_fmax``) is supported in Clang. +``__has_target_builtin`` + +

[clang] [llvm] [NVPTX] Auto-Upgrade llvm.nvvm.atomic.load.{inc,dec}.32 (PR #134111)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134111 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [PGO][Offload] Disable PGO on NVPTX (PR #133522)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/133522 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. Nice. Now we're missing the two last steps; - that ptxas accepts the inline asm instructions we generate - that those instructions actually do what they are intended to do. Can you manually verify that the test file actually compiles to a G

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-05 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/132883 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-25 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/132881 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-25 Thread Artem Belevich via cfe-commits
@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC { def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask), "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;", - [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-26 Thread Artem Belevich via cfe-commits
Artem-B wrote: @AustinSchuh would you like me to merge the change for you, once the checks are done? https://github.com/llvm/llvm-project/pull/132881 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listin

[clang] [Clang][ARM] Only try to redefine builtins for non-CUDA (PR #128222)

2025-03-26 Thread Artem Belevich via cfe-commits
@@ -27,6 +27,8 @@ extern "C" { #endif +#if !defined(__CUDA_ARCH__) + Artem-B wrote: @rnk Reid, would you happen to have an idea what's up with these builtins on Windows? https://github.com/llvm/llvm-project/pull/128222 __

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-26 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > @AustinSchuh would you like me to merge the change for you, once the checks > > are done? > > That would be wonderful. I don't know how to merge it (happy to learn, but I > suspect I won't do it more than a couple of times) https://llvm.org/docs/DeveloperPolicy.html#obtaini

[clang] [llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

2025-03-26 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B closed https://github.com/llvm/llvm-project/pull/132881 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-03-26 Thread Artem Belevich via cfe-commits
Artem-B wrote: > > LGTM in principle, but it could use some tests. The change is surprisingly > > nicely compact. Thank you for filling in one of the long-standing gaps in > > clang's cuda support story. > > I might need some hints on where to start. How would you go about testing > this, or

[clang] [Clang] Add 'Joseph Huber' as offloading driver maintainer (PR #133296)

2025-03-27 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/133296 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Add support for CUDA surfaces (PR #132883)

2025-04-09 Thread Artem Belevich via cfe-commits
Artem-B wrote: @AustinSchuh One thing I've missed during review is that the test clang/test/CodeGen/nvptx-surface.cu should probably go into clang/test/CodeGenCUDA This would also obviate the need for #134459. Can you send the patch to move the test to the right location? https://github.com

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B requested changes to this pull request. Hold on a sec. https://github.com/llvm/llvm-project/pull/134758 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
@@ -2,6 +2,170 @@ // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s #include "Inputs/cuda.h" +struct char1 { Artem-B wrote: These type declarations should go into Inputs/cuda.h https://github.com/llvm/

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
@@ -2,6 +2,170 @@ // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s #include "Inputs/cuda.h" +struct char1 { Artem-B wrote: Those are actually *useful* failures and expose real issues in those tests. -

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. Thank you! https://github.com/llvm/llvm-project/pull/134758 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Clang][AMDGPU] Accept builtins in lambda declarations (PR #135027)

2025-04-09 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,24 @@ +// REQUIRES: amdgpu-registered-target Artem-B wrote: I've just checked using experimental target `csky-unknown-elf ` that's not enabled by default, and clang indeed errors out if we attempt to generate code, but works OK with `-fsyntax-only`.

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM for the code. Tests could use a bit more polishing. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mail

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -39,186 +39,201 @@ #include "llvm/Transforms/Scalar.h" #include "llvm/Transforms/Utils/BasicBlockUtils.h" #include "llvm/Transforms/Utils/Local.h" -#include #define NVVM_REFLECT_FUNCTION "__nvvm_reflect" #define NVVM_REFLECT_OCL_FUNCTION "__nvvm_reflect_ocl" +// Argument

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,26 @@ +; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with Artem-B wrote: The test is functionally fine, but it also makes me stop and think "what exactly are we doing here and why?". Two points: -

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,26 @@ +; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with +; the appropriate command line values. + +declare i32 @__nvvm_reflect(ptr) +@ftz = private unnamed_addr addrspace(1) constant [11 x i8] c"__CUDA_FTZ\00" +@ar

[clang] [compiler-rt] [llvm] [PGO][Offload] Allow PGO flags to be used on GPU targets (PR #94268)

2025-03-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: @jhuber6 @jdoerfert I propose reverting the change, unless it can be quickly fixed forward so it does not affect CUDA/NVPTX. https://github.com/llvm/llvm-project/pull/94268 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https:/

[clang] [compiler-rt] [llvm] [PGO][Offload] Allow PGO flags to be used on GPU targets (PR #94268)

2025-03-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: The crash is blocking our compiler updates. If nothing depends on this change yet, it would be great to revert the patch and re-land it once it's fixed. https://github.com/llvm/llvm-project/pull/94268 ___ cfe-commits mailing list cfe-co

[clang] [llvm] [PGO][Offload] Disable PGO on NVPTX (PR #133522)

2025-03-28 Thread Artem Belevich via cfe-commits
@@ -6397,7 +6397,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA, Args.AddLastArg(CmdArgs, options::OPT_fconvergent_functions, options::OPT_fno_convergent_functions); - addPGOAndCoverageFlags(TC, C, JA, Output, Args, SanitizeArgs, CmdArg

[clang] [llvm] [PGO][Offload] Disable PGO on NVPTX (PR #133522)

2025-03-28 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM with a comment nit. https://github.com/llvm/llvm-project/pull/133522 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [compiler-rt] [llvm] [PGO][Offload] Allow PGO flags to be used on GPU targets (PR #94268)

2025-03-28 Thread Artem Belevich via cfe-commits
Artem-B wrote: This is breaking CUDA/NVPTX. Enabling PGO results in compiler generating PGO-related data which references itself, and NVPTX can't compile those. E.g. we see data like this which includes a reference to itself: ``` @__profd__ZN12cuda_helpers13memcmp_kernelEPjS0_mPb = protected g

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
@@ -2,6 +2,170 @@ // RUN: %clang_cc1 -triple nvptx64-unknown-unknown -fcuda-is-device -O3 -o - %s -emit-llvm | FileCheck %s #include "Inputs/cuda.h" +struct char1 { Artem-B wrote: See above. propagate-attributes.cu just needs to apply `extern "C"` to the fu

[clang] cuda clang: Move nvptx-surface.cu test to CodeGenCUDA (PR #134758)

2025-04-08 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134758 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM in general, with an intrinsic naming nit. https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listin

[clang] [llvm] [clang][NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-11 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/134345 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-11 Thread Artem Belevich via cfe-commits
@@ -703,6 +703,41 @@ let hasSideEffects = false in { defm CVT_to_tf32_rz_satf : CVT_TO_TF32<"rz.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rn_relu_satf : CVT_TO_TF32<"rn.relu.satfinite", [hasPTX<86>, hasSM<100>]>; defm CVT_to_tf32_rz_relu_satf : CVT_TO_TF

[clang] [llvm] [mlir] [NVPTX] Add support for Distributed Shared Memory address space. (PR #135444)

2025-04-11 Thread Artem Belevich via cfe-commits
Artem-B wrote: I wish PTX would be a bit more consistent about naming things. Documentation calls it distributed shared memory (and it is distributed, and is shared), but the PTX instructions, compiler builtins and intrinsics use shared::cluster (as opposed to regular shared AKA shared::cta).

[clang] Move CodeGen cuda.h to Inputs from include (PR #134706)

2025-04-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: Fixes test break introduced by #134459 https://github.com/llvm/llvm-project/pull/134706 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] Move CodeGen cuda.h to Inputs from include (PR #134706)

2025-04-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/134706 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

2025-04-07 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B unassigned https://github.com/llvm/llvm-project/pull/134244 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [Clang][NVVM] Support `-f[no-]cuda-prec-sqrt` and propagate precision flag to `NVVMReflect` (PR #134244)

2025-04-07 Thread Artem Belevich via cfe-commits
Artem-B wrote: @AlexMaclean who authored #89417 and possibly other NVIDIA folks may have some thoughts on this. In general, making it per-function attribute makes sense on LLVM level. We will also need to reconcile it with the https://github.com/llvm/llvm-project/blob/10bef367a5643bc41d0172b0

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-17 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,35 @@ +// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail() +// to trigger compilation errors when the __glibcxx_assert(cond) macro +// is used in a constexpr context. +// Compilation fails when using code from the libstdc++ (such as std::array) on

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-17 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/136133 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [mlir] [NVPTX] Add support for Shared Cluster Memory address space. (PR #135444)

2025-04-16 Thread Artem Belevich via cfe-commits
@@ -982,8 +982,9 @@ void NVPTXDAGToDAGISel::SelectAddrSpaceCast(SDNode *N) { case ADDRESS_SPACE_SHARED: Opc = TM.is64Bit() ? NVPTX::cvta_shared_64 : NVPTX::cvta_shared; break; -case ADDRESS_SPACE_DSHARED: - Opc = TM.is64Bit() ? NVPTX::cvta_dshared_64 :

[clang] [clang][ARM][AArch64] Define intrinsics guarded by __has_builtin on all platforms (PR #128222)

2025-04-17 Thread Artem Belevich via cfe-commits
@@ -36,6 +36,28 @@ typedef __SIZE_TYPE__ size_t; #include +#ifdef __ARM_ACLE +// arm_acle.h needs some stdint types, but -ffreestanding prevents us from Artem-B wrote: Shouldn't that be fixed in arm_acle.h itself so it includes the headers with the types i

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-18 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/136133 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM w/ a nit. https://github.com/llvm/llvm-project/pull/136645 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
@@ -1100,3 +1101,49 @@ std::string SemaCUDA::getConfigureFuncName() const { // Legacy CUDA kernel configuration call return "cudaConfigureCall"; } + +// Record any local constexpr variables that are passed one way on the host +// and another on the device. +void SemaCUDA::r

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
@@ -1100,3 +1101,49 @@ std::string SemaCUDA::getConfigureFuncName() const { // Legacy CUDA kernel configuration call return "cudaConfigureCall"; } + +// Record any local constexpr variables that are passed one way on the host +// and another on the device. +void SemaCUDA::r

[clang] [CUDA][HIP] capture possible ODR-used var (PR #136645)

2025-04-22 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/136645 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-30 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,35 @@ +// libstdc++ uses the non-constexpr function std::__glibcxx_assert_fail() +// to trigger compilation errors when the __glibcxx_assert(cond) macro +// is used in a constexpr context. +// Compilation fails when using code from the libstdc++ (such as std::array) on

[clang] [CUDA][HIP] Add a __device__ version of std::__glibcxx_assert_fail() (PR #136133)

2025-04-30 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: LGTM in principle. Now the question is -- how do we test it? There are multiple libstdc++ library versions in the wild and we must not break any of them. We do have some testing on CUDA test bots (which I've just discovered to be silently broken for a whil

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B commented: Almost there. Few more test nits. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1,26 +1,53 @@ -; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with -; the appropriate command line values. +; Test the NVVM reflect pass functionality: verifying that reflect calls are replaced with +; appropriate values b

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1,26 +1,53 @@ -; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with -; the appropriate command line values. +; Test the NVVM reflect pass functionality: verifying that reflect calls are replaced with +; appropriate values b

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1,26 +1,53 @@ -; Verify that when passing in command-line options to NVVMReflect, that reflect calls are replaced with -; the appropriate command line values. +; Test the NVVM reflect pass functionality: verifying that reflect calls are replaced with +; appropriate values b

[clang] [llvm] [NVPTX] Add builtins and intrinsics for conversions of new FP types (PR #134345)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -596,6 +605,28 @@ def __nvvm_e4m3x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(sh def __nvvm_e5m2x2_to_f16x2_rn : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>(short)", SM_89, PTX81>; def __nvvm_e5m2x2_to_f16x2_rn_relu : NVPTXBuiltinSMAndPTX<"_Vector<2, __fp16>

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-16 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B edited https://github.com/llvm/llvm-project/pull/135644 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-16 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. https://github.com/llvm/llvm-project/pull/135644 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [clang][Sema] Don't warn for implicit uses of builtins in system headers (PR #138205)

2025-05-02 Thread Artem Belevich via cfe-commits
Artem-B wrote: Something does not add up here. AFAICT, using builtins w/o explicitly declaring them is something that's done all the time. https://godbolt.org/z/ha47W53dh In that sense, we should not be needing to filter out the diagnostics coming from the system headers only. There should not

[clang] [clang][Sema] Don't warn for implicit uses of builtins in system headers (PR #138205)

2025-05-02 Thread Artem Belevich via cfe-commits
Artem-B wrote: OK. This makes sense. > sorry this change is so drawn out :) What matters is that you're making progress, and I appreciate your work on getting this issue sorted out the right way. https://github.com/llvm/llvm-project/pull/138205 _

[clang] [clang][Sema] Don't warn for implicit uses of builtins in system headers (PR #138205)

2025-05-02 Thread Artem Belevich via cfe-commits
@@ -2376,9 +2376,14 @@ NamedDecl *Sema::LazilyCreateBuiltin(IdentifierInfo *II, unsigned ID, return nullptr; } + // Warn for implicit uses of header dependent libraries, + // except in system headers. if (!ForRedeclaration && (Context.BuiltinInfo.isPredefine

[clang] [CUDA][HIP] Fix implicit attribute of builtin (PR #138162)

2025-05-01 Thread Artem Belevich via cfe-commits
@@ -0,0 +1,23 @@ +// expected-no-diagnostics + +// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -aux-triple amdgcn-amd-amdhsa -fsyntax-only -verify -xhip %s +// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fsyntax-only -fcuda-is-device -verify -xhip %s + +#include "Inputs/cuda

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Artem Belevich via cfe-commits
@@ -411,6 +412,13 @@ static Instruction *convertNvvmIntrinsicToLlvm(InstCombiner &IC, } return nullptr; } + case SPC_Fabs: { +if (!II->getType()->isDoubleTy()) + return nullptr; +auto *Fabs = Intrinsic::getOrInsertDeclaration( +II->getModule(),

[clang] [llvm] [NVPTX] Cleanup and document nvvm.fabs intrinsics, adding f16 support (PR #135644)

2025-04-15 Thread Artem Belevich via cfe-commits
@@ -1034,6 +1034,10 @@ Value *CodeGenFunction::EmitNVPTXBuiltinExpr(unsigned BuiltinID, case NVPTX::BI__nvvm_fmin_xorsign_abs_f16x2: return MakeHalfType(Intrinsic::nvvm_fmin_xorsign_abs_f16x2, BuiltinID, E, *this); + case NVPTX::BI__nvvm_abs_bf16

[clang] [Clang] add option --offload-jobs=N (PR #135229)

2025-04-10 Thread Artem Belevich via cfe-commits
@@ -1233,6 +1233,10 @@ def offload_compression_level_EQ : Joined<["--"], "offload-compression-level=">, Flags<[HelpHidden]>, HelpText<"Compression level for offload device binaries (HIP only)">; +def offload_jobs_EQ : Joined<["--"], "offload-jobs=">, + HelpText<"Set the

[clang] [llvm] [NVPTX] Improve NVVMReflect Efficiency (PR #134416)

2025-04-10 Thread Artem Belevich via cfe-commits
https://github.com/Artem-B approved this pull request. LGTM. https://github.com/llvm/llvm-project/pull/134416 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

<    7   8   9   10   11   12   13   >