[PATCH] D130224: [Clang][Attribute] Introduce maybe_undef attribute for function arguments which accepts undef values

2022-07-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D130224#3668243 , @aaron.ballman wrote: > I'm still not seeing the issue fully. My understanding of the situation > (which may be wrong) is that Clang lowers to LLVM IR and adds `noundef` > markings at call sites that this pa

[PATCH] D130224: [Clang][Attribute] Introduce maybe_undef attribute for function arguments which accepts undef values

2022-07-25 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D130224#3677224 , @aaron.ballman wrote: > However, what I think I'm hearing from this thread is that there are > alternative approaches that have been thought about but not tried, we're not > certain how feasible those approa

[PATCH] D123878: [AMDGPU] Add remarks to output some resource usage

2022-05-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D123878#3506500 , @afanfa wrote: > If possible, I would like to keep some kind of delimiter. I like the idea of > having it at the beginning and at the end of the section. The best option > would be to convince clang to print

[PATCH] D125719: [Attribute] Add attribute NeverOptimizeNone

2022-05-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/CodeGenModule.cpp:1934 !F->hasFnAttribute(llvm::Attribute::AlwaysInline)) { B.addAttribute(llvm::Attribute::OptimizeNone); I still think adding optnone to everything at -O0 is a bad idea in

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-08-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D62739/new/ https://reviews.llvm.org/D62739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-08-26 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 217301. arsenm added a comment. Make lower bound 1, although this is a behavior change CHANGES SINCE LAST ACTION https://reviews.llvm.org/D62739/new/ https://reviews.llvm.org/D62739 Files: lib/CodeGen/TargetInfo.cpp test/CodeGenOpenCL/amdgpu-attrs.cl

[PATCH] D62739: AMDGPU: Always emit amdgpu-flat-work-group-size

2019-08-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. r370101 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D62739/new/ https://reviews.llvm.org/D62739 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mai

[PATCH] D67048: [AMDGPU] Set default flat work group size to (1, 256) for HIP

2019-09-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D67048/new/ https://reviews.llvm.org/D67048 ___ cfe-commits mailing list cfe-commits@l

[PATCH] D66198: AMDGPU: Add builtins for is_local/is_private

2019-09-04 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. r371010 CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66198/new/ https://reviews.llvm.org/D66198 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mai

[PATCH] D36118: Fix creating bitcasts with wrong address space

2017-07-31 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision. Herald added subscribers: Anastasia, tpr, wdng. In a future commit AMDGPU will start passing aggregates directly to more functions, triggering asserts in test/CodeGenOpenCL/addr-space-struct-arg.cl https://reviews.llvm.org/D36118 Files: lib/CodeGen/CGCall.cpp I

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision. Herald added subscribers: t-tye, tpr, dstuttard, nhaehnle, wdng, kzhuravl. This is an improvement over always using byval for structs. This will use registers until ~16 are used, and then switch back to byval. This needs more work, since I'm not sure it ever

[PATCH] D36118: Fix creating bitcasts with wrong address space

2017-08-01 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. r309741 https://reviews.llvm.org/D36118 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-07 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. ping https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7386 + bool isHomogeneousAggregateBaseType(QualType Ty) const override; + bool isHomogeneousAggregateSmallEnough(const Type *Base, + uint64_t Members) const override; --

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 110272. arsenm added a comment. Fix assert when estimating array registers https://reviews.llvm.org/D36171 Files: lib/CodeGen/TargetInfo.cpp test/CodeGenOpenCL/addr-space-struct-arg.cl test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl test/CodeGenOpenCL

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7386 + bool isHomogeneousAggregateBaseType(QualType Ty) const override; + bool isHomogeneousAggregateSmallEnough(const Type *Base, + uint64_t Members) const override; --

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-08 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: lib/CodeGen/TargetInfo.cpp:7571 + + // XXX: Should this be i64 instead, and should the limit increase? + llvm::Type *I32Ty = llvm::Type::getInt32Ty(getVMContext()); b-sumner wrote: > arsenm wrote: > > b-sumner w

[PATCH] D36171: AMDGPU: Use direct struct returns

2017-08-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. r310527 https://reviews.llvm.org/D36171 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36771: AMDGPU: add missing amdgcn processors and tests

2017-08-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. We should also be defining __devicename__ macros. I opened a bug for this last week https://reviews.llvm.org/D36771 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-c

[PATCH] D71726: Let clang atomic builtins fetch add/sub support floating point types

2020-01-02 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/CGAtomic.cpp:597-598 case AtomicExpr::AO__atomic_add_fetch: -PostOp = llvm::Instruction::Add; +PostOp = E->getValueType()->isFloatingType() ? llvm::Instruction::FAdd +

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-02 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69878/new/ https://reviews.llvm.org/D69878 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done. arsenm added a comment. In D69878#1801508 , @cameron.mcinally wrote: > This is looking pretty good to me, but I'm ignoring some of the target > specific code that I'm not familiar with. > > Is `denormal-fp-math` i

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D69878#1805804 , @arsenm wrote: > In D69878#1801508 , > @cameron.mcinally wrote: > > > This is looking pretty good to me, but I'm ignoring some of the target > > specific code that I'm n

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenHIP/printf.cpp:18 +} + +// CHECK: [[BEGIN:%.*]] = call i64 @__ockl_printf_begin(i64 0) This could use a lot more testcases. Can you add some half, float, and double as well as pointers (including diff

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done. arsenm added inline comments. Comment at: llvm/docs/LangRef.rst:1837 + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type. + andrew.w.kaylor wrot

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done. arsenm added inline comments. Comment at: llvm/docs/LangRef.rst:1837 + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type. + andrew.w.kaylor wrot

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-10 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm marked an inline comment as done. arsenm added inline comments. Comment at: llvm/docs/LangRef.rst:1837 + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type. + andrew.w.kaylor wrot

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 237665. arsenm added a comment. Mention support in langref CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69878/new/ https://reviews.llvm.org/D69878 Files: clang/include/clang/Basic/CodeGenOptions.def clang/include/clang/Basic/CodeGenOptions.h

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenHIP/printf.cpp:18 +} + +// CHECK: [[BEGIN:%.*]] = call i64 @__ockl_printf_begin(i64 0) sameerds wrote: > sameerds wrote: > > arsenm wrote: > > > This could use a lot more testcases. Can you add some ha

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Mostly looks fine, except vectors are supposed to work Comment at: llvm/lib/Transforms/Utils/AMDGPUEmitPrintf.cpp:52 + } else if (Ty->getTypeID() == Type::DoubleTyID) { +return Builder.CreateBitCast(Arg, Int64Ty); + } else if (auto PtrTy = dyn_cast

[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. LGTM Comment at: llvm/lib/Target/AMDGPU/AMDGPU.td:487 +def FeatureMfma1Insts : SubtargetFeature<"mfma1-insts", + "HasMfma1Insts", We already have the HasMAIInsts feature Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACT

[PATCH] D71365: expand printf when compiling HIP to AMDGPU

2020-01-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D71365/new/ https://reviews.llvm.org/D71365 ___

[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision. arsenm added a comment. This revision now requires changes to proceed. Having two subtarget features for the same feature is an issue Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72723/new/ https://review

[PATCH] D72723: Built-in functions for AMDGPU MFMA instructions.

2020-01-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72723/new/ https://reviews.llvm.org/D72723 ___ cfe-commits mailing list cfe-commits@l

[PATCH] D69878: Consoldiate internal denormal flushing controls

2020-01-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Herald added a subscriber: kerbowa. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69878/new/ https://reviews.llvm.org/D69878 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin

[PATCH] D85607: CfgTraits: add CfgInstructionRef

2020-09-15 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. Herald added a subscriber: tatianashp. LGTM Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D85607/new/ https://reviews.llvm.org/D85607 __

[PATCH] D78979: OpenCL: Include builtin header by default

2020-09-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 292275. arsenm added a comment. Herald added subscribers: wenlei, dang. Rebase CHANGES SINCE LAST ACTION https://reviews.llvm.org/D78979/new/ https://reviews.llvm.org/D78979 Files: clang/include/clang/Driver/Options.td clang/lib/Driver/ToolChains/Clan

[PATCH] D86020: [MemCpyOptimizer] Optimize passing byref function arguments down the stack

2020-09-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D86020#103 , @atrosinenko wrote: > @rjmccall > > Maybe I overestimated similarity of `byval` and recently introduced > `byref`... Looks like some aliasing restrictions are not mentioned in LLVM > Language Reference

[PATCH] D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling

2020-09-21 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. I assume this makes 1f4e7463b5e3ff654c84371527767830e51db10d redundant? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D87972/new/ https://reviews.llvm.org

[PATCH] D88370: Emit predefined macro for wavefront size for amdgcn

2020-09-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/Driver/amdgpu-macros.cl:351 + +// GFX600-DAG: #define __amdgcn_wavefront_size 64 +// GFX601-DAG: #define __amdgcn_wavefront_size 64 Macros should be all caps CHANGES SINCE LAST ACTION https://reviews.llvm.o

[PATCH] D88370: Emit predefined macro for wavefront size for amdgcn

2020-09-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/Basic/Targets/AMDGPU.h:419 + +if (llvm::find(Features, "+wavefrontsize64") != Features.end()) + WavefrontSize = 64; Why is this not redundant with the features check? CHANGES SINCE LAST ACTION https

[PATCH] D86229: [X86] Enable constexpr on POPCNT intrinsics (PR31446)

2020-08-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGen/popcnt-builtins.cpp:3 +// RUN: %clang_cc1 -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s + + Missing required register target? Repository: rG LLVM Github Mo

[PATCH] D86229: [X86] Enable constexpr on POPCNT intrinsics (PR31446)

2020-08-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGen/popcnt-builtins.cpp:3 +// RUN: %clang_cc1 -std=c++11 -ffreestanding %s -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s + + RKSimon wrote: > craig.topper wrote: > > RKSimon wrote: > > > arsen

[PATCH] D86154: AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloads

2020-08-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D86154#2229292 , @nhaehnle wrote: > In D86154#2224272 , @arsenm wrote: > >> In D86154#2224270 , @nhaehnle wrote: >> >>> Note that part of my motiva

[PATCH] D91840: OpaquePtr: Require byval on x86_intrcc parameter 0

2020-12-14 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm closed this revision. arsenm added a comment. ef4da3c2ba8a812a695361d786e3de8a8b2cd482 +2e0e03c6a089da39039ec3f464f7cee5df86646b CHANGES SINCE LAST

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. This should use byref, but I don't think this should come at the cost of the promotion. I would still like to see this promotion occur for the in-memory byref type Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D89980/new/ h

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D89980#2371952 , @hliao wrote: > In D89980#2371850 , @arsenm wrote: > >> This should use byref, but I don't think this should come at the cost of the >> promotion. I would still like to s

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D89980#2372102 , @hliao wrote: > In D89980#2371966 , @arsenm wrote: > >> In D89980#2371952 , @hliao wrote: >> >>> In D89980#2371850

[PATCH] D89980: [hip] Remove the coercion on aggregate kernel arguments.

2020-11-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu:55 struct S { int *x; Should also have a case with a single element struct, which will be treated like a scalar Repository: rG LLVM Github Monorepo CHAN

[PATCH] D91121: [InferAddrSpace] Teach to handle assumed address space.

2020-11-13 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/include/llvm/CodeGen/BasicTTIImpl.h:227 + Optional getAssumedAddrSpace(const Value *V) const { +return getTLI()->getTargetMachine().getAssumedAddrSpace(V); hliao wrote: > hliao wrote: > > arsenm wrote: > > > W

[PATCH] D91519: [AST][Mach0] Fix unused-variable warnings

2020-11-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: lld/MachO/SymbolTable.cpp:137 // error message. -if (auto *defined = dyn_cast(s)) +if (dyn_cast(s)) error("found defined symbol with illegal name " + DSOHandle::name); Should use isa Repository: rG

[PATCH] D91121: [InferAddrSpace] Teach to handle assumed address space.

2020-11-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM with nits Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:533 + if (!LD) +return -1; + hliao wrote: > tra wrote: > > Is there a suitable

[PATCH] D91121: [InferAddrSpace] Teach to handle assumed address space.

2020-11-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Transforms/Scalar/InferAddressSpaces.cpp:483 +// Skip values with an assumed address space. +if (TTI->getAssumedAddrSpace(TopVal) == UninitializedAddressSpace) { + for (Value *PtrOperand : getPointerOperands(*TopVal,

[PATCH] D91428: Add support for multiple program address spaces

2020-11-17 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. While the IR definitely should support mixing functions with different address spaces, I don't see the point of encoding multiple of these in the datalayout Comment at: llvm/include/llvm/IR/DataLayout.h:125-129 + /// Vector of address spaces that can c

[PATCH] D91840: OpaquePtr: Require byval on x86_intrcc parameter 0

2020-11-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision. arsenm added reviewers: rnk, aaboud, craig.topper. Herald added subscribers: dexonsmith, pengfei, hiraditya. Herald added a project: LLVM. arsenm requested review of this revision. Herald added a subscriber: wdng. Currently the backend special cases x86_intrcc and tre

[PATCH] D91840: OpaquePtr: Require byval on x86_intrcc parameter 0

2020-11-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/test/Bitcode/compatibility-6.0.ll:439 ; CHECK: declare hhvm_ccc void @f.hhvm_ccc() -declare cc83 void @f.cc83() +declare cc83 void @f.cc83(i8* byval(i8)) ; CHECK: declare x86_intrcc void @f.cc83() craig.topper wrot

[PATCH] D88370: Emit predefined macro for wavefront size for amdgcn

2020-09-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/Driver/amdgpu-macros.cl:380-381 +// RUN: -mno-wavefrontsize64 %s 2>&1 | FileCheck --check-prefixes=WAVE64 %s +// RUN: %clang -E -dM -target amdgcn -mcpu=gfx1010 -mwavefrontsize64 \ +// RUN: -mno-wavefrontsize64 %s 2>&1 | Fi

[PATCH] D88370: Emit predefined macro for wavefront size for amdgcn

2020-09-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:394-395 + // Get the last argument of -mwavefrontsize64 or -mno-wavefrontsize64. + for (auto WaveArg : Args.filtered_reverse(options::OPT_mwavefrontsize64, +

[PATCH] D88370: Emit predefined macro for wavefront size for amdgcn

2020-09-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:394-395 + // Get the last argument of -mwavefrontsize64 or -mno-wavefrontsize64. + for (auto WaveArg : Args.filtered_reverse(options::OPT_mwavefrontsize64, +

[PATCH] D17993: [CodeGen] Apply 'nonnull' to 'this' pointer arguments.

2020-10-12 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp:1082 +bool isGFX9Plus(const MCSubtargetInfo &STI) { + return isGFX9(STI) || isGFX10(STI); +} How are these changes related? CHANGES SINCE LAST ACTION https://reviews.ll

[PATCH] D91840: OpaquePtr: Require byval on x86_intrcc parameter 0

2020-11-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm updated this revision to Diff 308354. arsenm added a comment. Fix bitcode upgrade after rebase CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91840/new/ https://reviews.llvm.org/D91840 Files: clang/lib/CodeGen/TargetInfo.cpp clang/test/CodeGen/attr-x86-interrupt.c clang/tes

[PATCH] D92628: [HIP] Fix bug in driver about wavefront size

2020-12-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm accepted this revision. arsenm added a comment. This revision is now accepted and ready to land. LGTM although I thought there were existing test files for this CHANGES SINCE LAST ACTION https://reviews.llvm.org/D92628/new/ https://reviews.llvm.org/D92628 _

[PATCH] D89582: clang/AMDGPU: Apply workgroup related attributes to all functions

2020-10-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm created this revision. arsenm added reviewers: yaxunl, rampitec. Herald added subscribers: kerbowa, t-tye, tpr, dstuttard, nhaehnle, jvesely, kzhuravl. arsenm requested review of this revision. Herald added a subscriber: wdng. When the default flat work group size is 256, it should also ap

[PATCH] D89582: clang/AMDGPU: Apply workgroup related attributes to all functions

2020-10-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D89582#2335574 , @yaxunl wrote: > What if a device function is called by kernels with different work group > sizes, will caller's work group size override callee's work group size? It's user error to call a function with a larg

[PATCH] D89582: clang/AMDGPU: Apply workgroup related attributes to all functions

2020-10-16 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D89582#2335671 , @rampitec wrote: > In D89582#2335619 , @arsenm wrote: > >> In D89582#2335574 , @yaxunl wrote: >> >>> What if a device function is

[PATCH] D83088: Introduce CfgTraits abstraction

2020-10-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D83088#2346322 , @mehdi_amini wrote: > In D83088#2345540 , @nhaehnle wrote: > >> David, I don't think this is appropriate here. Let's take the discussion to >> llvm-dev. > > Seems like Da

[PATCH] D83088: Introduce CfgTraits abstraction

2020-10-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D83088#2348641 , @mehdi_amini wrote: > In D83088#2347111 , @arsenm wrote: > >> In D83088#2346322 , @mehdi_amini >> wrote: >> >>> In D83088#2345540

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-10-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision. arsenm added inline comments. This revision now requires changes to proceed. Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu:30 -} - -// HOST: define void @_Z22__device_stub__kernel2Ri(i32* nonnull align 4 dereferen

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-10-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu:19 +// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel1Pi(i32*{{.*}} %x) +// OPT: [[VAL:%.*]] = load i32, i32* %x, align 4 // OPT: [[INC:%.*]] = add nsw i32 [[VAL]], 1 -

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-10-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu:19 +// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel1Pi(i32*{{.*}} %x) +// OPT: [[VAL:%.*]] = load i32, i32* %x, align 4 // OPT: [[INC:%.*]] = add nsw i32 [[VAL]], 1 -

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-10-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu:19 +// COMMON-LABEL: define amdgpu_kernel void @_Z7kernel1Pi(i32*{{.*}} %x) +// OPT: [[VAL:%.*]] = load i32, i32* %x, align 4 // OPT: [[INC:%.*]] = add nsw i32 [[VAL]], 1 -

[PATCH] D90251: [AMDGPU] Add __builtin_amdgcn_grid_size

2020-10-27 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. What's the point of this? The reason for the other case was because there was no other way to attach the range metadata. The invariant load here is redundant since with AMDGPU AA the load from constant will be treated as invariant anyway Repository: rG LLVM Github M

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-10-28 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D89980#2357208 , @hliao wrote: > Besides the unpromotable `alloca` issue due to indirect accesses, such > coercion to GLOBAL pointer directly is not safe as, in HIP/CUDA, both > CONSTANT and GLOBAL pointers would be passed as t

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-02 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm requested changes to this revision. arsenm added a comment. This revision now requires changes to proceed. I think this is a dead end approach. I don't see the connection to the original problem you are trying to solve. Can you send me an IR testcase that this is supposed to help? Repos

[PATCH] D89980: [hip] Remove kernel argument coercion.

2020-11-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D89980#2371270 , @hliao wrote: > In D89980#2368506 , @arsenm wrote: > >> I think this is a dead end approach. I don't see the connection to the >> original problem you are trying to solve

[PATCH] D78979: OpenCL: Include builtin header by default

2021-02-02 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D78979#2536590 , @Anastasia wrote: > @arsenm I would like to see if we can finalize this patch soon. Do you think > you will have a capacity for this in the next weeks? Otherwise I would be > happy to help too. I don't think I

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-09-17 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. In D109707#3004869 , @gandhi21299 wrote: > Internal linkage detection works great for our purposes but it causes a > failure in llvm/test/CodeGen/AMDGPU/inline-calls.ll due to `@func_alias` > unable to be casted into a `Function

[PATCH] D110257: [CFE][Codegen] Do not break the contiguity of static allocas.

2021-09-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. I do think it's cleaner/more canonical IR to cluster these at the top of the block, but I don't understand this comment: > otherwise, inliner's attempt to move static allocas from callee to caller > will fail, The inliner successfully moves allocas to the caller's entry

[PATCH] D110257: [CFE][Codegen] Do not break the contiguity of static allocas.

2021-09-22 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/CGExpr.cpp:102-106 + auto *EBB = AllocaInsertPt->getParent(); + auto Iter = AllocaInsertPt->getIterator(); + if (Iter != EBB->end()) +++Iter; + Builder.SetInsertPoint(EBB, Iter); -

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-09-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:65 - return cast(Op.getGlobal()); + return dyn_cast(Op.getGlobal()); } I think this is not the right place for this. If we can determine the callee function, we

[PATCH] D109707: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols

2021-09-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp:65 - return cast(Op.getGlobal()); + return dyn_cast(Op.getGlobal()); } gandhi21299 wrote: > arsenm wrote: > > I think this is not the right place for this. If we

[PATCH] D110257: [CFE][Codegen] Do not break the contiguity of static allocas.

2021-09-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/CGExpr.cpp:103 +if (!ArraySize) { + auto *EBB = AllocaInsertPt->getParent(); + auto Iter = AllocaInsertPt->getIterator(); Why is there a special AllocaInsertPt iterator in the first place?

[PATCH] D57835: Fix -ftime-report with -x ir

2021-02-17 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. ping CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57835/new/ https://reviews.llvm.org/D57835 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-19 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199 + MCRegister RepReg; + for (MCRegister R : *MRI->getRegClass(Reg)) { +if (!MRI->isReserved(R)) { + RepReg = R; + break; +} + } This is a problem

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199 + MCRegister RepReg; + for (MCRegister R : *MRI->getRegClass(Reg)) { +if (!MRI->isReserved(R)) { + RepReg = R; + break; +} + } rampitec wrote: >

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-20 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199 + MCRegister RepReg; + for (MCRegister R : *MRI->getRegClass(Reg)) { +if (!MRI->isReserved(R)) { + RepReg = R; + break; +} + } rampitec wrote: >

[PATCH] D96906: [AMDGPU] gfx90a support

2021-02-23 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199 + MCRegister RepReg; + for (MCRegister R : *MRI->getRegClass(Reg)) { +if (!MRI->isReserved(R)) { + RepReg = R; + break; +} + } rampitec wrote: >

[PATCH] D61112: AMDGPU: Enable _Float16

2021-02-24 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: test/CodeGenCXX/amdgpu-float16.cpp:7 + _Float16 x, y, z; + // CHECK: v_add_f16_e64 + // NOF16: v_add_f32_e64 I thought clang tests were supposed to stop at the IR CHANGES SINCE LAST ACTION https://reviews.llvm.org/

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-07-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Missing sema test for rejecting the builtins on targets that don't support this Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:201 +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1fi", "t", "gfx90a-insts") +TARGET_BUILTIN(__builti

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-07-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:201 +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1fi", "t", "gfx90a-insts") +TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_2f16, "hh*1hi", "t", "gfx90a-insts") +TARGET_BU

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-07-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Needs an IR only test too Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12086 + TargetLowering::AtomicExpansionKind Kind, bool UnsafeFlag) { + ORE = new OptimizationRemarkEmitter(RMW->getFunction()); + if (Kind == TargetLoweri

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-07-30 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12090 + OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); + Remark << "An FP atomic instruction was expanded into a CAS loop."; + return Remark;

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/test/CodeGenOpenCL/fp-atomics-optremarks-gfx90a.cl:25 +// GFX90A-HW-LABEL: test_atomic_add +// GFX90A-HW: global_atomic_add_f64 +float test_atomic_add(global atomic_double *d, double a) { Should check the operands

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. Should still have an IR only remark test Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106891/new/ https://reviews.llvm.org/D106891 ___ cfe-commits mailing list cfe-commits@lists.

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212 + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { +Intrinsic::ID IID; +llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); rampitec wrote: > gandhi21299 wr

[PATCH] D106909: [clang] Add clang builtins support for gfx90a

2021-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212 + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { +Intrinsic::ID IID; +llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); rampitec wrote: > arsenm wrote:

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:597 + OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction()); + Remark << "A hardware instruction was generated"; + return Remark; This is too strong of a

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks when an FP atomic instruction is converted into a CAS loop or unsafe hardware instruction for GFX90A

2021-08-03 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. If you insist on making this a target remark (which I don't think it should be, and don't think it should be too specific about why the expansion happened), you have to pass through the ORE to the callback Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-06 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added a comment. I think getting too specific as to why in the messages may confuse people. Especially "might not update memory" is not super helpful. It sounds like the instruction is just entirely unreliable Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:12212

[PATCH] D106891: [AMDGPU] [Remarks] Emit optimization remarks for FP atomics

2021-08-09 Thread Matt Arsenault via Phabricator via cfe-commits
arsenm added inline comments. Comment at: llvm/lib/CodeGen/AtomicExpandPass.cpp:626-632 + if (AI->isFloatingPointOperation()) +emitAtomicExpansionRemarks( +AI, Kind, +Remark << "A hardware CAS loop generated: if the memory is " +

<    1   2   3   4   5   6   7   8   9   10   >