[llvm-bugs] [Bug 123175] Mul reassociation in instcombine does not maintain NSW (for example impacting alias analysis negatively when canonicalizing GEP)
Issue 123175 Summary Mul reassociation in instcombine does not maintain NSW (for example impacting alias analysis negatively when canonicalizing GEP) Labels llvm:instcombine Assignees Reporter bjope Consider IR like this (also in godbolt here https://godbolt.org/z/rW8K8zfnc): ``` ; RUN: opt -passes='aa-eval,instcombine,aa-eval' -print-all-alias-modref-info target datalayout = "p:16:16:16:16" define i16 @foo1(i16 %x) { %a = mul nsw nuw i16 %x, 2 %b = mul nsw nuw i16 %a, 3 ret i16 %b } define i16 @foo2(i16 %x) { %a = mul nsw nuw i16 %x, 3 %b = mul nsw nuw i16 %a, 2 ret i16 %b } define ptr @foo3(i16 noundef %x, ptr noundef %p) { %cmp = icmp sgt i16 %x, 0 call void @llvm.assume(i1 %cmp) %a = mul nsw nuw i16 %x, 3 %idxprom = sext i16 %a to i64 %b = getelementptr inbounds i16, ptr %p, i64 %idxprom store i16 2, ptr %b store i16 1, ptr %p ret ptr %b } ``` It seems a bit inconsistent that instcombine for foo1 is able to keep "nsw nuw" on the simplified mul ``` %b = mul nuw nsw i16 %x, 6 ``` while for foo2 nsw is dropped ``` %b = mul nuw i16 %x, 6 ``` and for foo3 both nuw and nsw is dropped on the mul ``` %b.idx = mul i16 %x, 6 %b = getelementptr inbounds i8, ptr %p, i16 %b.idx ``` The foo3 example also show that dropping "nsw" on the mul may impact alias analysis as it no longer is able to derive NoAlias after instcombine. I think one problem here is that InstCombinerImpl::SimplifyAssociativeOrCommutative only deal with Add/Sub when using the maintainNoSignedWrap helper. Here (https://alive2.llvm.org/ce/z/RqG2pz) is an alive2 proof showing that we at least should be able to keep nsw on the second mul when doing "(A mul B) mul C" ==> "A mul (B mul C)", as long as the associated Mul operations are both "nsw nuw": ``` define i8 @src(i8 %a, i8 %b, i8 %c) { %x = mul nsw nuw i8 %a, %b %y = mul nsw nuw i8 %x, %c ret i8 %y } define i8 @tgt(i8 %a, i8 %b, i8 %c) { %x = mul i8 %c, %b %y = mul nsw nuw i8 %x, %a ret i8 %y } ``` Maybe there are more situations when "nsw" can be kept given that (B mul C) simplifies, e.g. when all involved values are known to be non-negative? PS. When using the reassociate pass instead of instcombine the result is even worse, since it drops both "nuw nsw" even for foo1. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123179] [clang-format] Macro formatting regression 19.1.6 vs 19.1.7
Issue 123179 Summary [clang-format] Macro formatting regression 19.1.6 vs 19.1.7 Labels clang-format Assignees Reporter chegoryu For the following code: ```cpp template void write_to(Writer& writer, const FieldHeader& field_header) { #define WRITE_MESSAGE(type) \ { \ case FieldType::type: { \ writer.value(#type); \ writer.key("Message").start_object(); \ write_to(writer, cast_to(field_header)); \ writer.finish_object(); \ return; \ } \ } } ``` ``` Ubuntu clang-format version 19.1.7 (++20250114103238+cd708029e0b2-1~exp1~20250114103342.77) ``` Produces ```cpp template void write_to(Writer& writer, const FieldHeader& field_header) { #define WRITE_MESSAGE(type) \ {case FieldType::type: {writer.value(#type); \ writer.key("Message").start_object(); \ write_to(writer, cast_to(field_header)); \ writer.finish_object(); \ return; \ } \ } } ``` But ``` Ubuntu clang-format version 19.1.6 (++20241217110052+657e03f8625c-1~exp1~20241217110110.73) ``` Does not change the file `clang-format-19 file.cpp --dump-config`: https://pastebin.com/7vebyX6j ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123189] [MASM] SIGSEGV in `checkForValidSection` in MasmParser
Issue 123189 Summary [MASM] SIGSEGV in `checkForValidSection` in MasmParser Labels new issue Assignees Reporter MisterDA I'm trying to cross-compile the OCaml compiler with a Debian host, targeting `x86_64-pc-windows` with `clang-cl`. I'm running into a segfault from `llvm-ml` (the MASM assembler), a drop-in replacement for Microsoft's `ml64`. I hit the issue with LLVM 18 and LLVM 20 (ea14bdb0356cdda727ac032470f6a0a2102d1281 as the time of writing). Here is a reproducer, as a Dockerfile (build with `docker build --platform linux/amd64 .`), and the backtrace: ```Dockerfile # syntax=docker/dockerfile:1 FROM debian:experimental ARG LLVM_VERSION=20 ENV DEBUGINFOD_URLS="https://debuginfod.debian.net" RUN cat <<'EOF' > /etc/apt/sources.list.d/debug.list deb http://deb.debian.org/debian-debug/ experimental-debug main EOF RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ --mount=type=cache,target=/var/lib/apt,sharing=locked \ apt update && DEBIAN_FRONTEND=noninteractive apt upgrade -y && \ DEBIAN_FRONTEND=noninteractive apt-get --no-install-recommends install -y \ clang-$LLVM_VERSION clang-$LLVM_VERSION-dbgsym \ clang-tools-$LLVM_VERSION clang-tools-$LLVM_VERSION-dbgsym \ lld-$LLVM_VERSION lld-$LLVM_VERSION-dbgsym \ llvm-$LLVM_VERSION llvm-$LLVM_VERSION-dbgsym \ lldb-$LLVM_VERSION lldb-$LLVM_VERSION-dbgsym \ make gdb ADD --keep-git-dir --link https://github.com/ocaml/ocaml.git /root/ocaml WORKDIR /root/ocaml ENV LLVM_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-$LLVM_VERSION RUN clang-cl-20 -nologo -EP -TC runtime/caml/domain_state.tbl > runtime/domain_state.inc # llvm-ml-20 -m64 dislikes parentheses on macro calls RUN sed -e 's/(//g' -e 's/)//g' -i runtime/domain_state.inc # llvm-ml-20 doesn't understand NEAR RUN sed -E -e 's/(EXTRN.*):.*NEAR/\1:PROC/g' -i runtime/amd64nt.asm RUN llvm-ml-20 -m64 -nologo -Iruntime -c -Foruntime/amd64nt.obj runtime/amd64nt.asm ``` ``` PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: llvm-ml-20 -m64 -nologo -Iruntime -c -Foruntime/amd64nt.obj runtime/amd64nt.asm #0 0x77fa117a llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:723:13 #1 0x77f9ed14 llvm::sys::RunSignalHandlers() build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Signals.cpp:106:18 #2 0x77fa182b SignalHandler build-llvm/tools/clang/stage2-bins/llvm/lib/Support/Unix/Signals.inc:413:1 #3 0x76ac0da0 (/lib/x86_64-linux-gnu/libc.so.6+0x3fda0) #4 0x798923a2 checkForValidSection build-llvm/tools/clang/stage2-bins/llvm/lib/MC/MCParser/MasmParser.cpp:1457:31 #5 0x79895133 parseStatement build-llvm/tools/clang/stage2-bins/llvm/lib/MC/MCParser/MasmParser.cpp:0:7 #6 0x7988d2a5 Run build-llvm/tools/clang/stage2-bins/llvm/lib/MC/MCParser/MasmParser.cpp:0:0 #7 0xd0f0 AssembleInput build-llvm/tools/clang/stage2-bins/llvm/tools/llvm-ml/llvm-ml.cpp:186:13 #8 0xbc9a llvm_ml_main build-llvm/tools/clang/stage2-bins/llvm/tools/llvm-ml/llvm-ml.cpp:0:11 #9 0xe45a main build-llvm/tools/clang/stage2-bins/build-llvm/tools/clang/stage2-bins/tools/llvm-ml/llvm-ml-driver.cpp:17:10 #10 0x76aaad68 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3 #11 0x76aaae25 call_init ./csu/../csu/libc-start.c:128:20 #12 0x76aaae25 __libc_start_main ./csu/../csu/libc-start.c:347:5 #13 0x9d71 (/usr/lib/llvm-20/bin/llvm-ml+0x5d71) Segmentation fault ``` https://github.com/llvm/llvm-project/blob/628976c8345e235d4f71a0715f1990ad8b5bbcf7/llvm/lib/MC/MCParser/MasmParser.cpp#L1456-L1463 Presumably `getStreamer()` returns a `nullptr`. It's possibly similar to #97635, I'll ping the participants: @sivan-shani @MaskRay. Thanks for any help! ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123195] PointerIntPair.h:172:17: error: static assertion failed due to requirement
Issue 123195 Summary PointerIntPair.h:172:17: error: static assertion failed due to requirement Labels build-problem, mlir Assignees Reporter sylvestre Recent regression on linux: https://llvm-jenkins.debian.net/job/llvm-toolchain-bookworm-binaries/architecture=i386,distribution=bookworm,label=i386/1114/consoleFull``` In file included from /build/source/mlir/lib/Bytecode/Writer/BytecodeWriter.cpp:9: In file included from /build/source/mlir/include/mlir/Bytecode/BytecodeWriter.h:16: In file included from /build/source/mlir/include/mlir/IR/AsmState.h:18: In file included from /build/source/mlir/include/mlir/IR/OperationSupport.h:17: In file included from /build/source/mlir/include/mlir/IR/Attributes.h:12: In file included from /build/source/mlir/include/mlir/IR/AttributeSupport.h:17: In file included from /build/source/mlir/include/mlir/IR/StorageUniquerSupport.h:21: In file included from /build/source/llvm/include/llvm/ADT/FunctionExtras.h:35: /build/source/llvm/include/llvm/ADT/PointerIntPair.h:172:17: error: static assertion failed due to requirement '3U <= PointerUnionUIntTraits::NumLowBitsAvailable': PointerIntPair with integer size too large for pointer 172 | static_assert(IntBits <= PtrTraits::NumLowBitsAvailable, | ^ /build/source/llvm/include/llvm/ADT/PointerIntPair.h:111:13: note: in instantiation of template class 'llvm::PointerIntPairInfo>' requested here 111 | Value = Info::updateInt(Info::updatePointer(0, PtrVal), ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123198] False positive in bugprone-string-constructor
Issue 123198 Summary False positive in bugprone-string-constructor Labels new issue Assignees Reporter JVApen #include void f(std::string str) { // Find the substring "FAMILY:" (copied from old code so still using C-style Char pointers) const char *ptr = str.c_str(); std::string copy(ptr, 0, str.size()/2); } On Compiler Explorer: https://compiler-explorer.com/z/1o86onvGG The result: [:7:19: warning: constructor creating an empty string [bugprone-string-constructor]] 7 | std::string copy(ptr, 0, str.size()/2); | ^ 1 warning generated. In this example, the following constructor of std::string should be called: template< class StringViewLike > basic_string( const StringViewLike& t, size_type pos, size_type count, const Allocator& alloc = Allocator() ); Since we provide a valid pointer and a valid non-0 count, the string isn't empty by default. As such, the warning should not be given. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123203] [Clang][OpenCL] Compiler crash on __builtin_assume_aligned in OpenCL
Issue 123203 Summary [Clang][OpenCL] Compiler crash on __builtin_assume_aligned in OpenCL Labels clang, OpenCL, crash Assignees Reporter ritter-x2a Using the return value of `__builtin_assume_aligned` in OpenCL hits an assertion in clang. I don't have a strong opinion on whether the builtin should be supported in OpenCL since it's not part of the Khronos spec, but it shouldn't hit an assertion. Observed with a RelWithDebInfo trunk build, on Ubuntu 22.04. Reproducer: ```c void f(__global int *g) { __global int *ag = __builtin_assume_aligned(g, 16); } ``` When compiling this via `clang -c test.cl`, clang first reports unexpected diagnostics (I don't see how `bool`s are involved) and then hits an assertion: ``` test.cl:2:17: error: incompatible integer to pointer conversion initializing '__global int *__private' with an _expression_ of type 'bool' [-Wint-conversion] 2 | __global int *ag = __builtin_assume_aligned(g, 16); | ^ ~~ clang: /home/faritter/playground/llvm/llvm-project/llvm/lib/IR/Instructions.cpp:2974: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::InsertPosition): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: ./build/bin/clang -c test.cl 1. parser at end of file 2. test.cl:1:6: LLVM IR generation of declaration 'f' 3. test.cl:1:6: Generating code for declaration 'f' #0 0x56e464acc7ff llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/faritter/playground/llvm/llvm-project/llvm/lib/Support/Unix/Signals.inc:802:3 [...] #13 0x56e4636a9852 llvm::IRBuilderBase::CreateCast(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, llvm::Twine const&, llvm::MDNode*, llvm::FMFSource) /home/faritter/playground/llvm/llvm-project/llvm/include/llvm/IR/IRBuilder.h:2193:0 #14 0x56e4636a9852 llvm::IRBuilderBase::CreateIntCast(llvm::Value*, llvm::Type*, bool, llvm::Twine const&) /home/faritter/playground/llvm/llvm-project/llvm/include/llvm/IR/IRBuilder.h:2231:0 #15 0x56e464e2f677 (anonymous namespace)::ScalarExprEmitter::VisitCastExpr(clang::CastExpr*) /home/faritter/playground/llvm/llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:2574:44 #16 0x56e464e2c104 Visit /home/faritter/playground/llvm/llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:449:3 #17 0x56e464e2c104 clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) /home/faritter/playground/llvm/llvm-project/clang/lib/CodeGen/CGExprScalar.cpp:5591:13 [...] ``` The full backtrace, preprocessed source, and run script are attached: [backtrace.txt](https://github.com/user-attachments/files/18440429/backtrace.txt) [test-ff43fa.cl.txt](https://github.com/user-attachments/files/18440433/test-ff43fa.cl.txt) [test-ff43fa.sh.txt](https://github.com/user-attachments/files/18440434/test-ff43fa.sh.txt) ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123224] [libc++] regression: new/delete symbol overrides broken on macOS
Issue 123224 Summary [libc++] regression: new/delete symbol overrides broken on macOS Labels libc++ Assignees Reporter tycho This is referring to commit https://github.com/llvm/llvm-project/commit/841895543edcf98bd16027c6b85fe7c6419a4566. In a shared library which statically links libc++ (ANGLE's libEGL in this case), the symbols for `new` and `new[]` are, as of the above commit, suddenly exposed as global, but the corresponding `delete` and `delete[]` operators are not. Before the above commit: ``` $ nm -g -C --defined-only contrib/angle/angle/out/macOS-Debug-arm64/libEGL.dylib | grep -e new -e delete ``` After: ``` $ nm -g -C --defined-only contrib/angle/angle/out/macOS-Debug-arm64/libEGL.dylib | grep -e new -e delete 001e9460 T operator new[](unsigned long) 001e9740 T operator new[](unsigned long, std::align_val_t) 001e9328 T operator new(unsigned long) 001e95e0 T operator new(unsigned long, std::align_val_t) ``` This causes applications like mine with custom allocators (mimalloc in this case) to provide the implementations for the operator `new` symbols, but not the `delete` symbols, which inevitably causes a crash within the shared library when it tries to free memory. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123212] [AMDGPU][GISel] Missing (or not running) combine for `sra workitem.id.xx, 31`
Issue 123212 Summary [AMDGPU][GISel] Missing (or not running) combine for `sra workitem.id.xx, 31` Labels Assignees Reporter qcolombet In the AMDGPU backend, GISel ends up with additional instructions because we are missing some simplification that could take advantage of the range of the `workitem.id.xx` values. I am somewhat surprised because I see that the AMDGPU backend implements the `TargetLowering::computeKnownBitsForTargetInstr` method and has some logic to propagate the known bits for these intrinsics. Bottom line, I haven't dug into why the simplification doesn't happen, that may be an easy fix. Anyhow, the issue at hand is that `sra workitem.id.xx, 31` could be simplified in `shl workitem.id.xx, 31` and then further simplified in a plain `0`. # To Reproduce # Download the attached reproducer or copy/paste the LLVM IR at the end of this issue. [repro.ll.txt](https://github.com/user-attachments/files/18441382/repro.ll.txt) Then run: ```bash llc -O3 -march=amdgcn -mcpu=gfx942 -mtriple amdgcn-amd-hmcsa -global-isel=<0|1> reduced.ll -o - ``` # Result # With GISel we have a `sra` and `xor` in the final assembly, whereas they could be eliminated. With GISel: ```asm s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) v_and_b32_e32 v2, 0x3ff, v31 v_ashrrev_i32_e32 v3, 31, v2 v_xor_b32_e32 v2, v3, v2 flat_store_dword v[0:1], v2 s_waitcnt vmcnt(0) lgkmcnt(0) s_setpc_b64 s[30:31] ``` With SDISel: ```asm s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0) v_and_b32_e32 v2, 0x3ff, v31 flat_store_dword v[0:1], v2 s_waitcnt vmcnt(0) lgkmcnt(0) s_setpc_b64 s[30:31] ``` # Note # Input LLVM IR: ```llvm target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9" target triple = "amdgcn-amd-amdhsa" declare noundef i32 @llvm.amdgcn.workgroup.id.x() define dso_local void @foo.bb.split(ptr %out) { newFuncRoot: %i = tail call i32 @llvm.amdgcn.workitem.id.x() %.lobit = ashr i32 %i, 31 %i32 = xor i32 %.lobit, %i store i32 %i32, ptr %out ret void } ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123208] AMDGPU silently converts incorrect physical register asm constraint to virtual register
Issue 123208 Summary AMDGPU silently converts incorrect physical register asm constraint to virtual register Labels backend:AMDGPU, accepts-invalid Assignees Reporter arsenm ``` ; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx940 < %s define void @invalid_sgpr(<2 x i32> inreg %arg0) { call void asm sideeffect "; use $0", "{s[1:2]}"(<2 x i32> %arg0) ret void } ``` s[1:2] is not a valid SGPR reference as 64-bit SGPRs require even alignment. This is silently accepted, and appears to be treated as a virtual register constraint. In -stop-after=finalize-isel, I see: ``` %10:sreg_64 = COPY %11 INLINEASM &"; use $0", 1 /* sideeffect attdialect */, 3997705 /* reguse:SReg_64 */, %10 ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123214] [flang] The Fortran test cases for hdf5-1.10.6 cannot be built with Flang
Issue 123214 Summary [flang] The Fortran test cases for hdf5-1.10.6 cannot be built with Flang Labels flang Assignees Reporter pawosm-arm Although the hdf5 library can be built and installed when tests are explicitly disabled (with `--disable-tests` passed to the `configure` script), this is not an optimal situation. I'm configuring hdf5-1.10.6 as such: ``` $ CC=mpicc CXX=mpic++ FC=mpifort F77=mpifort F90=mpifort ./configure --enable-shared --enable-static --enable-parallel --disable-cxx --enable-fortran --enable-hl --prefix=/some/prefix $ sed -i -e 's#wl=""#wl="-Wl,"#g' libtool $ sed -i -e 's#pic_flag=""#pic_flag=" -fPIC -DPIC"#g' libtool ``` (the `sed` lines are here to make it able to build shared libs, it's a known flang issue) Unfortunately, this will fail when building Fortran tests as such: ``` mpifort -I. -I../../../fortran/test -I../../src -I../../fortran/src -I../../fortran/src -I../../fortran/src -c -o tH5T.o ../../../fortran/test/tH5T.F90 error: Semantic errors in ../../../fortran/test/tH5T.F90 ./../../../fortran/test/tH5T.F90:283:6: error: No specific subroutine of generic 'h5dwrite_f' matches the actual arguments CALL h5dwrite_f(dset_id, dt4_id, real_member, data_dims, error, xfer_prp = plist_id) ./../../../fortran/test/tH5T.F90:541:6: error: No specific subroutine of generic 'h5dread_f' matches the actual arguments CALL h5dread_f(dset_id, dt4_id, real_member_out, data_dims, error) ^^ ./../../../fortran/test/tH5T.F90:544:9: error: Cannot use intrinsic function 'verify' as a subroutine CALL VERIFY("h5dread_f:Wrong double precision data is read back", real_member_out(i), real_member(i), total_error) ^^ ./../../../fortran/test/tH5T.F90:544:9: error: No specific subroutine of generic 'verify' matches the actual arguments CALL VERIFY("h5dread_f:Wrong double precision data is read back", real_member_out(i), real_member(i), total_error) ^^ ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123201] [LV][EVL] Support interleaved accesses for EVL tail folding.
Issue 123201 Summary [LV][EVL] Support interleaved accesses for EVL tail folding. Labels vectorizers Assignees Mel-Chen Reporter Mel-Chen The motivation for this issue is to provide better support for RVV unit-strided segment load/store. The following scenarios need to be supported: * Interleaved load (vp.load + interleave) * Interleaved load with tail gaps (Requires scalar epilogue to run the last iteration) * Fully interleaved store (deinterleave + vp.store) * Interleaved store with gaps (This can not emit unit-strided segment store. We can only emit a wide masked store for that) Due to the high complexity of `VPInterleaveRecipe::execute()`, creating a new recipe or converting it into `VPWidenIntrinsicRecipe` does not seem like a wise approach. A tentative approach I have in mind is to first split `VPInterleaveRecipe` into `VPWidenLoad + VPDeinterleave` and `VPInterleave + VPWidenStore`. During the EVL lowering phase, we would only need to transform `VPWidenLoad/VPWidenStore` into `VPWidenLoadEVL/VPWidenStoreEVL`. For now, the focus will be on supporting factor 2 (`interleave2/deinterleave2`) as the initial target, with support for factors 3 to 8 planned after test results are stable. Related IAP support: #120490 . ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123227] Clang static analyzer false positive suppression does not suppress an issue report
Issue 123227 Summary Clang static analyzer false positive suppression does not suppress an issue report Labels clang, false-positive Assignees Reporter SergeySatskiy We use the clang static analyzer for our C++ code as a part of a workflow. Sometimes there are false positives and I have troubles to suppress them. Here is an example of the code: ```c++ template inline typename CParam::TValueType CParam::Get(void) const { if ( !m_ValueSet ) { // The lock prevents multiple initializations with the default value // in Get(), but does not prevent Set() from modifying the value // while another thread is reading it. CMutexGuard guard(s_GetLock()); if ( !m_ValueSet ) { m_Value = GetThreadDefault(); if (GetState() >= eState_Config) { // All sources checked or the value is set by user. m_ValueSet = true; } } } return m_Value; } ``` An issue is reported for the ```return m_Value;``` line as follows: "Undefined or garbage value returned to caller". The developer of the code investigated this case and it seems that the false positive is because the multithreaded nature of the code was not taken into consideration. It is understandable so I tried to suppress the issue reporting. Following the documentation I tried multiple options (adding before the ```return ...``` line): - ```__attribute__((suppress))``` - ```[[clang::suppress]]``` - ```[[gsl::suppress("lifetime")]]``` - ```[[gsl::suppress("bounds")]]``` And none of this options suppressed the issue reporting. Do I do something wrong or there is an issue with the clang analyzer so that the suppress attribute is not taken into consideration? Note: the code is compiled with ```-std=gnu++17``` option. I tried ```-std=c++17``` option as well with the same outcome. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123296] Clang crash when trying to evaluate a constexpr with `auto` type, variadic template before type is fully defined
Issue 123296 Summary Clang crash when trying to evaluate a constexpr with `auto` type, variadic template before type is fully defined Labels clang Assignees Reporter bricknerb Example: ```c++ struct MyClass { template static constexpr auto foo() { return 1;} static constexpr auto my_value = foo(); }; ``` We get a "Unexpected undeduced type!" crash Compiler Explorer: https://godbolt.org/z/7rhT7qEeE I believe this should be an error that says the function `foo()` isn't defined yet because the class isn't fully defined, but not a crash. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123278] Missed optimization between `range` parameter metadata and `assume`s
Issue 123278 Summary Missed optimization between `range` parameter metadata and `assume`s Labels new issue Assignees Reporter scottmcm Now that `range` parameter metadata exists (🎉) I'm trying to remove some of our `assume`s that `rustc` outputs which should no longer be necessary. That works for almost all of our tests in https://github.com/rust-lang/rust/blob/master/tests/codegen/transmute-optimized.rs , but one. We end up, after optimizations, still getting ```rust define noundef range(i8 1, 4) i8 @ordering_transmute_onetwothree(i8 noundef returned range(i8 -1, 2) %x) unnamed_addr #2 { start: %0 = icmp ne i8 %x, 0 tail call void @llvm.assume(i1 %0) %1 = icmp ult i8 %x, 4 tail call void @llvm.assume(i1 %1) ret i8 %x } ``` That input range is `[-1, 2)` and those assumes are a range `[1, 4)`, so it ought to simplify to just `ret i8 1`, but it doesn't. Alive2 proof that it would be legal: -- and legal even without the `range` on the return value. Maybe this is somehow related to the `x uge 1` being turned into `x ne 0`, and thus it not noticing there's a range? Or maybe it's something about the wrap-around? SEO: rust transmute range bounds --- As an aside, I'd love to emit these as `range` [assume operand bundles](https://llvm.org/docs/LangRef.html#assume-operand-bundles) instead of `icmp`s, but AFAICT those don't exist yet, so I'm stuck with the `icmp`s for now 🙁 ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123291] Conversion of Affine Loops to GPU Dialect Fails with 'Invalid Dimension or Symbol Identifier'
Issue 123291 Summary Conversion of Affine Loops to GPU Dialect Fails with 'Invalid Dimension or Symbol Identifier' Labels new issue Assignees Reporter lhw414 ## Description While attempting to convert an affine loop-based MLIR to the GPU dialect using the `convert-affine-for-to-gpu` pass, I encounter the following error: ```plaintext sample.mlir:8:14: error: 'affine.load' op index must be a valid dimension or symbol identifier %0 = affine.load %arg0[%arg1, %arg2] : memref<6x12xf32> ^ sample.mlir:8:14: note: see current operation: %8 = "affine.load"(%arg0, %7, %arg13) <{map = affine_map<(d0, d1) -> (d0, d1)>}> : (memref<6x12xf32>, index, index) -> f32 ``` The mlir-opt command used to reproduce this issue is: ```bash ../build/bin/mlir-opt sample.mlir -o sample_output.mlir \ -pass-pipeline="builtin.module(func.func(convert-affine-for-to-gpu{gpu-block-dims=1 gpu-thread-dims=0}))" ``` Here is the original input MLIR: ```mlir module { memref.global "private" @global_seed : memref = dense<0> func.func @main(%arg0: memref<6x12xf32>) -> memref<6x12xf32> { %cst = arith.constant 0.00e+00 : f32 %alloc = memref.alloc() {alignment = 64 : i64} : memref<6x12xf32> affine.for %arg1 = 0 to 6 { affine.for %arg2 = 0 to 12 { %0 = affine.load %arg0[%arg1, %arg2] : memref<6x12xf32> } } return %alloc : memref<6x12xf32> } } ``` The resulting intermediate IR dump after the failure is as follows: ```mlir // -// IR Dump After ConvertAffineForToGPU Failed (convert-affine-for-to-gpu) //- "func.func"() <{function_type = (memref<6x12xf32>) -> memref<6x12xf32>, sym_name = "main"}> ({ ^bb0(%arg0: memref<6x12xf32>): %0 = "arith.constant"() <{value = 0.00e+00 : f32}> : () -> f32 %1 = "memref.alloc"() <{alignment = 64 : i64, operandSegmentSizes = array}> : () -> memref<6x12xf32> %2 = "arith.constant"() <{value = 0 : index}> : () -> index %3 = "arith.constant"() <{value = 6 : index}> : () -> index %4 = "arith.subi"(%3, %2) <{overflowFlags = #arith.overflow}> : (index, index) -> index %5 = "arith.constant"() <{value = 1 : index}> : () -> index %6 = "arith.constant"() <{value = 1 : index}> : () -> index "gpu.launch"(%4, %6, %6, %6, %6, %6) <{operandSegmentSizes = array}> ({ ^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index, %arg5: index, %arg6: index, %arg7: index, %arg8: index, %arg9: index, %arg10: index, %arg11: index, %arg12: index): %7 = "arith.addi"(%2, %arg1) <{overflowFlags = #arith.overflow}> : (index, index) -> index "affine.for"() <{lowerBoundMap = affine_map<() -> (0)>, operandSegmentSizes = array, step = 1 : index, upperBoundMap = affine_map<() -> (12)>}> ({ ^bb0(%arg13: index): %8 = "affine.load"(%arg0, %7, %arg13) <{map = affine_map<(d0, d1) -> (d0, d1)>}> : (memref<6x12xf32>, index, index) -> f32 "affine.yield"() : () -> () }) : () -> () "gpu.terminator"() : () -> () }) {workgroup_attributions = 0 : i64} : (index, index, index, index, index, index) -> () "func.return"(%1) : (memref<6x12xf32>) -> () }) : () -> () ``` ## Questions 1. Is there an issue with the original MLIR input? - Are there any preconditions or required passes that I missed before applying convert-affine-for-to-gpu? 2. Could this be a problem in the convert-affine-for-to-gpu implementation? - The error suggests that the indices for affine.load are not considered valid dimensions or symbols. However, %arg1 and %arg2 are induction variables of affine.for loops, which are typically valid. 3. What additional passes should be applied before convert-affine-for-to-gpu? - For instance, should I run canonicalize, lower-affine, or similar passes to simplify the IR and ensure compatibility? 4. Are there any reference backends or examples for converting MLIR with linalg or affine dialects to the GPU dialect? - I am particularly interested in examples or documentation that describe the process and highlight best practices for such conversions. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123246] [mlir] -remove-dead-values crashes on scf.if for empty region
Issue 123246 Summary [mlir] -remove-dead-values crashes on scf.if for empty region Labels mlir, crash Assignees Reporter python3kgae reproduce with: mlir-opt -remove-dead-values a.mlir a.mlir: ``` func.func @nested_if(%cond0: i1, %cond1: i1, %cond2: i1, %p: memref<1xf32>) { %cst = arith.constant 1.00e+00 : f32 scf.if %cond0 { } else { scf.if %cond1 { } else { scf.if %cond2 { affine.store %cst, %p[0] : memref<1xf32> } } } return } ``` It will crash in cleanRegionBranchOp when access region.front() or region.back(). And could be worked around by adding ``` if (region.empty()) continue; ``` to all these access. There's assert in one of the access: ``` assert(!region.empty() && "expected a non-empty region in an op " "implementing `RegionBranchOpInterface`"); ``` But scf.if seems OK to return an empty region for getElseRegion. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123263] `@llvm.minimumnum.f32` returns sNaN instead of qNaN on x86_64
Issue 123263 Summary `@llvm.minimumnum.f32` returns sNaN instead of qNaN on x86_64 Labels new issue Assignees Reporter sunfishcode LLVM's documentation for `@llvm.minimumnum.f32` [says](https://llvm.org/docs/LangRef.html#llvm-minimumnum-intrinsic) "If both operands are NaNs (including sNaN), returns qNaN". However, on x86_64, it actually returns sNaN. Specifically, with this test.c: ```c #include #include float f32_minnumber(float x, float y); int main() { float f = __builtin_nansf(""); float g = f32_minnumber(f, f); float h = g + 0; unsigned uf, ug, uh; memcpy(&uf, &f, sizeof(f)); memcpy(&ug, &g, sizeof(f)); memcpy(&uh, &h, sizeof(f)); printf("%x\n%x\n%x\n", uf, ug, uh); return 0; } ``` and this minnumber.ll: ```llvmir target triple = "x86_64-pc-linux-gnu" define float @f32_minnumber(float %x, float %y) { %t = call float @llvm.minimumnum.f32(float %x, float %y) ret float %t } define double @f64_minnumber(double %x, double %y) { %t = call double @llvm.minimumnum.f32(double %x, double %y) ret double %t } define float @f32_maxnumber(float %x, float %y) { %t = call float @llvm.maximumnum.f32(float %x, float %y) ret float %t } define double @f64_maxnumber(double %x, double %y) { %t = call double @llvm.maximumnum.f32(double %x, double %y) ret double %t } ``` Compiling for x86_64 gets this output: ```console $ clang test.c minnumber.ll $ ./a.out 7fa0 7fa0 7fe0 $ ``` This shows that the operands of the `f32.minimumnum` are sNaN and the result is incorrectly also sNaN. IEEE 754-2019 says of its corresponding `minimumNumber` operattion "If both operands are NaNs, a quiet NaN is returned". I have not tested similar variants for f64, maximumnum, or other architectures. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123239] Unnecessarily large constant created from reordering add and shift
Issue 123239 Summary Unnecessarily large constant created from reordering add and shift Labels new issue Assignees Reporter dzaima https://godbolt.org/z/xoKf6bnTb The code: ```c #include #include bool foo(uint64_t x) { uint16_t tag = x>>48; return tag>=0b0010 && tag<=0b0100; } ``` with `-O3` as of clang 19 (and still in trunk) compiles to: ```asm foo: movabs rax, 3940649673949184 add rax, rdi shr rax, 48 cmp eax, 3 setbal ret ``` whereas 18.0 did this, which is strictly better (i.e. is the exact same set of instructions, just in a different order and without movabs): ```asm foo: shr rdi, 48 add edi, -65522 cmp edi, 3 setbal ret ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123249] [clang] Compiler crash with "echo 'a; b() { __atomic_test_and_set(a, b); }' | ./clang -cc1 -emit-llvm -o -"
Issue 123249 Summary [clang] Compiler crash with "echo 'a; b() { __atomic_test_and_set(a, b); }' | ./clang -cc1 -emit-llvm -o -" Labels clang Assignees Reporter thurstond Using clang built from today's source: ``` commit a98df676140c9b3e44f6e094df40d49f53e9a89c (HEAD -> main, upstream/main, upstream/HEAD) Date: Thu Jan 16 14:00:42 2025 -0800 ``` and running this command: ``` $ echo 'a; b() { __atomic_test_and_set(a, b); }' | ./clang -cc1 -emit-llvm -o - ``` crashes the compiler: ``` :1:1: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int] 1 | a; b() { __atomic_test_and_set(a, b); } | ^ | int :1:4: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int] 1 | a; b() { __atomic_test_and_set(a, b); } |^ |int :1:32: error: incompatible integer to pointer conversion passing 'int' to parameter of type 'volatile void *' [-Wint-conversion] 1 | a; b() { __atomic_test_and_set(a, b); } | ^ :1:35: error: incompatible pointer to integer conversion passing 'int ()' to parameter of type 'int' [-Wint-conversion] 1 | a; b() { __atomic_test_and_set(a, b); } | ^ clang: /usr/local/google/home/thurston/llvm-projectG/clang/include/clang/AST/Type.h:8810: const T *clang::Type::castAs() const [T = clang::PointerType]: Assertion `isa(CanonicalType)' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /usr/local/google/home/thurston/llvm-projectG/build/bin/clang -cc1 -emit-llvm -o - 1. parser at end of file 2. :1:4: LLVM IR generation of declaration 'b' 3. :1:4: Generating code for declaration 'b' #0 0x5608f980d9a1 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/thurston/llvm-projectG/llvm/lib/Support/Unix/Signals.inc:798:11 #1 0x5608f980de9b PrintStackTraceSignalHandler(void*) /usr/local/google/home/thurston/llvm-projectG/llvm/lib/Support/Unix/Signals.inc:874:1 #2 0x5608f980be96 llvm::sys::RunSignalHandlers() /usr/local/google/home/thurston/llvm-projectG/llvm/lib/Support/Signals.cpp:105:5 #3 0x5608f980e635 SignalHandler(int) /usr/local/google/home/thurston/llvm-projectG/llvm/lib/Support/Unix/Signals.inc:415:1 #4 0x7f0859056590 (/lib/x86_64-linux-gnu/libc.so.6+0x3f590) #5 0x7f08590a53ac __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 #6 0x7f08590564f2 raise ./signal/../sysdeps/posix/raise.c:27:6 #7 0x7f085903f4ed abort ./stdlib/abort.c:81:7 #8 0x7f085903f415 _nl_load_domain ./intl/loadmsgcat.c:1177:9 #9 0x7f085904f012 (/lib/x86_64-linux-gnu/libc.so.6+0x38012) #10 0x5608f9df5bb3 clang::PointerType const* clang::Type::castAs() const /usr/local/google/home/thurston/llvm-projectG/clang/include/clang/AST/Type.h:0:3 #11 0x5608fa372b7f clang::CodeGen::CodeGenFunction::EmitBuiltinExpr(clang::GlobalDecl, unsigned int, clang::CallExpr const*, clang::CodeGen::ReturnValueSlot) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGBuiltin.cpp:5135:16 #12 0x5608f9de34d2 clang::CodeGen::CodeGenFunction::EmitCallExpr(clang::CallExpr const*, clang::CodeGen::ReturnValueSlot, llvm::CallBase**) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGExpr.cpp:5607:12 #13 0x5608f9e97918 (anonymous namespace)::ScalarExprEmitter::VisitCallExpr(clang::CallExpr const*) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGExprScalar.cpp:627:36 #14 0x5608f9e8e871 clang::StmtVisitorBase::Visit(clang::Stmt*) /usr/local/google/home/thurston/llvm-projectG/build/tools/clang/include/clang/AST/StmtNodes.inc:614:1 #15 0x5608f9e83435 (anonymous namespace)::ScalarExprEmitter::Visit(clang::Expr*) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGExprScalar.cpp:448:52 #16 0x5608f9e8326a clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGExprScalar.cpp:5590:3 #17 0x5608f9dc03e9 clang::CodeGen::CodeGenFunction::EmitAnyExpr(clang::Expr const*, clang::CodeGen::AggValueSlot, bool) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGExpr.cpp:242:24 #18 0x5608f9dc0299 clang::CodeGen::CodeGenFunction::EmitIgnoredExpr(clang::Expr const*) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGExpr.cpp:217:5 #19 0x5608f9fc7540 clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef) /usr/local/google/home/thurston/llvm-projectG/clang/lib/CodeGen/CGStmt.cpp:129:5 #20 0x5608f9fd1901 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWitho
[llvm-bugs] [Bug 123248] [RISCV64] ld.lld: error: relaxation not converged
Issue 123248 Summary [RISCV64] ld.lld: error: relaxation not converged Labels lld Assignees Reporter appujee Steps to repro ``` clone aosp-main-with-phones $ source build/envsetup.sh $ lunch aosp_cf_riscv64_phone-trunk_staging-userdebug $ m m net_test_stack FAILED: out/soong/.intermediates/packages/modules/Bluetooth/system/stack/net_test_stack/android_riscv64_cfi/unstripped/net_test_stack64 prebuilts/clang/host/linux-x86/clang-r536225/bin/clang++ out/soong/.intermediates/bionic/libc/crtbegin_dynamic/android_riscv64/crtbegin_dynamic.o @out/soong/.intermediates/packages/modules/Bluetooth/system/stack/net_test_stack/android_riscv64_cfi/unstripped/net_test_stack64.rsp out/soong/.intermediates/bionic/libc/crtend_android/android_riscv64/crtend_android.o -o out/soong/.intermediates/packages/modules/Bluetooth/system/stack/net_test_stack/android_riscv64_cfi/unstripped/net_test_stack64 -target riscv64-linux-android1 -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now -Wl,--build-id=md5 -Wl,--fatal-warnings -Wl,--no-undefined-version -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libgcc_stripped.a -Wl,--exclude-libs,libunwind_llvm.a -Wl,--exclude-libs,libunwind.a -fuse-ld=lld -Wl,--icf=safe -Wl,--no-demangle -Wl,--compress-debug-sections=zstd -Wl,--pack-dyn-relocs=android+relr -Wl,--no-undefined -march=rv64gcv_zba_zbb_zbs -Wl,-mllvm -Wl,-jump-is-expensive=false -Wl,-z,max-page-size=4096 -pie -nostdlib -Bdynamic -Wl,--gc-sections -Wl,-z,nocopyreloc -Wl,-rpath,\$ORIGIN -flto -fsanitize-cfi-cross-dso -fsanitize=cfi -Wl,-plugin-opt,O1 -fsanitize=bounds,cfi -fno-sanitize-link-runtime -Wl,--exclude-libs=libclang_rt.builtins-riscv64-android.a -Wl,--exclude-libs=libclang_rt.ubsan_minimal-riscv64-android.a -Wl,-dynamic-linker,/system/bin/linker64 ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123241] [Clang] Missing AddressSpaceCast on CXX PointerToMember global.
Issue 123241 Summary [Clang] Missing AddressSpaceCast on CXX PointerToMember global. Labels clang Assignees Reporter jhuber6 The following code crashed when run on an NVIDIA or AMD GPU due to a missing address space cast https://godbolt.org/z/3vx6avrT6. ```c++ struct S { int x; }; [[clang::loader_uninitialized]] S [[clang::address_space(3)]] s; int &lookup(int S::*in) { return s.*in; } ``` The generated IR accesses the global `s` but does not emit an address space cast to the generic address space. We do not emit an address space cast, because it is missing from the AST like should normally be applied prior to the `ReturnStmt`. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123231] Erroneous "use of infinity" warning
Issue 123231 Summary Erroneous "use of infinity" warning Labels clang:diagnostics Assignees Reporter ahatanak clang incorrectly emits a warning when a method called infinity is called. $ cat test.cpp ``` double infinity() { return 0; } int main() { return infinity(); } ``` $ clang++ -ffast-math test.cpp -c test.cpp:4:11: warning: use of infinity is undefined behavior due to the currently enabled floating-point options [-Wnan-infinity-disabled] 4 |return infinity(); ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 123262] [clang++][aarch64] help optimize __builtin_mul_overflow performance
Issue 123262 Summary [clang++][aarch64] help optimize __builtin_mul_overflow performance Labels clang Assignees Reporter eric-yq Hi team, I have a sample code compiling with clang++, it shows 10 times slower than g++. The main performance issue is located in function `__builtin_mul_overflow under clang++` Can you help give some suggestions ? I do not want to use both g++ and clang++ in my CICD pipeline. Compiling command and `Time taken` comparation: `( 0.22 seconds vs. 0.02 seconds. )` ```c # Ubuntu 24.04, g++ 13.3 and clang++ 18.1.3 # Server:AWS c7g.xlarge(AWS Graviton3, Neoverse-V1) # g++ -std=c++17 -O3 -march=armv8-a+crc testint.cpp -o testint-g++ # ./testint-g++ Time taken for 1000 iterations: 0.0208047 seconds Sum of results: 9747553088193654009 # clang++ -std=c++17 -O3 -march=armv8-a+crc testint.cpp -o testint-clang++ --rtlib=compiler-rt # ./testint-clang++ Time taken for 1000 iterations: 0.226598 seconds ( 0.22 seconds vs. 0.02 seconds. ) Sum of results: 18269431752893742105 ``` Sample code: testint.cpp ```c #include #include #include #include #include // 定义 128 位整数类型(如果编译器支持) using int128_t = __int128; // 被基准测试的函数 inline bool int128_mul_overflow(int128_t a, int128_t b, volatile int128_t* c) { return __builtin_mul_overflow(a, b, c); } // 随机生成 128 位整数 int128_t generate_random_int128() { static std::mt19937_64 rng(std::random_device{}()); std::uniform_int_distribution dist(0, std::numeric_limits::max()); // 生成两个 64 位整数,并将它们组合成一个 128 位整数 int128_t high = static_cast(dist(rng)); int128_t low = static_cast(dist(rng)); return (high << 64) | low; } // 生成随机数据并存储在 vector 中 std::vector> generate_random_data(int count) { std::vector> data; data.reserve(count); for (int i = 0; i < count; ++i) { int128_t a = generate_random_int128(); int128_t b = generate_random_int128(); data.emplace_back(a, b); } return data; } // 基准测试函数 void benchmark_int128_mul_overflow(const std::vector>& data) { int128_t c = 0; int128_t sum = 0; // 用于累加结果 auto start = std::chrono::high_resolution_clock::now(); for (const auto& pair : data) { if (int128_mul_overflow(pair.first, pair.second, &c)) { sum += c; // 累加结果以防止优化 } } auto end = std::chrono::high_resolution_clock::now(); std::chrono::duration duration = end - start; std::cout << "Time taken for " << data.size() << " iterations: " << duration.count() << " seconds\n"; std::cout << "Sum of results: " << static_cast(sum) << "\n"; // 输出累加结果 } int main() { int iterations = 1000; // 可以根据需要调整迭代次数 auto data = "" benchmark_int128_mul_overflow(data); return 0; } ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs