[llvm-bugs] [Bug 119999] [mlir] Inconsistent output when executing MLIR program with `affine-parallelize` and `--affine-super-vectorize`
Issue 11 Summary [mlir] Inconsistent output when executing MLIR program with `affine-parallelize` and `--affine-super-vectorize` Labels mlir Assignees Reporter Emilyaxe git version: ff939b06a5 system: `Ubuntu 18.04.6 LTS` ## Description: I am experiencing an inconsistent result when executing the same MLIR program with and without `affine-parallelize` and `--affine-super-vectorize`. The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug. ## Steps to Reproduce: ### 1. **MLIR Program (tosa.mlir)**: tosa.mlir: ``` module { func.func private @printMemrefI32(tensor<*xi32>) func.func private @printMemrefF32(tensor<*xf32>) func.func @main() { %0 = "tosa.const"() <{value = dense<[0, 2, 1]> : tensor<3xi32>}> : () -> tensor<3xi32> %1 = "tosa.const"() <{value = dense<-12> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32> %2 = "tosa.const"() <{value = dense<1676> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32> %3 = "tosa.const"() <{value = dense<-10> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32> %4 = tosa.abs %2 : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32> %5 = tosa.clamp %4 {max_fp = 1.60e+01 : f32, max_int = 16 : i64, min_fp = 0.00e+00 : f32, min_int = 0 : i64} : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32> %6 = tosa.arithmetic_right_shift %2, %5 {round = true} : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32> %7 = tosa.minimum %6, %1 : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32> %8 = tosa.transpose %3, %0 : (tensor<1x4x21xi32>, tensor<3xi32>) -> tensor<1x21x4xi32> %9 = tosa.matmul %7, %8 : (tensor<1x4x21xi32>, tensor<1x21x4xi32>) -> tensor<1x4x4xi32> %cast = tensor.cast %9 : tensor<1x4x4xi32> to tensor<*xi32> call @printMemrefI32(%cast) : (tensor<*xi32>) -> () return } } ``` ### 2. **Command to Run without `affine-parallelize` and `--affine-super-vectorize` :** ``` /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops -tosa-to-arith -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm -convert-linalg-to-affine-loops -convert-vector-to-scf-convert-arith-to-llvm--affine-loop-coalescing -convert-vector-to-scf -convert-vector-to-llvm -convert-math-to-llvm -convert-arith-to-llvm -lower-affine -convert-scf-to-cf -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so ``` ### 3. **Output without `affine-parallelize` and `--affine-super-vectorize` :**: ``` [[[2520,2520,2520, 2520], [2520,2520,2520,2520], [2520,2520,2520, 2520], [2520,2520,2520,2520]]] ``` ### 4. **Command to Run with `affine-parallelize` and `--affine-super-vectorize` :** ``` /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops -tosa-to-arith -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm -convert-linalg-to-affine-loops --affine-parallelize-convert-vector-to-scf-convert-arith-to-llvm --affine-loop-coalescing -convert-vector-to-scf --affine-super-vectorize="virtual-vector-size=128 test-fastest-varying=0 vectorize-reductions=true" -convert-vector-to-llvm -convert-math-to-llvm -convert-arith-to-llvm -lower-affine -convert-scf-to-cf -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so ``` ### 5. **Output with `affine-parallelize` and `-
[llvm-bugs] [Bug 119995] libc: `MSVC_DEBUG_INFORMATION_FORMAT` value 'Embedded' not known for this ASM compiler.
Issue 119995 Summary libc: `MSVC_DEBUG_INFORMATION_FORMAT` value 'Embedded' not known for this ASM compiler. Labels libc Assignees SchrodingerZhu Reporter petrhosek In #119806 we're seeing the following error: ``` CMake Error in D:/a/llvm-project/llvm-project/libc/fuzzing/__support/CMakeLists.txt: MSVC_DEBUG_INFORMATION_FORMAT value 'Embedded' not known for this ASM compiler. ``` This is due to `-DCMAKE_MSVC_DEBUG_INFORMATION_FORMAT=Embedded` being set in https://github.com/llvm/llvm-project/blob/8c681a929b8684f5a4ad2ebd4e3e4f20036a9595/.github/workflows/libc-overlay-tests.yml#L76-L91 ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 119963] Clang doesn't accept udiv with 2 operands when assembling for Cortex-M3
Issue 119963 Summary Clang doesn't accept udiv with 2 operands when assembling for Cortex-M3 Labels clang Assignees Reporter mateusz-banaszek When using clang to assemble for ARM Cortex-M3, it doesn't accept an `udiv` instruction with 2 operands, for example: ```asm .syntax unified udiv r10, r5 ``` An attempt to assemble that using clang v19.1.5: ``` clang -nodefaultlibs -mcpu=cortex-m3 --target=armv7m-none-eabi ./example.s ``` results in an error: ``` ./example.s:3:15: error: too few operands for instruction udiv r10, r5 ^ ``` However, *Arm v7-M Architecture Reference Manual* says that the assembler syntax is `UDIV {,} , ` where *"`` Specifies the destination register. If `` is omitted, this register is the same as ``."* As a result, I expect clang to assemble that into: ``` fbba faf5 udiv r10, r10, r5 ``` GCC assembles it correctly: https://godbolt.org/z/oh8TnW99T. ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 119967] [packaging] libclc-20-dev from apt.llvm.org doesn't ship clcfunc.h
Issue 119967 Summary [packaging] libclc-20-dev from apt.llvm.org doesn't ship clcfunc.h Labels new issue Assignees Reporter TheRealCuran Trying to use `rusticl` from Mesa is, at the moment, not fully possible, since `libclc-20-dev` from apt.llvm.org is missing `clcfunc.h` (`libclc/clc/include/clc/clcfunc.h`). Currently the following files are installed to /usr/include/clc by libclc-20-dev /usr/include/clc/as_type.h /usr/include/clc/async/async_work_group_copy.h /usr/include/clc/async/async_work_group_strided_copy.h /usr/include/clc/async/prefetch.h /usr/include/clc/async/wait_group_events.h /usr/include/clc/atomic/atomic_add.h /usr/include/clc/atomic/atomic_and.h /usr/include/clc/atomic/atomic_cmpxchg.h /usr/include/clc/atomic/atomic_dec.h /usr/include/clc/atomic/atomic_inc.h /usr/include/clc/atomic/atomic_max.h /usr/include/clc/atomic/atomic_min.h /usr/include/clc/atomic/atomic_or.h /usr/include/clc/atomic/atomic_sub.h /usr/include/clc/atomic/atomic_xchg.h /usr/include/clc/atomic/atomic_xor.h /usr/include/clc/cl_khr_global_int32_base_atomics/atom_add.h /usr/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h /usr/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h /usr/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h /usr/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h /usr/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h /usr/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h /usr/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h /usr/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h /usr/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h /usr/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h /usr/include/clc/cl_khr_int64_base_atomics/atom_add.h /usr/include/clc/cl_khr_int64_base_atomics/atom_cmpxchg.h /usr/include/clc/cl_khr_int64_base_atomics/atom_dec.h /usr/include/clc/cl_khr_int64_base_atomics/atom_inc.h /usr/include/clc/cl_khr_int64_base_atomics/atom_sub.h /usr/include/clc/cl_khr_int64_base_atomics/atom_xchg.h /usr/include/clc/cl_khr_int64_extended_atomics/atom_and.h /usr/include/clc/cl_khr_int64_extended_atomics/atom_max.h /usr/include/clc/cl_khr_int64_extended_atomics/atom_min.h /usr/include/clc/cl_khr_int64_extended_atomics/atom_or.h /usr/include/clc/cl_khr_int64_extended_atomics/atom_xor.h /usr/include/clc/cl_khr_local_int32_base_atomics/atom_add.h /usr/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h /usr/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h /usr/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h /usr/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h /usr/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h /usr/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h /usr/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h /usr/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h /usr/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h /usr/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h /usr/include/clc/clc.h /usr/include/clc/clcmacros.h /usr/include/clc/common/degrees.h /usr/include/clc/common/mix.h /usr/include/clc/common/radians.h /usr/include/clc/common/sign.h /usr/include/clc/common/smoothstep.h /usr/include/clc/common/step.h /usr/include/clc/convert.h /usr/include/clc/explicit_fence/explicit_memory_fence.h /usr/include/clc/float/definitions.h /usr/include/clc/geometric/cross.h /usr/include/clc/geometric/distance.h /usr/include/clc/geometric/dot.h /usr/include/clc/geometric/fast_distance.h /usr/include/clc/geometric/fast_length.h /usr/include/clc/geometric/fast_normalize.h /usr/include/clc/geometric/length.h /usr/include/clc/geometric/normalize.h /usr/include/clc/image/image.h /usr/include/clc/image/image_defines.h /usr/include/clc/integer/abs.h /usr/include/clc/integer/abs_diff.h /usr/include/clc/integer/add_sat.h /usr/include/clc/integer/clz.h /usr/include/clc/integer/definitions.h /usr/include/clc/integer/hadd.h /usr/include/clc/integer/mad24.h /usr/include/clc/integer/mad_hi.h /usr/include/clc/integer/mad_sat.h /usr/include/clc/integer/mul24.h /usr/include/clc/integer/mul_hi.h /usr/include/clc/integer/popcount.h /usr/include/clc/integer/rhadd.h /usr/include/clc/integer/rotate.h /usr/include/clc/integer/sub_sat.h /usr/include/clc/integer/upsample.h /usr/include/clc/math/acos.h /usr/include/clc/math/acosh.h /usr/include/clc/math/acospi.h /usr/include/clc/math/asin.h /usr/include/clc/math/asinh.h /usr/include/clc/math/asinpi.h /usr/include/clc/math/atan.h /usr/include/clc/math/atan2.h /usr/include/clc/math/atan2pi.h /usr/include/clc/math/atanh.h /usr/include/clc/math/atanpi.h /usr/include/clc/math/cbrt.h /usr/include/clc/math/ceil.h /usr/include/clc/math/copysign.h /usr/include/clc/math/cos.h /usr/include/clc/math/co
[llvm-bugs] [Bug 119972] Issues with udf when assembling for Cortex-M3
Issue 119972 Summary Issues with udf when assembling for Cortex-M3 Labels new issue Assignees Reporter mateusz-banaszek When using clang to assemble for ARM Cortex-M3, I've come across 2 issues with the `udf` (Permanently Undefined) instruction. To illustrate: ```asm .syntax unified @ Issue 1 udf #256 @ Issue 2 it eq udfeq #20 ``` An attempt to assemble that using clang v19.1.5: ``` clang -nodefaultlibs -mcpu=cortex-m3 --target=armv7m-none-eabi ./example.s ``` results in 2 errors: ``` ./example.s:4:9: error: operand must be an immediate in the range [0,255] udf #256 ^ ./example.s:8:1: error: instruction 'udf' is not predicable, but condition code specified udfeq #20 ^ ``` However, I don't see a reason to that. *Arm v7-M Architecture Reference Manual* shows 2 instruction encodings: T2 accepts `imm16`, and both accept a condition: > Encoding T1: UDF\ #\ > Encoding T2: UDF\.W #\ Surprisingly, the [`udf-thumb-2-diagnostics.s`](https://github.com/llvm/llvm-project/blob/main/llvm/test/MC/ARM/udf-thumb-2-diagnostics.s) test verifies that clang does generate these errors. However, even after reviewing the [`27351f2`](https://github.com/llvm/llvm-project/commit/27351f2022c56b830f91d7f526775693fd9043e9) commit which implemented `udf`, I still don't see a reason to that. I expect that the snippet is assembled to: ``` f7f0 a100 udf.w #256 bf08 it eq de14 udfeq #20 ``` So I expect that for Issue 1 clang chooses itself the wide encoding (as it does for, e.g., `adds r0, r1, #1024` vs. `adds r0, r1, #2`), whereas for Issue 2 clang assembles it as the instruction accepts the condition and it is predicable (*Arm v7-M Architecture Reference Manual*: *"`UNDEFINED` Indicates an instruction that generates an Undefined Instruction exception."*). GCC assembles it correctly: https://godbolt.org/z/Y8q7KbzKG. Or is there something I don't understand? ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 119979] `Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp` fails on 32-bit systems
Issue 119979 Summary `Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp` fails on 32-bit systems Labels Assignees Reporter mgorny The test added in #119719 is failing on 32-bit systems (e.g. x86): ``` FAIL: Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp (18443 of 21238) TEST 'Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp' FAILED Exit Code: 1 Command Output (stderr): -- RUN: at line 1: /var/tmp/portage/llvm-core/clang-20.0.0./work/x/y/clang-abi_x86_32.x86/bin/clang --driver-mode=cl -fms-compatibility -Xclang -ast-dump -fsyntax-only -- /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp | /usr/lib/llvm/20/bin/FileCheck /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp + /var/tmp/portage/llvm-core/clang-20.0.0./work/x/y/clang-abi_x86_32.x86/bin/clang --driver-mode=cl -fms-compatibility -Xclang -ast-dump -fsyntax-only -- /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp + /usr/lib/llvm/20/bin/FileCheck /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp:21:12: error: CHECK: expected string not found in input // CHECK: CXXMethodDecl {{.*}} foo 'int ()' delete ^ :34:61: note: scanning from here | `-NoBuiltinAttr 0x57422580 <> Implicit fabsf ^ :44:4: note: possible intended match here | |-CXXMethodDecl 0x57470738 col:9 foo 'int () __attribute__((thiscall))' delete implicit-inline ^ Input file: Check file: /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp -dump-input=help explains the following input dump. Input was: << 1: TranslationUnitDecl 0x57421708 <> 2: |-CXXRecordDecl 0x57421b60 <> implicit struct _GUID 3: | `-TypeVisibilityAttr 0x57421be0 <> Implicit Default 4: |-TypedefDecl 0x57421ec0 <> implicit __NSConstantString '__NSConstantString_tag' 5: | `-RecordType 0x57421ce0 '__NSConstantString_tag' 6: | `-CXXRecord 0x57421c88 '__NSConstantString_tag' 7: |-CXXRecordDecl 0x57421ef0 <> implicit class type_info 8: | `-TypeVisibilityAttr 0x57421f70 <> Implicit Default 9: |-TypedefDecl 0x57421fc8 <> implicit size_t 'unsigned int' 10: | `-BuiltinType 0x574217f0 'unsigned int' 11: |-TypedefDecl 0x57421c58 <> implicit __builtin_va_list 'char *' 12: | `-PointerType 0x57421c20 'char *' 13: | `-BuiltinType 0x57421770 'char' 14: |-LinkageSpecDecl 0x57422010 col:8 C 15: | `-FunctionDecl 0x57422140 col:35 fabsf 'float (float) __attribute__((cdecl))':'float (float)' inline 16: | |-ParmVarDecl 0x57422060 col:49 _X 'float' 17: | |-BuiltinAttr 0x57422208 <> Implicit 556 18: | |-NoThrowAttr 0x57422248 Implicit 19: | `-ConstAttr 0x57422268 Implicit 20: |-FunctionDecl 0x57422350 prev 0x57422140 line:6:26 fabsf 'float (float) __attribute__((cdecl))':'float (float)' inline 21: | |-ParmVarDecl 0x574222a0 col:40 _X 'float' 22: | |-CompoundStmt 0x574224bc 23: | | `-ReturnStmt 0x574224b0 24: | | `-ImplicitCastExpr 0x574224a0 'float' 25: | | `-IntegerLiteral 0x57422480 'int' 0 26: | |-BuiltinAttr 0x574223e0 <> Inherited Implicit 556 27: | |-NoThrowAttr 0x57422400 Inherited Implicit 28: | |-ConstAttr 0x57422420 Inherited Implicit 29: | `-NoBuiltinAttr 0x57422440 <> Implicit fabsf 30: |-FunctionDecl 0x57422518 line:13:5 bar 'int ()' 31: | |-CompoundStmt 0x57422608 32: | | `-ReturnStmt 0x574225fc 33: | | `-IntegerLiteral 0x574225e0 'int' 0 34: | `-NoBuiltinAttr 0x57422580 <> Implicit fabsf check:21'0 X error: no match found 35: |-CXXRecordDecl 0x57422620 line:19:8 struct A definition check:21'0 36: | |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_user_dec lared_ctor has_constexpr_non_copy_move_ctor can_const_default_init check:21'0 ~~~ 37: | | |-DefaultConstructor exists trivial constexpr defaulted_is_constexpr check:21'0 ~~~
[llvm-bugs] [Bug 119956] Misaligned LOAD segment on llvm-strip output
Issue 119956 Summary Misaligned LOAD segment on llvm-strip output Labels new issue Assignees Reporter tzik I hit a misaligned llvm-strip output. On the second column of the output below should end with 000 to be aligned correctly. ``` - LOAD 0x00c000 0xc000 0xc000 0x001488 0x001489 RW 0x1000 + LOAD 0x00c002 0xc000 0xc000 0x001488 0x001489 RW 0x1000 ``` #56738 may be a related issue, but my case didn't use llvm-bolt, and my llvm-strip was new enough to contain [a suggested fix there](https://github.com/llvm/llvm-project/issues/56738#issuecomment-2449258980). (I'm not sure this is a llvm-strip issue or mold issue, tho.) Here is a repro case: ``` #!/bin/bash set -eu cd "$(dirname "$0")" if [ ! -e abseil-cpp ]; then git clone --depth=1 -b 20240722.0 https://github.com/abseil/abseil-cpp.git fi rm -rf build mkdir -p build cd build clang++ --version ld.mold --version llvm-strip --version echo '__attribute__((weak)) void foo() {}' > dummy.cc clang -fPIC -o dummy.o -c dummy.cc clang -fuse-ld=mold -shared -o 00.so dummy.o so_list=({01..22}.so) for so_file in "${so_list[@]}"; do cp 00.so "${so_file}" done clang++ -Draw_hash_set_EXPORTS -I../abseil-cpp -DNDEBUG -fPIC -o foo.o -c ../abseil-cpp/absl/container/internal/raw_hash_set.cc clang++ -fuse-ld=mold -shared -Wl,-soname,foo.so -o foo.so foo.o -Wl,-rpath,'$ORIGIN' "${so_list[@]}" llvm-readelf -W --segments foo.so | grep LOAD > before_strip llvm-strip foo.so llvm-readelf -W --segments foo.so | grep LOAD > after_strip diff -u0 before_strip after_strip ``` and its output on my env was: ``` clang version 19.1.5 (https://github.com/llvm/llvm-project.git ab4b5a2db582958af1ee308a790cfdb42bd24720) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /home/tzik/work/llvm/out/bin mold 1.0.3 (compatible with GNU ld) llvm-strip, compatible with GNU strip LLVM (http://llvm.org/): LLVM version 19.1.5 Optimized build. --- before_strip 2024-12-14 14:48:09.957991233 +0900 +++ after_strip 2024-12-14 14:48:09.967991256 +0900 @@ -3 +3 @@ - LOAD 0x00c000 0xc000 0xc000 0x001488 0x001489 RW 0x1000 + LOAD 0x00c002 0xc000 0xc000 0x001488 0x001489 RW 0x1000 ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 119959] [llvm] cmpxchg16b uses pointer from overwritten rbx
Issue 119959 Summary [llvm] cmpxchg16b uses pointer from overwritten rbx Labels new issue Assignees Reporter vasama Reduced IR: ```ll target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "amd64-pc-windows-msvc19.41.34123" %struct.anon = type { [2 x %"struct.(anonymous namespace)::mt_shared_object"], %"class.vsm::atomic_intrusive_ptr", [48 x i8], %"struct.std::atomic.3", [56 x i8], %"struct.std::atomic_flag", [60 x i8] } %"struct.(anonymous namespace)::mt_shared_object" = type { %"class.vsm::detail::basic_intrusive_refcount", %"struct.std::atomic_flag", ptr, [40 x i8] } %"class.vsm::detail::basic_intrusive_refcount" = type { %"struct.vsm::detail::intrusive_refcount_base" } %"struct.vsm::detail::intrusive_refcount_base" = type { %"class.vsm::atomic" } %"class.vsm::atomic" = type { i64 } %"class.vsm::atomic_intrusive_ptr" = type { %"class.vsm::atomic.2" } %"class.vsm::atomic.2" = type { %"struct.vsm::atomic_intrusive_ptr<(anonymous namespace)::mt_shared_object>::atom" } %"struct.vsm::atomic_intrusive_ptr<(anonymous namespace)::mt_shared_object>::atom" = type { ptr, i64 } %"struct.std::atomic.3" = type { %"struct.std::_Atomic_integral_facade.4" } %"struct.std::_Atomic_integral_facade.4" = type { %"struct.std::_Atomic_integral.5" } %"struct.std::_Atomic_integral.5" = type { %"struct.std::_Atomic_storage.6" } %"struct.std::_Atomic_storage.6" = type { %"struct.std::_Atomic_padded.7" } %"struct.std::_Atomic_padded.7" = type { i64 } %"struct.std::atomic_flag" = type { %"struct.std::atomic" } %"struct.std::atomic" = type { %"struct.std::_Atomic_integral_facade" } %"struct.std::_Atomic_integral_facade" = type { %"struct.std::_Atomic_integral" } %"struct.std::_Atomic_integral" = type { %"struct.std::_Atomic_storage" } %"struct.std::_Atomic_storage" = type { %"struct.std::_Atomic_padded" } %"struct.std::_Atomic_padded" = type { i32 } define fastcc void @"?test_case@?A0x7E1854EA@@YAXXZ"() #0 personality ptr @__CxxFrameHandler3 { %1 = alloca [0 x [0 x %struct.anon]], i32 0, align 64 %2 = cmpxchg ptr %1, i128 0, i128 0 monotonic monotonic, align 16 invoke void @"?_Throw_Cpp_error@std@@YAXH@Z"(i32 0) to label %3 unwind label %4 3:; preds = %0 unreachable 4:; preds = %0 %5 = cleanuppad within none [] ret void } declare i32 @__CxxFrameHandler3(...) declare void @"?_Throw_Cpp_error@std@@YAXH@Z"() ; uselistorder directives uselistorder i32 0, { 1, 0 } attributes #0 = { "target-cpu"="nehalem" } ``` Here is the resulting object code: (`clang-19 -cc1 -emit-obj -triple "amd64-pc-windows-msvc19.41.34123" -O3 reduced.ll -o - | llvm-objdump-19 -M intel -d -`) ```asm : 0: 55 pushrbp 1: 53 pushrbx 2: 48 83 ec 68 sub rsp, 0x68 6: 48 8d 6c 24 60lea rbp, [rsp + 0x60] b: 48 83 e4 c0 and rsp, -0x40 f: 48 89 e3 mov rbx, rsp 12: 48 89 6b 58 mov qword ptr [rbx + 0x58], rbp 16: 48 c7 45 00 fe ff ff ff mov qword ptr [rbp], -0x2 1e: 49 89 d8 mov r8, rbx 21: 45 31 c9 xor r9d, r9d 24: 31 c0 xor eax, eax 26: 31 d2 xor edx, edx 28: 31 c9 xor ecx, ecx 2a: 4c 89 cb mov rbx, r9 2d: f0 lock 2e: 48 0f c7 4b 40 cmpxchg16b xmmword ptr [rbx + 0x40] 33: 4c 89 c3 mov rbx, r8 36: 31 c9 xor ecx, ecx 38: e8 00 00 00 00call0x3d 3d: cc int3 3e: 66 90 nop ``` Note `mov rbx, r9` followed by `cmpxchg16b xmmword ptr [rbx + 0x40]` where `rbx` is used after having just been overwritten for the purposes of `cmpxchg16b` which uses it as an input register. The original unreduced input produces slightly different object code but has the same problem: ```asm 7FF786689459 lea r8,[rbx+100h] 7FF786689460 mov rax,qword ptr [rbx+140h] 7FF786689467 mov rdx,qword ptr [rbx+148h] 7FF78668946E nop 7FF786689470 mov r9,rbx 7FF786689473 xor ecx,ecx 7FF786689475 mov rbx,r8 7FF786689478 lock cmpxchg16b oword ptr [rbx+140h] ``` ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs
[llvm-bugs] [Bug 119960] [openmp] Problem with OpenMP user-declared reduction
Issue 119960 Summary [openmp] Problem with OpenMP user-declared reduction Labels Assignees Reporter CDK6182CHR ## Brief description Recently, I encounter some problem while using OpenMP user-declared reduction (`#pragma omp declare reduction`) for Eigen arrays or vectors. I found the reduction may fail (that cause incorrect result and may be different in each run) when I am using `clang` compiler with LLVM OpenMP library. The same problem is not found in GCC compiler. I wonder whether I did something wrong (for example, incorrect configuration of LLVM; incorrect declaration of OpenMP reduction, etc), or there may be some issues within LLVM OpenMP. I will appreciate it if anyone could give me some suggestion about this problem. Thanks in advance. ## Version and environment LLVM version 18.1.8, built with GCC 14.1.0 compiler, using the CMake configuration command: ```shell CC=gcc CXX=g++ cmake ../llvm -DLLVM_ENABLE_PROJECTS="clang;openmp;flang;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/18.1.8 ``` The GCC toolchain selected by the `clang` compiler (explicitly specified with `clang++ --gcc-toolchain=/xxx/gcc14.1.0`) is GCC 14.1.0. The OS that runs the test is CentOS 7. Latest Eigen (3.4.0) is used. ## Minimal example Please see the attached ZIP file for this example, including source and CMake. The code is also listed here for convenience: ```C++ #include #include #include #pragma omp declare reduction(redEigen: Eigen::Vector2i: omp_out += omp_in) initializer(omp_priv=omp_orig) void test_reduction() { Eigen::Vector2i res; res.setZero(); constexpr int num = 200; #pragma omp parallel for reduction(redEigen:res) for (int i=0; i ___ llvm-bugs mailing list llvm-bugs@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs