[llvm-bugs] [Bug 119999] [mlir] Inconsistent output when executing MLIR program with `affine-parallelize` and `--affine-super-vectorize`

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

11




Summary

[mlir] Inconsistent output when executing MLIR program with `affine-parallelize` and `--affine-super-vectorize`




  Labels
  
mlir
  



  Assignees
  
  



  Reporter
  
  Emilyaxe
  




git version: ff939b06a5

system: `Ubuntu 18.04.6 LTS`

## Description:
I am experiencing an inconsistent result when executing the same MLIR program with and without `affine-parallelize` and `--affine-super-vectorize`.
The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug.


## Steps to Reproduce:


### 1. **MLIR Program (tosa.mlir)**:
tosa.mlir: 
``` 
module {
  func.func private @printMemrefI32(tensor<*xi32>)
  func.func private @printMemrefF32(tensor<*xf32>)
  func.func @main() {
%0 = "tosa.const"() <{value = dense<[0, 2, 1]> : tensor<3xi32>}> : () -> tensor<3xi32>
%1 = "tosa.const"() <{value = dense<-12> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
%2 = "tosa.const"() <{value = dense<1676> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
 %3 = "tosa.const"() <{value = dense<-10> : tensor<1x4x21xi32>}> : () -> tensor<1x4x21xi32>
%4 = tosa.abs %2 : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%5 = tosa.clamp %4 {max_fp = 1.60e+01 : f32, max_int = 16 : i64, min_fp = 0.00e+00 : f32, min_int = 0 : i64} : (tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%6 = tosa.arithmetic_right_shift %2, %5 {round = true} : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%7 = tosa.minimum %6, %1 : (tensor<1x4x21xi32>, tensor<1x4x21xi32>) -> tensor<1x4x21xi32>
%8 = tosa.transpose %3, %0 : (tensor<1x4x21xi32>, tensor<3xi32>) -> tensor<1x21x4xi32>
%9 = tosa.matmul %7, %8 : (tensor<1x4x21xi32>, tensor<1x21x4xi32>) -> tensor<1x4x4xi32>
%cast = tensor.cast %9 : tensor<1x4x4xi32> to tensor<*xi32>
call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
return
  }
}

``` 

 ### 2. **Command to Run without  `affine-parallelize` and `--affine-super-vectorize` :**

``` 
/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops   -tosa-to-arith  -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm -convert-linalg-to-affine-loops -convert-vector-to-scf-convert-arith-to-llvm--affine-loop-coalescing -convert-vector-to-scf  -convert-vector-to-llvm -convert-math-to-llvm -convert-arith-to-llvm   -lower-affine -convert-scf-to-cf -finalize-memref-to-llvm  -convert-func-to-llvm  -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

``` 

### 3. **Output  without   `affine-parallelize` and `--affine-super-vectorize` :**:

``` 
[[[2520,2520,2520, 2520],
  [2520,2520,2520,2520],
  [2520,2520,2520, 2520],
  [2520,2520,2520,2520]]]

``` 

### 4. **Command to Run with  `affine-parallelize` and `--affine-super-vectorize`  :**


``` 
/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt tosa.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg-named,tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt --linalg-generalize-named-ops   -tosa-to-arith  -convert-math-to-llvm --test-linalg-elementwise-fusion-patterns="fuse-generic-ops-control" -one-shot-bufferize="bufferize-function-boundaries" -convert-arith-to-llvm -convert-linalg-to-affine-loops --affine-parallelize-convert-vector-to-scf-convert-arith-to-llvm --affine-loop-coalescing  -convert-vector-to-scf --affine-super-vectorize="virtual-vector-size=128 test-fastest-varying=0 vectorize-reductions=true"  -convert-vector-to-llvm -convert-math-to-llvm   -convert-arith-to-llvm   -lower-affine -convert-scf-to-cf   -finalize-memref-to-llvm  -convert-func-to-llvm -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so
``` 

### 5. **Output with  `affine-parallelize` and `-

[llvm-bugs] [Bug 119995] libc: `MSVC_DEBUG_INFORMATION_FORMAT` value 'Embedded' not known for this ASM compiler.

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119995




Summary

libc: `MSVC_DEBUG_INFORMATION_FORMAT` value 'Embedded' not known for this ASM compiler.




  Labels
  
libc
  



  Assignees
  
SchrodingerZhu
  



  Reporter
  
  petrhosek
  




In #119806 we're seeing the following error:
```
CMake Error in D:/a/llvm-project/llvm-project/libc/fuzzing/__support/CMakeLists.txt:
 MSVC_DEBUG_INFORMATION_FORMAT value 'Embedded' not known for this ASM
 compiler.
```
This is due to `-DCMAKE_MSVC_DEBUG_INFORMATION_FORMAT=Embedded` being set in https://github.com/llvm/llvm-project/blob/8c681a929b8684f5a4ad2ebd4e3e4f20036a9595/.github/workflows/libc-overlay-tests.yml#L76-L91


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 119963] Clang doesn't accept udiv with 2 operands when assembling for Cortex-M3

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119963




Summary

Clang doesn't accept udiv with 2 operands when assembling for Cortex-M3




  Labels
  
clang
  



  Assignees
  
  



  Reporter
  
  mateusz-banaszek
  




When using clang to assemble for ARM Cortex-M3, it doesn't accept an `udiv` instruction with 2 operands, for example:

```asm
.syntax unified

udiv r10, r5
```

An attempt to assemble that using clang v19.1.5:

```
clang -nodefaultlibs -mcpu=cortex-m3 --target=armv7m-none-eabi ./example.s
```

results in an error:

```
./example.s:3:15: error: too few operands for instruction
udiv   r10, r5
  ^
```

However, *Arm v7-M Architecture Reference Manual* says that the assembler syntax is `UDIV {,} , ` where *"`` Specifies the destination register. If `` is omitted, this register is the same as ``."* As a result, I expect clang to assemble that into:
```
fbba faf5   udiv r10, r10, r5
```
GCC assembles it correctly: https://godbolt.org/z/oh8TnW99T.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 119967] [packaging] libclc-20-dev from apt.llvm.org doesn't ship clcfunc.h

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119967




Summary

[packaging] libclc-20-dev from apt.llvm.org doesn't ship clcfunc.h




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  TheRealCuran
  




Trying to use `rusticl` from Mesa is, at the moment, not fully possible, since `libclc-20-dev` from apt.llvm.org is missing `clcfunc.h` (`libclc/clc/include/clc/clcfunc.h`).


Currently the following files are installed to /usr/include/clc by libclc-20-dev

/usr/include/clc/as_type.h
/usr/include/clc/async/async_work_group_copy.h
/usr/include/clc/async/async_work_group_strided_copy.h
/usr/include/clc/async/prefetch.h
/usr/include/clc/async/wait_group_events.h
/usr/include/clc/atomic/atomic_add.h
/usr/include/clc/atomic/atomic_and.h
/usr/include/clc/atomic/atomic_cmpxchg.h
/usr/include/clc/atomic/atomic_dec.h
/usr/include/clc/atomic/atomic_inc.h
/usr/include/clc/atomic/atomic_max.h
/usr/include/clc/atomic/atomic_min.h
/usr/include/clc/atomic/atomic_or.h
/usr/include/clc/atomic/atomic_sub.h
/usr/include/clc/atomic/atomic_xchg.h
/usr/include/clc/atomic/atomic_xor.h
/usr/include/clc/cl_khr_global_int32_base_atomics/atom_add.h
/usr/include/clc/cl_khr_global_int32_base_atomics/atom_cmpxchg.h
/usr/include/clc/cl_khr_global_int32_base_atomics/atom_dec.h
/usr/include/clc/cl_khr_global_int32_base_atomics/atom_inc.h
/usr/include/clc/cl_khr_global_int32_base_atomics/atom_sub.h
/usr/include/clc/cl_khr_global_int32_base_atomics/atom_xchg.h
/usr/include/clc/cl_khr_global_int32_extended_atomics/atom_and.h
/usr/include/clc/cl_khr_global_int32_extended_atomics/atom_max.h
/usr/include/clc/cl_khr_global_int32_extended_atomics/atom_min.h
/usr/include/clc/cl_khr_global_int32_extended_atomics/atom_or.h
/usr/include/clc/cl_khr_global_int32_extended_atomics/atom_xor.h
/usr/include/clc/cl_khr_int64_base_atomics/atom_add.h
/usr/include/clc/cl_khr_int64_base_atomics/atom_cmpxchg.h
/usr/include/clc/cl_khr_int64_base_atomics/atom_dec.h
/usr/include/clc/cl_khr_int64_base_atomics/atom_inc.h
/usr/include/clc/cl_khr_int64_base_atomics/atom_sub.h
/usr/include/clc/cl_khr_int64_base_atomics/atom_xchg.h
/usr/include/clc/cl_khr_int64_extended_atomics/atom_and.h
/usr/include/clc/cl_khr_int64_extended_atomics/atom_max.h
/usr/include/clc/cl_khr_int64_extended_atomics/atom_min.h
/usr/include/clc/cl_khr_int64_extended_atomics/atom_or.h
/usr/include/clc/cl_khr_int64_extended_atomics/atom_xor.h
/usr/include/clc/cl_khr_local_int32_base_atomics/atom_add.h
/usr/include/clc/cl_khr_local_int32_base_atomics/atom_cmpxchg.h
/usr/include/clc/cl_khr_local_int32_base_atomics/atom_dec.h
/usr/include/clc/cl_khr_local_int32_base_atomics/atom_inc.h
/usr/include/clc/cl_khr_local_int32_base_atomics/atom_sub.h
/usr/include/clc/cl_khr_local_int32_base_atomics/atom_xchg.h
/usr/include/clc/cl_khr_local_int32_extended_atomics/atom_and.h
/usr/include/clc/cl_khr_local_int32_extended_atomics/atom_max.h
/usr/include/clc/cl_khr_local_int32_extended_atomics/atom_min.h
/usr/include/clc/cl_khr_local_int32_extended_atomics/atom_or.h
/usr/include/clc/cl_khr_local_int32_extended_atomics/atom_xor.h
/usr/include/clc/clc.h
/usr/include/clc/clcmacros.h
/usr/include/clc/common/degrees.h
/usr/include/clc/common/mix.h
/usr/include/clc/common/radians.h
/usr/include/clc/common/sign.h
/usr/include/clc/common/smoothstep.h
/usr/include/clc/common/step.h
/usr/include/clc/convert.h
/usr/include/clc/explicit_fence/explicit_memory_fence.h
/usr/include/clc/float/definitions.h
/usr/include/clc/geometric/cross.h
/usr/include/clc/geometric/distance.h
/usr/include/clc/geometric/dot.h
/usr/include/clc/geometric/fast_distance.h
/usr/include/clc/geometric/fast_length.h
/usr/include/clc/geometric/fast_normalize.h
/usr/include/clc/geometric/length.h
/usr/include/clc/geometric/normalize.h
/usr/include/clc/image/image.h
/usr/include/clc/image/image_defines.h
/usr/include/clc/integer/abs.h
/usr/include/clc/integer/abs_diff.h
/usr/include/clc/integer/add_sat.h
/usr/include/clc/integer/clz.h
/usr/include/clc/integer/definitions.h
/usr/include/clc/integer/hadd.h
/usr/include/clc/integer/mad24.h
/usr/include/clc/integer/mad_hi.h
/usr/include/clc/integer/mad_sat.h
/usr/include/clc/integer/mul24.h
/usr/include/clc/integer/mul_hi.h
/usr/include/clc/integer/popcount.h
/usr/include/clc/integer/rhadd.h
/usr/include/clc/integer/rotate.h
/usr/include/clc/integer/sub_sat.h
/usr/include/clc/integer/upsample.h
/usr/include/clc/math/acos.h
/usr/include/clc/math/acosh.h
/usr/include/clc/math/acospi.h
/usr/include/clc/math/asin.h
/usr/include/clc/math/asinh.h
/usr/include/clc/math/asinpi.h
/usr/include/clc/math/atan.h
/usr/include/clc/math/atan2.h
/usr/include/clc/math/atan2pi.h
/usr/include/clc/math/atanh.h
/usr/include/clc/math/atanpi.h
/usr/include/clc/math/cbrt.h
/usr/include/clc/math/ceil.h
/usr/include/clc/math/copysign.h
/usr/include/clc/math/cos.h
/usr/include/clc/math/co

[llvm-bugs] [Bug 119972] Issues with udf when assembling for Cortex-M3

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119972




Summary

Issues with udf when assembling for Cortex-M3




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  mateusz-banaszek
  




When using clang to assemble for ARM Cortex-M3, I've come across 2 issues with the `udf` (Permanently Undefined) instruction. To illustrate:

```asm
.syntax unified

@ Issue 1
udf #256

@ Issue 2
it eq
udfeq   #20
```

An attempt to assemble that using clang v19.1.5:

```
clang -nodefaultlibs -mcpu=cortex-m3 --target=armv7m-none-eabi ./example.s
```

results in 2 errors:

```
./example.s:4:9: error: operand must be an immediate in the range [0,255]
udf #256
^
./example.s:8:1: error: instruction 'udf' is not predicable, but condition code specified
udfeq #20
^
```

However, I don't see a reason to that. *Arm v7-M Architecture Reference Manual* shows 2 instruction encodings: T2 accepts `imm16`, and both accept a condition:

> Encoding T1: UDF\ #\
> Encoding T2: UDF\.W #\

Surprisingly, the [`udf-thumb-2-diagnostics.s`](https://github.com/llvm/llvm-project/blob/main/llvm/test/MC/ARM/udf-thumb-2-diagnostics.s) test verifies that clang does generate these errors. However, even after reviewing the [`27351f2`](https://github.com/llvm/llvm-project/commit/27351f2022c56b830f91d7f526775693fd9043e9) commit which implemented `udf`, I still don't see a reason to that. I expect that the snippet is assembled to:

```
f7f0 a100   udf.w #256
bf08   it eq
de14   udfeq #20
```

So I expect that for Issue 1 clang chooses itself the wide encoding (as it does for, e.g., `adds r0, r1, #1024` vs. `adds r0, r1, #2`), whereas for Issue 2 clang assembles it as the instruction accepts the condition and it is predicable (*Arm v7-M Architecture Reference Manual*: *"`UNDEFINED` Indicates an instruction that generates an Undefined Instruction exception."*). GCC assembles it correctly: https://godbolt.org/z/Y8q7KbzKG.

Or is there something I don't understand?


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 119979] `Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp` fails on 32-bit systems

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119979




Summary

`Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp` fails on 32-bit systems




  Labels
  
  



  Assignees
  
  



  Reporter
  
  mgorny
  




The test added in #119719 is failing on 32-bit systems (e.g. x86):

```
FAIL: Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp (18443 of 21238)
 TEST 'Clang :: SemaCXX/msvc-pragma-function-no-builtin-attr.cpp' FAILED 
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: /var/tmp/portage/llvm-core/clang-20.0.0./work/x/y/clang-abi_x86_32.x86/bin/clang --driver-mode=cl -fms-compatibility -Xclang -ast-dump -fsyntax-only -- /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp | /usr/lib/llvm/20/bin/FileCheck /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp
+ /var/tmp/portage/llvm-core/clang-20.0.0./work/x/y/clang-abi_x86_32.x86/bin/clang --driver-mode=cl -fms-compatibility -Xclang -ast-dump -fsyntax-only -- /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp
+ /usr/lib/llvm/20/bin/FileCheck /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp
/var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp:21:12: error: CHECK: expected string not found in input
 // CHECK: CXXMethodDecl {{.*}} foo 'int ()' delete
   ^
:34:61: note: scanning from here
| `-NoBuiltinAttr 0x57422580 <> Implicit fabsf
 ^
:44:4: note: possible intended match here
| |-CXXMethodDecl 0x57470738  col:9 foo 'int () __attribute__((thiscall))' delete implicit-inline
 ^

Input file: 
Check file: /var/tmp/portage/llvm-core/clang-20.0.0./work/clang/test/SemaCXX/msvc-pragma-function-no-builtin-attr.cpp

-dump-input=help explains the following input dump.

Input was:
<<
1: TranslationUnitDecl 0x57421708 <>  
 2: |-CXXRecordDecl 0x57421b60 <>  implicit struct _GUID 
3: | `-TypeVisibilityAttr 0x57421be0 <> Implicit Default 
4: |-TypedefDecl 0x57421ec0 <>  implicit __NSConstantString '__NSConstantString_tag' 
 5: | `-RecordType 0x57421ce0 '__NSConstantString_tag' 
6: | `-CXXRecord 0x57421c88 '__NSConstantString_tag' 
7: |-CXXRecordDecl 0x57421ef0 <>  implicit class type_info 
8: | `-TypeVisibilityAttr 0x57421f70 <> Implicit Default 
9: |-TypedefDecl 0x57421fc8 <>  implicit size_t 'unsigned int' 
   10: | `-BuiltinType 0x574217f0 'unsigned int' 
   11: |-TypedefDecl 0x57421c58 <>  implicit __builtin_va_list 'char *' 
   12: | `-PointerType 0x57421c20 'char *' 
   13: | `-BuiltinType 0x57421770 'char' 
   14: |-LinkageSpecDecl 0x57422010  col:8 C 
   15: | `-FunctionDecl 0x57422140  col:35 fabsf 'float (float) __attribute__((cdecl))':'float (float)' inline 
 16: | |-ParmVarDecl 0x57422060  col:49 _X 'float' 
   17: | |-BuiltinAttr 0x57422208 <> Implicit 556 
 18: | |-NoThrowAttr 0x57422248  Implicit 
   19: | `-ConstAttr 0x57422268  Implicit 
   20: |-FunctionDecl 0x57422350 prev 0x57422140  line:6:26 fabsf 'float (float) __attribute__((cdecl))':'float (float)' inline 
   21: | |-ParmVarDecl 0x574222a0  col:40 _X 'float' 
   22: | |-CompoundStmt 0x574224bc  
   23: | | `-ReturnStmt 0x574224b0  
   24: | | `-ImplicitCastExpr 0x574224a0  'float'  
 25: | | `-IntegerLiteral 0x57422480  'int' 0 
   26: | |-BuiltinAttr 0x574223e0 <> Inherited Implicit 556 
 27: | |-NoThrowAttr 0x57422400  Inherited Implicit 
 28: | |-ConstAttr 0x57422420  Inherited Implicit 
   29: | `-NoBuiltinAttr 0x57422440 <> Implicit fabsf 
   30: |-FunctionDecl 0x57422518  line:13:5 bar 'int ()' 
 31: | |-CompoundStmt 0x57422608  
   32: | | `-ReturnStmt 0x574225fc  
   33: | | `-IntegerLiteral 0x574225e0  'int' 0 
   34: | `-NoBuiltinAttr 0x57422580 <> Implicit fabsf 
check:21'0 X error: no match found
   35: |-CXXRecordDecl 0x57422620  line:19:8 struct A definition 
check:21'0 
 36: | |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_user_dec
lared_ctor has_constexpr_non_copy_move_ctor can_const_default_init 
check:21'0 
~~~
 37: | | |-DefaultConstructor exists trivial constexpr defaulted_is_constexpr 
check:21'0 ~~~

[llvm-bugs] [Bug 119956] Misaligned LOAD segment on llvm-strip output

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119956




Summary

Misaligned LOAD segment on llvm-strip output




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  tzik
  




I hit a misaligned llvm-strip output.
On the second column of the output below should end with 000 to be aligned correctly.
```
-  LOAD   0x00c000 0xc000 0xc000 0x001488 0x001489 RW  0x1000
+  LOAD   0x00c002 0xc000 0xc000 0x001488 0x001489 RW 0x1000
```

#56738 may be a related issue, but my case didn't use llvm-bolt, and my llvm-strip was new enough to contain [a suggested fix there](https://github.com/llvm/llvm-project/issues/56738#issuecomment-2449258980).
(I'm not sure this is a llvm-strip issue or mold issue, tho.)

Here is a repro case:
```
#!/bin/bash
set -eu

cd "$(dirname "$0")"

if [ ! -e abseil-cpp ]; then
  git clone --depth=1 -b 20240722.0 https://github.com/abseil/abseil-cpp.git
fi

rm -rf build
mkdir -p build
cd build

clang++ --version
ld.mold --version
llvm-strip --version

echo '__attribute__((weak)) void foo() {}' > dummy.cc
clang -fPIC -o dummy.o -c dummy.cc
clang -fuse-ld=mold -shared -o 00.so dummy.o
so_list=({01..22}.so)
for so_file in "${so_list[@]}"; do
  cp 00.so "${so_file}"
done

clang++ -Draw_hash_set_EXPORTS -I../abseil-cpp -DNDEBUG -fPIC -o foo.o -c ../abseil-cpp/absl/container/internal/raw_hash_set.cc
clang++ -fuse-ld=mold -shared -Wl,-soname,foo.so -o foo.so foo.o -Wl,-rpath,'$ORIGIN' "${so_list[@]}"

llvm-readelf -W --segments foo.so | grep LOAD > before_strip
llvm-strip foo.so
llvm-readelf -W --segments foo.so | grep LOAD > after_strip

diff -u0 before_strip after_strip
```

and its output on my env was:
```
clang version 19.1.5 (https://github.com/llvm/llvm-project.git ab4b5a2db582958af1ee308a790cfdb42bd24720)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /home/tzik/work/llvm/out/bin
mold 1.0.3 (compatible with GNU ld)
llvm-strip, compatible with GNU strip
LLVM (http://llvm.org/):
  LLVM version 19.1.5
  Optimized build.
--- before_strip	2024-12-14 14:48:09.957991233 +0900
+++ after_strip	2024-12-14 14:48:09.967991256 +0900
@@ -3 +3 @@
-  LOAD   0x00c000 0xc000 0xc000 0x001488 0x001489 RW  0x1000
+  LOAD   0x00c002 0xc000 0xc000 0x001488 0x001489 RW  0x1000
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 119959] [llvm] cmpxchg16b uses pointer from overwritten rbx

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119959




Summary

[llvm] cmpxchg16b uses pointer from overwritten rbx




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  vasama
  




Reduced IR:

```ll
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "amd64-pc-windows-msvc19.41.34123"

%struct.anon = type { [2 x %"struct.(anonymous namespace)::mt_shared_object"], %"class.vsm::atomic_intrusive_ptr", [48 x i8], %"struct.std::atomic.3", [56 x i8], %"struct.std::atomic_flag", [60 x i8] }
%"struct.(anonymous namespace)::mt_shared_object" = type { %"class.vsm::detail::basic_intrusive_refcount", %"struct.std::atomic_flag", ptr, [40 x i8] }
%"class.vsm::detail::basic_intrusive_refcount" = type { %"struct.vsm::detail::intrusive_refcount_base" }
%"struct.vsm::detail::intrusive_refcount_base" = type { %"class.vsm::atomic" }
%"class.vsm::atomic" = type { i64 }
%"class.vsm::atomic_intrusive_ptr" = type { %"class.vsm::atomic.2" }
%"class.vsm::atomic.2" = type { %"struct.vsm::atomic_intrusive_ptr<(anonymous namespace)::mt_shared_object>::atom" }
%"struct.vsm::atomic_intrusive_ptr<(anonymous namespace)::mt_shared_object>::atom" = type { ptr, i64 }
%"struct.std::atomic.3" = type { %"struct.std::_Atomic_integral_facade.4" }
%"struct.std::_Atomic_integral_facade.4" = type { %"struct.std::_Atomic_integral.5" }
%"struct.std::_Atomic_integral.5" = type { %"struct.std::_Atomic_storage.6" }
%"struct.std::_Atomic_storage.6" = type { %"struct.std::_Atomic_padded.7" }
%"struct.std::_Atomic_padded.7" = type { i64 }
%"struct.std::atomic_flag" = type { %"struct.std::atomic" }
%"struct.std::atomic" = type { %"struct.std::_Atomic_integral_facade" }
%"struct.std::_Atomic_integral_facade" = type { %"struct.std::_Atomic_integral" }
%"struct.std::_Atomic_integral" = type { %"struct.std::_Atomic_storage" }
%"struct.std::_Atomic_storage" = type { %"struct.std::_Atomic_padded" }
%"struct.std::_Atomic_padded" = type { i32 }

define fastcc void @"?test_case@?A0x7E1854EA@@YAXXZ"() #0 personality ptr @__CxxFrameHandler3 {
  %1 = alloca [0 x [0 x %struct.anon]], i32 0, align 64
  %2 = cmpxchg ptr %1, i128 0, i128 0 monotonic monotonic, align 16
  invoke void @"?_Throw_Cpp_error@std@@YAXH@Z"(i32 0)
  to label %3 unwind label %4

3:; preds = %0
 unreachable

4:; preds = %0
  %5 = cleanuppad within none []
  ret void
}

declare i32 @__CxxFrameHandler3(...)

declare void @"?_Throw_Cpp_error@std@@YAXH@Z"()

; uselistorder directives
uselistorder i32 0, { 1, 0 }

attributes #0 = { "target-cpu"="nehalem" }
```

Here is the resulting object code:
(`clang-19 -cc1 -emit-obj -triple "amd64-pc-windows-msvc19.41.34123" -O3 reduced.ll -o - | llvm-objdump-19 -M intel -d -`)
```asm
 :
   0: 55 pushrbp
   1: 53 pushrbx
   2: 48 83 ec 68   sub rsp, 0x68
 6: 48 8d 6c 24 60lea rbp, [rsp + 0x60]
   b: 48 83 e4 c0   and rsp, -0x40
   f: 48 89 e3 mov rbx, rsp
  12: 48 89 6b 58   mov qword ptr [rbx + 0x58], rbp
  16: 48 c7 45 00 fe ff ff ff   mov qword ptr [rbp], -0x2
  1e: 49 89 d8  mov r8, rbx
  21: 45 31 c9  xor r9d, r9d
  24: 31 c0 xor eax, eax
  26: 31 d2 xor edx, edx
  28: 31 c9 xor ecx, ecx
  2a: 4c 89 cb  mov rbx, r9
  2d: f0 lock
  2e: 48 0f c7 4b 40 cmpxchg16b  xmmword ptr [rbx + 0x40]
  33: 4c 89 c3 mov rbx, r8
  36: 31 c9 xor ecx, ecx
  38: e8 00 00 00 00call0x3d 
  3d: cc int3
  3e: 66 90 nop
```

Note `mov rbx, r9` followed by `cmpxchg16b  xmmword ptr [rbx + 0x40]` where `rbx` is used after having just been overwritten for the purposes of `cmpxchg16b` which uses it as an input register.

The original unreduced input produces slightly different object code but has the same problem:
```asm
7FF786689459  lea r8,[rbx+100h] 
7FF786689460  mov rax,qword ptr [rbx+140h]  
7FF786689467 mov rdx,qword ptr [rbx+148h]  
7FF78668946E  nop 
7FF786689470  mov r9,rbx  
7FF786689473  xor ecx,ecx  
7FF786689475  mov rbx,r8  
7FF786689478  lock cmpxchg16b oword ptr [rbx+140h]
```



___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 119960] [openmp] Problem with OpenMP user-declared reduction

2024-12-14 Thread LLVM Bugs via llvm-bugs


Issue

119960




Summary

[openmp] Problem with OpenMP user-declared reduction




  Labels
  
  



  Assignees
  
  



  Reporter
  
  CDK6182CHR
  




## Brief description

Recently, I encounter some problem while using OpenMP user-declared reduction (`#pragma omp declare reduction`) for Eigen arrays or vectors. I found the reduction may fail (that cause incorrect result and may be different in each run) when I am using `clang` compiler with LLVM OpenMP library. The same problem is not found in GCC compiler. 

I wonder whether I did something wrong (for example, incorrect configuration of LLVM; incorrect declaration of OpenMP reduction, etc), or there may be some issues within LLVM OpenMP. I will appreciate it if anyone could give me some suggestion about this problem. Thanks in advance.

## Version and environment

LLVM version 18.1.8, built with GCC 14.1.0 compiler, using the CMake configuration command:

```shell
CC=gcc CXX=g++ cmake ../llvm -DLLVM_ENABLE_PROJECTS="clang;openmp;flang;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/llvm/18.1.8
```

The GCC toolchain selected by the `clang` compiler (explicitly specified with `clang++ --gcc-toolchain=/xxx/gcc14.1.0`) is GCC 14.1.0.

The OS that runs the test is CentOS 7.

Latest Eigen (3.4.0) is used.

## Minimal example

Please see the attached ZIP file for this example, including source and CMake. The code is also listed here for convenience:

```C++
#include 
#include 
#include 


#pragma omp declare reduction(redEigen: Eigen::Vector2i: omp_out += omp_in) initializer(omp_priv=omp_orig)

void test_reduction()
{
 Eigen::Vector2i res;
res.setZero();

constexpr int num = 200;

#pragma omp parallel for reduction(redEigen:res)
for (int i=0; i

___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs