[llvm-bugs] [Bug 129688] [Flang] position of `-L$LLVM_DIR/lib`

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129688




Summary

[Flang] position of `-L$LLVM_DIR/lib`




  Labels
  
flang:driver
  



  Assignees
  
  



  Reporter
  
  kawashima-fj
  




The Flang driver puts `-L$LLVM_DIR/lib -lflang_rt.runtime` after OS library directory `-L` options when linking.

```console
$ flang -### test.f90 |& tail -1 | tr ' ' '\n' | grep '^"-[Ll]'
"-L/home/foo/llvm/bin/../lib/aarch64-unknown-linux-gnu"
"-L/usr/lib/gcc/aarch64-linux-gnu/13"
"-L/lib/aarch64-linux-gnu"
"-L/usr/lib/aarch64-linux-gnu"
"-L/lib"
"-L/usr/lib"
"-L/home/foo/llvm/lib"
"-lflang_rt.runtime"
"-lm"
"-lgcc"
"-lgcc_s"
"-lc"
"-lgcc"
"-lgcc_s"
```

If you install Flang (and Flang-RT) manually on an environment where the OS-provided Flang package is already installed, this has a problem.

When user-installed Flang is invoked, `libflang_rt.runtime` in the user-installed directory should be linked. However, OS-provided `libflang_rt.runtime` in `/lib` (or `/lib/TRIPLE` or `/lib64`, depending on OS) is linked instead.

https://github.com/llvm/llvm-project/issues/100403 and https://github.com/open-mpi/ompi/issues/13116 suffer from this problem.

Should we prioritize `-L$LLVM_DIR/lib` over `-L/lib`?



___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129723] Using std::hash through does not compile for builtin arithmetic types

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129723




Summary

Using std::hash through  does not compile for builtin arithmetic types




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  johnfranklinrickard
  




As far as I am aware (hope I am not mistaken), this should be valid C++ code.

```
#include 
std::size_t test(int i) { return std::hash()(i); }
```

>From the C++ draft http://wg21.link/N4950:
In `[unord.hash]` 2:

> Each header that declares the template hash provides enabled specializations of hash for nullptr_t and all
cv-unqualified arithmetic, enumeration, and pointer types

And in `[type.index.synopsis]` it is mentioned that the header `` also declares the template `std::hash`.

In my opinion the msvc STL has it correctly in this case and the code compiles correctly.
Both libc++ and libstdc++ currently do not work for this code snippet and reject it with `implicit instantiation of undefined template 'std::hash'`.

[Godbolt link](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYTStJg1DIApACYAQuYukl9ZATwDKjdAGFUtAK4sGIAKwAbKSuADJ4DJgAcj4ARpjEIGakAA6oCoRODB7evnppGY4C4ZExLPGJXLaY9kUMQgRMxAQ5Pn5B1bVZDU0EJdFxCUm2jc2teVUKo30RA%2BVDXACUtqhexMjsHOYAzBHI3lgA1CbbbgQAnimYEViqJ9gmGgCCU%2BggIBkAXpgA%2BgSHBEwUwgEX%2BeEWxwA7BZDsRMAQ1gxDq93ggmAoECc3KD7hBFiDFicYSZIQARDjLWicfy8PwcLSkVCcNzWazI1brTDHMzbHikAiaCnLADWSQAHAA6SEATkkgWl20kGjFGn82zF%2Bk4klpgsZnF4ChAGn5guWcFgMEQKFQLBSdAS5EoaFt9sSADdkCkUj83VxpT8DICpj9VIFpDRaIDiIaILFdbEIk1zpw%2BQnmMRzgB5WLaTAOFO8Z1sQSZhi0ZP03hYWJeYBuMS0Q3cKuYFiGYDiSukfBwhx4N1A3WYVR5ryAgvkQQ1XW0PCxYhJjxYXUEYh4FgTgfEWLpTCk1vt2dGU18AzABQANTwmAA7pmrnS%2BfxBCIxOwpDJBIoVOou7oqgYx6mJY1j6HOhqQMsqApHUTa8KgW5rlgEF4p0eZ1C4DDuJ4bRJMkYSzGUFQgBqBSZAI4x%2BGYyRkXU/REUMGp2Oh3TTJReFoX2Ag9M09GDIkTFsTheTUSMvR8fMAnLAoHIbBIlLUjqXZMhwhyhpIhwsAoHqHL60oSoGQL/BAuCECQ3K8osvACpWiwiiAkiQhKYpcGYkjqoEYqymY/gflSHDaqQG7%2BMadIMipBpGiatmkOaVqrAQKRjo6EDOna9DEFErCbKoYqBAAtGGhzAMgyCHBAq5eAwwpWSE%2BBEEhejPsIojiB%2BzXfmour/qQN4LikBYKRwNKkGF8GcJmY5Jf8qBUGpRVaTpekGUwQbGR4LoZRZSzWaaywIJgTBYIkqH%2BYFwWhbqEW2FFNlaHZpCijyErbNs0rSpCPLStRkgqlw/iahw2xKeF%2BrRfdsWWnFEBIGlropXDGUgOezApBiqAEHwdBRjGcZdmmSYTgTGbZrm%2BbNqQRaMAQpblrq1a1vWtCNhOWBtkYnYMj2LEDk2DLDqO44U6C05drO86LhgmwMqu66bgkO5KPu7MdhEoAxVQZ6Xted4PhOzWvm14ayJ1v4MroySAcYrKWGBsQoVBMFZHBjKIXgyHwNJNQsc4ECuOxVQEaU/H5Ok5HZMJfhVLRWQScRkze1x9RCbkUecXUPEzMHkl6FMvQB2JvGESHSwrGscml/5I1jXqqm5QVRUlWVFXEFVNXlaZDXbbVd1Co9EiSBKQSqvKXBipCKqSG5wRnbwF2jVdYOGsavdmlDSAJdNCM2ulCRZWwnD14VGlN%2BVlXVbVmD1SQ7tNbIhvvsbX5KF1f7DH1TADc2Q3V4vHCTYlMchxZpqTysfYqpUz6twvuVDau9iAWTMD3PapADpHSGKdLUc8AiXWUkvW6KCnpmAlNKLgGhSFcD9JCN6bkNT%2BWBgvPBHBdoxWhuva0m0HQUFSjvV0KArZ/GgcKLGkYEi43jImDMRNJFZhzOhCcVMSxlgrFzTANY6wNibHyNmh5pZVjwL2RwvMhwjmQGOTYfIRb%2BQZOLBcGYlx6P5GuDcFMtyKz3AeDmasTyayYOeK8t57yMH1vfVqj9PzyBfmbHQIBtj6HbCgG2NhxYO0ZE7AQTZ8qvHFicUkKSCDoGAlYSw8E3Ye0gunLImFsKpz0EHOY8dUhhzqAXGOxRi45wTl0biKdcJdJ9sncSHTGl5zGJHXO0w44LGkrJd8P8QbjTrmAoq%2Bx2xQLbhCEy18EE7B2uDPuopJD%2BAlGPMw0px6BDcpQsUNzAbnRwYw0GzCborxPNDDeU1krcMRnvbKh9lkaVWUYdZF9eBXzMo1KoBswkDwiabbqsTer9UGoDX%2BTCAHTWAXNI%2BKyrYgvbhAOBroLLbGQbZfah1jqUCGvckKjzFmRVeeS/ubkSH%2BEkFPbYGgNA8lVJCDQcT6ELNriw%2B6Q0zDCuuqvZYW4MjOEkEAA%3D%3D%3D)

I only checked it for ``, but this could affect more headers where `std::hash` is supposed to be defined as mentioned in [cppreference](https://en.cppreference.com/w/cpp/utility/hash).

Under the assumption that this is correct C++, would it be possible to fix these includes for the LLVM standard library?


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129726] [Flang] Error due to order of specific procedures in generic interface

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129726




Summary

[Flang] Error due to order of specific procedures in generic interface




  Labels
  
flang
  



  Assignees
  
  



  Reporter
  
  ivan-pi
  




Given the following module, simply changing the order of the specific procedure in a type-bound method results in an error:

```fortran
module collisions
implicit none

type :: Spaceship
contains
#ifdef WORKING
 procedure, private, pass(x) :: collide_x => collide_ss, collide_sa  ! Works
#else
procedure, private, pass(x) :: collide_x => collide_sa, collide_ss  ! Breaks
#endif
procedure, private, pass(y) :: collide_y => collide_as
generic :: collide => collide_x, collide_y
end type

type :: Asteroid
end type

contains

subroutine collide_as(x,y)
 class(Asteroid) :: x
class(Spaceship) :: y
print *, "a/s"
end subroutine
subroutine collide_sa(x,y)
class(Spaceship) :: x
 class(Asteroid) :: y
print *, "s/a"
end subroutine
subroutine collide_ss(x,y)
class(Spaceship) :: x
class(Spaceship) :: y
 print *, "s/s"
end subroutine

end module
```

```
$ flang-new -c c4.F90 
error: Semantic errors in c4.F90
./c4.F90:12:16: error: Generic 'collide' may not have specific procedures 'collide_x' and 'collide_y' as their interfaces are not distinguishable
  generic :: collide => collide_x, collide_y
 ^^^
./c4.F90:25:12: Procedure 'collide_x' of type 'spaceship' is bound to 'collide_sa'
  subroutine collide_sa(x,y)
 ^^
./c4.F90:20:12: Procedure 'collide_y' of type 'spaceship' is bound to 'collide_as'
  subroutine collide_as(x,y)
 ^^
$ flang-new -c -DWORKING c4.F90 
$ flang-new --version
Homebrew flang-new version 19.1.4
Target: x86_64-apple-darwin23.6.0
Thread model: posix
InstalledDir: /usr/local/Cellar/flang/19.1.4/libexec
Configuration file: /usr/local/Cellar/flang/19.1.4/libexec/flang.cfg
Configuration file: /usr/local/etc/clang/x86_64-apple-darwin23.cfg
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129698] [LLVM] Kaleidoscope compilation fails for chapter 4

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129698




Summary

[LLVM] Kaleidoscope compilation fails for chapter 4




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  nots1dd
  




Upon compiling Chapter 4 of Kaleidoscope this error pops up:

```bash
untoy.cpp:646:35: error: no member named 'toPtr' in 'llvm::orc::ExecutorSymbolDef'
  646 |   double (*FP)() = ExprSymbol.toPtr();
  |~~ ^
untoy.cpp:646:50: error: expected _expression_
  646 |   double (*FP)() = ExprSymbol.toPtr();
  | ^
untoy.cpp:646:55: error: expected _expression_
  646 | double (*FP)() = ExprSymbol.toPtr();
  | ^
3 errors generated.
```

The main issue lies in this code snippet:

```cpp
  // Search the JIT for the __anon_expr symbol.
  auto ExprSymbol = ExitOnErr(TheJIT->lookup("__anon_expr"));

  /* ExprSymbolDef does not have this method (API change)! */
  double (*FP)() = ExprSymbol.toPtr();
  fprintf(stderr, "Evaluated to %f\n", FP());
```

The error persists in future chapters until Chapter 8 where the focus shifts on compiling to object code.

This seems to be the issue of the ORC's API being outdated and I already have a possible solution on how to fix this.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129701] [ASAN] `new-delete-type-mismatch` with allocation bigger than the object

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129701




Summary

[ASAN] `new-delete-type-mismatch` with allocation bigger than the object




  Labels
  
compiler-rt:asan,
false-positive
  



  Assignees
  
  



  Reporter
  
  firewave
  




This has been reduced from code in https://github.com/mamedev/mame/blob/master/src/osd/modules/file/posixfile.cpp.

```cpp
#include 

struct entry
{
const char * name;
};

static std::unique_ptr osd_stat()
{
entry *result = reinterpret_cast(::operator new(sizeof(*result) + 1));

return std::unique_ptr(result);
}

int main()
{
auto f = osd_stat();
}
```
https://godbolt.org/z/G8Kfz945c

```
==1==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x50200010 in thread T0:
 object passed to delete has wrong type:
  size of the allocated type:   9 bytes;
  size of the deallocated type: 8 bytes.
#0 0x5b65dbf6a542 in operator delete(void*, unsigned long) /root/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:155:3
#1 0x5b65dbf6c19b in std::default_delete::operator()(entry*) const /opt/compiler-explorer/gcc-14.2.0/lib/gcc/x86_64-linux-gnu/14.2.0/../../../../include/c++/14.2.0/bits/unique_ptr.h:93:2
 #2 0x5b65dbf6bebf in std::unique_ptr>::~unique_ptr() /opt/compiler-explorer/gcc-14.2.0/lib/gcc/x86_64-linux-gnu/14.2.0/../../../../include/c++/14.2.0/bits/unique_ptr.h:398:4
 #3 0x5b65dbf6bda3 in main /app/example.cpp:18:1
#4 0x7750ada29d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
#5 0x7750ada29e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)
#6 0x5b65dbe8b354 in _start (/app/output.s+0x2c354)

0x50200010 is located 0 bytes inside of 9-byte region [0x50200010,0x50200019)
allocated by thread T0 here:
#0 0x5b65dbf698dd in operator new(unsigned long) /root/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:86:3
#1 0x5b65dbf6be20 in osd_stat() /app/example.cpp:10:47
#2 0x5b65dbf6bd9a in main /app/example.cpp:17:14
#3 0x7750ada29d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: 490fef8403240c91833978d494d39e537409b92e)

SUMMARY: AddressSanitizer: new-delete-type-mismatch /app/example.cpp:18:1 in main
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129705] s390x: widening multiplication does not optimize

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129705




Summary

s390x: widening multiplication does not optimize




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  folkertdev
  




this LLVM 

https://godbolt.org/z/cx8adPc9f

```llvm
define range(i32 0, -131070) <4 x i32> @manual_mule(<8 x i16> %a, <8 x i16> %b) unnamed_addr {
start:
 %0 = shufflevector <8 x i16> %a, <8 x i16> poison, <4 x i32> 
  %1 = zext <4 x i16> %0 to <4 x i32>
  %2 = shufflevector <8 x i16> %b, <8 x i16> poison, <4 x i32> 
  %3 = zext <4 x i16> %2 to <4 x i32>
  %4 = mul nuw <4 x i32> %3, %1
  ret <4 x i32> %4
}
```

does not optimize to the expected output of `vec_mule`, a single `vmleh` instruction. The same is true for the other multiplication flavors (low, high, odd).




___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129693] [Clang] Build fails when forward declared static template function used in std::visit

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129693




Summary

[Clang] Build fails when forward declared static template function used in std::visit




  Labels
  
clang
  



  Assignees
  
  



  Reporter
  
  deaklajos
  




Code:
```cpp
#include 

template
static T funcT(const T t);

int main()
{
std::variant v{1};

return std::visit([&](auto& a) -> int
{
return funcT<>(a);
}, v);
}

template
static T funcT(const T t) {
return t;
}
```
Output:

```
:4:10: warning: function 'funcT' has internal linkage but is not defined [-Wundefined-internal]
4 | static T funcT(const T t);
  |  ^
:12:16: note: used here
   12 | return funcT<>(a);
  |^
:4:10: warning: function 'funcT' has internal linkage but is not defined [-Wundefined-internal]
4 | static T funcT(const T t);
  | ^
:12:16: note: used here
   12 | return funcT<>(a);
 |^
2 warnings generated.
ASM generation compiler returned: 0
:4:10: warning: function 'funcT' has internal linkage but is not defined [-Wundefined-internal]
4 | static T funcT(const T t);
 |  ^
:12:16: note: used here
   12 | return funcT<>(a);
  |^
:4:10: warning: function 'funcT' has internal linkage but is not defined [-Wundefined-internal]
4 | static T funcT(const T t);
  | ^
:12:16: note: used here
   12 | return funcT<>(a);
 |^
2 warnings generated.
/opt/compiler-explorer/gcc-14.2.0/lib/gcc/x86_64-linux-gnu/14.2.0/../../../../x86_64-linux-gnu/bin/ld: /tmp/example-8f2a9c.o: in function `int main::$_0::operator()(int&) const':
:12:(.text+0x257): undefined reference to `int funcT(int)'
/opt/compiler-explorer/gcc-14.2.0/lib/gcc/x86_64-linux-gnu/14.2.0/../../../../x86_64-linux-gnu/bin/ld: /tmp/example-8f2a9c.o: in function `int main::$_0::operator()(double&) const':
:12:(.text+0x309): undefined reference to `double funcT(double)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
Execution build compiler returned: 1
```

Demo: https://godbolt.org/z/d18dvc56E


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129707] error: 'vector.insert' op expected position attribute rank + source rank to match dest vector rank

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129707




Summary

error: 'vector.insert' op expected position attribute rank + source rank to match dest vector rank




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  CXiaorong
  




My goal is to insert two vector<32xf32> into vector<64xf32>, but the verification keeps reporting errors. I don't know what the specific reason is. I hope to get your reply.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129730] Value of `__cpp_constexpr` is incorrect for `-std=c++20`

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129730




Summary

Value of `__cpp_constexpr` is incorrect for `-std=c++20`




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  elbeno
  




When compiling with `-std=c++20`, clang defines `__cpp_constexpr` to be `201907L`, and it should be `202002L`.

https://godbolt.org/z/G6v7EvEfe

Clearly clang has support for https://wg21.link/P1330.

So the value of `__cpp_constexpr` in C++20 should be 202002L as given by https://wg21.link/p2493 and https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations#__cpp_constexpr



___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129764] `-fzero-call-used-regs` should not trigger before tail-calls

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129764




Summary

`-fzero-call-used-regs` should not trigger before tail-calls




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  nelhage
  




I believe that `-fzero-call-used-regs` should be modified to not clear registers prior to a tail call. Here's my reasoning:

With the landing of `clang::musttail`, there's been a bit of a trend towards using indirect tail calls to implement efficient interpreters and parsers; see [the original post about protobuf][proto-tc], and [CPython's recent new interpreter][python-tc]. This pattern is, in part, an alternative to using computed gotos to implement dispatch within a single large interpreter function.

In both cases (computed gotos, and indirect tail calls), the opcode/parser definition generates fairly similar code, ending with an indirect call through a dispatch table. Depending on compiler choices, this turns into (on x86) something like `jmpq *%REG` or `jmpq *(%REG1, %REG2, 8)`

With `-fzero-call-used-regs` enabled, clang/LLVM currently emit call-used-clearing `xor`s prior to the indirect tail-call, but not prior to a computed goto, even one that produces near-identical machine code ([example on goldbolt](https://godbolt.org/z/dxh754E49), showing the stylized core of an interpreter loop). 

Such interpreter loops tend to be extreme hot spots. On CPython, I've measured the cost of `-fzero-call-used-regs=used-gpr` on **only** the opcode functions at about 2% on [the pyperformance suite](https://github.com/python/pyperformance/), when using the tail-call interpreter. It seems surprising and "unfair" to impose this cost on the tail-call style but not the computed goto style of interpreter, when, again, they emit very similar machine code containing similar indirect jumps (and potential JOP gadgets).

Also, GCC's implementation behaves in the way I describe, eliding the clearing for tail calls. See a [godbolt example](https://godbolt.org/z/3KTYzWoWb) -- if you remove the `clang::musttail` and add `-fno-optimize-sibling-calls` to the GCC options, the `xor`s will reappear

[proto-tc]: https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html
[python-tc]: https://github.com/python/cpython/pull/128718


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129740] Assertion `(!R2 || (Kind <= REX2 || Kind == EVEX)) && "invalid setting"

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129740




Summary

Assertion `(!R2 || (Kind <= REX2 || Kind == EVEX)) && "invalid setting"




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  ashermancinelli
  




```
> clang++ -march=znver4 -v -O3 -c reduced.ll
llvm/lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp:173: 
void {anonymous}::X86OpcodePrefixHelper::setR2(unsigned int):
Assertion `(!R2 || (Kind <= REX2 || Kind == EVEX)) && "invalid setting"' failed.
```

```
;; reduced.ll
; ModuleID = ''
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@extfloat1 = external global float
@extfloat2 = external global [220 x [250 x float]]

define void @foo(ptr %0, i32 %1, i64 %2, float %3, float %4, ptr %5, i64 %6, i1 %7, ptr %8) {
  %10 = alloca [0 x [0 x [0 x float]]], i32 0, align 4
  %11 = alloca float, i64 %2, align 4
  %12 = alloca float, i64 %2, align 4
  call void @bar(ptr %10)
  br label %13

13: ; preds = %40, %9
  br label %14

14: ; preds = %35, %13
  %.027 = phi float [ 0.00e+00, %13 ], [ %.1, %35 ]
  %15 = phi i32 [ %1, %13 ], [ %19, %35 ]
  %16 = phi i64 [ %2, %13 ], [ %39, %35 ]
  %17 = icmp sgt i64 %16, 0
  br i1 %17, label %18, label %40

18:   ; preds = %14
  %19 = add i32 %15, 1
  %20 = sext i32 %15 to i64
  %21 = getelementptr float, ptr %11, i64 %20
  %22 = load float, ptr %21, align 4
 %23 = sext i32 %19 to i64
  %24 = getelementptr float, ptr %11, i64 %23
 store float %22, ptr %24, align 4
  call void @baz(ptr %24, ptr %10, ptr null)
  %25 = load float, ptr %10, align 4
  %26 = getelementptr float, ptr %5, i64 %23
  %27 = load float, ptr %26, align 4
  %28 = getelementptr float, ptr null, i64 %23
  store float 0.00e+00, ptr %28, align 4
  br i1 %7, label %29, label %35

29: ; preds = %18
  %30 = fadd float %3, %27
  %31 = fmul float %25, %30
 %32 = fdiv arcp float %31, %3
  %33 = fmul float %32, %4
  %34 = fadd reassoc float %.027, %33
  br label %35

35: ; preds = %29, %18
  %.1 = phi float [ %34, %29 ], [ %.027, %18 ]
  %36 = getelementptr float, ptr %8, i64 %23
  %37 = getelementptr float, ptr %12, i64 %6
  call void @qux(ptr %0, ptr %36, ptr %37)
  %38 = load float, ptr %12, align 4
  store float %38, ptr %11, align 4
  %39 = add i64 %16, -1
  br label %14

40: ; preds = %14
  %41 = fcmp ogt float %.027, 0.00e+00
  br i1 %41, label %42, label %13

42: ; preds = %40
  ret void
}

declare void @bar(ptr)

define void @baz(ptr %0, ptr %1, ptr %extfloat2) {
  %3 = load float, ptr null, align 4
  %4 = call float @llvm.trunc.f32(float %3)
  %5 = fsub float 0.00e+00, %4
  %6 = load float, ptr %0, align 4
  %7 = load float, ptr @extfloat1, align 4
  %8 = fmul float %6, %7
  %9 = fptosi float %6 to i32
  %10 = add i32 %9, 1
  %11 = load float, ptr @extfloat2, align 4
 %12 = load float, ptr getelementptr (i8, ptr @extfloat2, i64 -4), align 4
 %13 = sext i32 %10 to i64
  %14 = getelementptr float, ptr %extfloat2, i64 %13
  %15 = getelementptr i8, ptr %14, i64 -4
  %16 = load float, ptr %15, align 4
  %17 = fsub float %12, 1.00e+00
  %18 = fmul float %17, %6
 %19 = fmul float %6, %5
  %20 = fadd float %18, %19
  %21 = fadd float %12, %16
  %22 = fsub float %11, %21
  %23 = fadd float %22, 0.00e+00
  %24 = fmul float %23, %8
  %25 = fmul float %24, 0.00e+00
  %26 = fadd float %20, %25
  store float %26, ptr %1, align 4
  ret void
}

define void @qux(ptr %0, ptr %1, ptr %2) {
  %4 = load float, ptr %1, align 4
  %5 = load float, ptr %0, align 4
  %6 = fdiv ninf arcp float %5, %4
  %7 = fptosi float %6 to i32
  %8 = add i32 %7, 1
  %9 = sext i32 %8 to i64
 %10 = getelementptr float, ptr null, i64 %9
  %11 = getelementptr i8, ptr %10, i64 -4
  %12 = load float, ptr %11, align 4
  %13 = fneg float %12
 %14 = fmul reassoc nsz float %5, %13
  %15 = fdiv ninf arcp contract float %4, %14
  %16 = call float @llvm.exp.f32(float %15)
  %17 = fmul float %16, 0.00e+00
  store float %17, ptr %2, align 4
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare float @llvm.trunc.f32(float) #0

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare float @llvm.exp.f32(float) #0

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129778] Lambdas as non‐type template parameters cause link errors (.rodata._ZTAXtl3$_0EE defined in discarded section)

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129778




Summary

Lambdas as non‐type template parameters cause link errors (.rodata._ZTAXtl3$_0EE defined in discarded section)




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  ryanofsky
  




I am encountering a linker error with Clang 19.1.4 when using lambdas as non‐type template parameters across multiple translation units. The minimal example below compiles without issues using GCC and also _does not_ trigger the link error in Clang when optimizations (`-O`) are enabled or if I change the parameter from `const auto&` to `auto` (i.e., pass by value rather than by const reference). 

Error:

```c++
`.rodata._ZTAXtl3$_0EE' referenced in section `.text' of pass_b.o: defined in discarded section `.rodata._ZTAXtl3$_0EE[_ZTAXtl3$_0EE]' of pass_b.o 
```

(`_ZTAXtl3$_0EE` demangles to `template parameter object for $_0{}`.)

Command to reproduce:

```bash
clang++ -std=c++20 pass_a.cpp pass_b.cpp
```

Example code:

__pass.h__
```c++
#ifndef PASS_H
#define PASS_H

void PassArg(const auto& arg)
{
}

template
void PassTemplate()
{
PassArg(object);
}

#endif
```

__pass_a.cpp__
```c++
#include "pass.h"

constexpr auto fn_a = []{};

void pass_a()
{
PassTemplate();
}

int main(int, char**)
{
return 0;
}
```

__pass_b.cpp__
```c++
#include "pass.h"

constexpr auto fn_b = []{};

void pass_b()
{
PassTemplate();
}
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129783] [clang][BoundsSafety] Extend `-Wvla-potential-size-confusion` for struct fields and bounds annotations

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129783




Summary

[clang][BoundsSafety] Extend `-Wvla-potential-size-confusion` for struct fields and bounds annotations




  Labels
  
clang:frontend,
TBAA,
clang:bounds-safety
  



  Assignees
  
rapidsna
  



  Reporter
  
  rapidsna
  




https://github.com/llvm/llvm-project/pull/129772

`-Wvla-potential-size-confusion` diagnoses when `n` references the file scope variable and not the parameter.

```
int n;
void func(int array[n], int n);
```

We may want to extend it to diagnose on situations mentioned in the PR:

- Diagnosing a similar situation in structures. e.g.,

```C
int n;
struct S {
  int n;
  int array[sizeof(n)]; // Refers to outer n, not member n
};
```

- Diagnosing with constant-size arrays (requires tracking the _expression_ for the constant-size array in the `QualType`) e.g.,

```C
constexpr int n = 12;
void func(int array[n], int n);
```

- Potentially, also diagnosing with any ambiguous situations with bounds annotations like below (with or without the `-fexperimental-late-parse-attributes` flag: 

```
constexpr int n;
struct foo {
  int * ptr __counted_by(n);
  int n;
};
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129716] Suboptimal codegen for vptr load

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129716




Summary

Suboptimal codegen for vptr load




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  apolukhin
  




Consider the example:
```cpp
#include 

struct Empty{};

template 
struct UnionOptional {
 UnionOptional() = default;

const T* set() {
return ::new (&data_.payload) T();
}

void clear() {
 data_.payload.~T();
data_.e = {};
}

union {
 Empty e{};
T payload;
} data_;
};

struct A {
virtual void foo() const;
};

struct B : A {
void foo() const override;
};

void sample_union() {
UnionOptional value;
 value.set()->foo();
value.clear();
 value.set()->foo();
}
```

With -O2 or -O3 clang-19 generates the following assembly:
```
sample_union():
pushr14
push rbx
pushrax
mov r14, qword ptr [rip + vtable for B@GOTPCREL]
add r14, 16
mov qword ptr [rsp], r14
 mov rbx, rsp
mov rdi, rbx
callB::foo() const@PLT
mov qword ptr [rsp], r14
mov rdi, rbx
 callB::foo() const@PLT
add rsp, 8
pop rbx
pop r14
ret
```
However, a more optimal assembly with less instructions and register clobbering could be used:
```
sample_union():
sub rsp, 24
mov QWORD PTR [rsp+8], OFFSET FLAT:vtable for B+16
lea rdi, [rsp+8]
 callB::foo() const
lea rdi, [rsp+8]
mov QWORD PTR [rsp+8], OFFSET FLAT:vtable for B+16
callB::foo() const
 add rsp, 24
ret
```
Godbolt playground: https://godbolt.org/z/T5PzMfz1W


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129757] [DirectX] Re-evaluate pass ordering for producing correct DXIL module flags

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129757




Summary

[DirectX] Re-evaluate pass ordering for producing correct DXIL module flags




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  Icohedron
  




According to #120119, the DXIL Shader Flags pass needs to be executed before the DXIL Op Lowering pass in order to simplify its implementation by being able to work directly with DirectX target intrinsics. However, this dependency creates a challenge, as the shader flag analysis is based on instructions that may not exist after the lowering pass.

This issue was discovered with the implementation of the Int64Ops Shader Flags Analysis and the resulting DXIL failing validation by `dxv` due to mismatched flags (https://github.com/llvm/llvm-project/pull/129089#issuecomment-2695570866). The Shader Flags Analysis currently enables the Int64Ops shader flag in the presence of `extractelement` instructions introduced by the Scalarizer pass. These `extractelement` instructions are subsequently be removed by the DXIL Op Lowering pass. 

Potential Solutions:

1. Perform Shader Flag Analysis before Scalarization: This would ensure that the `extractelement` instructions are not yet introduced, thereby avoiding the need to account for their removal later. But it may impact the implementation of current and/or future Shader Flag Analyses

2. Split the Shader Flag Analysis into two stages: one before the DXIL Op Lowering Pass and one after. This would also require moving the DXIL Translate Metadata pass to follow after the later Shader Flag Analysis. Shader Flag Analyses that benefit from the DirectX target intrinsics could be performed before DXIL Op Lowering, and the Shader Flag Analyses that don't benefit from that should be performed after DXIL Op Lowering.

3. Complicate the logic for the Int64Ops Shader Flag Analysis: Detect when an instruction return an i64 operand or using i64 operands will be removed by a subsequent DXIL Op Lowering pass. This would probably be very ugly.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129745] Several C++ EH tests fail with "terminating due to uncaught exception of type int"

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129745




Summary

Several C++ EH tests fail with "terminating due to uncaught exception of type int"




  Labels
  
backend:Hexagon
  



  Assignees
  
androm3da
  



  Reporter
  
  androm3da
  




Some tests from the llvm-test-suite are failing like below.  @quic-akaryaki investigated and found that using `eld` to build shared libraries instead of `lld` addressed this failure and it seems to be due to the absence of `PT_GNU_EH_FRAME` program header.

```
 TEST 'test-suite :: SingleSource/Regression/C++/EH/Regression-C++-ctor_dtor_count.test' FAILED ***
*

/local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/tools/timeit --timeout 7200 --limit-core 0 --l
imit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /local/mnt/workspace/upstrea
m/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleSource/Regression/C++/EH/Output/Regression-C++-ctor_dtor_count.test.
out --redirect-input /dev/null --chdir /local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleS
ource/Regression/C++/EH --summary /local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleSource
/Regression/C++/EH/Output/Regression-C++-ctor_dtor_count.test.time /local/mnt/workspace/upstream/toolchain_for_hexagon/clang+llvm-21.0.
0-cross-hexagon-unknown-linux-musl/x86_64-linux-gnu/bin/qemu_wrapper.sh /local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-su
ite_target-hexagon-v79-O2/SingleSource/Regression/C++/EH/Regression-C++-ctor_dtor_count
/local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/tools/fpcmp /local/mnt/workspace/upstream/tool
chain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleSource/Regression/C++/EH/Output/Regression-C++-ctor_dtor_count.test.out /l
ocal/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleSource/Regression/C++/EH/ctor_dtor_count.r
eference_output

+ /local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/tools/fpcmp /local/mnt/workspace/upstream/to
olchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleSource/Regression/C++/EH/Output/Regression-C++-ctor_dtor_count.test.out 
/local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/SingleSource/Regression/C++/EH/ctor_dtor_count
.reference_output
/local/mnt/workspace/upstream/toolchain_for_hexagon/obj_test-suite_target-hexagon-v79-O2/tools/fpcmp: Comparison failed, textual differ
ence between 'l' and 'D'

Input 1:
libc++abi: terminating due to uncaught exception of type int
exit 134

Input 2:
Deriv ok!

```

test failures:
```
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-class_hierarchy.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-ctor_dtor_count-2.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-ctor_dtor_count.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-exception_spec_test.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-function_try_block.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-inlined_cleanup.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-recursive-throw.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-simple_rethrow.test
 test-suite :: SingleSource/Regression/C++/EH/Regression-C++-simple_throw.test
  test-suite :: SingleSource/Regression/C++/EH/Regression-C++-throw_rethrow_test.test
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129749] [DirectX] Update DXContainer binary format documentation to describe Root Descriptors representation

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129749




Summary

[DirectX] Update DXContainer binary format documentation to describe Root Descriptors representation




  Labels
  
new issue
  



  Assignees
  
joaosaffran
  



  Reporter
  
  joaosaffran
  




Update https://github.com/llvm/llvm-project/blob/main/llvm/docs/DirectX/DXContainer.rst file to detail the expected binary representation of Root Signature Root Descriptor parameters.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129748] Missed optimizations with -fstack-protector-strong

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129748




Summary

Missed optimizations with -fstack-protector-strong




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  travisdowns
  




With -fstack-protector-strong, stack canary checks are even when addresses of local variables (or function parameters, etc) escape the function (or escape the fully inlined function). However, the optimization seems poor in this area: the stack checks are inserted even when the only writes to the stack on the hot path are provably-safe compiler spills.

For example, consider this case:

```
[[noreturn]] [[gnu::cold]]
void rare_function(const int& x, const int& y);

int hot_function(int x, int y) {
if (x < y) [[unlikely]] {
 rare_function(x, y);
}
return x + y;
}
```

This is a reduced test case from a rich assert mechanism and this pattern is very common: we call `rare_function` very infrequently (in the case of assertions, at most once per-process invocation). `x` and `y` are passed in registers, so the hot path could simply be a comparison, jump to slow path and then return value. Instead we get:

```
hot_function(int, int):
  sub rsp, 24
  mov rax, qword ptr fs:[40]
  mov qword ptr [rsp + 16], rax
  mov dword ptr [rsp + 12], edi
  mov dword ptr [rsp + 8], esi
  cmp edi, esi
  jl .LBB0_1
  mov rax, qword ptr fs:[40]
  cmp rax, qword ptr [rsp + 16]
  jne .LBB0_5
  add esi, edi
  mov eax, esi
  add rsp, 24
  ret
.LBB0_1:
  mov rax, qword ptr fs:[40]
  cmp rax, qword ptr [rsp + 16]
  jne .LBB0_5
  lea rdi, [rsp + 12]
  lea rsi, [rsp + 8]
  call rare_function(int const&, int const&)@PLT
.LBB0_5:
  call __stack_chk_fail@PLT
```

Note that on the hot path we store the stack cookie, spill the register variables, then do the comparison with reigsters and immediately load + compare the cookie: but there are no user-controlled or dangerous writes to the stack here at all, only spills to known slots, which are statically known to be disjoint from the cookie location.

If instead `rare_function` takes its parameters by value, the whole function reduces to:

```
hot_function(int, int):
  cmp edi, esi
  jl .LBB0_2
  add esi, edi
  mov eax, esi
  ret
.LBB0_2:
 push rax
  call rare_function(int, int)@PLT
```

The hot path of the by-reference function could look the same!


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129746] `-isystem` does not suppress warnings from macros

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129746




Summary

`-isystem` does not suppress warnings from macros




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  elbeno
  




Given a header `provoke_warning.hpp`:
```cpp
enum struct E { A, B, C };

namespace N {
 using enum E;
}

#define PROVOKE_WARNING using enum E;
```

And a source file `main.cpp`:
```cpp
#include 

struct S {
  PROVOKE_WARNING;
};
```

When compile with:
`clang -std=c++17 -isystem . -c main.cpp`

We get:
```console
main.cpp:4:3: warning: using enum declaration is a C++20 extension [-Wc++20-extensions]
4 | PROVOKE_WARNING;
  |   ^
./provoke_warning.hpp:7:31: note: expanded from macro 'PROVOKE_WARNING'
7 | #define PROVOKE_WARNING using enum E;
 |   ^ 
```

Notice that the warning we _would_ get from the `using enum` inside the namespace is not emitted (because of `-isystem`). However the use of the macro still gives a warning despite the fact that the macro came from a header included under `-isystem`.

--

I see why this could be difficult to remedy, and I already see the counterarguments: "it's a macro! It's in your code, not the library code!" So it's understandable why this happens, but that doesn't make it correct or expected.

This comes up in real code: https://github.com/catchorg/Catch2/issues/2910



___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129750] Missed optimization: eager spills mess up hot path

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129750




Summary

Missed optimization: eager spills mess up hot path




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  travisdowns
  




Consider the following function:

```
[[noreturn]] [[gnu::cold]]
void cold_function(const int& x, const int& y);

int hot_function(int x, int y) {
if (x < y) [[unlikely]] {
cold_function(x, y);
}
return x + y;
}
```

In clang++ this generates the following code at -O3:

```
hot_function(int, int):
  push rax
  mov dword ptr [rsp + 4], edi
  mov dword ptr [rsp], esi
  cmp edi, esi
  jl .LBB0_2
  add esi, edi
  mov eax, esi
  pop rcx
  ret
.LBB0_2:
  lea rdi, [rsp + 4]
  mov rsi, rsp
  call cold_function(int const&, int const&)@PLT
```

However the whole spilling of the in-register variables, and the alignment of the stack frame (`push rax`) could be deferred to the cold branch instead:

```
hot_function(int, int):
  cmp edi, esi
  jl .LBB0_2
  add esi, edi
  mov eax, esi
  ret
.LBB0_2:
  push rax
  mov dword ptr [rsp + 4], edi
  mov dword ptr [rsp], esi
  lea rdi, [rsp + 4]
  mov rsi, rsp
 call cold_function(int const&, int const&)@PLT
```

Cutting the hot path almost in half and avoiding an expensive store-forwarding stall (`pop rax` reads the qword at `[rsp]` which was immediately before written in two dword halves during the spill, this causes an expensive (~10ish cycles) stall on all modern big cores I'm aware of).

https://godbolt.org/z/nTvnj4r1r


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129747] c++23 std::ranges::copy_n advances InputIterator one more time than necessary

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129747




Summary

c++23 std::ranges::copy_n advances InputIterator one more time than necessary




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  Be3y4uu-K0T
  




Repeated error, but for copy_n from ranges. [(old resolved bug)](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50119)

Test program ([godbolt](https://godbolt.org/z/h6cbjqPT7)):
```c++
#include 
#include 
#include 
#include 

int main()
{
std::istringstream s("1 2 3 4 5");
std::vector v;
std::ranges::copy_n(std::istream_iterator(s), 2, std::back_inserter(v));
 std::ranges::copy_n(std::istream_iterator(s), 2, back_inserter(v));

 std::ranges::copy(v, std::ostream_iterator(std::cout, " "));
 std::cout << '\n';
}
```

Run:
`clang++ -std=c++23 index.cc -o index && ./index`

Actual output:
`1 2 4 5`

Expected output:
`1 2 3 4`

Environment:
```
% clang++ -v
Homebrew clang version 18.1.8
Target: arm64-apple-darwin24.3.0
Thread model: posix
InstalledDir: /opt/homebrew/bin
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129808] Increased memory consumption in ParentMapContext after clang-19

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129808




Summary

Increased memory consumption in ParentMapContext after clang-19




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  michael-jabbour-sonarsource
  




The memory increase can be observed when enabling any check that uses `ASTContext::getParents` in `clang-tidy`. By plotting the memory consumption when analyzing a sample file, I got the following chart which shows around 10x increase in memory consumption in clang-tidy-19 compared to clang-tidy-18 when the number of elements in the array is large enough:

![Image](https://github.com/user-attachments/assets/ef531e57-289a-4b27-a04a-f63d7bbd7ae1)

Here is a script that generates the above plot:

```python
import matplotlib.pyplot as plt
import subprocess

def generate_cpp_file(file_name, num_elements):
  elements = [100] * num_elements
  elements_str = ', '.join(map(hex, elements))
  with open(file_name, 'w') as f:
f.write(f"""
const char large_array[] = {{
  {elements_str}
}};
 """)


def measure_memory_consumption_with(clang_tidy_bin, num_elements):
  file_name = f'file_{num_elements}.cpp'
 generate_cpp_file(file_name, num_elements)
  process = subprocess.run(['/usr/bin/time', '-f', '%M', clang_tidy_bin, '-checks=readability-magic-numbers', file_name, '--', '-std=c++17'], check=True, capture_output=True)
  memory_kb = int(process.stderr)
  return memory_kb // 1024


def plot_memory_consumption():
  num_elements = [10 ** 1, 10 ** 2, 10 ** 3, 10 ** 4, 10 ** 5, 10 ** 6, 3 * 10 ** 6, 5 * 10 ** 6, 8 * 10 ** 6]

  memory_consumption_17 = [measure_memory_consumption_with('clang-tidy-17', n) for n in num_elements]
 memory_consumption_18 = [measure_memory_consumption_with('clang-tidy-18', n) for n in num_elements]
  memory_consumption_19 = [measure_memory_consumption_with('clang-tidy-19', n) for n in num_elements]
 plt.plot(num_elements, memory_consumption_17, label='clang-tidy-17')
 plt.plot(num_elements, memory_consumption_18, label='clang-tidy-18')
 plt.plot(num_elements, memory_consumption_19, label='clang-tidy-19')
 plt.xlabel('Number of elements')
  plt.ylabel('Memory consumption (MB)')
 plt.legend()
  plt.show()


if __name__ == '__main__':
 plot_memory_consumption()
```
I have installed `clang-tidy` binaries in this test from apt.llvm.org on an Ubuntu machine.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129829] UNREACHABLE executed at /root/build/tools/clang/include/clang/Sema/AttrSpellingListIndex.inc:14!

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129829




Summary

UNREACHABLE executed at /root/build/tools/clang/include/clang/Sema/AttrSpellingListIndex.inc:14!




  Labels
  
clang
  



  Assignees
  
  



  Reporter
  
  bi6c
  




Compiler Explorer: https://godbolt.org/z/x3hMvbd3a

```console
Ignored/unknown shouldn't get here
UNREACHABLE executed at /root/build/tools/clang/include/clang/Sema/AttrSpellingListIndex.inc:14!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/clang -gdwarf-4 -g -o /app/output.s -mllvm --x86-asm-syntax=intel -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics 
1.	:5:68: current parser token ';'
 #0 0x03e5b938 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x3e5b938)
 #1 0x03e595f4 llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x3e595f4)
 #2 0x03da5f28 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x7a77b4042520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x7a77b40969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x7a77b4042476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x7a77b40287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x03db188a (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x3db188a)
 #8 0x07c5f33c clang::AttributeCommonInfo::calculateAttributeSpellingListIndex() const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x7c5f33c)
 #9 0x0743454b clang::AsmLabelAttr::getSpelling() const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x743454b)
#10 0x0735c620 clang::FormatASTNodeDiagnosticArgument(clang::DiagnosticsEngine::ArgumentKind, long, llvm::StringRef, llvm::StringRef, llvm::ArrayRef>, llvm::SmallVectorImpl&, void*, llvm::ArrayRef) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x735c620)
#11 0x04090d45 clang::Diagnostic::FormatDiagnostic(char const*, char const*, llvm::SmallVectorImpl&) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x4090d45)
#12 0x04b4302f clang::TextDiagnosticPrinter::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x4b4302f)
#13 0x0409ef1e clang::DiagnosticIDs::EmitDiag(clang::DiagnosticsEngine&, clang::DiagnosticBuilder const&, clang::DiagnosticIDs::Level) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x409ef1e)
#14 0x0409f458 clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&, clang::DiagnosticBuilder const&) const (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x409f458)
#15 0x040900af clang::DiagnosticsEngine::EmitDiagnostic(clang::DiagnosticBuilder const&, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x40900af)
#16 0x0659c164 clang::Sema::EmitDiagnostic(unsigned int, clang::DiagnosticBuilder const&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x659c164)
#17 0x06614d48 clang::SemaBase::ImmediateDiagBuilder::~ImmediateDiagBuilder() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x6614d48)
#18 0x06589d03 clang::SemaBase::SemaDiagnosticBuilder::~SemaDiagnosticBuilder() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x6589d03)
#19 0x067881bd checkNonMultiVersionCompatAttributes(clang::Sema&, clang::FunctionDecl const*, clang::FunctionDecl const*, clang::MultiVersionKind)::'lambda'(clang::Sema&, clang::Attr const*)::operator()(clang::Sema&, clang::Attr const*) const SemaDecl.cpp:0:0
#20 0x067884d1 checkNonMultiVersionCompatAttributes(clang::Sema&, clang::FunctionDecl const*, clang::FunctionDecl const*, clang::MultiVersionKind) SemaDecl.cpp:0:0
#21 0x06788c3f CheckMultiVersionAdditionalRules(clang::Sema&, clang::FunctionDecl const*, clang::FunctionDecl const*, bool, clang::MultiVersionKind) SemaDecl.cpp:0:0
#22 0x067c905d CheckMultiVersionFunction(clang::Sema&, clang::FunctionDecl*, bool&, clang::NamedDecl*&, clang::LookupResult&) SemaDecl.cpp:0:0
#23 0x067cae74 clang::Sema::CheckFunctionDeclaration(clang::Scope*, clang::FunctionDecl*, clang::LookupResult&, bool, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x67cae74)
#24 0x067d13b3 clang::Sema::ActOnFunctionDeclarator(clang::Scope*, clang::Declarator&, clang::DeclContext*, clang::TypeSourceInfo*, clang::LookupResult&, llvm::MutableArrayRef, bool&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x67d13b3)
#25 0

[llvm-bugs] [Bug 129843] LLVM 20 miscompiles `@llvm.ctpop.i128` for `aarch64_be`

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129843




Summary

LLVM 20 miscompiles `@llvm.ctpop.i128` for `aarch64_be`




  Labels
  
backend:AArch64,
regression,
miscompilation
  



  Assignees
  
  



  Reporter
  
  alexrp
  




Consider this Zig program:

```zig
pub fn main() void {
var x: u128 = 0b00011000110001111111100101010001;
_ = &x;
 @import("std").process.exit(@popCount(x));
}
```

Running it with `qemu-aarch64_be` will produce `24` with LLVM 19, but `0` with LLVM 20.

Isolating the `@llvm.ctpop.i128` a bit:

```llvm
; ModuleID = 'BitcodeBuffer'
source_filename = "repro"
target datalayout = "E-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
target triple = "aarch64_be-unknown-linux4.19.0-unknown"

@builtin.zig_backend = internal unnamed_addr constant i64 2, align 8
@start.simplified_logic = internal unnamed_addr constant i1 false, align 1
@builtin.output_mode = internal unnamed_addr constant i2 -2, align 1

; Function Attrs: nosanitize_coverage nounwind skipprofile
define dso_local i32 @repro() #0 {
  %1 = alloca [16 x i8], align 16
  store i128 71803349708323153, ptr %1, align 16
  %2 = load i128, ptr %1, align 16
  %3 = call i128 @llvm.ctpop.i128(i128 %2)
  %4 = trunc i128 %3 to i8
  %5 = zext i8 %4 to i32
  ret i32 %5
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i128 @llvm.ctpop.i128(i128) #1

attributes #0 = { nosanitize_coverage nounwind skipprofile "frame-pointer"="all" "target-cpu"="generic" "target-features"="+enable-select-opt,+ete,+fp-armv8,+fuse-adrp-add,+fuse-aes,+neon,+trbe,+use-postra-scheduler,-addr-lsl-slow-14,-aes,-aggressive-fma,-alternate-sextload-cvt-f32-pattern,-altnzcv,-alu-lsl-fast,-am,-amvs,-arith-bcc-fusion,-arith-cbz-fusion,-ascend-store-address,-avoid-ldapur,-balance-fp-ops,-bf16,-brbe,-bti,-call-saved-x10,-call-saved-x11,-call-saved-x12,-call-saved-x13,-call-saved-x14,-call-saved-x15,-call-saved-x18,-call-saved-x8,-call-saved-x9,-ccdp,-ccidx,-ccpp,-chk,-clrbhb,-cmp-bcc-fusion,-cmpbr,-complxnum,-CONTEXTIDREL2,-cpa,-crc,-crypto,-cssc,-d128,-disable-latency-sched-heuristic,-disable-ldp,-disable-stp,-dit,-dotprod,-ecv,-el2vmsa,-el3,-exynos-cheap-as-move,-f32mm,-f64mm,-f8f16mm,-f8f32mm,-faminmax,-fgt,-fix-cortex-a53-835769,-flagm,-fmv,-force-32bit-jump-tables,-fp16fml,-fp8,-fp8dot2,-fp8dot4,-fp8fma,-fpac,-fprcvt,-fptoint,-fujitsu-monaka,-fullfp16,-fuse-address,-fuse-addsub-2reg-const1,-fuse-arith-logic,-fuse-crypto-eor,-fuse-csel,-fuse-literals,-gcs,-harden-sls-blr,-harden-sls-nocomdat,-harden-sls-retbr,-hbc,-hcx,-i8mm,-ite,-jsconv,-ldp-aligned-only,-lor,-ls64,-lse,-lse128,-lse2,-lsfe,-lsui,-lut,-mec,-mops,-mpam,-mte,-nmi,-no-bti-at-return-twice,-no-neg-immediates,-no-sve-fp-ld1r,-no-zcz-fp,-nv,-occmo,-outline-atomics,-pan,-pan-rwv,-pauth,-pauth-lr,-pcdphint,-perfmon,-pops,-predictable-select-expensive,-predres,-prfm-slc-target,-rand,-ras,-rasv2,-rcpc,-rcpc3,-rcpc-immo,-rdm,-reserve-lr-for-ra,-reserve-x1,-reserve-x10,-reserve-x11,-reserve-x12,-reserve-x13,-reserve-x14,-reserve-x15,-reserve-x18,-reserve-x2,-reserve-x20,-reserve-x21,-reserve-x22,-reserve-x23,-reserve-x24,-reserve-x25,-reserve-x26,-reserve-x27,-reserve-x28,-reserve-x3,-reserve-x4,-reserve-x5,-reserve-x6,-reserve-x7,-reserve-x9,-rme,-sb,-sel2,-sha2,-sha3,-slow-misaligned-128store,-slow-paired-128,-slow-strqro-store,-sm4,-sme,-sme2,-sme2p1,-sme2p2,-sme-b16b16,-sme-f16f16,-sme-f64f64,-sme-f8f16,-sme-f8f32,-sme-fa64,-sme-i16i64,-sme-lutv2,-sme-mop4,-sme-tmop,-spe,-spe-eef,-specres2,-specrestrict,-ssbs,-ssve-aes,-ssve-bitperm,-ssve-fp8dot2,-ssve-fp8dot4,-ssve-fp8fma,-store-pair-suppress,-stp-aligned-only,-strict-align,-sve,-sve2,-sve2-aes,-sve2-bitperm,-sve2-sha3,-sve2-sm4,-sve2p1,-sve2p2,-sve-aes,-sve-aes2,-sve-b16b16,-sve-bfscale,-sve-bitperm,-sve-f16f32mm,-tagged-globals,-the,-tlb-rmi,-tlbiw,-tme,-tpidr-el1,-tpidr-el2,-tpidr-el3,-tpidrro-el0,-tracev8.4,-uaops,-use-experimental-zeroing-pseudos,-use-fixed-over-scalable-if-equal-cost,-use-reciprocal-square-root,-v8.1a,-v8.2a,-v8.3a,-v8.4a,-v8.5a,-v8.6a,-v8.7a,-v8.8a,-v8.9a,-v8a,-v8r,-v9.1a,-v9.2a,-v9.3a,-v9.4a,-v9.5a,-v9.6a,-v9a,-vh,-wfxt,-xs,-zcm,-zcz,-zcz-fp-workaround,-zcz-gp" }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

!llvm.module.flags = !{}
```

Compiling this with `llc repro.ll -O0` with LLVM 19 and 20 yields this codegen diff:

```diff
--- repro.19.s  2025-03-05 08:29:31.485173087 +0100
+++ repro.20.s  2025-03-05 08:29:34.672295525 +0100
@@ -1,5 +1,5 @@
- .text
.file   "repro"
+   .text
.globl  repro // -- Begin function repro
.p2align2
 .type   repro,@function
@@ -16,15 +16,16 @@
mov x8, xzr
 str x8, [sp]
ldr x8, [sp, #8]
-   ldr d1, [sp]
- 

[llvm-bugs] [Bug 129796] [DirectX] Update Root Signature Binary Representation docs to describe Descriptor tables.

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129796




Summary

[DirectX] Update Root Signature Binary Representation docs to describe Descriptor tables.




  Labels
  
new issue
  



  Assignees
  
joaosaffran
  



  Reporter
  
  joaosaffran
  




Update https://github.com/llvm/llvm-project/blob/main/llvm/docs/DirectX/DXContainer.rst file to detail the expected binary representation of Root Signature Root Descriptor tables parameters.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129803] [libc++] `std::variant` introduces padding if a variant member contains a variant

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129803




Summary

[libc++] `std::variant` introduces padding if a variant member contains a variant




  Labels
  
libc++
  



  Assignees
  
  



  Reporter
  
  zygoloid
  




[Testcase](https://godbolt.org/z/r7WEoEzTh):
```c++
#include 

struct A {
  int x;
};

struct B {
  int y;
  int z;
};
static_assert(sizeof(B) == 8);

static_assert(sizeof(std::variant) == 12);

struct C {
 std::variant v;
};
static_assert(sizeof(C) == 8);

static_assert(sizeof(std::variant) == 16);
```

`variant` ought to be only 12 bytes, but is actually 16 bytes. The reason for this is that `std::variant` derives from `__sfinae_ctor_base<...>` and `__sfinae_assign_base<...>`, and those base classes are the *same* for `std::variant` and for `std::variant`.

This prevents the variant's first field (the `__union`) from being put at offset 0 within the variant, because that would mean we have two different `__sfinae_ctor_base<...>` subobjects at the same offset within the same object, and the C++ language rules don't permit that struct layout.

The solution is to change `variant` so that it doesn't derive from a class that is, or can be, independent of the `variant`'s template arguments. Perhaps either change the `__sfinae_...` types to use CRTP (even though they don't care what the derived class is), or remove them and rely on getting the special members' properties from the `__impl` type instead.

Of course, fixing this will break `std::variant`'s ABI, so it'd need to be done only in the unstable ABI. :(

`std::optional` appears to use the same implementation strategy, so I would imagine it has the same deficiency (assuming it puts the `T` first, not the `bool`), but I've not checked. And it looks like `std::tuple` may also suffer from the same issue.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129805] Failure to spot `popcount` idiom

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129805




Summary

Failure to spot `popcount` idiom




  Labels
  
missed-optimization
  



  Assignees
  
  



  Reporter
  
  Kmeakin
  




LLVM does a valiant effort of unrolling and vectorizing these loops, but they're really just `popcount` and it should recognize them as such

```c++
#include 

using u8 = uint8_t;
using u16 = uint16_t;
using u32 = uint32_t;
using u64 = uint64_t;

template 
auto src(T x) -> u64 {
u64 count = 0;
for (u64 i = 0; i < sizeof(T) * 8; i++) {
if (x & ((u64)1 << i)) {
 count++;
}
}
return count;
}

template 
auto tgt(T x) -> u64 {
return __builtin_popcountg(x);
}

extern "C" {
auto src8(u8 x) -> u64 { return src(x); }
auto src16(u8 x) -> u64 { return src(x); }
auto src32(u8 x) -> u64 { return src(x); }
auto src64(u8 x) -> u64 { return src(x); }

auto tgt8(u8 x) -> u64 { return tgt(x); }
auto tgt16(u8 x) -> u64 { return tgt(x); }
auto tgt32(u8 x) -> u64 { return tgt(x); }
auto tgt64(u8 x) -> u64 { return tgt(x); }
}
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129832] Assertion `!isa(static_cast(this)) || cast(static_cast(this))->isLinkageValid()' failed.

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129832




Summary

Assertion `!isa(static_cast(this)) || cast(static_cast(this))->isLinkageValid()' failed.




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  bi6c
  




Compiler Explorer: https://godbolt.org/z/Tb1YaaPs4

```console
:7:12: warning: #pragma redefine_extname is applicable to external C declarations only; not applied to function 'foo' [-Wpragmas]
7 | static int foo(void);
  | ^
clang: /root/llvm-project/clang/include/clang/AST/Decl.h:5157: void clang::Redeclarable::setPreviousDecl(decl_type*) [with decl_type = clang::FunctionDecl]: Assertion `!isa(static_cast(this)) || cast(static_cast(this))->isLinkageValid()' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /opt/compiler-explorer/clang-assertions-trunk/bin/clang -gdwarf-4 -g -o /app/output.s -mllvm --x86-asm-syntax=intel -fno-verbose-asm -S --gcc-toolchain=/opt/compiler-explorer/gcc-snapshot -fcolor-diagnostics -fno-crash-diagnostics 
1.	:8:21: current parser token ';'
 #0 0x03e53898 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x3e53898)
 #1 0x03e51554 llvm::sys::CleanupOnSignal(unsigned long) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x3e51554)
 #2 0x03d9de88 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x7d7f66c42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x7d7f66c969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x7d7f66c42476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x7d7f66c287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x7d7f66c2871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)
 #8 0x7d7f66c39e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
 #9 0x073739ac clang::Redeclarable::setPreviousDecl(clang::FunctionDecl*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x73739ac)
#10 0x074ee3d5 clang::FunctionDecl::setPreviousDeclaration(clang::FunctionDecl*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x74ee3d5)
#11 0x067b7420 clang::Sema::CheckFunctionDeclaration(clang::Scope*, clang::FunctionDecl*, clang::LookupResult&, bool, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x67b7420)
#12 0x067bc5f0 clang::Sema::ActOnFunctionDeclarator(clang::Scope*, clang::Declarator&, clang::DeclContext*, clang::TypeSourceInfo*, clang::LookupResult&, llvm::MutableArrayRef, bool&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x67bc5f0)
#13 0x067c1530 clang::Sema::HandleDeclarator(clang::Scope*, clang::Declarator&, llvm::MutableArrayRef) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x67c1530)
#14 0x067c2070 clang::Sema::ActOnDeclarator(clang::Scope*, clang::Declarator&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x67c2070)
#15 0x0642fd1e clang::Parser::ParseDeclarationAfterDeclaratorAndAttributes(clang::Declarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::ForRangeInit*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x642fd1e)
#16 0x0643f8c9 clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::ParsedAttributes&, clang::Parser::ParsedTemplateInfo&, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x643f8c9)
#17 0x063ff73e clang::Parser::ParseDeclOrFunctionDefInternal(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x63ff73e)
#18 0x063ffef9 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x63ffef9)
#19 0x064076d3 clang::Parser::ParseExternalDeclaration(clang::ParsedAttributes&, clang::ParsedAttributes&, clang::ParsingDeclSpec*) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x64076d3)
#20 0x064085ad clang::Parser::ParseTopLevelDecl(clang::OpaquePtr&, clang::Sema::ModuleImportState&) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x64085ad)
#21 0x063faa3a clang::ParseAST(clang::Sema&, bool, bool) (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x63faa3a)
#22 0x04812598 clang::CodeGenAction::ExecuteAction() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x4812598)
#23 0x04ada245 clang::FrontendAction::Execute() (/opt/compiler-explorer/clang-assertions-trunk/bin/clang+0x4ada245)
#24 0x04a5d92e clang::C

[llvm-bugs] [Bug 129838] [libc] str_to_float_comparison_test should be hermetic

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129838




Summary

[libc] str_to_float_comparison_test should be hermetic




  Labels
  
libc
  



  Assignees
  
  



  Reporter
  
  RossComputerGuy
  




The test in question, https://github.com/llvm/llvm-project/blob/main/libc/test/src/__support/str_to_float_comparison_test.cpp, is not hermetic right now. This causes problems for NixOS/nixpkgs where full builds use clang without a libc. Being able to run all tests without needing the host's libc would be very beneficial.

Relevant log:
```
libc> [1145/1151] Building CXX object libc/test/src/__support/CMakeFiles/libc_str_to_float_comparison_test.dir/str_to_float_comparison_test.cpp.o
libc> FAILED: libc/test/src/__support/CMakeFiles/libc_str_to_float_comparison_test.dir/str_to_float_comparison_test.cpp.o
libc> /nix/store/h3wgz6n8bc4n61vv427xl8cz69vcd96c-clang-wrapper-20.1.0-rc3/bin/clang++ -DLIBC_NAMESPACE=__llvm_libc_20_1_0_rc3  -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wno-comment -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=gnu++17 -MD -MT libc/test/src/__support/CMakeFiles/libc_str_to_float_comparison_test.dir/str_to_float_comparison_test.cpp.o -MF libc/test/src/__support/CMakeFiles/libc_str_to_float_comparison_test.dir/str_to_float_comparison_test.cpp.o.d -o libc/test/src/__support/CMakeFiles/libc_str_to_float_comparison_test.dir/str_to_float_comparison_test.cpp.o -c /build/libc-src-20.1.0-rc3/libc/test/src/__support/str_to_float_comparison_test.cpp
libc> In file included from /build/libc-src-20.1.0-rc3/libc/test/src/__support/str_to_float_comparison_test.cpp:11:
libc> In file included from /nix/store/l71wz2r8ki25kzw33jwssg8rh77xfkpr-gcc-14-20241116/include/c++/14-20241116/stdlib.h:36:
libc> In file included from /nix/store/l71wz2r8ki25kzw33jwssg8rh77xfkpr-gcc-14-20241116/include/c++/14-20241116/cstdlib:41:
libc> In file included from /nix/store/l71wz2r8ki25kzw33jwssg8rh77xfkpr-gcc-14-20241116/include/c++/14-20241116/aarch64-unknown-linux-gnu/bits/c++config.h:680:
libc> /nix/store/l71wz2r8ki25kzw33jwssg8rh77xfkpr-gcc-14-20241116/include/c++/14-20241116/aarch64-unknown-linux-gnu/bits/os_defines.h:39:10: fatal error: 'features.h' file not found
libc>39 | #include 
libc>   |  ^~~~
libc> 1 error generated.
libc> [1146/1151] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.strfromf.dir/strfromf.cpp.o
libc> [1147/1151] Building CXX object libc/src/stdlib/CMakeFiles/libc.src.stdlib.strfromd.dir/strfromd.cpp.o
libc> [1148/1151] Building CXX object libc/src/stdio/printf_core/CMakeFiles/libc.src.stdio.printf_core.converter.dir/converter.cpp.o
libc> ninja: build stopped: subcommand failed.
```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129841] Dead code in MLRegAllocEvictAdvisor.cpp

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129841




Summary

Dead code in MLRegAllocEvictAdvisor.cpp




  Labels
  
mlgo
  



  Assignees
  
  



  Reporter
  
  abhishek-kaushik22
  




In [MLRegAllocEvictAdvisor.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MLRegAllocEvictAdvisor.cpp#L862-L874), the condition `if (CandidatePos == CandidateVirtRegPos)` is checked twice but the first time it's true, the function returns making the second condition check unnecessary.

@boomanaiden154 can you please take a look because this was introduced in 00f692b94f9aa08ede4aaba6f2aafe17857599c4


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129842] [asan] failure to detect memory leaks

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129842




Summary

[asan] failure to detect memory leaks




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  PikachuHyA
  




reproducer
see https://godbolt.org/z/6YGdnn634

```c++
// main.cc
struct Foo {
  struct Foo *other;
};
int main() {
  auto f1 = new Foo();
  auto f2 = new Foo();
  f1->other = f2;
  f2->other = f1;
  return 0;
}

```

However, the following memory leaks detected.

see https://godbolt.org/z/n6jWbYTqY

```c++
struct Foo {
  struct Foo *other;
};
int main() {
  auto f1 = new Foo();
  auto f2 = new Foo();
 f1->other = f2;
  // highlight here
  // f2->other = f1;
  return 0;
}
```


Note: GCC can detect the memory leaks.
if use `-fsanitize=leak`, the memory leaks detected.





___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129845] accepts-invalid with C++23 constexpr-unknown with struct containing reference

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129845




Summary

accepts-invalid with C++23 constexpr-unknown with struct containing reference




  Labels
  
clang:frontend,
c++23,
constexpr
  



  Assignees
  
  



  Reporter
  
  efriedma-quic
  




```
int &ff();
int &x = ff();
struct A { int& x; };
constexpr A g = {x};
const A* gg = &g;
```

Should be rejected, currently accepted.  (And related variations miscompile.)


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129675] [clang-tidy] bugprone-throw-keyword-missing on default member initializer: "did you mean 'throw shared_ptr'?"

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129675




Summary

[clang-tidy] bugprone-throw-keyword-missing on default member initializer: "did you mean 'throw shared_ptr'?"




  Labels
  
clang-tidy
  



  Assignees
  
  



  Reporter
  
  N-Dekker
  




Using LLVM 19.1.7, I encountered a false positive `bugprone-throw-keyword-missing` from [ITK](https://itk.org)'s [itkExceptionObject.h](https://github.com/InsightSoftwareConsortium/ITK/blob/32a2a6de17ffb7c8319ab38dbe61bd3b7c171f00/Modules/Core/Common/include/itkExceptionObject.h), which can be reproduced as follows:

```cpp
#include 
#include 

class MyException : public std::exception
{
public:
 MyException() = default;
private:
  class NestedData;
 std::shared_ptr m_shared_data{};
};

class Bug : public MyException
{
public:
  Bug()
  {
// Non-defaulted default constructor.
  }
};

```

Output:

```
warning: suspicious exception object created but not thrown; did you mean 'throw shared_ptr'? [bugprone-throw-keyword-missing]
   10 |   std::shared_ptr m_shared_data{};
  |^```


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129676] Clang emits not the smallest code with `-Os` for `(unsigned)x >> C1 == C2`

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129676




Summary

Clang emits not the smallest code with `-Os` for `(unsigned)x >> C1 == C2`




  Labels
  
  



  Assignees
  
  



  Reporter
  
  Explorer09
  




```c
unsigned int pred2_rshift(unsigned int x) {
return (x >> 11) == 0x1B;
// 0x1B == (0xD800 >> 11);
}
unsigned int pred2_bitand(unsigned int x) {
 return (x &= ~0x7FF) == 0xD800;
}
```

When tested on Compiler Explorer, x86-64 clang 19.1.0, with `-Os` option, `pred2_rshift` translates to `pred2_bitand` which is slightly larger code.

My expected result is the right shift should be used, as I specify `-Os` I expect smallest code size.

Note: The example code is part of [a report I reported to GCC](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115529). When checking whether an integer is in a specified range, and the range happens to be aligned to a power of two, then all of these comparisons can do the same thing:

```c
unsigned int pred2(unsigned int x) {
return x >= 0xD800 && x <= 0xDFFF;
}
unsigned int pred2_sub(unsigned int x) {
return (x - 0xD800) <= (0xDFFF - 0xD800);
}
unsigned int pred2_bitand(unsigned int x) {
return (x &= ~0x7FF) == 0xD800;
}
unsigned int pred2_bitor(unsigned int x) {
return (x |= 0x7FF) == 0xDFFF;
}
unsigned int pred2_rshift(unsigned int x) {
return (x >>= 11) == (0xD800 >> 11);
}
unsigned int pred2_div(unsigned int x) {
return (x / 0x800) == (0xD800 / 0x800);
}
```

While Clang can recognize _all_ of these as equivalent (good job, by the way), it made a strange decision on which code to emit. While I can't tell which one is best for speed (performance), I can figure out which one is the smallest size.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129671] Confusing behaviour command line options with default value 'true'

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129671




Summary

Confusing behaviour command line options with default value 'true'




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  JVApen
  




In clang-include-cleaner, there are some options that have a default value 'true'.

For example:
https://github.com/llvm/llvm-project/blob/03505a004ff6909c46d6b8c498a9ffccd47d88a0/clang-tools-extra/include-cleaner/tool/IncludeCleaner.cpp#L100-L105

When using the --help for it, it tells the following:

USAGE: clang-include-cleaner.exe [options]  [... ]

OPTIONS:
 ...
  --remove- Allow header removals
  --version - Display the version of this program

This seems to imply that you have to add '--remove' in order to active the example option.
However, this is enabled by default. If you don't want the 'removal' behavior, you have to add `--remove=false` to the command line. This is nowhere to be found in the help message.

Can the command line output be improved such that default options are somehow indicated and it is easy to see how to disable them? For example ` --remove  - Allow header removals (Default, use --remove=false to disable)`


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129779] [flang] surprising performance loss with nested type operator overloading

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129779




Summary

[flang] surprising performance loss with nested type operator overloading




  Labels
  
flang
  



  Assignees
  
  



  Reporter
  
  ivan-pi
  




I've attempted to create a performance benchmark which sums an array of numbers, but in different ways to measure the overhead of operator overloading for simple value types:

[abstraction_penalty.F90.txt](https://github.com/user-attachments/files/19077916/abstraction_penalty.F90.txt)

When I run the program, I see the output:

```
$ flang-new -O2 abstraction_penalty.F90 
$ ./a.out
[info] compiler: Homebrew flang version 19.1.4 (https://github.com/Homebrew/homebrew-core/issues)
[info] compiler options: flang-new -O2 abstraction_penalty.F90
[info] using naive sum
[info] number of iterations: 25000

testabsolute additions  ratio with
  number  time (sec)  per second   test0

 0  0.0532   9.400E+02   1.000
   1  0.0498 1.003E+03   0.937
   2  0.0493   1.015E+03   0.926
 3  0.0526   9.511E+02   0.988
   4  0.0595 8.410E+02   1.118
   5  0.0515   9.700E+02   0.969
 6  0.0486   1.029E+03   0.913
   7  0.0485 1.031E+03   0.912
   8  0.0490   1.020E+03   0.922
 9  0.0472   1.059E+03   0.888
  10  0.0483 1.036E+03   0.907
  11  0.0485   1.031E+03   0.912
 12  0.0479   1.044E+03   0.901
  13  0.0481 1.039E+03   0.905
  14  6.7735   7.382E+00 127.336
 15  6.7167   7.444E+00 126.267
  16  0.0467 1.071E+03   0.878
  17  0.0452   1.105E+03   0.850
 18  0.0451   1.108E+03   0.849
  19  0.0452 1.105E+03   0.850
  20  0.0476   1.050E+03   0.895
 21  0.0469   1.066E+03   0.882
  22  0.0467 1.071E+03   0.877
  23  0.0461   1.086E+03   0.866
 24  0.0454   1.101E+03   0.853
  25  0.0452 1.105E+03   0.851
  26  0.0456   1.097E+03   0.857
 27  0.0454   1.102E+03   0.853
  28  6.6540 7.514E+00 125.089
  29  6.5274   7.660E+00 122.709

mean 0.0928   5.386E+021.75
```

The slow cases (14, 15, 28, 29) are calling the procedure `test_ddd`, which calls `dsum` for the `type(ddd)`, which is really just a double value but defined in a obscure way:

```fortran
integer, parameter :: dp = c_double

! Double wrapper
type :: dd
real(dp) :: val
end type

! Double wrapper child with TBP
type, extends(dd) :: ddi
contains
 procedure :: get => get_ddi_val
end type

! Double wrapper wrapper
type :: ddd
type(dd) :: val
end type
```

The sum procedure looks as follows:
```fortran
pure function ddd_sum(a) result(res)
type(ddd), intent(in) :: a(:)
type(ddd) :: res
real(dp), pointer :: t(:)
#if USE_INTRINSIC_SUM
 res%val%val = sum(a%val%val)
#else
integer :: i
res = ddd(dd(0.0_dp))
do i = 1, size(a)
res = res + a(i)
 end do
#endif
end function
``` 
where the `+` is the overloaded `operator(+)` defined as,

```fortran
pure function ddd_add(a,b) result(c)
type(ddd), intent(in) :: a, b
type(ddd) :: c
 c%val%val = a%val%val + b%val%val
end function
```

If the intrinsic sum (`-DUSE_INTRINSIC_SUM`) is used instead, there are no observable penalties. There are other switches too, namely `-DUSE_INTRINSIC_REDUCE` which displays good performance, and `-DUSE_STRUCTURE_CONSTRUCTOR` which makes the performance even worse (300x slower than the baseline). 


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129813] Documented option file for each clang-tidy option

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129813




Summary

Documented option file for each clang-tidy option




  Labels
  
clang-tidy
  



  Assignees
  
  



  Reporter
  
  martinlicht
  




The online documentation lists all available checks and their options:
https://clang.llvm.org/extra/clang-tidy/checks/list.html

How about providing a comprehensive list of clang-tidy checks and their options in the form of a config file? 

The default option should be some reasonable standard and the file should include some brief explanation (or link) in the comments for each option. 

Possible Example
```
# https://clang.llvm.org/extra/clang-tidy/checks/bugprone/argument-comment.html
 - key: bugprone-argument-comment.StrictMode
value: false
'''

I would love to have access to such a complete configuration file simply for the sake of playing around. Maybe there is a way to automate the generation of such a file?


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129815] (When) does Clang respect noinline, and how?

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129815




Summary

(When) does Clang respect noinline, and how?




  Labels
  
clang
  



  Assignees
  
  



  Reporter
  
  higher-performance
  




This issue appears to exist with GCC and MSVC as well, but (in my various attempts) Clang appears to be the least willing to respect noinline.

Consider this code:
```
#include 

#if defined(_MSC_VER)
#define NOINLINE [[msvc::noinline]]
#elif defined(__clang__)
#define NOINLINE [[gnu::noinline]]
#elif defined(__GNUC__)
#define NOINLINE [[gnu::noinline]]
#else
#error unable to prevent inlining
#endif

using R = int;
using P = R*;

NOINLINER bar1(P arg);
NOINLINER bar2(P arg) { return arg ? 1 : 0; }
NOINLINE static R bar3(P arg) { return arg ? 1 : 0; }

R foo1(int x) { return (x ? 0 : bar1(&x)); }
R foo2(int x) { return (x ? 0 : bar2(&x)); }
R foo3(int x) { return (x ? 0 : bar3(&x)); }
```
In this code, all the `fooN` are equivalent, and:
- Must contain calls to `barN` because all `barN` are noinline (mandatory)
- Should result in identical codegen (optional, but preferable)

Instead, [what I see](https://godbolt.org/z/G9zjYY1fe) is:
- With the exception of `foo2` & `foo3` on MSVC, no pair of `fooN` result in identical codegen on any compiler
- None of the compilers produce a `call` instruction in `foo3`, implying that `noinline` isn't guaranteeing the generation of a new stack frame
- Clang is the only compiler that **completely elides** any reference to any `barN` (see `foo3`), making the `noinline` function disappear entirely in some cases.

```
Clang   │ GCC  │ MSVC
┿━━┿
foo1:   │ foo1:│ foo1:
  push rax  │   sub  rsp, 24   │   mov  [rsp+8], ecx
  mov  [rsp+4], edi │   xor  eax, eax  │   sub  rsp, 40
  xor  eax, eax │   mov  [rsp+12], edi │   test ecx, ecx
  test edi, edi │   test edi, edi  │   je   LABEL
  je   LABEL│   je   LABEL │   xor  eax, eax
  pop  rcx  │   add  rsp, 24   │   add  rsp, 40
  ret   │   ret│   ret  0
LABEL:  │ LABEL:   │ LABEL:
  lea  rdi, [rsp+4] │   lea  rdi, [rsp+12] │   lea  rcx, [rsp]
  call bar1@PLT │   call bar1  │   call bar1
  pop  rcx  │   add  rsp, 24   │   add  rsp, 40
  ret   │   ret│   ret  0
┼──┼
foo2:   │ foo2:│ foo2:
  xor  eax, eax │   test edi, edi  │   test ecx, ecx
  test edi, edi │   jne  LABEL │   je   LABEL
  je   LABEL│   sub  rsp, 8│   xor  eax, eax
  ret   │   lea  rdi, [rsp+4]  │   ret  0
LABEL:  │   call bar2  │ LABEL:
  push rax  │   add  rsp, 8│   lea  rcx, [rsp]
  lea  rdi, [rsp+4] │   ret│   jmp  bar2
  call bar2 │ LABEL:   │
  add  rsp, 8   │   xor  eax, eax  │
  ret   │   ret│
┼──┼
foo3:   │ foo3:│ foo3:
  xor  eax, eax │   test edi, edi  │   test ecx, ecx
  test edi, edi │   jne  LABEL │   je   LABEL
  sete al   │   lea  rdi, [rsp-4]  │   xor  eax, eax
  ret   │   jmp  bar2  │   ret  0
│ LABEL:   │ LABEL:
│   xor  eax, eax  │   lea  rcx, [rsp]
│   ret│   jmp  bar3
```

Note that I haven't tried LTO yet, but I imagine that would produce interesting results as well.

This made me wonder:
- What _are_ the precise semantics of `noinline`? i.e. what guarantee(s) can users actually rely on when using `noinline`, if any?
- Are these behaviors bugs, or intended behavior?


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129812] Early exit optimization of Fortran array expressions

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129812




Summary

Early exit optimization of Fortran array expressions




  Labels
  
new issue
  



  Assignees
  
  



  Reporter
  
  ivan-pi
  




Consider a function for checking if an array is sorted:
```fortran
!
! Check an array of integers is sorted in ascending order
!
logical function is_sorted_scalar(n,a) result(is_sorted)
 integer, intent(in) :: n
integer, intent(in) :: a(n)
integer :: i
 !$omp simd simdlen(8) early_exit
do i = 2, n
if (a(i) < a(i-1)) then
is_sorted = .false.
return
 end if
end do
is_sorted = .true.
end function

logical function is_sorted_all(n,a) result(is_sorted)
integer, intent(in) :: n
 integer, intent(in) :: a(n)
is_sorted = all(a(2:n) >= a(1:n-1))
end function

program benchmark

implicit none
integer, allocatable :: a(:)

integer :: i, n

external :: is_sorted_scalar
 external :: is_sorted_all

logical :: is_sorted_scalar
logical :: is_sorted_all

character(len=32) :: str
integer :: tmp

tmp = 0
n = 2

if (command_argument_count() > 0) then
 call get_command_argument(1,str)
read(str,*) tmp
if (tmp > 0) n = tmp
end if
print *, "n = ",  n

allocate(a(n))

 ! Fill ascending numbers
do i = 1, n
a(i) = i
end do

 ! Introduce an unsorted value
a(100) = 1001
!a(101) = 1000

 call measure(10,a,is_sorted_scalar,"scalar")
call measure(10,a,is_sorted_all,   "all")

contains

impure subroutine measure(nreps,a,func,name)
integer, intent(in) :: nreps
 integer, intent(in) :: a(:)
logical :: func
 character(len=*), intent(in) :: name
integer(8) :: t1, t2, rate
 real(kind(1.0d0)) :: elapsed
logical :: res

 character(len=12) :: str

integer :: k
call system_clock(t1)
do k = 1, nreps
res = func(size(a),a)
end do
call system_clock(t2,rate)

 elapsed = (t2 - t1)/real(rate,kind(elapsed))

str = adjustl(name)
print '(A12,F12.4,L2)', str, elapsed/nreps*1.e6, res

! Time is in microseconds

end subroutine

end program
```

It appears to me that in `is_sorted_all` flang generates a temporary array for the `a(2:n) >= a(1:n-1)` _expression_, and then performs the `all` reduction. This is fast due to vectorization, but it missed the chance of early exit. 

The effect is noticeable in the runtime:
```
~/fortran/is_sorted$ make FC=flang-new FFLAGS="-O2 -march=native" standalone
flang-new -O2 -march=native -o standalone standalone.f90
~/fortran/is_sorted$ ./standalone 
 n =  2
scalar 0.0673 F
all   1.7358 F
~/fortran/is_sorted$ make clean
rm -rf *.o benchmark standalone
~/fortran/is_sorted$ make FC=gfortran FFLAGS="-O2 -march=native" standalone
gfortran -O2 -march=native -o standalone standalone.f90
~/fortran/is_sorted$ ./standalone 
 n = 2
scalar0.0389 F
all   0.0390 F
```

It would be nice if early exit vectorization were also supported (https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690). With x86 SIMD extensions this still has to be done manually it seems: http://0x80.pl/notesen/2018-04-11-simd-is-sorted.html


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs


[llvm-bugs] [Bug 129816] [LLD] Support placing OVERLAY in a specific MEMORY region in linker scripts

2025-03-04 Thread LLVM Bugs via llvm-bugs


Issue

129816




Summary

[LLD] Support placing OVERLAY in a specific MEMORY region in linker scripts




  Labels
  
lld
  



  Assignees
  
mysterymath
  



  Reporter
  
  Prabhuk
  




```
OVERLAY OVERLAY_ADDR : {
.overlay1 { *(overlay1*) }
.overlay { *(overlay2*) }
} > MEM_REGION
```

In LLD, the `> MEM_REGION` semantic for OVERLAY is unsupported currently. This issue tracks adding support for this feature in LLD.


___
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs