[Bug tree-optimization/98982] New: Optimizing loop variants of fixed-byte-order functions

2021-02-06 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98982

Bug ID: 98982
   Summary: Optimizing loop variants of fixed-byte-order functions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: j...@jak-linux.org
  Target Milestone: ---

These functions actually generate loops whereas they should just be optimized
to no-ops.

It's the classic byte-read/write idiom for little endian encoded integers, just
written as a loop, to accomodate varying sizeof(T) in the template.

template 
struct little_endian
{
   uint8_t data[sizeof(T)];

   constexpr little_endian() : data{}
   {
   }
   constexpr little_endian(T in)
   {
  for (size_t i = 0; i < sizeof(T); i++)
data[i] = (in >> (8 * i)) & 0xFF;
   }
   constexpr operator T() const
   {
  T res = 0;
  for (size_t i = 0; i < sizeof(T); i++)
res |= static_cast(data[i]) << (8u * i);
  return res;
   }
};


-O3 or -funroll-loops correctly unrolls the encoding (the constructor), but the
operator T() still ends up being

▸   endbr64
▸   movabsq▸$72057594037927935, %rdx
▸   movq▸   %rdi, %rax
▸   shrq▸   $56, %rax
▸   andq▸   %rdi, %rdx
▸   salq▸   $56, %rax
▸   orq▸%rdx, %rax
▸   ret

Seems weird to me.

[Bug tree-optimization/98982] Optimizing loop variants of fixed-byte-order functions

2021-02-06 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98982

--- Comment #1 from Julian Andres Klode  ---
Verified with gcc version 11.0.0 20210203 (experimental) [master revision
e8c87bc07b5:def4749fcba:84110515b93a6709de24240d6658ac207db5129f] (Ubuntu
11-20210203-0ubuntu1)

[Bug tree-optimization/98982] Optimizing loop variants of fixed-byte-order functions

2021-02-06 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98982

--- Comment #2 from Julian Andres Klode  ---
clang works perfectly, fwiw. I was told it has a separate pass dedicated to
just recognizing loop idioms and optimizing them. I have no idea if gcc has
that, but it seems useful.

[Bug middle-end/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #8 from Julian Andres Klode  ---
(In reply to Andrew Pinski from comment #6)
> If you provide the preprocessed source a lot of the time we can do the rest
> of getting a runtime Testcase and reducing it but we need the preprocessed
> source to begin with.

Hey, to me it seems the .ii in the linked tarball is the preprocessed source?


$ g++ -O2 -o https http-http.ii  -lapt-pkg -lssl -lsystemd  -lcrypto -lseccomp
$ rm /tmp/InRelease ; ( cat ../test-input; sleep 60 ) | ./https 

I'll link again:

https://magenta.jak-linux.org/bin/b0752bf2386d09d013a352c3dd8f6004-test-data.tar.gz

I can't attach it here, as it's too big.

[Bug middle-end/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #10 from Julian Andres Klode  ---
Created attachment 60197
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60197&action=edit
Partially minimized by cvise to 48k lines

I made it exit 42 because I could not insert an assert into the preprocessed
code, and attaching the snapshot of 48k lines as I'm not sure how long I have
that instance and it's getting late in my timezone.

[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #14 from Julian Andres Klode  ---
(In reply to Andrew Pinski from comment #11)
> >These may be related (do q31 and v31 overlap?)
> 
> Q31 and V31 are the same register just different views into it. That is q31
> is the full 128bit view while v31.4s is the 32bit x 4 element vector view of
> the register. That is just storing 0 into those 2 locations.

Indeed, I'm not sure that's the affected location, after all the user who
reported it tried it on other arm64 machines and those worked correctly; and I
tried running the code on Graviton and valgrind on Graviton and only the former
got 0, so it's likely not as simple as a simple initialization mishap.

We're calling OpenSSL which is probably doing a whole thing of optimized code
in the https calls, it's possible some register is not in the state we expect
it to be in after OpenSSL returns; but idk.

Lots of unknown.(In reply to Andrew Pinski from comment #12)
> A few things to test out first:
> Is it -fsanitize=address clean?

I can try some more stuff tomorrow!

> 
> Does -fno-lifetime-dse help? Does -fstack-reuse=none help? What about both
> together?

I localized it to the constructor as an attribute:

   RequestState(BaseHttpMethod * const Owner, ServerState * const Server)
__attribute__((optimize("no-lifetime-dse"))):

   RequestState(BaseHttpMethod * const Owner, ServerState * const Server)
__attribute__((optimize("stack-reuse=none"))):

and both of these fix the problem individually as well.

[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #15 from Julian Andres Klode  ---
I have the answer(In reply to Andrew Pinski from comment #12)
> A few things to test out first:
> Is it -fsanitize=address clean?

I managed to recompile it with that option and annoyingly, that also prevents
the bug from happening (just like valgrind) :)

[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64

2025-01-18 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #18 from Julian Andres Klode  ---
(In reply to Sam James from comment #17)
> (In reply to Julian Andres Klode from comment #16)
> 
> I built apt and couldn't reproduce it using this yet with tip of 14 release
> branch and trunk too.

I need to try building gcc-14 branch itself and see for myself, also if I can
still reproduce it on AWS, then I can bisect it to the commit introducing the
issue.

Meanwhile, my initial attempt at cvise on the full .ii was not fruitful as I
mocked that up using if (Encoding != Closes) exit(42) and cvise helpfully
trimmed it down to Encoding != Closes; exit(42) - that is, it always failed.

I started a second attempt where I split out the offending
BaseHttpMethod::Loop() into its own source file and then cvised that and that
reduced it to:
void BaseHttpMethod::Loop() {
  while (1) {
Run();
Server = CreateServerState(URI(Queue->Uri));
Server->Open();
RequestState Req(this, Server.get());
switch (Server->RunHeaders(Req, Queue->Uri));
  }
}


When downloading over https (i.e. Server->Open() makes OpenSSL calls), I get
the error, without OpenSSL calls, I do not get the error, so it sounds like
something messes up (register?) state there.

This boils down to:

isb
// 0 "" 2
// basehttp-loop.cc:12: time(&Date);
#NO_APP
ldr x0, [sp, 16]//, %sfp
// basehttp-loop.cc:11:   : Server() {
str d15, [sp, 48]   // tmp332, MEM  [(void
*)&Req]
// basehttp-loop.cc:12: time(&Date);
bl  time//
// basehttp-loop.cc:13: assert(Encoding == Closes);
ldr w0, [sp, 48]//, Req.Encoding
cmp w0, 1   // Req.Encoding,
bne .L56//,
// /usr/include/aarch64-linux-gnu/bits/stdio2.h:118:   return __printf_chk
(__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
adrpx1, .LC3// tmp267,
mov w0, 2   //,
add x1, x1, :lo12:.LC3  //, tmp267,
bl  __printf_chk//
.LEHE3:
// basehttp-loop.cc:41: __asm__("isb");
#APP
// 41 "basehttp-loop.cc" 1
isb

Now per ARM, `D part of V8-V15 need to be saved`, but the only other mention of
`d15` is 

ldr d15, [x0, #:lo12:.LC4]  // tmp332,

earlier in the function way before we call any of the other functions.

The question I'm not sure is D part of V15 caller-saved or callee-saved?

[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64

2025-01-18 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #20 from Julian Andres Klode  ---
OK sorry folks, further debugging on that hunch on d15 from the minimized code
led me to build libcrypto without assembler code and now apt works correctly;
so my guess here really is that the hand-written OpenSSL asm code is wrong, and
the gcc change really just makes it use the registers that they forgot to save
and restore.

[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64

2025-01-18 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

Julian Andres Klode  changed:

   What|Removed |Added

   See Also||https://github.com/openssl/
   ||openssl/issues/26466

--- Comment #22 from Julian Andres Klode  ---
(In reply to Sam James from comment #21)
> (In reply to Julian Andres Klode from comment #20)
> > OK sorry folks, further debugging on that hunch on d15 from the minimized
> > code led me to build libcrypto without assembler code and now apt works
> > correctly; so my guess here really is that the hand-written OpenSSL asm code
> > is wrong, and the gcc change really just makes it use the registers that
> > they forgot to save and restore.
> 
> No worries. Can you cross-link the bugs when you file one with OpenSSL?
> Thanks.

[Bug c++/118537] New: [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

Bug ID: 118537
   Summary: [14 Regression; 20241102 -> 20241128] wrong
initialization of member variable on Graviton [related
to tree-slp-vectorize]
   Product: gcc
   Version: 14.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: j...@jak-linux.org
  Target Milestone: ---

I did not have time for a full debug and minimal reproducer yet (I had a
visitor all day after the first 1-2 hours debugging it this morning :D), but
maybe this already is useful to someone:

APT in Debian recently received a bug report that it was failing to do https
downloads on Graviton instances were failing. The struct RequestState has a
member defined as

enum {Chunked,Stream,Closes} Encoding = Closes;

Running on Graviton, the constructor - inlined - resulted in the member
initializing with 0 (Chunked) instead of 2 (Closes). Valgrind on Graviton and
other arm64 platforms did not see the same issue.

Debugging this on Ubuntu revealed a regression in a gcc-14 update, particular
in one of the commits (14.2.0-8ubuntu1 was good, 14.2.0-9ubuntu1 with the
following changes is bad):

  * Update to git 20241128 from the gcc-14 branch.
- Fix PR target/117564 (PA), PR target/117525 (PA), PR target/117408 (ARM),
  PR target/117443 (PA), PR target/116999 (AArch64),
  PR target/117045 (AArch64), PR target/116629 (AArch64),
  PR target/116371 (AArch64), PR target/117304 (x86),
  PR target/116371 (AArch64), PR tree-optimization/117398,
  PR tree-optimization/117287, PR middle-end/117354, PR target/117296,
  PR ada/117328, PR ada/113868, PR ada/113036, PR c++/117158,
  PR c++/101463, PR c++/116634, PR fortran/115700, PR fortran/115700,
  PR fortran/115070, PR fortran/115348, PR libstdc++/117520,
  PR libstdc++/117406, PR target/117562 (x86), PR target/117744 (AVR),
  PR other/116603, PR testsuite/109360, PR target/117357 (x86),
  PR target/117659 (AVR), PR middle-end/116997, PR target/117418 (x86),
  PR target/117500 (AVR), PR ada/115917, PR analyzer/115724,
  PR analyzer/111475, PR fortran/117763, PR fortran/84869,
  PR fortran/116388, PR fortran/109345, PR fortran/104819,
  PR fortran/115700, PR fortran/115700, PR fortran/115070,
  PR fortran/115348, PR fortran/79685, PR fortran/84868, PR fortran/100273,
  PR fortran/116530, PR fortran/108889, PR modula2/116378,
  PR modula2/116181, PR modula2/115823, PR modula2/116048,
  PR modula2/115957, PR modula2/115804, PR modula2/115540,
  PR modula2/115536, PR modula2/114529, PR modula2/115057,
  PR modula2/115003, PR preprocessor/117118, PR modula2/115276.

In the attached diff of the instructions - which have been amended with
__asm__("isb sy"); around the call of the constructor to find it we see a
notable difference:

+movi   v31.4s, #0x0
[...]
+stur   q31, [x2, #232]
+stur   q31, [x2, #248]

These may be related (do q31 and v31 overlap?), certainly no vector
instructions were used before which led me to specify
optimize("no-tree-slp-vectorize") on the function the constructor is being
called from; which worked around the issue.

For sake of completeness, A workaround has been shipped in APT 2.9.23 that
simply changes the enum order such that 0 is the correct default value:
https://salsa.debian.org/apt-team/apt/-/commit/98ddccc

[Bug c++/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #1 from Julian Andres Klode  ---
Created attachment 60194
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60194&action=edit
diff between the bad and good cc1plus versions

Attaching the diff between the bad and good cc1plus versions.

I only replaced cc1plus for testing purposes, further changes were not needed.

[Bug middle-end/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #4 from Julian Andres Klode  ---
I only have a huge ton of code so far, I can't minimize it easily, as it
requires actually getting to the point where the constructor is called in the
loop of the https downloader which happens _after_ headers are read!

That said,

https://magenta.jak-linux.org/bin/b0752bf2386d09d013a352c3dd8f6004-test-data.tar.gz

contains the -save-temps output with as minimized arguments as possible, and a
single compilation unit.

$ /usr/bin/c++ -v -save-temps -g -O2
-I/home/ubuntu/apt/obj-aarch64-linux-gnu/include  methods/http.cc -o http 
-lapt-pkg /usr/lib/aarch64-linux-gnu/libseccomp.so 
/usr/lib/aarch64-linux-gnu/libssl.so  /usr/lib/aarch64-linux-gnu/libsystemd.so 
/usr/lib/aarch64-linux-gnu/libcrypto.so
Using built-in specs.   
COLLECT_GCC=/usr/bin/c++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-linux-gnu/14/lto-wrapper   
OFFLOAD_TARGET_NAMES=nvptx-none 
OFFLOAD_TARGET_DEFAULT=1
Target: aarch64-linux-gnu   
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
14.2.0-12ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-14/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr
--with-gcc-major-version-only --program-suffix=-14 --program-prefix=aar
ch64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-ab
i=new --enable-libstdcxx-backtrace --enable-gnu-unique-object
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enab
le-fix-cortex-a53-843419 --disable-werror
--enable-offload-targets=nvptx-none=/build/gcc-14-5QJBTv/gcc-14-14.2.0/debian/tmp-nvptx/usr
--enable-offload-defaulted --without-cuda-driver --enable-checking=release
--build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-g
nu --with-build-config=bootstrap-lto-lean --enable-link-serialization=4 
Thread model: posix 
Supported LTO compression algorithms: zlib zstd 
gcc version 14.2.0 (Ubuntu 14.2.0-12ubuntu1)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-O2' '-I'
'/home/ubuntu/apt/obj-aarch64-linux-gnu/include' '-o' 'http'
'-foffload-options=-l_GCC_m' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'
'-dumpdir' 'http-'  
 /usr/libexec/gcc/aarch64-linux-gnu/14/cc1plus -E -quiet -v -I
/home/ubuntu/apt/obj-aarch64-linux-gnu/include -imultiarch aarch64-linux-gnu
-D_GNU_SOURCE methods/http.cc -mlittle-endian -mabi=lp64
-foffload-options=-l_GCC_m -g -fworking-directory -O2 -fpch-preprocess
-D_FORTIFY_SOURC
E=3 -fasynchronous-unwind-tables -fstack-protector-strong -Wformat
-Wformat-security -fstack-clash-protection -o http-http.ii  
ignoring duplicate directory "/usr/include/aarch64-linux-gnu/c++/14"
ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"   
ignoring nonexistent directory
"/usr/lib/gcc/aarch64-linux-gnu/14/include-fixed/aarch64-linux-gnu" 
ignoring nonexistent directory
"/usr/lib/gcc/aarch64-linux-gnu/14/include-fixed"   
ignoring nonexistent directory
"/usr/lib/gcc/aarch64-linux-gnu/14/../../../../aarch64-linux-gnu/include"   
#include "..." search starts here:
#include <...> search starts here:
 /home/ubuntu/apt/obj-aarch64-linux-gnu/include
 /usr/include/c++/14
 /usr/include/aarch64-linux-gnu/c++/14
 /usr/include/c++/14/backward
 /usr/lib/gcc/aarch64-linux-gnu/14/include
 /usr/local/include
 /usr/include/aarch64-linux-gnu
 /usr/include
End of search list.
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-O2' '-I'
'/home/ubuntu/apt/obj-aarch64-linux-gnu/include' '-o' 'http'
'-foffload-options=-l_GCC_m' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'
'-dumpdir' 'http-'
 /usr/libexec/gcc/aarch64-linux-gnu/14/cc1plus -fpreprocessed http-http.ii
-D_FORTIFY_SOURCE=3 -quiet -dumpdir http- -dumpbase http.cc -dumpbase-ext .cc
-mlittle-endian -mabi=lp64 -g -O2 -version -foffload-options=-l_GCC_m
-fasynchronous-unwind-tables -fstack-protector-strong -Wformat
-Wformat-security -fstack-clash-protection -o http-http.s
GNU C++17 (Ubuntu 14.2.0-12ubuntu1) version 14.2.0 (aarch64-linux-gnu)
compiled by GNU C version 14.2.0, GMP version 6.3.0, MPFR version
4

[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64

2025-01-17 Thread jak--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537

--- Comment #16 from Julian Andres Klode  ---
Created attachment 60200
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60200&action=edit
Reduced to 55k

The test case is now reduced to 55k bytes from 2.9M; attaching it if someone
wants to look at it in the meantime, I'm afk now before I reach 1 am :D