[Bug tree-optimization/98982] New: Optimizing loop variants of fixed-byte-order functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98982 Bug ID: 98982 Summary: Optimizing loop variants of fixed-byte-order functions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: j...@jak-linux.org Target Milestone: --- These functions actually generate loops whereas they should just be optimized to no-ops. It's the classic byte-read/write idiom for little endian encoded integers, just written as a loop, to accomodate varying sizeof(T) in the template. template struct little_endian { uint8_t data[sizeof(T)]; constexpr little_endian() : data{} { } constexpr little_endian(T in) { for (size_t i = 0; i < sizeof(T); i++) data[i] = (in >> (8 * i)) & 0xFF; } constexpr operator T() const { T res = 0; for (size_t i = 0; i < sizeof(T); i++) res |= static_cast(data[i]) << (8u * i); return res; } }; -O3 or -funroll-loops correctly unrolls the encoding (the constructor), but the operator T() still ends up being ▸ endbr64 ▸ movabsq▸$72057594037927935, %rdx ▸ movq▸ %rdi, %rax ▸ shrq▸ $56, %rax ▸ andq▸ %rdi, %rdx ▸ salq▸ $56, %rax ▸ orq▸%rdx, %rax ▸ ret Seems weird to me.
[Bug tree-optimization/98982] Optimizing loop variants of fixed-byte-order functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98982 --- Comment #1 from Julian Andres Klode --- Verified with gcc version 11.0.0 20210203 (experimental) [master revision e8c87bc07b5:def4749fcba:84110515b93a6709de24240d6658ac207db5129f] (Ubuntu 11-20210203-0ubuntu1)
[Bug tree-optimization/98982] Optimizing loop variants of fixed-byte-order functions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98982 --- Comment #2 from Julian Andres Klode --- clang works perfectly, fwiw. I was told it has a separate pass dedicated to just recognizing loop idioms and optimizing them. I have no idea if gcc has that, but it seems useful.
[Bug middle-end/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #8 from Julian Andres Klode --- (In reply to Andrew Pinski from comment #6) > If you provide the preprocessed source a lot of the time we can do the rest > of getting a runtime Testcase and reducing it but we need the preprocessed > source to begin with. Hey, to me it seems the .ii in the linked tarball is the preprocessed source? $ g++ -O2 -o https http-http.ii -lapt-pkg -lssl -lsystemd -lcrypto -lseccomp $ rm /tmp/InRelease ; ( cat ../test-input; sleep 60 ) | ./https I'll link again: https://magenta.jak-linux.org/bin/b0752bf2386d09d013a352c3dd8f6004-test-data.tar.gz I can't attach it here, as it's too big.
[Bug middle-end/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #10 from Julian Andres Klode --- Created attachment 60197 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60197&action=edit Partially minimized by cvise to 48k lines I made it exit 42 because I could not insert an assert into the preprocessed code, and attaching the snapshot of 48k lines as I'm not sure how long I have that instance and it's getting late in my timezone.
[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #14 from Julian Andres Klode --- (In reply to Andrew Pinski from comment #11) > >These may be related (do q31 and v31 overlap?) > > Q31 and V31 are the same register just different views into it. That is q31 > is the full 128bit view while v31.4s is the 32bit x 4 element vector view of > the register. That is just storing 0 into those 2 locations. Indeed, I'm not sure that's the affected location, after all the user who reported it tried it on other arm64 machines and those worked correctly; and I tried running the code on Graviton and valgrind on Graviton and only the former got 0, so it's likely not as simple as a simple initialization mishap. We're calling OpenSSL which is probably doing a whole thing of optimized code in the https calls, it's possible some register is not in the state we expect it to be in after OpenSSL returns; but idk. Lots of unknown.(In reply to Andrew Pinski from comment #12) > A few things to test out first: > Is it -fsanitize=address clean? I can try some more stuff tomorrow! > > Does -fno-lifetime-dse help? Does -fstack-reuse=none help? What about both > together? I localized it to the constructor as an attribute: RequestState(BaseHttpMethod * const Owner, ServerState * const Server) __attribute__((optimize("no-lifetime-dse"))): RequestState(BaseHttpMethod * const Owner, ServerState * const Server) __attribute__((optimize("stack-reuse=none"))): and both of these fix the problem individually as well.
[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #15 from Julian Andres Klode --- I have the answer(In reply to Andrew Pinski from comment #12) > A few things to test out first: > Is it -fsanitize=address clean? I managed to recompile it with that option and annoyingly, that also prevents the bug from happening (just like valgrind) :)
[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #18 from Julian Andres Klode --- (In reply to Sam James from comment #17) > (In reply to Julian Andres Klode from comment #16) > > I built apt and couldn't reproduce it using this yet with tip of 14 release > branch and trunk too. I need to try building gcc-14 branch itself and see for myself, also if I can still reproduce it on AWS, then I can bisect it to the commit introducing the issue. Meanwhile, my initial attempt at cvise on the full .ii was not fruitful as I mocked that up using if (Encoding != Closes) exit(42) and cvise helpfully trimmed it down to Encoding != Closes; exit(42) - that is, it always failed. I started a second attempt where I split out the offending BaseHttpMethod::Loop() into its own source file and then cvised that and that reduced it to: void BaseHttpMethod::Loop() { while (1) { Run(); Server = CreateServerState(URI(Queue->Uri)); Server->Open(); RequestState Req(this, Server.get()); switch (Server->RunHeaders(Req, Queue->Uri)); } } When downloading over https (i.e. Server->Open() makes OpenSSL calls), I get the error, without OpenSSL calls, I do not get the error, so it sounds like something messes up (register?) state there. This boils down to: isb // 0 "" 2 // basehttp-loop.cc:12: time(&Date); #NO_APP ldr x0, [sp, 16]//, %sfp // basehttp-loop.cc:11: : Server() { str d15, [sp, 48] // tmp332, MEM [(void *)&Req] // basehttp-loop.cc:12: time(&Date); bl time// // basehttp-loop.cc:13: assert(Encoding == Closes); ldr w0, [sp, 48]//, Req.Encoding cmp w0, 1 // Req.Encoding, bne .L56//, // /usr/include/aarch64-linux-gnu/bits/stdio2.h:118: return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); adrpx1, .LC3// tmp267, mov w0, 2 //, add x1, x1, :lo12:.LC3 //, tmp267, bl __printf_chk// .LEHE3: // basehttp-loop.cc:41: __asm__("isb"); #APP // 41 "basehttp-loop.cc" 1 isb Now per ARM, `D part of V8-V15 need to be saved`, but the only other mention of `d15` is ldr d15, [x0, #:lo12:.LC4] // tmp332, earlier in the function way before we call any of the other functions. The question I'm not sure is D part of V15 caller-saved or callee-saved?
[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #20 from Julian Andres Klode --- OK sorry folks, further debugging on that hunch on d15 from the minimized code led me to build libcrypto without assembler code and now apt works correctly; so my guess here really is that the hand-written OpenSSL asm code is wrong, and the gcc change really just makes it use the registers that they forgot to save and restore.
[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 Julian Andres Klode changed: What|Removed |Added See Also||https://github.com/openssl/ ||openssl/issues/26466 --- Comment #22 from Julian Andres Klode --- (In reply to Sam James from comment #21) > (In reply to Julian Andres Klode from comment #20) > > OK sorry folks, further debugging on that hunch on d15 from the minimized > > code led me to build libcrypto without assembler code and now apt works > > correctly; so my guess here really is that the hand-written OpenSSL asm code > > is wrong, and the gcc change really just makes it use the registers that > > they forgot to save and restore. > > No worries. Can you cross-link the bugs when you file one with OpenSSL? > Thanks.
[Bug c++/118537] New: [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 Bug ID: 118537 Summary: [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize] Product: gcc Version: 14.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: j...@jak-linux.org Target Milestone: --- I did not have time for a full debug and minimal reproducer yet (I had a visitor all day after the first 1-2 hours debugging it this morning :D), but maybe this already is useful to someone: APT in Debian recently received a bug report that it was failing to do https downloads on Graviton instances were failing. The struct RequestState has a member defined as enum {Chunked,Stream,Closes} Encoding = Closes; Running on Graviton, the constructor - inlined - resulted in the member initializing with 0 (Chunked) instead of 2 (Closes). Valgrind on Graviton and other arm64 platforms did not see the same issue. Debugging this on Ubuntu revealed a regression in a gcc-14 update, particular in one of the commits (14.2.0-8ubuntu1 was good, 14.2.0-9ubuntu1 with the following changes is bad): * Update to git 20241128 from the gcc-14 branch. - Fix PR target/117564 (PA), PR target/117525 (PA), PR target/117408 (ARM), PR target/117443 (PA), PR target/116999 (AArch64), PR target/117045 (AArch64), PR target/116629 (AArch64), PR target/116371 (AArch64), PR target/117304 (x86), PR target/116371 (AArch64), PR tree-optimization/117398, PR tree-optimization/117287, PR middle-end/117354, PR target/117296, PR ada/117328, PR ada/113868, PR ada/113036, PR c++/117158, PR c++/101463, PR c++/116634, PR fortran/115700, PR fortran/115700, PR fortran/115070, PR fortran/115348, PR libstdc++/117520, PR libstdc++/117406, PR target/117562 (x86), PR target/117744 (AVR), PR other/116603, PR testsuite/109360, PR target/117357 (x86), PR target/117659 (AVR), PR middle-end/116997, PR target/117418 (x86), PR target/117500 (AVR), PR ada/115917, PR analyzer/115724, PR analyzer/111475, PR fortran/117763, PR fortran/84869, PR fortran/116388, PR fortran/109345, PR fortran/104819, PR fortran/115700, PR fortran/115700, PR fortran/115070, PR fortran/115348, PR fortran/79685, PR fortran/84868, PR fortran/100273, PR fortran/116530, PR fortran/108889, PR modula2/116378, PR modula2/116181, PR modula2/115823, PR modula2/116048, PR modula2/115957, PR modula2/115804, PR modula2/115540, PR modula2/115536, PR modula2/114529, PR modula2/115057, PR modula2/115003, PR preprocessor/117118, PR modula2/115276. In the attached diff of the instructions - which have been amended with __asm__("isb sy"); around the call of the constructor to find it we see a notable difference: +movi v31.4s, #0x0 [...] +stur q31, [x2, #232] +stur q31, [x2, #248] These may be related (do q31 and v31 overlap?), certainly no vector instructions were used before which led me to specify optimize("no-tree-slp-vectorize") on the function the constructor is being called from; which worked around the issue. For sake of completeness, A workaround has been shipped in APT 2.9.23 that simply changes the enum order such that 0 is the correct default value: https://salsa.debian.org/apt-team/apt/-/commit/98ddccc
[Bug c++/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #1 from Julian Andres Klode --- Created attachment 60194 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60194&action=edit diff between the bad and good cc1plus versions Attaching the diff between the bad and good cc1plus versions. I only replaced cc1plus for testing purposes, further changes were not needed.
[Bug middle-end/118537] [14 Regression; 20241102 -> 20241128] wrong initialization of member variable on Graviton [related to tree-slp-vectorize]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #4 from Julian Andres Klode --- I only have a huge ton of code so far, I can't minimize it easily, as it requires actually getting to the point where the constructor is called in the loop of the https downloader which happens _after_ headers are read! That said, https://magenta.jak-linux.org/bin/b0752bf2386d09d013a352c3dd8f6004-test-data.tar.gz contains the -save-temps output with as minimized arguments as possible, and a single compilation unit. $ /usr/bin/c++ -v -save-temps -g -O2 -I/home/ubuntu/apt/obj-aarch64-linux-gnu/include methods/http.cc -o http -lapt-pkg /usr/lib/aarch64-linux-gnu/libseccomp.so /usr/lib/aarch64-linux-gnu/libssl.so /usr/lib/aarch64-linux-gnu/libsystemd.so /usr/lib/aarch64-linux-gnu/libcrypto.so Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-linux-gnu/14/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: aarch64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 14.2.0-12ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-14/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2,rust --prefix=/usr --with-gcc-major-version-only --program-suffix=-14 --program-prefix=aar ch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-ab i=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enab le-fix-cortex-a53-843419 --disable-werror --enable-offload-targets=nvptx-none=/build/gcc-14-5QJBTv/gcc-14-14.2.0/debian/tmp-nvptx/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-g nu --with-build-config=bootstrap-lto-lean --enable-link-serialization=4 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.2.0 (Ubuntu 14.2.0-12ubuntu1) COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-O2' '-I' '/home/ubuntu/apt/obj-aarch64-linux-gnu/include' '-o' 'http' '-foffload-options=-l_GCC_m' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'http-' /usr/libexec/gcc/aarch64-linux-gnu/14/cc1plus -E -quiet -v -I /home/ubuntu/apt/obj-aarch64-linux-gnu/include -imultiarch aarch64-linux-gnu -D_GNU_SOURCE methods/http.cc -mlittle-endian -mabi=lp64 -foffload-options=-l_GCC_m -g -fworking-directory -O2 -fpch-preprocess -D_FORTIFY_SOURC E=3 -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -o http-http.ii ignoring duplicate directory "/usr/include/aarch64-linux-gnu/c++/14" ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/14/include-fixed/aarch64-linux-gnu" ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/14/include-fixed" ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/14/../../../../aarch64-linux-gnu/include" #include "..." search starts here: #include <...> search starts here: /home/ubuntu/apt/obj-aarch64-linux-gnu/include /usr/include/c++/14 /usr/include/aarch64-linux-gnu/c++/14 /usr/include/c++/14/backward /usr/lib/gcc/aarch64-linux-gnu/14/include /usr/local/include /usr/include/aarch64-linux-gnu /usr/include End of search list. COLLECT_GCC_OPTIONS='-v' '-save-temps' '-g' '-O2' '-I' '/home/ubuntu/apt/obj-aarch64-linux-gnu/include' '-o' 'http' '-foffload-options=-l_GCC_m' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64' '-dumpdir' 'http-' /usr/libexec/gcc/aarch64-linux-gnu/14/cc1plus -fpreprocessed http-http.ii -D_FORTIFY_SOURCE=3 -quiet -dumpdir http- -dumpbase http.cc -dumpbase-ext .cc -mlittle-endian -mabi=lp64 -g -O2 -version -foffload-options=-l_GCC_m -fasynchronous-unwind-tables -fstack-protector-strong -Wformat -Wformat-security -fstack-clash-protection -o http-http.s GNU C++17 (Ubuntu 14.2.0-12ubuntu1) version 14.2.0 (aarch64-linux-gnu) compiled by GNU C version 14.2.0, GMP version 6.3.0, MPFR version 4
[Bug middle-end/118537] [14/15 Regression] (20241102 -> 20241128) wrong initialization of member variable on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118537 --- Comment #16 from Julian Andres Klode --- Created attachment 60200 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60200&action=edit Reduced to 55k The test case is now reduced to 55k bytes from 2.9M; attaching it if someone wants to look at it in the meantime, I'm afk now before I reach 1 am :D