Re: [PATCH 09/14 v2] lto: Add toplevel assembly heuristics

2025-09-07 Thread Andi Kleen
> Thanks both! For modules the Makefile needs to be adjusted to run final LTO before modpost etc. These were the respective hunks from the old patchkit (may need some tweaks) @@ -154,7 +154,7 @@ is-single-obj-m = $(and $(part-of-module),$(filter $@, $(obj-m)),y) # When a module consists of a si

Re: [PATCH 09/14 v2] lto: Add toplevel assembly heuristics

2025-09-04 Thread Andi Kleen
Sam James writes: > Michal Jires writes: > >> I did handle node->iterate_referring, but forgot cnode->callers. >> >> Only change are contents of the newly separated >> mark_symbol_referenced_from_asm > > Thanks, I'll try the new patch now. > > With the workaround I mentioned earlier, I managed t

Re: Do not auto-enable loop optimizations with autoFDO

2025-09-04 Thread Andi Kleen
Jan Hubicka writes: > With -O2 we automatically enable several loop optimizations with > -fprofile-use. > The rationale is that those optimizations at -O3 only mainly since they may > hurt performance or not pay back in code size when used blindly on all loops. > Profile feedback gives us data o

Re: [PATCH 09/14] lto: Add toplevel assembly heuristics

2025-09-03 Thread Andi Kleen
> There are 27 unique toplevel assembly in following files. > That is when building only vmlinux with default settings. > There are probably a few more. Try allyesconfig. The default config is quite small. -Andi

Re: [PATCH] Add default arch/tuning to shift-gf2p8affine test cases

2025-09-01 Thread Andi Kleen
On Fri, Aug 29, 2025 at 09:01:06AM -0700, Andi Kleen wrote: > From: Andi Kleen > > This makes them not fail during test suite runs with overriden arch or > tunings. Comitted as obvious now. -Andi

Re: [PATCH] [x86] Fix ICE due to wrong operand is passed to ix86_vgf2p8affine_shift_matrix.

2025-08-31 Thread Andi Kleen
liuhongt writes: > 1) Fix predicate of operands[3] in cond_ since only > const_vec_dup_operand is excepted for masked operations, and pass real > count to ix86_vgf2p8affine_shift_matrix. > > 2) Pass operands[2] instead of operands[1] to > gen_vgf2p8affineqb__mask which excepted the operand to shi

Re: [PATCH 09/14] lto: Add toplevel assembly heuristics

2025-08-29 Thread Andi Kleen
> > Can you point to that discussion? > > I'm not aware of a rejection of the new form in GCC 15, but in previous > discussions, their responses were: > * https://lore.kernel.org/all/87a64qo4th.ffs@tglx/ > * > https://lore.kernel.org/all/y3jj67tz9ta2a...@hirez.programming.kicks-ass.net/ > * > ht

[PATCH] Add default arch/tuning to shift-gf2p8affine test cases

2025-08-29 Thread Andi Kleen
From: Andi Kleen This makes them not fail during test suite runs with overriden arch or tunings. gcc/testsuite/ChangeLog: * gcc.target/i386/shift-gf2p8affine-1.c: Use -march=x86-64 -mtune-generic. * gcc.target/i386/shift-gf2p8affine-2.c: Dito. * gcc.target

Re: [r16-3364 Regression] FAIL: gcc.target/i386/shift-gf2p8affine-7.c scan-assembler-times vgf2p8affineqb 53 on Linux/x86_64

2025-08-29 Thread Andi Kleen
On Fri, Aug 29, 2025 at 05:19:18AM -0700, H.J. Lu wrote: > On Thu, Aug 28, 2025 at 10:22 PM Andi Kleen wrote: > > > > > > This patch should fix it. Please confirm. > > > > > > diff --git a/gcc/testsuite/gcc.target/i386/shift-gf2p8affine-1.c > > b/gcc/

Re: [r16-3364 Regression] FAIL: gcc.target/i386/shift-gf2p8affine-7.c scan-assembler-times vgf2p8affineqb 53 on Linux/x86_64

2025-08-28 Thread Andi Kleen
This patch should fix it. Please confirm. diff --git a/gcc/testsuite/gcc.target/i386/shift-gf2p8affine-1.c b/gcc/testsuite/gcc.target/i386/shift-gf2p8affine-1.c index e5be3a35538..cb576eb4498 100644 --- a/gcc/testsuite/gcc.target/i386/shift-gf2p8affine-1.c +++ b/gcc/testsuite/gcc.target/i386/s

Re: [PATCH 09/14] lto: Add toplevel assembly heuristics

2025-08-28 Thread Andi Kleen
Jakub Jelinek writes: > On Wed, Aug 27, 2025 at 03:52:11PM +0200, Michal Jires wrote: >> This new pass heuristically detects symbols referenced by toplevel >> assembly to prevent their optimization. >> >> Heuristics is done by comparing identifiers in assembly to known >> symbols. >> >> The pas

Re: FW: [r16-3364 Regression] FAIL: gcc.target/i386/shift-gf2p8affine-7.c scan-assembler-times vgf2p8affineqb 53 on Linux/x86_64

2025-08-28 Thread Andi Kleen
> > > > with GCC configured with > > > > ../../gcc/configure > > --prefix=/export/users3/haochenj/src/gcc-bisect/master/master/r16-3364/usr > > --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld > > --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet > > --without-isl

Re: FW: [r16-3364 Regression] FAIL: gcc.target/i386/shift-gf2p8affine-7.c scan-assembler-times vgf2p8affineqb 53 on Linux/x86_64

2025-08-28 Thread Andi Kleen
On Wed, Aug 27, 2025 at 02:11:44AM +, Jiang, Haochen wrote: > On Linux/x86_64, > > 001cd39749f94ece8276b63f91eb864babb81a5d is the first bad commit > commit 001cd39749f94ece8276b63f91eb864babb81a5d > Author: Andi Kleen > Date: Sun Aug 3 17:35:39 2025 -0700 > &

Re: [PATCH 00/14] lto: Linux LTO toplevel assembly

2025-08-28 Thread Andi Kleen
Michal Jires writes: > These patches allow us to handle toplevel assembly referencing symbols. > Previous linux kernel patches needed to mark any such referenced symbols > manually. Currently needed linux patches are here: > https://gitlab.com/mixal_iirec/linux_gcc_lto_patches > > Thanks for all

[PATCH] Fix an ICE with recent GFNI changes

2025-08-25 Thread Andi Kleen
From: Andi Kleen Make the expand pattern for operand 1 match the final instruction. PR 121658 gcc/ChangeLog: * config/i386/sse.md ("3"): Use register_operand for rotate patterns. gcc/testsuite/ChangeLog: * gcc.target/i386/pr121658.c: New test. ---

[PATCH v4] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-24 Thread Andi Kleen
From: Andi Kleen [v4 version: Exclude for >> 7. Add test cases for 256/128bit and improve tests. Remove some AVX512F checks. Fix mode iterator.] [v3 version: Remove unnecessary _mask pattern. Add extra FAIL case. Remove unnecessary AVX512F check. Fix changelog.] [v2 version: Split

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-23 Thread Andi Kleen
> > I think for a 512-bit vector, vgf2p8affineqb is better than the > original codegen, but for a 128/256-bit vector, shouldn't vpcmpgtb be > better than vgf2p8affineqb? Yes it's better, but I don't see it in the loop bodies for any of my test cases, only in prologues/epilogues. Okay probably t

[PATCH v3] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-22 Thread Andi Kleen
From: Andi Kleen [v3 version: Remove unnecessary _mask pattern. Add extra FAIL case. Remove unnecessary AVX512F check. Fix changelog.] [v2 version: Split rotate patterns in V16QI and V32/64QI. Add various AVX512F checks. Remove some unnecessary masks. Add untested cond_ pattern (untested

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-22 Thread Andi Kleen
> > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2])) > I don't think we need AVX512F here, and let's exclude >>7 cases here, > so better be. > else if (TARGET_GFNI > && CONST_INT_P (operands[2]) > /* It's just vpcmpgtb against 0. */ > && !

[PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-20 Thread Andi Kleen
From: Andi Kleen [v2 version: Split rotate patterns in V16QI and V32/64QI. Add various AVX512F checks. Remove some unnecessary masks. Add untested cond_ pattern (untested, couldn't trigger it) Clean up some control flow. Use narrower modes. Avoid need for weakening predicate check in expand

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Andi Kleen
> > It might be reasonable to tweak the costs per CPU however, I haven't > > done that. > > > > BTW for rotate the wins are much higher because there are no native > > instructions for it. > For ashl/lshr, the original implementation only takes 2 > instructions(vpsllw/vpsrlw + vpand), and for ashr

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Andi Kleen
> > The latter takes 5 cycles, the former takes 3 cycles. It's pipelined however. > > Do you have any microbenchmark or real workloads to show your > optimization is better? Keep in mind it only uses one port vs two. Yes I ran it on Arrow lake and saw wins on both Pcore and Ecore according to

[PING] [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-11 Thread Andi Kleen
Andi Kleen writes: I wanted to ping https://gcc.gnu.org/pipermail/gcc-patches/2025-August/691624.html > From: Andi Kleen > > The GFNI AVX gf2p8affineqb instruction can be used to implement > vectorized byte shifts or rotates. This patch uses them to implement > shift and rot

[PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-04 Thread Andi Kleen
From: Andi Kleen The GFNI AVX gf2p8affineqb instruction can be used to implement vectorized byte shifts or rotates. This patch uses them to implement shift and rotate patterns to allow the vectorizer to use them. Previously AVX couldn't do rotates (except with XOP) and had to handle 8 bit s

Re: [PATCH] x86: Don't hoist non all 0s/1s vector set outside of loop

2025-08-02 Thread Andi Kleen
"H.J. Lu" writes: > Don't hoist non all 0s/1s vector set outside of the loop to avoid extra > spills. It seems this could be a loss if there are actually enough registers. So you need to make it depend on the register pressure? -Andi

Re: [PATCH] testsuite: Disable musttail tests if target uses SJLJ exceptions

2025-07-11 Thread Andi Kleen
Dimitar Dimitrov writes: > A few tests started failing recently on pru-unknown-elf because it uses > SJLJ implementation for exceptions: > FAIL: g++.dg/ext/musttail3.C -std=c++11 (test for excess errors) > .../gcc/gcc/testsuite/g++.dg/ext/musttail3.C:12:34: error: cannot > tail-call: caller

Re: make autprofiledbootstrap with LTO meaningful

2025-07-11 Thread Andi Kleen
On Fri, Jul 11, 2025 at 12:14:46PM +0200, Jan Hubicka wrote: > Hello, > currently autoprofiled bootstrap produces auto-profiles for cc1 and > cc1plus binaries. Those are used to build respective frontend files. > For backend cc1plus.fda is used. This does not work well with LTO > bootstrap where

Re: [committed] i386: Introduce crc_revsi4 expanders [PR120719]

2025-06-27 Thread Andi Kleen
On Fri, Jun 27, 2025 at 08:11:29AM +0200, Uros Bizjak wrote: > On Fri, Jun 27, 2025 at 7:27 AM Andi Kleen wrote: > > > > Uros Bizjak writes: > > > > > Introduce crc_revsi4 expanders to generate CRC32 instruction when > > > using > > > __

Re: [committed] i386: Introduce crc_revsi4 expanders [PR120719]

2025-06-26 Thread Andi Kleen
Uros Bizjak writes: > Introduce crc_revsi4 expanders to generate CRC32 instruction when using > __builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32. > > PR target/120719 > > gcc/ChangeLog: > > * config/i386/i386.md (crc_revsi4): New expander. > > gcc/testsuite/Change

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-06-06 Thread Andi Kleen
On 2025-06-06 12:42, Jan Hubicka wrote: Hi, also after fixing this issue my bootstrap failes with: Permission error mapping pages. Consider increasing /proc/sys/kernel/perf_event_mlock_kb, or try again with a smaller value of -m/--mmap_pages. (current value: 4294967295,0) Permission error mappin

Re: [AUTOFDO][AARCH64] Add support for profilebootstrap

2025-05-15 Thread Andi Kleen
On Wed, May 14, 2025 at 02:46:15AM +, Kugan Vivekanandarajah wrote: > Adding Eugene and Andi to CC as Sam suggested. > > > On 13 May 2025, at 12:57 am, Richard Sandiford > wrote: > > > > External email: Use caution opening links or attachments > > > > > > Kugan Vivekanandarajah writes: > >>

Re: [PATCH 2/3] x86: Add a pass to fold tail call

2025-05-06 Thread Andi Kleen
On 2025-05-06 09:48, H.J. Lu wrote: On Mon, May 5, 2025 at 9:56 PM Andi Kleen wrote: On Mon, May 05, 2025 at 06:20:40AM -0700, Andi Kleen wrote: > > If the branch edge destination is a basic block with only a direct > > sibcall, change the jcc target to the sibcall target, d

Re: [PATCH 2/3] x86: Add a pass to fold tail call

2025-05-05 Thread Andi Kleen
On Mon, May 05, 2025 at 06:20:40AM -0700, Andi Kleen wrote: > > If the branch edge destination is a basic block with only a direct > > sibcall, change the jcc target to the sibcall target, decrement the > > destination basic block entry label use count and redirect the edge >

Re: [PATCH 2/3] x86: Add a pass to fold tail call

2025-05-05 Thread Andi Kleen
> If the branch edge destination is a basic block with only a direct > sibcall, change the jcc target to the sibcall target, decrement the > destination basic block entry label use count and redirect the edge > to the exit basic block. Call delete_unreachable_blocks to delete > the unreachable bas

[PATCH] Add diffsummary.py to contrib

2025-04-29 Thread Andi Kleen
This adds an automatic downloader for the latest test results from the mailing list archive and supports diffing test_summary to it. Useful if you don't want to run your own baseline. contrib/ChangeLog: * diffsummary.py: New file. --- contrib/diffsummary.py | 104

Re: [PATCH] Add a bootstrap-native build config

2025-04-25 Thread Andi Kleen
On 2025-04-23 10:18, Richard Biener wrote: On Tue, Apr 22, 2025 at 5:43 PM Andi Kleen wrote: On 2025-04-22 13:22, Richard Biener wrote: > On Sat, Apr 12, 2025 at 5:09 PM Andi Kleen wrote: >> >> From: Andi Kleen >> >> ... that uses -march=native -mtune=native to bu

Re: [PATCH] asf: Enable pass at O2 or higher

2025-04-22 Thread Andi Kleen
On Wed, Jan 29, 2025 at 10:33:14AM +0100, Christoph Müllner wrote: > The avoid-store-forwarding pass is disabled by default and therefore > in the risk of bit-rotting. This patch addresses this by enabling > the pass at O2 or higher. > > The assembly patterns in `bitfield-bitint-abi-align16.c` an

Re: [PHASE1 PATCH] Use optimize free lists for alloc_pages

2025-04-22 Thread Andi Kleen
On Tue, Apr 22, 2025 at 01:27:34PM +0200, Richard Biener wrote: > I assume this passed bootstrap & regtest? Yes it did > > This is OK for trunk after we've released GCC 15.1. Thanks. Andi

[PATCH] Add a bootstrap-native build config

2025-04-12 Thread Andi Kleen
From: Andi Kleen ... that uses -march=native -mtune=native to build a compiler optimized for the host. config/ChangeLog: * bootstrap-native.mk: New file. gcc/ChangeLog: * doc/install.texi: Document bootstrap-native. --- config/bootstrap-native.mk | 1 + gcc/doc/install.texi

[PATCH] Add diffsummary.py to contrib

2025-04-11 Thread Andi Kleen
This adds an automatic downloader for the latest test results from the mailing list archive and supports diffing test_summary to it. Useful if you don't want to run your own baseline. contrib/ChangeLog: * diffsummary.py: New file. --- contrib/diffsummary.py | 104

[PHASE1 PATCH] Use optimize free lists for alloc_pages

2025-04-11 Thread Andi Kleen
Right now ggc has a single free list for multiple sizes. In some cases the list can get mixed by orders and then the allocator may spend a lot of time walking the free list to find the right sizes. This patch splits the free list into multiple free lists by order which allows O(1) access in most c

[PATCH v3] Don't instrument exit edges after musttail

2025-04-05 Thread Andi Kleen
When -fprofile-generate is used musttail often fails because the compiler adds instrumentation after the tail calls. This patch prevents adding exit extra edges after musttail because for a tail call the execution leaves the function and can never come back even on a unwind or exception. This is

[PATCH] PR119482: Avoid mispredictions in bitmap_set_bit

2025-04-01 Thread Andi Kleen
From: Andi Kleen This isn't a regression, but it's a very simple patch with high performance improvement, so perhaps suitable in the current stage. --- bitmap_set_bit checks the original value of the bit to return it to the caller and then only writes the new value back if it cha

Re: Patch ping [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]

2025-04-01 Thread Andi Kleen
> I'd like to ping the > https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679182.html > patch. > I know it is quite controversial and if clang wouldn't be the first > to implement this I'd certainly not go that way; I am willing to change > the warning option names or move the maybe one from -W

Re: [PATCH] testsuite: Fix up musttail2.C test

2025-03-26 Thread Andi Kleen
> You're right (although I don't remember which targets are > non-external_musttail). Several flavors of ARM and Power at least.

Re: [PATCH] tailc: Only diagnose musttail failures during tailc or musttail passes [PR119376]

2025-03-26 Thread Andi Kleen
Jakub Jelinek writes: > --- gcc/testsuite/g++.dg/opt/musttail2.C.jj 2025-03-24 13:27:44.329204196 > +0100 > +++ gcc/testsuite/g++.dg/opt/musttail2.C 2025-03-24 13:28:08.975867389 > +0100 > @@ -0,0 +1,14 @@ > +// PR ipa/119376 > +// { dg-do compile { target musttail } } I think this need

Re: [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]

2025-03-25 Thread Andi Kleen
> This can be rewritten as > > void foo(int v) > { > { > int a; > capture(&a); > if (condition) > goto tail_position; > // do something with a > } > tail_position: > tailcall(v); > } > > or with 'do { ... if (...) break; ...} while (0)' when one prefers that to > goto

Re: [PATCH] tailc: Don't fail musttail calls if they use or could use local arguments, instead warn [PR119376]

2025-03-25 Thread Andi Kleen
On Tue, Mar 25, 2025 at 07:43:28PM +0300, Alexander Monakov wrote: > Hello, > > FWIW I think Clang made a mistake in bending semantics in a way that is > clearly > misaligned with the general design of C and C++, where a language-native, so > to > speak, solution was available: introduce a scope

Re: [PATCH v3] Don't instrument exit edges after musttail

2025-03-25 Thread Andi Kleen
> 2025-03-25 Jakub Jelinek > Andi Kleen > > PR gcov-profile/118442 > * profile.cc (branch_prob): Ignore EDGE_FAKE edges from musttail calls > to EXIT. > > * c-c++-common/pr118442.c: New test. > > --- gcc/profile.cc.jj 2025-

[PATCH] PR118442: Don't instrument exit edges after musttail

2025-03-22 Thread Andi Kleen
From: Andi Kleen When -fprofile-generate is used musttail often fails because the compiler adds instrumentation after the tail calls. This patch prevents adding exit extra edges after musttail because for a tail call the execution leaves the function and can never come back even on a unwind or

Re: [PATCH v2 2/2] PR119376: Disable clang musttail

2025-03-20 Thread Andi Kleen
On Thu, Mar 20, 2025 at 06:25:26PM +0100, Jakub Jelinek wrote: > On Thu, Mar 20, 2025 at 10:01:02AM -0700, Andi Kleen wrote: > > So it could be as simple as that patch? It solves your test case at least > > for x86. > > Not sure I like this, but if others (e.g. Richi, Josep

Re: [PATCH v2 2/2] PR119376: Disable clang musttail

2025-03-20 Thread Andi Kleen
On Thu, Mar 20, 2025 at 05:28:48PM +0100, Jakub Jelinek wrote: > On Thu, Mar 20, 2025 at 09:19:02AM -0700, Andi Kleen wrote: > > The inlining was just one of the issue, there are some related to > > different semantics of escaped locals. gcc always errors out while > > LLVM

Re: [PATCH v2 2/2] PR119376: Disable clang musttail

2025-03-20 Thread Andi Kleen
On Thu, Mar 20, 2025 at 11:45:33AM -0400, Jason Merrill wrote: > On 3/19/25 9:31 PM, Andi Kleen wrote: > > From: Andi Kleen > > > > There are multiple reports (see PR 119376) now where semantic differences > > in the gcc musttail implementation break existing programs

[PATCH v2 2/2] PR119376: Disable clang musttail

2025-03-19 Thread Andi Kleen
From: Andi Kleen There are multiple reports (see PR 119376) now where semantic differences in the gcc musttail implementation break existing programs written for the clang variant. Even though that can be all hopefully fixed eventually, for the gcc 15 release it seems safer to disable clang

[PATCH v2 1/2] PR118442: Don't instrument exit edges after musttail

2025-03-19 Thread Andi Kleen
From: Andi Kleen When -fprofile-generate is used musttail often fails because the compiler adds instrumentation after the tail calls. This patch prevents adding exit extra edges after musttail because for a tail call the execution leaves the function and can never come back even on a unwind or

Re: [PATCH] PR118442: Don't instrument exit edges after musttail

2025-03-19 Thread Andi Kleen
> This looks wrong to me. Even tail calls can be terminated with exit, > perform longjmp, do other things for which stmt_can_terminate_bb_p > should return true. stmt_can_terminate_bb_p is used in many places, not > just in the predict instrumentation. Okay so the check should be only used for s

Re: [PATCH] PR118442: Don't instrument exit edges after musttail

2025-03-19 Thread Andi Kleen
Andi Kleen writes: > diff --git a/gcc/input.cc b/gcc/input.cc > index fabfbfb6eaa..d3b12037ba8 100644 > --- a/gcc/input.cc > +++ b/gcc/input.cc > @@ -1325,6 +1325,8 @@ dump_line_table_statistics (void) >if (s.num_expanded_macros != 0) > fprintf (stderr, "Av

Re: The COBOL front end, version 3, now in 14 easy pieces (+NIST)

2025-02-24 Thread Andi Kleen
"James K. Lowden" writes: >> Having a minimal harness in GCCs testsuite is critical - I'd expect a >> gcc/testsuite/gcobol.dg/dg.exp supporting execution tests. I assume >> Cobol has a way to exit OK or fatally and this should be >> distinguished as testsuite PASS or FAIL. > > Yes, a COBOL pro

[COMMITTED PATCH] Fix description of file-cache-lines/file-cache-files params

2025-02-18 Thread Andi Kleen
From: Andi Kleen The file-cache-lines / file-cache-files tunables were documented in the wrong section. Fix that. Reported-by: Filip Kastl Comitted as obvious. gcc/ChangeLog: * doc/invoke.texi: --- gcc/doc/invoke.texi | 20 ++-- 1 file changed, 10 insertions(+), 10

[COMITTED] Fix file cache tunables documentation

2025-02-04 Thread Andi Kleen
From: Andi Kleen Document new params in invoke.texi. The auto tuning description was on the wrong tunable, move to lines. Comitted as obvious. gcc/ChangeLog: * doc/invoke.texi: Document file cache tunables. * params.opt: Move auto tuning description to lines. --- gcc/doc

Re: [PATCH v2 6/7] Enable vectorization for input.cc find_end_of_line function

2025-02-02 Thread Andi Kleen
On Tue, Jan 28, 2025 at 09:50:41AM +0100, Richard Biener wrote: > On Mon, Jan 27, 2025 at 9:59 PM David Malcolm wrote: > > > > On Sat, 2025-01-25 at 23:31 -0800, Andi Kleen wrote: > > > From: Andi Kleen > > > > > > This is the hot function in input.cc &

Re: [PATCH v2 4/7] Add a cache of recent lines

2025-02-02 Thread Andi Kleen
> > If I reading this right, calls to get_next_line lead to insertions into > the ring buffer whilst the buffer is empty or the last line in the ring > buffer cache is m_line_num - 1. > > There are a few places where we update m_line_num, but this caching > code doesn't seem to touch those places

Re: [PATCH v2 7/7] Add a unit test for random access in the file cache

2025-02-02 Thread Andi Kleen
On Sun, Feb 02, 2025 at 09:35:52PM -0800, Andi Kleen wrote: > > Patch 7 is OK otherwise, and I'm taking a look at the rest of the > > patches now; thanks. > > Any comments on the other patches? nm. I see you already commented. somehow i missed that. -Andi

Re: [PATCH v2 7/7] Add a unit test for random access in the file cache

2025-02-02 Thread Andi Kleen
> Patch 7 is OK otherwise, and I'm taking a look at the rest of the > patches now; thanks. Any comments on the other patches? Thanks, -Andi

[PATCH v2 5/7] Size input line cache based on file size

2025-01-25 Thread Andi Kleen
From: Andi Kleen While the input line cache size now tunable it's better if the compiler auto tunes it. Otherwise large files needing random file access will still have to search many lines to find the right lines. Add support for allocating one line anchor per hundred input lines. This

PR118168: Updated fix

2025-01-25 Thread Andi Kleen
This is a fix for slowness accessing random lines in the source file for diagnostics. This version I added a unit test as requested by David, and also added a x86 vectorization hint for the hot line search function (with the early break work the vectorizer is powerful enough to handle it now) If

[PATCH v2 7/7] Add a unit test for random access in the file cache

2025-01-25 Thread Andi Kleen
From: Andi Kleen gcc/ChangeLog: * input.cc (check_line): New. (test_replacement): New function to test line caching. (input_cc_tests): Call test_replacement --- gcc/input.cc | 46 ++ 1 file changed, 46 insertions(+) diff

[PATCH v2 6/7] Enable vectorization for input.cc find_end_of_line function

2025-01-25 Thread Andi Kleen
From: Andi Kleen This is the hot function in input.cc The vectorizer can vectorize it now, but in a generic cpu O2 x86 build it isn't. Add a automatic target clone to handle it for x86 and build that function with O3. The ifdef here is ugly, perhaps gcc should have a more convenient "

[PATCH v2 1/7] Add tunables for input buffer

2025-01-25 Thread Andi Kleen
From: Andi Kleen The input machinery to read the source code independent of the lexer has a range of hard coded maximum array sizes that can impact performance. Make them tunable. input.cc is part of libcommon so it cannot direct access params without a level of indirection. gcc/ChangeLog

[PATCH v2 2/7] Rebalance file_cache input line cache dynamically

2025-01-25 Thread Andi Kleen
From: Andi Kleen The input context file_cache maintains an array of anchors to speed up accessing lines before the previous line. The array has a fixed upper size and the algorithm relies on the linemap reporting the maximum number of lines in the file in advance to compute the position of each

[PATCH v2 3/7] Remove m_total_lines support from input cache

2025-01-25 Thread Andi Kleen
From: Andi Kleen With the new cache maintenance algorithm we don't need the maximum number of lines anymore. Remove all the code for that. gcc/ChangeLog: PR preprocessor/118168 * input.cc (total_lines_num): Remove. (file_cache_slot::evict):

[PATCH v2 4/7] Add a cache of recent lines

2025-01-25 Thread Andi Kleen
From: Andi Kleen For larger files the file_cache line index will be spread out to make the index fit into the fixed buffer, so any access to the non latest line will need some skipping of lines. Most accesses for line are near the latest line because a diagnostic is likely near where the

[PATCH] Describe inline assembler parsing

2025-01-18 Thread Andi Kleen
From: Andi Kleen Correct the description of inline assembler to say that gcc does limited assembler parsing to estimate the length of inline assembler statements, and document that certain assembler primitives can confuse it. gcc/ChangeLog: * doc/extend.texi: Document assembler parsing

[COMMITTED] Fix an incorrect file header comment for the core2 scheduling model

2025-01-15 Thread Andi Kleen
From: Andi Kleen Committed as obvious. gcc/ChangeLog: * config/i386/x86-tune-sched-core.cc: Fix incorrect comment. --- gcc/config/i386/x86-tune-sched-core.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/x86-tune-sched-core.cc b/gcc/config/i386

Re: [PATCH] docs: Fix up inline asm documentation

2025-01-15 Thread Andi Kleen
On Wed, Jan 15, 2025 at 10:41:11PM +0100, Jakub Jelinek wrote: > Hi! > > When writing the gcc-15/changes.html patch posted earlier, I've been > wondering where significant part of the Basic asm chapter went and the > problem was the insertion of a new @node in the middle of the Basic Asm > @node,

Re: [PING] [PATCH 1/6] Add tunables for input buffer

2025-01-08 Thread Andi Kleen
On Wed, Jan 08, 2025 at 07:47:27PM -0500, David Malcolm wrote: > On Wed, 2025-01-08 at 07:48 -0800, Andi Kleen wrote: > > > > I wanted to ping this patch series. Thanks. > > > > -Andi > > > > Thanks for tha patches, and sorry about not getting back

[PING] [PATCH 1/6] Add tunables for input buffer

2025-01-08 Thread Andi Kleen
I wanted to ping this patch series. Thanks. -Andi

Re: [PATCH] c++: Fix up ICEs on constexpr inline asm strings in templates [PR118277]

2025-01-07 Thread Andi Kleen
On Tue, Jan 07, 2025 at 08:36:29PM +0100, Jakub Jelinek wrote: > Hi! > > The following patch fixes ICEs when the new inline asm syntax > to use C++26 static_assert-like constant expressions in place > of string literals is used in templates. > As finish_asm_stmt doesn't do any checking for > proce

Re: [PATCH] tree-switch-conversion: don't apply switch size limit on jump tables

2025-01-05 Thread Andi Kleen
Mark Wielaard writes: > commit 56946c801a7c ("gimple: Add limit after which slower switchlower > algs are used [PR117091] [PR117352]") introduced a limit on the number > of cases of a switch. It also bails out on finding jump tables if the > switch is too large. This introduces a compile time reg

[PATCH 6/6] Size input line cache based on file size

2024-12-26 Thread Andi Kleen
From: Andi Kleen While the input line cache size now tunable it's better if the compiler auto tunes it. Otherwise large files needing random file access will still have to search many lines to find the right lines. Add support for allocating one line anchor per hundred input lines. This

[PATCH 3/6] Remove m_total_lines support from input cache

2024-12-26 Thread Andi Kleen
From: Andi Kleen With the new cache maintenance algorithm we don't need the maximum number of lines anymore. Remove all the code for that. gcc/ChangeLog: PR preprocessor/118168 * input.cc (total_lines_num): Remove. (file_cache_slot::evict):

[PATCH 2/6] Rebalance file_cache input line cache dynamically

2024-12-26 Thread Andi Kleen
From: Andi Kleen The input context file_cache maintains an array of anchors to speed up accessing lines before the previous line. The array has a fixed upper size and the algorithm relies on the linemap reporting the maximum number of lines in the file in advance to compute the position of each

[PATCH 5/6] Add a cache of recent lines

2024-12-26 Thread Andi Kleen
From: Andi Kleen For larger files the file_cache line index will be spread out to make the index fit into the fixed buffer, so any access to the non latest line will need some skipping of lines. Most accesses for line are near the latest line because a diagnostic is likely near where the

Fix file_cache for large files

2024-12-26 Thread Andi Kleen
This patch kit fixes scaling issues for the input cache, especially for C, motivated by PR118168. In overall in number of lines it is practically neutral: gcc/input.cc | 261 -- gcc/inp

[PATCH 4/6] Move ferror out of hot loop of file cache

2024-12-26 Thread Andi Kleen
From: Andi Kleen glibc ferror is surprisingly expensive. Move it out of the hot loop of finding lines by setting a flag after the actual IO operations. gcc/ChangeLog: PR preprocessor/118168 * input.cc (file_cache_slot::m_error): New field. (file_cache_slot::create

[PATCH 1/6] Add tunables for input buffer

2024-12-26 Thread Andi Kleen
From: Andi Kleen The input machinery to read the source code independent of the lexer has a range of hard coded maximum array sizes that can impact performance. Make them tunable. input.cc is part of libcommon so it cannot direct access params without a level of indirection. gcc/ChangeLog

Re: The COBOL front end, in 8 notes

2024-12-13 Thread Andi Kleen
"James K. Lowden" writes: > The following 8 patches constitute the 80 files needed to build and > document the COBOL front end. They assume that following exist: > > gcc/cobol/ChangeLog > libgcobol/ChangeLog > > The messages are grouped by files in a more or less logical order, > but gro

Re: [PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-12-05 Thread Andi Kleen
> > What do you think, Andi and Richi? I myself slightly prefer keeping the DP > > but > > I would be fine with either option. > > I think we can keep both, though I have no strong opinion. Keeping both is fine for me. -Andi

Re: [PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-12-05 Thread Andi Kleen
> > But yeah, thinking about it some more, 1 seems like a lot. Maybe the > > limit > > could be 1000. That's also big enough. I could try to run the testcase > > set to > > 1000 on my not-so-powerful laptop this time and check that even on that > > machine > > it finishes "fast" (under a

Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-11-26 Thread Andi Kleen
On Tue, Nov 26, 2024 at 04:06:37PM -0800, Andrew Pinski wrote: > On Thu, Oct 31, 2024 at 1:41 PM Andi Kleen wrote: > > > > From: Andi Kleen > > > > autofdo looks up inline stacks and tries to match them with the profile > > data using their symbol name. Mak

Re: [PATCH] gimple: Add limit after which slower switchlower algs are used [PR117091] [PR117352]

2024-11-21 Thread Andi Kleen
On Fri, Nov 15, 2024 at 10:43:57AM +0100, Filip Kastl wrote: > Hi, > > Andi's greedy bit test finding algorithm was reverted. I found a fix for the > problem that caused the revert. I made this patch to reintroduce the greedy > alg into GCC. However I think we should keep the old slow but more

Re: [PATCH] Add a bootstrap-native build config

2024-11-06 Thread Andi Kleen
On Tue, Jul 30, 2024 at 09:40:42AM -0700, Andi Kleen wrote: > From: Andi Kleen > > ... that uses -march=native -mtune=native to build a compiler optimized > for the host. > > config/ChangeLog: > > * bootstrap-native.mk: New file. > > gcc/ChangeLog:

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-11-06 Thread Andi Kleen
On Fri, Nov 01, 2024 at 02:01:18PM -0400, John David Anglin wrote: > This breaks build on hppa64-hp-hpux11.11. This target has clock_gettime > but it doesn't have CLOCK_MONOTONIC. It has CLOCK_REALTIME. I modified > timevar.cc as follows to restore build. Alternative would be to check for CLOCK

Re: [PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-11-05 Thread Andi Kleen
On Tue, Nov 05, 2024 at 09:47:17AM +0100, Richard Biener wrote: > On Tue, Nov 5, 2024 at 2:02 AM Jason Merrill wrote: > > > > On 10/31/24 4:40 PM, Andi Kleen wrote: > > > From: Andi Kleen > > > > > > autofdo looks up inline stacks and tries to match th

[PATCH] Update gcc-auto-profile / gen_autofdo_event.py

2024-10-31 Thread Andi Kleen
From: Andi Kleen - Fix warnings with newer python versions about bad escapes by making all the python string raw. - Add a fallback for using the builtin perf event list if the CPU model number is unknown. - Regenerate the shipped gcc-auto-profile with the changes. contrib/ChangeLog

[PATCH] Enable autofdo bootstrap for lto/fortran

2024-10-31 Thread Andi Kleen
From: Andi Kleen When autofdo bootstrap support was originally implemented there were issues with the LTO bootstrap, that is why it wasn't enabled for them. I retested this now and it works on x86_64-linux. Fortran was also missing, not sure why. Also enabled now. gcc/fortran/Chan

[PATCH] PR117350: Keep assembler name for abstract decls for autofdo

2024-10-31 Thread Andi Kleen
From: Andi Kleen autofdo looks up inline stacks and tries to match them with the profile data using their symbol name. Make sure all decls that can be in a inline stack have a valid assembler name. This fixes a bootstrap problem with autoprofiledbootstrap and LTO. 2024-10-30 Jason Merrill

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-31 Thread Andi Kleen
> I'm getting a build failure: > > timevar.cc:163: undefined reference to `clock_gettime' > > Our frozen build tools are intended to produce binaries that work > "everywhere", so they're a few years old, but apparently something didn't > configure correctly. > > I see that libbacktrace configure

Re: [PATCH v3] Remove sys/user time in -ftime-report

2024-10-30 Thread Andi Kleen
On Wed, Oct 23, 2024 at 02:56:51PM +0200, Richard Biener wrote: > On Wed, Oct 9, 2024 at 6:18 PM Andi Kleen wrote: > > > > From: Andi Kleen > > > > Retrieving sys/user time in timevars is quite expensive because it > > always needs a system call. Only getting

Re: [PATCH v3 1/2][RFC] Provide more contexts for -Warray-bounds, -Wstringop-* warning messages due to code movements from compiler transformation [PR109071]

2024-10-30 Thread Andi Kleen
Qing Zhao writes: > Control this with a new option -fdiagnostics-details. It would be useful to be also able to print the inline call stack, maybe with a separate option. In some array bounds cases I looked at the problem was hidden in some inlines and it wasn't trivial to figure it out. I wro

  1   2   3   4   5   6   7   8   9   >