Re: [PATCH] x86: Get the widest vector mode from MOVE_MAX

2025-06-20 Thread Uros Bizjak
On Thu, Jun 19, 2025 at 1:27 PM H.J. Lu wrote: > > Since MOVE_MAX defines the maximum number of bytes that an instruction > can move quickly between memory and registers, use it to get the widest > vector mode in vector loop when inlining memcpy and memset. > > gcc/ > > PR target/120708 > * config

Re: [PATCH] x86: Fix shrink wrap separate ICE under -fstack-clash-protection [PR120697]

2025-06-19 Thread Uros Bizjak
On Thu, Jun 19, 2025 at 9:37 AM Uros Bizjak wrote: > > On Wed, Jun 18, 2025 at 4:12 PM Cui, Lili wrote: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Wednesday, June 18, 2025 9:22 PM > > > To

Re: [PATCH v4] x86: Enable *mov_(and|or) only for -Oz

2025-06-19 Thread Uros Bizjak
On Thu, Jun 19, 2025 at 9:01 AM Hongtao Liu wrote: > > On Wed, Jun 18, 2025 at 6:38 PM H.J. Lu wrote: > > > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > > Author: Roger Sayle > > Date: Thu Dec 23 12:33:07 2021 + > > > > x86: PR target/103773: Fix wrong-code with -Oz from pop to

Re: [PATCH] x86: Fix shrink wrap separate ICE under -fstack-clash-protection [PR120697]

2025-06-19 Thread Uros Bizjak
On Wed, Jun 18, 2025 at 4:12 PM Cui, Lili wrote: > > > > > -Original Message- > > From: Uros Bizjak > > Sent: Wednesday, June 18, 2025 9:22 PM > > To: Cui, Lili > > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; > > hongjiu...@intel.com

Re: [PATCH] x86: Fix shrink wrap separate ICE under -fstack-clash-protection [PR120697]

2025-06-18 Thread Uros Bizjak
On Wed, Jun 18, 2025 at 3:11 PM Cui, Lili wrote: > > From: Lili Cui > > Hi Uros, > > An assertion I added in shrink wrap separate V2 reports ICE when > -fstack-clash-protection is enabled. The assertion should not be added here. > > I created a patch to remove 3 assertions and their associated c

Re: [PATCH V3] x86: Enable separate shrink wrapping

2025-06-17 Thread Uros Bizjak
On Tue, Jun 17, 2025 at 4:03 PM Cui, Lili wrote: > > From: Lili Cui > > Hi Uros, > > This is patch v3, the main changes are as follows. > > 1. Added a pro_epilogue_adjust_stack_add_nocc in i386.md to add memory > clobber for lea/mov. > 2. Adjusted some formatting issues. > 3. Added scan-rtl-dump

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-15 Thread Uros Bizjak
On Fri, Jun 13, 2025 at 3:15 PM Cui, Lili wrote: > > > On Mon, Apr 21, 2025 at 7:24 AM H.J. Lu wrote: > > > > > > > > On Sun, Apr 20, 2025 at 6:31 PM Jan Hubicka wrote: > > > > > > > > > > > PR target/102294 > > > > > > PR target/119596 > > > > > > * config/i386/x86-tune-costs

[committed] i386: Fix signed integer overflow in ix86_expand_int_movcc, part 2 [PR120604]

2025-06-12 Thread Uros Bizjak
Make sure we can represent the difference between two 64-bit DImode immediate values in 64-bit HOST_WIDE_INT and return false if this is not the case. ix86_expand_int_movcc is used in movcc expadner. Expander will FAIL when the function returns false and middle-end will retry expansion with value

Re: [PATCH V2] x86: Enable separate shrink wrapping

2025-06-12 Thread Uros Bizjak
On Thu, Jun 12, 2025 at 10:58 AM Uros Bizjak wrote: > > On Thu, Jun 12, 2025 at 9:26 AM Cui, Lili wrote: > > > > > > @@ -7753,8 +7762,12 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, > > > > rtx > > > offset, > > > > add_frame_

Re: [PATCH V2] x86: Enable separate shrink wrapping

2025-06-12 Thread Uros Bizjak
On Thu, Jun 12, 2025 at 9:26 AM Cui, Lili wrote: > > > > @@ -7753,8 +7762,12 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, > > > rtx > > offset, > > > add_frame_related_expr = true; > > > } > > > > > > + if (crtl->shrink_wrapped_separate) insn = emit_insn (gen_rtx_SET > > > + (d

[committed] i386: Fix signed integer overflow in ix86_expand_int_movcc [PR120604]

2025-06-11 Thread Uros Bizjak
Patch for PR120553 enabled full 64-bit DImode immediates in ix86_expand_int_movcc. However, the function calculates the difference between two immediate arguments using signed 64-bit HOST_WIDE_INT subtractions that can cause signed integer overflow. Avoid the overflow by casting operands of subtr

Re: [PATCH V2] x86: Enable separate shrink wrapping

2025-06-11 Thread Uros Bizjak
On Wed, Jun 11, 2025 at 5:33 AM Cui, Lili wrote: > > From: Lili Cui > > Hi Uros, > > Thank you very much for providing detailed BKM to reproduce Linux kernel boot > failure. My patch and Matz's patch have this problem. We inserted a SUB > between TEST and JLE, and the SUB changes the value of

Re: [PATCH] i386: Handle ZERO_EXTEND like SIGN_EXTEND in bsr patterns [PR120434]

2025-06-09 Thread Uros Bizjak
On Fri, Jun 6, 2025 at 3:43 PM Jakub Jelinek wrote: > > Hi! > > The just posted second PR120434 patch causes > +FAIL: gcc.target/i386/pr78103-3.c scan-assembler m(leaq|addq|incq)M > +FAIL: gcc.target/i386/pr78103-3.c scan-assembler-not mmovlM+ > +FAIL: gcc.target/i386/pr78103-3.c s

[PATCH] i386: Improve "movcc" expander for DImode immediates [PR120553]

2025-06-05 Thread Uros Bizjak
"movcc" expander uses x86_64_general_operand predicate that limits the range of immediate operands to 32-bit size. The usage of this predicate causes ifcvt to force out-of-range immediates to registers when converting through noce_try_cmove. The testcase: long long foo (long long c) { return c >

Re: [PATCH] i386: Fix vmovvdup's mem attribute

2025-06-04 Thread Uros Bizjak
On Thu, Jun 5, 2025 at 3:29 AM Hu, Lin1 wrote: > > Hi, > > Some vmovvdup pattern's type attribute is sselog1 and then mem attribute is > both. Modify type attribute according to other patterns about vmovvdup. > > Bootstrapped and regtested on x86_64-linux-pc-gnu, OK for trunk? OK. Thanks, Uros.

[PATCH] rtl-optimization: Invalid CSE of inline asm with memory clobber [PR111901]

2025-05-29 Thread Uros Bizjak
The following test: --cut here-- int test (void) { unsigned int sum = 0; for (int i = 0; i < 4; i++) { unsigned int val; asm ("magic %0" : "=r" (val) : : "memory"); sum += val; } return sum; } --cut here-- compiles on x86_64 with -O2 -funroll-all-loops to nonsen

Re: [PATCH] i386, v2: Extend *cmp_minus_1 optimizations also to plus with CONST_INT [PR120360]

2025-05-21 Thread Uros Bizjak
On Wed, May 21, 2025 at 1:20 PM Jakub Jelinek wrote: > > On Wed, May 21, 2025 at 11:48:34AM +0200, Uros Bizjak wrote: > > Please introduce "x86_64_neg_const_int_operand" predicate that will > > allow only const_int operands, and will reject negative endbr (and >

Re: [PATCH] i386: Extend *cmp_minus_1 optimizations also to plus with CONST_INT [PR120360]

2025-05-21 Thread Uros Bizjak
On Wed, May 21, 2025 at 9:44 AM Jakub Jelinek wrote: > > Hi! > > As mentioned by Linus, we can't optimize comparison of otherwise unused > result of plus with CONST_INT second operand, compared against zero. > This can be done using just cmp instruction with negated constant and say > js/jns/je/jn

Re: [PATCH] [testsuite] add missing require vect_early_break_hw for vect-tsvc

2025-05-19 Thread Uros Bizjak
LGTM for the whole series. Thanks, Uros. On Tue, May 20, 2025 at 6:17 AM Alexandre Oliva wrote: > > > Some tsvc tests add vect_early_break options without requiring the > feature to be available. Add the requirements. > > Regstrapped on x86_64-linux-gnu. Also tested with gcc-14 on aarch64-, >

Re: [PATCH] x86: Enable separate shrink wrapping

2025-05-13 Thread Uros Bizjak
On Tue, May 13, 2025 at 8:15 AM Cui, Lili wrote: > > From: Lili Cui > > Hi, > > This patch is to enale separate shrink wrapping for x86. > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > > Ok for trunk? Unfortunately, the patched compiler fails to boot the latest linux kernel. Uros. Uro

Re: [PATCH] x86: Enable separate shrink wrapping

2025-05-13 Thread Uros Bizjak
On Tue, May 13, 2025 at 8:15 AM Cui, Lili wrote: > > From: Lili Cui > > Hi, > > This patch is to enale separate shrink wrapping for x86. > > Bootstrapped & regtested on x86-64-pc-linux-gnu. > > Ok for trunk? > > > This commit implements the target macros (TARGET_SHRINK_WRAP_*) that > enable separ

Re: [PATCH] x86: Remove df_insn_rescan after emit_insn_*

2025-05-11 Thread Uros Bizjak
On Mon, May 12, 2025 at 8:19 AM H.J. Lu wrote: > > Since df_insn_rescan has been called by emit_insn_*, there is no need > to call it after calling emit_insn_*. Remove its unnecessary usages. > > PR target/120228 > * config/i386/i386-features.cc (ix86_place_single_vector_set): > Remove df_insn_re

[pushed]: i386: Do not use explicit operands for MOVS instructions [PR120019]

2025-05-05 Thread Uros Bizjak
Some assemblers do not support MOVS instructions with explicit operands. Emit instruction with implicit operands, but prefix the instruction with a segment override prefix if the memory operand refers to ADDR_SPACE_SEG_FS or ADDR_SPACE_SEG_GS named address space. PR target/120019 gcc/ChangeLo

Re: [PATCH v4] libstdc++: Implement C++26 features (P2546R5)

2025-05-05 Thread Uros Bizjak
On Thu, May 1, 2025 at 12:59 PM Jonathan Wakely wrote: > > This includes the P2810R4 (is_debugger_present is_replaceable) changes, > allowing std::is_debugger_present to be replaced by the program. > > It would be good to provide a macOS definition of is_debugger_present as > per https://developer

Re: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-05-02 Thread Uros Bizjak
On Fri, May 2, 2025 at 2:33 AM H.J. Lu wrote: > > On Wed, Apr 30, 2025 at 7:40 PM Uros Bizjak wrote: > > > > On Tue, Apr 29, 2025 at 12:22 PM H.J. Lu wrote: > > > > > > On Tue, Apr 29, 2025 at 5:30 PM Uros Bizjak wrote: > > > > > &

Re: [PATCH RFA] i386: -Wabi false positive with indirect call

2025-05-02 Thread Uros Bizjak
On Thu, May 1, 2025 at 10:46 PM Jason Merrill wrote: > > Tested x86_64-pc-linux-gnu, OK for trunk? > > -- 8< -- > > This warning relies on the TRANSLATION_UNIT_WARN_EMPTY_P flag (set in > cxx_init_decl_processing) to decide whether we want to warn about the GCC 8 > empty class parameter passing fi

Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread Uros Bizjak
On Thu, May 1, 2025 at 1:21 PM Richard Sandiford wrote: > > Uros Bizjak writes: > > On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu wrote: > >> > >> On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak wrote: > >> > > >> > On Tue, Apr 29, 2025 at 11:40 PM

Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-05-01 Thread Uros Bizjak
On Thu, May 1, 2025 at 9:10 AM H.J. Lu wrote: > > On Thu, May 1, 2025 at 2:56 PM Uros Bizjak wrote: > > > > On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu wrote: > > > > > > On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak wrote: > > > > > &g

Re: [PATCH] x86: Update TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P

2025-05-01 Thread Uros Bizjak
On Thu, May 1, 2025 at 12:49 AM H.J. Lu wrote: > > On Wed, Apr 30, 2025 at 7:48 PM Uros Bizjak wrote: > > > > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu wrote: > > > > > > SMALL_REGISTER_CLASSES was added by > > > > > > commit c98f874233

Re: [PATCH] x86: Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-05-01 Thread Uros Bizjak
On Wed, Apr 30, 2025 at 11:43 PM H.J. Lu wrote: > > On Wed, Apr 30, 2025 at 8:12 PM Uros Bizjak wrote: > > > > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu wrote: > > > > > > SSE_FIRST_REG was added to CLASS_LIKELY_SPILLED_P, which became > > > TARGET_

Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-04-30 Thread Uros Bizjak
On Wed, Apr 30, 2025 at 11:31 PM H.J. Lu wrote: > > On Wed, Apr 30, 2025 at 7:37 PM Uros Bizjak wrote: > > > > On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu wrote: > > > > > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to > &

Re: [PATCH] x86: Remove SSE_FIRST_REG from ix86_class_likely_spilled_p

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu wrote: > > SSE_FIRST_REG was added to CLASS_LIKELY_SPILLED_P, which became > TARGET_CLASS_LIKELY_SPILLED_P, for > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40470 > > Since RA has been improved and xmm0 is a commonly used register, remove > SSE_FIRST_RE

Re: [PATCH] x86: Update TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu wrote: > > SMALL_REGISTER_CLASSES was added by > > commit c98f874233428d7e6ba83def7842fd703ac0ddf1 > Author: James Van Artsdalen > Date: Sun Feb 9 13:28:48 1992 + > > Initial revision > > which became TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P. It

Re: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 12:22 PM H.J. Lu wrote: > > On Tue, Apr 29, 2025 at 5:30 PM Uros Bizjak wrote: > > > > On Tue, Apr 29, 2025 at 9:56 AM H.J. Lu wrote: > > > > > > Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy >

Re: [PATCH] x86: Remove BREG from ix86_class_likely_spilled_p

2025-04-30 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 11:40 PM H.J. Lu wrote: > > AREG, DREG, CREG and AD_REGS are kept in ix86_class_likely_spilled_p to > avoid the following regressions with > > $ make check RUNTESTFLAGS="--target_board='unix{-m32,}'" > > FAIL: gcc.dg/pr105911.c (internal compiler error: in lra_split_hard_re

[pushed] i386: Disable string insn from non-default AS for Pmode != word_mode [PR111657]

2025-04-29 Thread Uros Bizjak
0x67 prefix is applied before segment register. That is in rep movsq %gs:(%esi), (%edi) the address is %gs + %esi. In case Pmode != word_mode (x32 with a default -maddress-mode=short) instructions should not allow segment override prefixes. Also, remove explicit addr32 prefix from asm templa

Re: [pushed] i386: Allow string instructions from non-default address space [PR111657]

2025-04-29 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 12:41 PM H.J. Lu wrote: > > On Tue, Apr 29, 2025 at 5:52 PM Uros Bizjak wrote: > > > > MOVS instructions allow segment override of their source operand, e.g.: > > > > rep movsq %gs:(%rsi), (%rdi) > > > > where %rsi is th

[pushed] i386: Allow string instructions from non-default address space [PR111657]

2025-04-29 Thread Uros Bizjak
MOVS instructions allow segment override of their source operand, e.g.: rep movsq %gs:(%rsi), (%rdi) where %rsi is the address of the source location (with %gs segment override) and %rdi is the address of the destination location. The testcase improves from (-O2 -mno-sse -mtune=generic):

Re: [PATCH] x86-64: Don't expand UNSPEC_TLS_LD_BASE to a call

2025-04-29 Thread Uros Bizjak
On Tue, Apr 29, 2025 at 9:56 AM H.J. Lu wrote: > > Don't expand UNSPEC_TLS_LD_BASE to a call so that the RTL local copy > propagation pass can eliminate multiple __tls_get_addr calls. __tls_get_addr needs to be called with 16-byte aligned stack, I don't think the compiler will correctly handle re

[pushed] i386: Skip sub-RTXes of memory operand in ix86_update_stack_alignment

2025-04-29 Thread Uros Bizjak
Skip sub-RTXes of the memory operand if stack access register is not mentioned in the operand. gcc/ChangeLog: * config/i386/i386.cc (ix86_update_stack_alignment): Skip sub-RTXes of the memory operand if stack access register is not mentioned in the operand. Bootstrapped and regressio

Re: [PATCH v3] x86: Properly find the maximum stack slot alignment

2025-04-28 Thread Uros Bizjak
On Mon, Apr 28, 2025 at 2:04 AM H.J. Lu wrote: > > On Wed, Apr 23, 2025 at 1:56 PM Uros Bizjak wrote: > > > +static void > > +ix86_find_all_reg_uses_1 (HARD_REG_SET ®set, > > + rtx set, unsigned int regno, > > + auto_bitmap &worklist) > > +{ > &

Re: [PATCH] Refactor msse4 and mno-sse4.

2025-04-24 Thread Uros Bizjak
On Fri, Apr 25, 2025 at 8:14 AM liuhongt wrote: > > This is originally from [1] > > For the command line, or target attribute, the actual operation goes > into ix86_handle_option, and as long as we get it right in this > ix86_handle_option, everything else should be fine. > As for the

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Uros Bizjak
On Thu, Apr 24, 2025 at 8:10 PM Uros Bizjak wrote: > > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > > or vpandn. > > > Current register_operand/vect

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Uros Bizjak
On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > or vpandn. > > Current register_operand/vector_operand could lose some optimization > > opportunity. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-04-22 Thread Uros Bizjak
On Sun, Apr 20, 2025 at 11:26 PM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. > > gcc/ >

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-04-21 Thread Uros Bizjak
On Sun, Apr 20, 2025 at 11:26 PM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. I've been

Re: [PATCH] [x86] Generate 2 FMA instructions in ix86_expand_swdivsf.

2025-04-20 Thread Uros Bizjak
On Mon, Apr 21, 2025 at 5:43 AM liuhongt wrote: > > From: "hongtao.liu" > > When FMA is available, N-R step can be rewritten with > > a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a > > which have 2 fma generated.[1] > > [1] https://bugs.llvm.org/show_bug.cgi?id=21385 > > Bootstrapped and re

Re: PING: [PATCH v2] x86: Add pcmpeq splitters

2025-04-19 Thread Uros Bizjak
On Sat, Apr 19, 2025 at 7:22 AM H.J. Lu wrote: > > On Mon, Dec 2, 2024 at 6:27 AM H.J. Lu wrote: > > > > Add pcmpeq splitters to split > > > > (insn 5 3 7 2 (set (reg:V4SI 100) > > (eq:V4SI (reg:V4SI 98) > > (reg:V4SI 98))) 7910 {*sse2_eqv4si3} > > (expr_list:REG_DEAD (re

Re: [PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-16 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 2:19 PM Ard Biesheuvel wrote: > > On Tue, 15 Apr 2025 at 09:48, Uros Bizjak wrote: > > > > On Thu, Apr 10, 2025 at 2:27 PM Ard Biesheuvel wrote: > > > > > > From: Ard Biesheuvel > > > > > > Commit bde21de1

Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 2:23 PM H.J. Lu wrote: > > On Tue, Apr 15, 2025 at 12:45 AM Uros Bizjak wrote: > > > > On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu wrote: > > > > > > ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for regist

Re: [PATCH v5 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-15 Thread Uros Bizjak
On Thu, Apr 10, 2025 at 2:27 PM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-base

Re: [PATCH v5 2/2] i386: Enable -mnop-mcount for -fpic with PLTs

2025-04-15 Thread Uros Bizjak
On Thu, Apr 10, 2025 at 2:26 PM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > -mnop-mcount can be trivially enabled for -fPIC codegen as long as PLTs > are being used, given that the instruction encodings are identical, only > the target may resolve differently depending on how the linker de

Re: [PATCH] x86: Update gcc.target/i386/apx-interrupt-1.c

2025-04-15 Thread Uros Bizjak
On Tue, Apr 15, 2025 at 1:06 AM H.J. Lu wrote: > > ix86_add_cfa_restore_note omits the REG_CFA_RESTORE REG note for registers > pushed in red-zone. Since > > commit 0a074b8c7e79f9d9359d044f1499b0a9ce9d2801 > Author: H.J. Lu > Date: Sun Apr 13 12:20:42 2025 -0700 > > APX: Don't use red-zone

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-14 Thread Uros Bizjak
On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote: > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > > > Don't use red-zone when there are no caller-saved registers and APX is > > enabled since 128-byte red-zone is too small for 31 GPRs. > > > > gcc/ > > > > PR target/119784 > >

Re: [PATCH] LRA: Backport PR115568 and PR119689 to release branches

2025-04-11 Thread Uros Bizjak
On Fri, Apr 11, 2025 at 6:23 PM Vladimir Makarov wrote: > > > On 4/11/25 2:29 AM, Uros Bizjak wrote: > > Hello! > > > > I would like to backport PR115568 and PR119689 to release branches. > > > > Author: Richard Biener > > Date: Wed Apr 9 14:36:19 2

[PATCH] LRA: Backport PR115568 and PR119689 to release branches

2025-04-10 Thread Uros Bizjak
Hello! I would like to backport PR115568 and PR119689 to release branches. Author: Richard Biener Date: Wed Apr 9 14:36:19 2025 +0200 rtl-optimization/119689 - compare-debug failure with LRA The previous change to fix LRA rematerialization broke compare-debug for i586 bootstrap. Fi

Re: [PATCH v4 1/2] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-09 Thread Uros Bizjak
On Wed, Apr 9, 2025 at 6:21 PM H.J. Lu wrote: > fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name); > diff --git a/gcc/testsuite/gcc.target/i386/pr119386-1.c > b/gcc/testsuite/gcc.target/i386/pr119386-1.c > index 174d00f1e27..56e44c89859 100644 > --- a/gcc/testsuite/gcc.target/i386/pr119

[committed] testsuite/x86: Correctly escape asterisk in scan-assembler

2025-04-09 Thread Uros Bizjak
Asterisk in []* regexp applies to bracket expression. When asterisk is a part of the word, then it needs to be escaped with \\. Also use []+ instead of []* to match elements in bracket expression one or more times. gcc/testsuite/ChangeLog: * gcc.target/i386/pr67215-1.c: Correctly escape

Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-09 Thread Uros Bizjak
On Tue, Apr 8, 2025 at 7:09 PM H.J. Lu wrote: > > Are there any existing test cases I should look at? > > Please see "gcc.target/i386/pr67215-*.c" While looking there, I noticed that the asterisk is not correctly escaped in scan strings. Asterisk in [...]* applies to square brackets, not scan st

Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-09 Thread Uros Bizjak
On Tue, Apr 8, 2025 at 6:59 PM Ard Biesheuvel wrote: > > On Tue, 8 Apr 2025 at 18:44, H.J. Lu wrote: > > > > On Tue, Apr 8, 2025 at 9:39 AM Ard Biesheuvel wrote: > > > > > > On Tue, 8 Apr 2025 at 15:33, H.J. Lu wrote: > > > > > > > > On Tue, Apr 8, 2025 at 3:46 AM Ard Biesheuvel > > > > wrote

Re: [PATCH v3] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-08 Thread Uros Bizjak
On Tue, Apr 8, 2025 at 12:47 PM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-base

Re: [PATCH] i386: Prefer PLT indirection for __fentry__ calls under -fPIC

2025-04-06 Thread Uros Bizjak
On Fri, Apr 4, 2025 at 9:00 AM Ard Biesheuvel wrote: > > From: Ard Biesheuvel > > Commit bde21de1205 ("i386: Honour -mdirect-extern-access when calling > __fentry__") updated the logic that emits mcount() / __fentry__() calls > into function prologues when profiling is enabled, to avoid GOT-based

Re: [PATCH] i386: Fix up splitters into vptest [PR119357]

2025-04-04 Thread Uros Bizjak
On Wed, Mar 19, 2025 at 8:56 AM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs, because the splitters into vptest > create an invalid instruction. The operands of all the UNSPEC_PTEST > using instructions use register_operand and vector_operand predicate, > these splitters use vecto

Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr82142?.c etc. on Solaris/x86

2025-04-02 Thread Uros Bizjak
On Wed, Apr 2, 2025 at 11:02 AM Rainer Orth wrote: > > Ping? It's been a week: > > https://gcc.gnu.org/pipermail/gcc-patches/2025-March/679330.html > > > Three tests FAIL on Solaris/x86 in similar ways: > > > > FAIL: gcc.target/i386/pr111673.c check-function-bodies advance > > FAIL: gcc.

Re: [PATCH] APX: add nf counterparts for rotl split pattern [PR 119539]

2025-04-01 Thread Uros Bizjak
On Tue, Apr 1, 2025 at 10:55 AM Hongtao Liu wrote: > > On Tue, Apr 1, 2025 at 4:40 PM Hongyu Wang wrote: > > > > Hi, > > > > For spiltter after 3_mask it now splits the pattern > > to *3_mask, causing the splitter doesn't generate > > nf variant. Add corresponding nf counterpart for define_insn_a

Re: [committed] i386: Fix offset calculation in ix86_redzone_clobber

2025-03-28 Thread Uros Bizjak
On Thu, Mar 27, 2025 at 11:24 PM Jakub Jelinek wrote: > > On Thu, Mar 27, 2025 at 09:28:31PM +0100, Uros Bizjak wrote: > > plus_constant expects integer as its third argument, not rtx. > > > > gcc/ChangeLog: > > > > * config/i386/i386.cc (ix86_redzone_clobb

[committed] i386: Fix offset calculation in ix86_redzone_clobber

2025-03-27 Thread Uros Bizjak
plus_constant expects integer as its third argument, not rtx. gcc/ChangeLog: * config/i386/i386.cc (ix86_redzone_clobber): Use integer, not rtx as the third argument of plus_constant. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i

Re: [PATCH] i386: Fix up pr55583.c testcase [PR119465]

2025-03-26 Thread Uros Bizjak
On Wed, Mar 26, 2025 at 9:23 AM Jakub Jelinek wrote: > > Hi! > > In r15-4289 H.J. fixed up the pr55583.c testcase to use unsigned long long > or long long instead of unsigned long or long. That change looks correct to > me because the > void test64r () { b = ((u64)b >> n) | (a << (64 - n)); } > e

Re: [PATCH] i386: Fix up combination of -2 r<<= (x & 7) into btr [PR119428]

2025-03-25 Thread Uros Bizjak
On Tue, Mar 25, 2025 at 7:55 AM Jakub Jelinek wrote: > > Hi! > > The following patch is miscompiled from r15-8478 but latently already > since my r11-5756 and r11-6631 changes. > The r11-5756 change was > https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561164.html > which changed the split

Re: [PATCH v2] i386: Verify that argument registers are spilled properly

2025-03-10 Thread Uros Bizjak
On Sun, Mar 9, 2025 at 11:34 PM H.J. Lu wrote: > > While working on a local x86 patch, which passed the GCC testsuite, I got > a compiler error: > > In function ‘paravirt_read_msr’, > inlined from ‘perf_ibs_handle_irq’ at arch/x86/events/amd/ibs.c:1055:2: > ./arch/x86/include/asm/paravirt_type

Re: [PATCH] i386: Verify that argument registers are spilled properly

2025-03-09 Thread Uros Bizjak
On Sun, Mar 9, 2025 at 3:05 PM H.J. Lu wrote: > > RDI, RSI, RDX and RCX registers are used to pass arguments in 64-bit > mode. EAX, EDX and ECX registers are used to pass arguments in 32-bit > mode. Add tests to verify that argument registers are spilled properly. > > PR target/119171 >

Re: [committed] combine: Reverse negative logic in ternary operator

2025-03-03 Thread Uros Bizjak
On Mon, Mar 3, 2025 at 5:44 PM Richard Biener wrote: > > > > > Am 03.03.2025 um 17:08 schrieb Uros Bizjak : > > > > Reverse negative logic in !a ? b : c to become a ? c : b. > > > > No functional changes. > > > > gcc/ChangeLog: > > > &

[committed] combine: Reverse negative logic in ternary operator

2025-03-03 Thread Uros Bizjak
Reverse negative logic in !a ? b : c to become a ? c : b. No functional changes. gcc/ChangeLog: * combine.cc (distribute_notes): Reverse negative logic in ternary operators. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed as an obvious patch. Uros. diff --git

Re: [PATCH] testsuite: Fix up gcc.target/i386/pr118940.c test [PR118940]

2025-02-28 Thread Uros Bizjak
On Fri, Feb 28, 2025 at 9:40 AM Jakub Jelinek wrote: > > Hi! > > The testcase uses -m32 in dg-options, something we try hard not to do, > if something should be tested only for -m32, it is { target ia32 } test, > if it can be tested for -m64/-mx32 too, just some extra options are > needed for ia32

Re: [PATCH] testsuite: Remove -m32 from another i386/ test

2025-02-28 Thread Uros Bizjak
On Fri, Feb 28, 2025 at 9:42 AM Jakub Jelinek wrote: > > Hi! > > I found another test which uses -m32 in gcc.target/i386/ . Similarly > to the previously posted test, the test ought to be tested during i686-linux > testing or x86_64-linux test with --target_board=unix\{-m32,-m64\} > There is noth

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-26 Thread Uros Bizjak
On Mon, Feb 24, 2025 at 10:46 AM Richard Biener wrote: > /* Otherwise, if this register is used by I3, then this register > now dies here, so we must put a REG_DEAD note here unless there > is one already. */ > else if (reg_referenced_p (XEXP (note,

[PATCH v2] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-26 Thread Uros Bizjak
The combine pass is trying to combine: Trying 16, 22, 21 -> 23: 16: r104:QI=flags:CCNO>0 22: {r120:QI=r104:QI^0x1;clobber flags:CC;} REG_UNUSED flags:CC 21: r119:QI=flags:CCNO<=0 REG_DEAD flags:CCNO 23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;} REG_DEAD r120:QI

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-25 Thread Uros Bizjak
On Mon, Feb 24, 2025 at 10:46 AM Richard Biener wrote: > > On Wed, Feb 12, 2025 at 1:16 PM Uros Bizjak wrote: > > > > The combine pass is trying to combine: > > > > Trying 16, 22, 21 -> 23: > >16: r104:QI=flags:CCNO>0 > >22: {r120:QI=r10

[PATCH PING] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-24 Thread Uros Bizjak
I would like to ping for the following patch that fixes P1 regression: gcc/ChangeLog: * combine.cc (distribute_notes) : Remove REG_UNUSED note from i2 when the register is also mentioned in i3. gcc/testsuite/ChangeLog: * gcc.target/i386/pr118739.c: New test. https://gcc.gnu.org/pip

Re: [PATCH v4] x86: Check the stack access register for stack access

2025-02-19 Thread Uros Bizjak
On Thu, Feb 20, 2025 at 3:17 AM H.J. Lu wrote: > > On Thu, Feb 20, 2025 at 5:37 AM H.J. Lu wrote: > > > > On Wed, Feb 19, 2025 at 10:09 PM Uros Bizjak wrote: > > > > > ... > > > > My algorithm keeps a list of registers which can access the s

Re: [PATCH v3] x86: Properly find the maximum stack slot alignment

2025-02-19 Thread Uros Bizjak
On Sat, Feb 15, 2025 at 1:27 AM H.J. Lu wrote: > > On Fri, Feb 14, 2025 at 10:08 PM Richard Biener wrote: > > > > On Fri, 14 Feb 2025, Uros Bizjak wrote: > > > > > On Fri, Feb 14, 2025 at 4:56 AM H.J. Lu wrote: > > > > > > >

Re: [PATCH v2] x86: Check register and GENERAL_REG_P for stack access

2025-02-19 Thread Uros Bizjak
On Wed, Feb 19, 2025 at 2:10 PM H.J. Lu wrote: > > On Wed, Feb 19, 2025 at 8:16 PM Uros Bizjak wrote: > > > > On Wed, Feb 19, 2025 at 12:53 PM H.J. Lu wrote: > > > > > > Since stack can only be accessed by GPR, check GENERAL_REG_P, instead of >

Re: [PATCH] x86: Check GENERAL_REG_P for stack access

2025-02-19 Thread Uros Bizjak
On Wed, Feb 19, 2025 at 12:53 PM H.J. Lu wrote: > > Since stack can only be accessed by GPR, check GENERAL_REG_P, instead of > REG_P, in ix86_find_all_reg_use_1. > > gcc/ > > PR target/118936 > * config/i386/i386.cc (ix86_find_all_reg_use_1): Replace REG_P > with GENERAL_REG_P. > > gcc/testsuite/

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Uros Bizjak
On Tue, Feb 18, 2025 at 8:26 PM Uros Bizjak wrote: > > On Tue, Feb 18, 2025 at 8:23 PM Richard Biener wrote: > > > > > > > > > Am 18.02.2025 um 20:07 schrieb Roman Kagan : > > > > > > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak w

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Uros Bizjak
On Tue, Feb 18, 2025 at 8:23 PM Richard Biener wrote: > > > > > Am 18.02.2025 um 20:07 schrieb Roman Kagan : > > > > On Tue, Feb 18, 2025 at 07:17:24PM +0100, Uros Bizjak wrote: > >>> On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: > >>> &g

Re: [PATCH] libgcc: i386/linux-unwind.h: always rely on sys/ucontext.h

2025-02-18 Thread Uros Bizjak
On Mon, Feb 17, 2025 at 6:19 PM Roman Kagan wrote: > > On Thu, Jan 02, 2025 at 04:32:17PM +0100, Roman Kagan wrote: > > When gcc is built for x86_64-linux-musl target, stack unwinding from > > within signal handler stops at the innermost signal frame. The reason > > for this behaviro is that the

[committed] i386: Simplify PARALLEL RTX scan in ix86_find_all_reg_use

2025-02-17 Thread Uros Bizjak
UNSPEC and UNSPEC_VOLATILE never store. Remove unnecessary checks and simplify RTX scan in ix86_find_all_reg_use to scan only for SET RTX in the PARALLEL. gcc/ChangeLog: * config/i386/i386.cc (ix86_find_all_reg_use): Scan only for SET RTX in PARALLEL. Bootstrapped and regression tested o

Re: [PATCH v3] x86: Properly find the maximum stack slot alignment

2025-02-17 Thread Uros Bizjak
On Fri, Feb 14, 2025 at 2:11 PM Uros Bizjak wrote: > > On Fri, Feb 14, 2025 at 4:56 AM H.J. Lu wrote: > > > > On Thu, Feb 13, 2025 at 5:17 PM Uros Bizjak wrote: > > > > > > On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu wrote: > > > > > > >

[PATCH] middle-end: Fixup constant integers when expanding __builtin_crc [PR118288]

2025-02-16 Thread Uros Bizjak
Constant integers with MSB set have to be represented as corresponding signed integers. Use gen_int_mode to emit them in the correct way. PR middle-end/118288 gcc/ChangeLog: * builtins.cc (expand_builtin_crc_table_based): Use gen_int_mode to emit constant integers with MSB set. gcc

Re: [PATCH] tree-optimizatio/118852 - wrong code with 502.gcc_r

2025-02-14 Thread Uros Bizjak
On Fri, Feb 14, 2025 at 3:10 PM Richard Biener wrote: > > 502.gcc_r when built with -fprofile-generate exposes a SLP discovery > issue where an IV forced live due to early break is not properly > discovered if its latch def is part of a different IVs SSA cycle. > To mitigate this we have to make s

Re: [PATCH v3] x86: Properly find the maximum stack slot alignment

2025-02-14 Thread Uros Bizjak
On Fri, Feb 14, 2025 at 4:56 AM H.J. Lu wrote: > > On Thu, Feb 13, 2025 at 5:17 PM Uros Bizjak wrote: > > > > On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu wrote: > > > > > > Don't assume that stack slots can only be accessed by stack or frame > > >

Re: [PATCH 0/2] x86: Add a pass to fold tail call

2025-02-13 Thread Uros Bizjak
On Thu, Feb 13, 2025 at 1:58 AM H.J. Lu wrote: > > x86 conditional branch (jcc) target can be either a label or a symbol. > Add a pass to fold tail call with jcc by turning: > > jcc .L6 > ... > .L6: > jmp tailcall > > into: > > jcc tailcall > > After basic block

Re: [PATCH v2] x86: Properly find the maximum stack slot alignment

2025-02-13 Thread Uros Bizjak
On Thu, Feb 13, 2025 at 9:31 AM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. > > gcc/ >

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Uros Bizjak
On Wed, Feb 12, 2025 at 4:16 PM Richard Sandiford wrote: > > Uros Bizjak writes: > > The combine pass is trying to combine: > > > > Trying 16, 22, 21 -> 23: > >16: r104:QI=flags:CCNO>0 > >22: {r120:QI=r104:QI^0x1;clobber flags:CC;} > &

Re: [PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Uros Bizjak
On Wed, Feb 12, 2025 at 1:14 PM Uros Bizjak wrote: > > The combine pass is trying to combine: > > Trying 16, 22, 21 -> 23: >16: r104:QI=flags:CCNO>0 >22: {r120:QI=r104:QI^0x1;clobber flags:CC;} > REG_UNUSED flags:CC >21: r119:QI=flags:CCNO<=0 &g

[PATCH] combine: Discard REG_UNUSED note in i2 when register is also referenced in i3 [PR118739]

2025-02-12 Thread Uros Bizjak
The combine pass is trying to combine: Trying 16, 22, 21 -> 23: 16: r104:QI=flags:CCNO>0 22: {r120:QI=r104:QI^0x1;clobber flags:CC;} REG_UNUSED flags:CC 21: r119:QI=flags:CCNO<=0 REG_DEAD flags:CCNO 23: {r110:QI=r119:QI|r120:QI;clobber flags:CC;} REG_DEAD r120:QI

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread Uros Bizjak
On Wed, Feb 12, 2025 at 11:06 AM H.J. Lu wrote: > > On Wed, Feb 12, 2025 at 5:28 PM Uros Bizjak wrote: > > > > On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > > > > > Don't assume that stack slots can only be accessed by stack or frame > > >

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-12 Thread Uros Bizjak
On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. > > gcc/ >

Re: [PATCH] x86: Properly find the maximum stack slot alignment

2025-02-11 Thread Uros Bizjak
On Wed, Feb 12, 2025 at 6:25 AM H.J. Lu wrote: > > Don't assume that stack slots can only be accessed by stack or frame > registers. We first find all registers defined by stack or frame > registers. Then check memory accesses by such registers, including > stack and frame registers. I wonder i

Re: [PATCH] x86: Correct ASM_OUTPUT_SYMBOL_REF

2025-02-10 Thread Uros Bizjak
On Tue, Feb 11, 2025 at 7:13 AM H.J. Lu wrote: > > x is not a macro argument. It just happens to work as final.cc passes > x for 2nd argument: > > final.cc: ASM_OUTPUT_SYMBOL_REF (file, x); > > PR target/118825 > * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with > SYM. > - =

  1   2   3   4   5   6   7   8   9   10   >