Re: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle wrote: > > > This patch avoids long lea instructions for performing x<<2 and x<<3 > by splitting them into shorter sal and move (or xchg instructions). > Because this increases the number of instructions, but reduces the > total size, its suitable for -O

Re: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle wrote: > > Doh! ENOPATCH. > > > -Original Message- > > From: Roger Sayle > > Sent: 05 October 2023 12:44 > > To: 'gcc-patches@gcc.gnu.org' > > Cc: 'Uros Bizjak' > > Subject: [X8

[COMMITTED] i386: Improve memory copy from named address space [PR111657]

2023-10-05 Thread Uros Bizjak
The stringop strategy selection algorithm falls back to a libcall strategy when it exhausts its pool of available strategies. The memory area copy function (memcpy) is not available from the system library for non-default address spaces, so the compiler emits the most trivial byte-at-a-time copy l

Re: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-09 Thread Uros Bizjak
On Fri, Oct 6, 2023 at 3:59 PM Roger Sayle wrote: > > > Grr! I've done it again. ENOPATCH. > > > -Original Message- > > From: Roger Sayle > > Sent: 06 October 2023 14:58 > > To: 'gcc-patches@gcc.gnu.org' > > Cc: 'Uros Bizja

Re: [PATCH v4] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote: > > When using -mcmodel=medium, large data objects larger than the > -mlarge-data-threshold threshold are placed into large data sections > (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place > .l* sections into separate outpu

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote: > > On 2023-10-16, Uros Bizjak wrote: > >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote: > >> > >> When using -mcmodel=medium, large data objects larger than the > >> -mlarge-data-threshold thresh

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 9:58 PM Fangrui Song wrote: > > On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak wrote: > > > > On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote: > > > > > > On 2023-10-16, Uros Bizjak wrote: > > > >On Tue, Aug 1, 2023 at 9:5

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle wrote: > > > This patch is the backend piece of a solution to PRs 101955 and 106245, > that adds a define_insn_and_split to the i386 backend, to perform sign > extension of a single (least significant) bit using AND $1 then NEG. > > Previously, (x<<31)>>

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-18 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle wrote: > > > Hi Uros, > Thanks for the speedy review. > > > From: Uros Bizjak > > Sent: 17 October 2023 17:38 > > > > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle > > wrote: > > > > > > &g

Re: [i386] Do not omit the frame pointer at -O0

2018-07-02 Thread Uros Bizjak
On Mon, Jul 2, 2018 at 10:14 AM, Eric Botcazou wrote: > Ping for https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01228.html > > Thanks in advance. LGTM, but please note that the patch was already approved by Jeff on 22th of June [1]. [1] https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01466.html Ur

Re: [PATCH] i386; Add indirect_return function attribute

2018-07-03 Thread Uros Bizjak
On Tue, Jul 3, 2018 at 5:32 PM, H.J. Lu wrote: > On Fri, Jun 8, 2018 at 3:27 AM, H.J. Lu wrote: >> On x86, swapcontext may return via indirect branch when shadow stack >> is enabled. To support code instrumentation of control-flow transfers >> with -fcf-protection, add indirect_return function a

[PATCH 18/n, 386]: Fix PR85694, Generation of vectorized AVG (Average) instruction

2018-07-03 Thread Uros Bizjak
Hello! Attached patch implements unsigned HImode and QImode vector average instructions. This is all x86 has to offer... 2018-07-03 Uros Bizjak PR target/85694 * config/i386/sse.md (uavg3_ceil): New expander. (_uavg3): Simplify expander. testsuite/ChangeLog: 2018-07-03 Uros

Re: [PATCH 18/n, 386]: Fix PR85694, Generation of vectorized AVG (Average) instruction

2018-07-04 Thread Uros Bizjak
On Tue, Jul 3, 2018 at 7:38 PM, Uros Bizjak wrote: > Hello! > > Attached patch implements unsigned HImode and QImode vector average > instructions. This is all x86 has to offer... FYI, I have tried the effectiveness of patched gcc with SPEC CPU2006 464.h264 (actually, jm19.0.zip so

Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-12 Thread Uros Bizjak
On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu wrote: > r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations > which are enabled by PROCESSOR_HASWELL. As the result, -mtune=skylake > generates slower codes on Skylake than before. The same also applies > to Cannonlake and Icelak tun

Re: [PATCH] x86: Tune Skylake, Cannonlake and Icelake as Haswell

2018-07-13 Thread Uros Bizjak
On Fri, Jul 13, 2018 at 3:12 PM, H.J. Lu wrote: > On Fri, Jul 13, 2018 at 08:53:02AM +0200, Uros Bizjak wrote: > > On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu wrote: > > > > > r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations > > > which are

[PATCH, middle-end]: Fix PR86511, traps are generated for non-trapping compares

2018-07-13 Thread Uros Bizjak
le LT does. The solution is to avoid the above expansion for compares that would change their trappines and emit jumps around 2018-07-13 Uros Bizjak PR target/86511 * expmed.c (emit_store_flag): Do not emit setcc followed by a conditional move when trapping comparison was split

Re: [patch, x86] Improve memcpy/memset strategy for Skylake.

2018-07-18 Thread Uros Bizjak
On Thu, Jul 19, 2018 at 7:00 AM, Koval, Julia wrote: > Hi, > This patch improves memset/memcpy strategy for Skylake. Ok for trunk? Is this patch based on some benchmark data? Uros. > * gcc/config/i386/x86-tune-costs.h (skylake_memcpy, > skylake_memcpy): Replace rep_prefix with u

Re: [patch, x86] Improve memcpy/memset strategy for Skylake.

2018-07-18 Thread Uros Bizjak
On Thu, Jul 19, 2018 at 8:20 AM, Koval, Julia wrote: > Yes, it gives small improvements(~2%) on 557.xz on O2 and on > 548.exchange(~2.5%) and 500.perlbench(~1%) on Ofast in rate mode. > >> -Original Message- >> From: Uros Bizjak [mailto:ubiz...@gmail.com] >> Se

Re: [PATCH 10/11] x86 - add speculation_barrier pattern

2018-07-28 Thread Uros Bizjak
On Fri, Jul 27, 2018 at 11:37 AM, Richard Earnshaw wrote: > > This patch adds a speculation barrier for x86, based on my > understanding of the required mitigation for that CPU, which is to use > an lfence instruction. > > This patch needs some review by an x86 expert and if adjustments are > need

[PATCH, testsuite]: Fix PR 86153, test case g++.dg/pr83239.C fails

2018-08-01 Thread Uros Bizjak
ingop-overflow=] In function 'void test_if(std::vector&, int) [with T = long int]': cc1plus: warning: 'void* __builtin_memset(void*, int, long unsigned int)' specified size 18446744073709551600 exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=] 2018-0

Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-19 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle wrote: > > > This patch contains clean-ups of the widening multiplication patterns in > i386.md, and provides variants of the existing highpart multiplication > peephole2 transformations (that tidy up register allocation after > reload), and thereby fixe

Re: [PATCH] [x86] Remove unused mmx_pinsrw.

2023-10-20 Thread Uros Bizjak
On Fri, Oct 20, 2023 at 8:54 AM liuhongt wrote: > > When I'm working on enable more 32/64-bit vectorization for _Float16, > I notice there's 1 redundant define_expand, the patch removed the expander. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: >

Re: [PATCH] i386: Avoid paradoxical subreg dests in vector zero_extend

2023-10-24 Thread Uros Bizjak
On Tue, Oct 24, 2023 at 12:08 PM Richard Sandiford wrote: > > For the V2HI -> V2SI zero extension in: > > typedef unsigned short v2hi __attribute__((vector_size(4))); > typedef unsigned int v2si __attribute__((vector_size(8))); > v2si f (v2hi x) { return (v2si) {x[0], x[1]}; } > > ix86_expan

Re: [x86 PATCH] Fine tune STV register conversion costs for -Os.

2023-10-24 Thread Uros Bizjak
On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle wrote: > > > The eagle-eyed may have spotted that my recent testcases for DImode shifts > on x86_64 included -mno-stv in the dg-options. This is because the > Scalar-To-Vector (STV) pass currently transforms these shifts to use > SSE vector operations,

[committed] i386: Narrow test instructions with immediate operands [PR111698]

2023-10-25 Thread Uros Bizjak
i386: Narrow test instructions with immediate operands [PR111698] Narrow test instructions with immediate operand that test memory location for zero. E.g. testl $0x00aa, mem can be converted to testb $0xaa, mem+2. Reject targets where reading (possibly unaligned) part of memory location after

Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-26 Thread Uros Bizjak
sses before reload check both predicates and > constraints. > > My original patch fixes PR 110511, using the same peephole2 idiom as already > used elsewhere in i386.md. Ok for mainline? Thanks for the explanation. The patch is OK. > > -Original Message- > > From: U

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-26 Thread Uros Bizjak
On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote: > > Hi all: > This patch enables -march/-mtune=yongfeng, costs and tunings are set > according to the characteristics of the processor. We add a new md file to > describe yongfeng processor. > > Bootstrapped /regtested X86_64. > > Ok for

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-27 Thread Uros Bizjak
On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote: > > On 2023/10/26 17:34, Uros Bizjak wrote: > > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote: > >> > >> Hi all: > >> This patch enables -march/-mtune=yongfeng, costs and tunings are set > >> a

Re: [PATCH] Testsuite, i386: Fix test by passing -march

2023-10-30 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 12:53 PM FX Coudert wrote: > > Hi, > > The newly introduced test gcc.target/i386/pr111698.c currently fails on > Darwin, where the default arch is core2. > Andrew suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112287 to > pass a recent value to -march, and I ca

Re: [PATCH] [x86_64]: Zhaoxin yongfeng enablement

2023-10-30 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 10:08 AM Mayshao-oc wrote: > > >On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote: > >> > >> On 2023/10/26 17:34, Uros Bizjak wrote: > >> > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote: > >> >> > >> >>

[PUSHED] i386: Improve stack protector patterns and peephole2s

2023-11-01 Thread Uros Bizjak
Improve stack protector patterns and peephole2s to substitute stack protector scratch register clear with unrelated subsequent register initialization in several ways: a. Explicitly generate scratch register as named pseudo. This allows optimizers to eventually reuse the zero value in the registe

Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Uros Bizjak
On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle wrote: > > > This patch is a follow-up to my previous PR target/110551 patch, this > time to address the additional move after mulx, seen on TARGET_BMI2 > architectures (such as -march=haswell). The complication here is > that the flexible multiple-set

Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation using peephole2.

2023-11-01 Thread Uros Bizjak
On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle wrote: > > > Hi Uros, > > > From: Uros Bizjak > > Sent: 01 November 2023 10:05 > > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation > > using peephole2. > > > > On Mon

[COMMITTED] i386: Move stack protector patterns above mov $0 -> xor peephole

2023-11-02 Thread Uros Bizjak
Move stack protector patterns above mov $0,%reg -> xor %reg,%reg so the latter won't interfere with stack protector peephole2s. gcc/ChangeLog: * config/i386/i386.md: Move stack protector patterns above mov $0,%reg -> xor %reg,%reg peephole2 pattern. Bootstrapped and regression tested on

[RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
The patch generalizes address register class handling to allow multiple address register classes. For APX EGPR targets, some instructions can't be encoded with REX2 prefix, so it is necessary to limit address register class to avoid REX2 registers. The same situation happens for instructions with

Re: [RFC, RFA PATCH] i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
023年11月3日周五 20:50写道: > > > > On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak wrote: > > > > > > The patch generalizes address register class handling to allow multiple > > > address register classes. For APX EGPR targets, some instructions can't > >

[COMMITTED]: i386: Handle multiple address register classes

2023-11-03 Thread Uros Bizjak
The patch generalizes address register class handling to allow multiple register classes. For APX EGPR targets, some instructions do not support GPR32 registers, so it is necessary to limit address register set to avoid them. The same situation happens for instructions with high registers, where

[committed] i386: Add LEGACY_INDEX_REG register class.

2023-11-05 Thread Uros Bizjak
Also rename LEGACY_REGS to LEGACY_GENERAL_REGS. gcc/ChangeLog: * config/i386/i386.h (enum reg_class): Add LEGACY_INDEX_REGS. Rename LEGACY_REGS to LEGACY_GENERAL_REGS. (REG_CLASS_NAMES): Ditto. (REG_CLASS_CONTENTS): Ditto. * config/i386/constraints.md ("R"): Update for rename.

[committed] i386: Use "addr" attribute to limit address regclass to non-REX regs

2023-11-06 Thread Uros Bizjak
Use "addr" attribute with "gpr8" value to limit address register class to non-REX registers in instructions with high registers, where REX registers can not be used in the address. gcc/ChangeLog: * config/i386/constraints.md (Bc): Remove constraint. (Bn): Rewrite to use x86_extended_reg_m

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-24 Thread Uros Bizjak
On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu wrote: > > Update patch: > Add m constraint to define_insn (sse_1_round *sse_1_round when under sse4 but not avx512f. It looks to me that the original insn is incompletely defined. It should use nonimmediate_operand, "m" constraint and pointer size mod

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-25 Thread Uros Bizjak
On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu wrote: > > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu wrote: > > > > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak wrote: > > > > > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu wrote: > > > > &

Re: [PATCH target/89071] Fix false dependence of scalar operations vrcp/vsqrt/vrsqrt/vrndscale

2019-10-25 Thread Uros Bizjak
On Fri, Oct 25, 2019 at 9:13 PM Hongtao Liu wrote: > > Update patch. > > On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak wrote: > > > > On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu wrote: > > > > > > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu wrote: >

Re: [PATCH] Adjust predicates and constraints of scalar insns

2019-10-25 Thread Uros Bizjak
On Fri, Oct 25, 2019 at 9:20 PM Hongtao Liu wrote: > > > Looking into sse.md, there is a lot of inconsistencies in existing *vm > > patterns w.r.t. operand constraints. Unfortunately, these were copied > > into proposed patterns. One example is existing > > > > (define_insn "_vmsqrt2" > > [(set

Re: [PATCH] Remove redudant iptr when operand already has a scalar mode.

2019-10-26 Thread Uros Bizjak
On Sat, Oct 26, 2019 at 3:27 PM Hongtao Liu wrote: > > > BTW: Please also note that there is no need to use or operand > > mode override in scalar insn templates for intel asm dialect when > > operand already has a scalar mode. > https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01868.html > > This p

[PATCH, i386]: Remove a couple of operand modifiers

2019-10-28 Thread Uros Bizjak
These are not needed for scalar operands. 2019-10-28 Uroš Bizjak * config/i386/sse.md (sse_cvtss2si_2): Remove %k operand modifier. (*vec_extractv2df_1_sse): Remove %q operand modifier. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline. Uros.

[PATCH, i386]: Fix REDUC_SSE_SMINMAX_MODE mode iterator

2019-10-28 Thread Uros Bizjak
2019-10-28 Uroš Bizjak PR target/92225 * config/i386/sse.md (REDUC_SSE_SMINMAX_MODE): Use TARGET_SSE4_2 condition for V2DImode. testsuite/ChangeLog: 2019-10-28 Uroš Bizjak PR target/92225 * gcc.target/i386/pr92225.c: New test. Bootstrapped and regression tested on x86

Re: [PATCH] Unbreak -masm=intel (PR target/92258)

2019-10-28 Thread Uros Bizjak
On Mon, Oct 28, 2019 at 11:02 PM Jakub Jelinek wrote: > > Hi! > > On Sat, Oct 26, 2019 at 09:27:12PM +0800, Hongtao Liu wrote: > > > BTW: Please also note that there is no need to use or operand > > > mode override in scalar insn templates for intel asm dialect when > > > operand already has a sc

Re: [PATCH] Don't mention MMX in -msse etc. option descriptions

2019-10-30 Thread Uros Bizjak
On Tue, Oct 29, 2019 at 8:57 AM Jakub Jelinek wrote: > > Hi! > > While working on the OpenMP isa patch, I've noticed most of the x86 ISA > options imply in the help text that it enables MMX, when they do not. > We now have the mmx in sse2, so to some extent most of the built-in functions > are ena

Re: [PATCH] Enable VPOPCNTDQ for icelake-{client,server} and tigerlake.

2019-11-13 Thread Uros Bizjak
On Wed, Nov 13, 2019 at 4:25 PM Martin Liška wrote: > > Hi. > > The patch adds a missing feature for PTA_ICELAKE_CLIENT and > inherited CPUs. One can see that: > https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > R

[PATCH, i386]: Fix PR93254, -msse generates sse2 instructions

2020-01-14 Thread Uros Bizjak
2020-01-14 Uroš Bizjak PR target/93254 * config/i386/i386.md (*movsf_internal): Require SSE2 ISA for alternatives 9 and 10. Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Committed to mainline, will be backported to gcc-9 branch. Uros. diff --git a/gcc/

Re: [PATCH, i386]: Fix PR93254, -msse generates sse2 instructions

2020-01-15 Thread Uros Bizjak
Also, we can remove existing SSE2 ISA requirements for alternatives 14 and 15. Remove invalid SSE2 ISA requirements in *movsf_internal. 2020-01-15 Uroš Bizjak * config/i386/i386.md (*movsf_internal): Do not require SSE2 ISA for alternatives 14 and 15. Patch was bootstrapped and regre

Re: [PATCH] i386: Fix up -fdollars-in-identifiers with identifiers starting with $ in -masm=att (PR target/91298)

2020-01-18 Thread Uros Bizjak
On Sat, Jan 18, 2020 at 1:30 PM Jakub Jelinek wrote: > > Hi! > > In AT&T syntax leading $ is special, so if we have identifiers that start > with dollar, we usually fail to assemble it (or assemble incorrectly). > As mentioned in the PR, what works is wrapping the identifiers inside of > parens, l

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-19 Thread Uros Bizjak
On Sun, Jan 19, 2020 at 2:58 PM H.J. Lu wrote: > > To add x32 support to -mtls-dialect=gnu2, we need to replace DI with > P in GNU2 TLS patterns. Since thread pointer is in ptr_mode, PLUS in > GNU2 TLS address computation must be done in ptr_mode to support > -maddress-mode=long. Also drop the "

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-19 Thread Uros Bizjak
On Sun, Jan 19, 2020 at 6:43 PM Uros Bizjak wrote: > > On Sun, Jan 19, 2020 at 2:58 PM H.J. Lu wrote: > > > > To add x32 support to -mtls-dialect=gnu2, we need to replace DI with > > P in GNU2 TLS patterns. Since thread pointer is in ptr_mode, PLUS in > > GNU2 T

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-19 Thread Uros Bizjak
On Sun, Jan 19, 2020 at 7:07 PM H.J. Lu wrote: > > On Sun, Jan 19, 2020 at 9:48 AM Uros Bizjak wrote: > > > > On Sun, Jan 19, 2020 at 6:43 PM Uros Bizjak wrote: > > > > > > On Sun, Jan 19, 2020 at 2:58 PM H.J. Lu wrote: > > > > > > &g

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-19 Thread Uros Bizjak
On Sun, Jan 19, 2020 at 9:07 PM H.J. Lu wrote: > > On Sun, Jan 19, 2020 at 12:01 PM Uros Bizjak wrote: > > > > On Sun, Jan 19, 2020 at 7:07 PM H.J. Lu wrote: > > > > > > On Sun, Jan 19, 2020 at 9:48 AM Uros Bizjak wrote: > > > > > >

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-19 Thread Uros Bizjak
On Sun, Jan 19, 2020 at 10:00 PM H.J. Lu wrote: > > On Sun, Jan 19, 2020 at 12:16 PM Uros Bizjak wrote: > > > > On Sun, Jan 19, 2020 at 9:07 PM H.J. Lu wrote: > > > > > > On Sun, Jan 19, 2020 at 12:01 PM Uros Bizjak wrote: > > > > > &

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-21 Thread Uros Bizjak
On Mon, Jan 20, 2020 at 10:46 PM H.J. Lu wrote: > > > OK. Let's go with this version, but please investigate if we need to > > > calculate TLS address in ptr_mode instead of Pmode. Due to quite some > > > zero-extension from ptr_mode to Pmode hacks in this area, it looks to > > > me that the whol

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-21 Thread Uros Bizjak
On Tue, Jan 21, 2020 at 9:47 AM Uros Bizjak wrote: > > On Mon, Jan 20, 2020 at 10:46 PM H.J. Lu wrote: > > > > > OK. Let's go with this version, but please investigate if we need to > > > > calculate TLS address in ptr_mode instead of Pmode. Due to quite som

Re: [PATCH] PR target/93319: x32: Add x32 support to -mtls-dialect=gnu2

2020-01-21 Thread Uros Bizjak
On Tue, Jan 21, 2020 at 8:16 PM H.J. Lu wrote: > > On Tue, Jan 21, 2020 at 2:29 AM Uros Bizjak wrote: > > > > On Tue, Jan 21, 2020 at 9:47 AM Uros Bizjak wrote: > > > > > > On Mon, Jan 20, 2020 at 10:46 PM H.J. Lu wrote: > > > > > > > &

Re: [PATCH] i386: Use bzhi for x & ((1 << y) - 1) or x & ((1U << y) - 1) [PR93346]

2020-01-23 Thread Uros Bizjak
On Thu, Jan 23, 2020 at 8:56 AM Jakub Jelinek wrote: > > Hi! > > The bzhi patterns are quite complicated because they need to accurately > describe the behavior of the instruction for all input values. > The following patterns are simple and make bzhi recognizable even for > cases where not all in

Re: [PATCH] wide-int: i386: Fix ICEs on TImode signed overflow add/sub patterns [PR93376]

2020-01-23 Thread Uros Bizjak
On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek wrote: > > On Thu, Jan 23, 2020 at 09:14:42AM +, Richard Sandiford wrote: > > > The other patch is something suggested by Richard S., avoid using OImode > > > for this and instead use a partial int mode that is smaller. This is > > > still > > >

Re: [PATCH] wide-int: i386: Fix ICEs on TImode signed overflow add/sub patterns [PR93376]

2020-01-23 Thread Uros Bizjak
On Thu, Jan 23, 2020 at 2:17 PM Jakub Jelinek wrote: > > On Thu, Jan 23, 2020 at 10:38:31AM +0100, Uros Bizjak wrote: > > On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek wrote: > > > > > > On Thu, Jan 23, 2020 at 09:14:42AM +, Richard Sandiford wrote: > >

Re: [PATCH] i386: prefer vpermilpd over vpermpd [PR93395]

2020-01-23 Thread Uros Bizjak
On Thu, Jan 23, 2020 at 10:48 PM Jakub Jelinek wrote: > > Hi! > > In Agner Fog's tables, vpermilp[sd] with immediates seem to be > much faster than vpermpd with immediate, for a good reason, > the former only permute something within the lanes and don't do anything > intra-lane, while vpermpd can.

Re: [PATCH] i386: Fix up *avx_vperm_broadcast_v4df [PR93430]

2020-01-26 Thread Uros Bizjak
On Sun, Jan 26, 2020 at 12:55 AM Jakub Jelinek wrote: > > Hi! > > Apparently my recent patch which moved the *avx_vperm_broadcast* and > *vpermil* patterns before vpermpd broke the following testcase, the > define_insn_and_split matched always but the splitter condition only split > it if not -mav

Re: [PATCH] i386: Fix up *{add,sub}v4_doubleword patterns (PR target/93412)

2020-01-26 Thread Uros Bizjak
On Sun, Jan 26, 2020 at 12:59 AM Jakub Jelinek wrote: > > Hi! > > In the *{add,sub}v4_doubleword patterns, we don't really want to see a > VOIDmode last operand, because it then means invalid RTL > (sign_extend:{TI,POI} (const_int ...)) or so, and therefore something we > don't really handle in th

Re: [PATCH] i386: Disable TARGET_SSE_TYPELESS_STORES for TARGET_AVX

2020-01-27 Thread Uros Bizjak
On Mon, Jan 27, 2020 at 7:23 PM H.J. Lu wrote: > > movaps/movups is one byte shorter than movdaq/movdqu. But it isn't the > case for AVX nor AVX512. We should disable TARGET_SSE_TYPELESS_STORES > for TARGET_AVX. > > gcc/ > > PR target/91461 > * config/i386/i386.h (TARGET_SSE_TYPE

Re: [PATCH] i386: Disable TARGET_SSE_TYPELESS_STORES for TARGET_AVX

2020-01-27 Thread Uros Bizjak
On Mon, Jan 27, 2020 at 11:17 PM H.J. Lu wrote: > > On Mon, Jan 27, 2020 at 12:26 PM Uros Bizjak wrote: > > > > On Mon, Jan 27, 2020 at 7:23 PM H.J. Lu wrote: > > > > > > movaps/movups is one byte shorter than movdaq/movdqu. But it isn't the > >

Re: [PATCH] i386: Don't use ix86_tune_ctrl_string in parse_mtune_ctrl_str

2020-01-27 Thread Uros Bizjak
On Mon, Jan 27, 2020 at 3:13 PM H.J. Lu wrote: > > There are > > static void > parse_mtune_ctrl_str (bool dump) > { > if (!ix86_tune_ctrl_string) > return; > > parse_mtune_ctrl_str is only called from set_ix86_tune_features, which > is only called from ix86_function_specific_restore and > ix

Re: [PATCH] i386: Disable TARGET_SSE_TYPELESS_STORES for TARGET_AVX

2020-01-28 Thread Uros Bizjak
On Tue, Jan 28, 2020 at 3:32 PM H.J. Lu wrote: > > On Mon, Jan 27, 2020 at 11:04 PM Uros Bizjak wrote: > > > > On Mon, Jan 27, 2020 at 11:17 PM H.J. Lu wrote: > > > > > > On Mon, Jan 27, 2020 at 12:26 PM Uros Bizjak wrote: > > > > > &

Re: [PATCH] i386: Disable TARGET_SSE_TYPELESS_STORES for TARGET_AVX

2020-01-28 Thread Uros Bizjak
On Tue, Jan 28, 2020 at 4:34 PM H.J. Lu wrote: > > You could move > > > > (match_test "TARGET_AVX") > > (const_string "TI") > > > > up to bypass the cases below. > > > > I don't think we can do that. There are 2 cases where we prefer > movaps/movups: > > /* Use packed single precision instru

Re: [PATCH] i386: Prefer TARGET_AVX over TARGET_SSE_TYPELESS_STORES

2020-01-28 Thread Uros Bizjak
On Tue, Jan 28, 2020 at 6:51 PM H.J. Lu wrote: > > On Tue, Jan 28, 2020 at 9:12 AM Uros Bizjak wrote: > > > > On Tue, Jan 28, 2020 at 4:34 PM H.J. Lu wrote: > > > > > > You could move > > > > > > > > (match_test "TARGET_AVX"

Re: [PATCH] i386: Optimize popcnt followed by zero/sign extension [PR91824]

2020-01-29 Thread Uros Bizjak
On Thu, Jan 30, 2020 at 1:18 AM Jakub Jelinek wrote: > > Hi! > > Like any other instruction with 32-bit GPR destination operand in 64-bit > mode, popcntl also clears the upper 32 bits of the register (and other bits > too, it can return only 0 to 32 inclusive). > > During combine, the zero or sign

Re: [PATCH] i386: Optimize {,v}{,p}movmsk{b,ps,pd} followed by sign extension [PR91824]

2020-01-29 Thread Uros Bizjak
On Thu, Jan 30, 2020 at 1:23 AM Jakub Jelinek wrote: > > Hi! > > Some time ago, patterns were added to optimize move mask followed by zero > extension from 32 bits to 64 bit. As the testcase shows, the intrinsics > actually return int, not unsigned int, so it will happen quite often that > one ac

[PATCH, i386]: Fix TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL handling.

2020-01-31 Thread Uros Bizjak
The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is only insn size, as advised in e.g. Software Optimization Guide for the AMD Family 15h Processors [1], section 7.1.2, where it is said: --quote-- 7.1.2 Reduce Instruction SizeOptimization Reduce the size of instructions when pos

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 10:39 AM Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for > registers that aren't ever live in the function before and break the > prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they > consider al

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek wrote: > > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > > If it works OK, I'd rather see this functionality implemented as an > > epilogue_completed guarded splitter. In the .md files, there are > > al

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak wrote: > > On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek wrote: > > > > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > > > If it works OK, I'd rather see this functionality implemented as an > > >

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 1:06 PM Richard Sandiford wrote: > > Uros Bizjak writes: > > On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak wrote: > >> > >> On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek wrote: > >> > > >> > On Tue, Feb 04, 2020 at 11:

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 1:30 PM Jakub Jelinek wrote: > > On Tue, Feb 04, 2020 at 12:24:10PM +0100, Uros Bizjak wrote: > > > A && is missing in the split condition to inherit TARGET_AVX. > > > > Also, you don't need to emit "#" in output templa

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-04 Thread Uros Bizjak
On Tue, Feb 4, 2020 at 2:13 PM Jakub Jelinek wrote: > > On Tue, Feb 04, 2020 at 01:38:51PM +0100, Uros Bizjak wrote: > > As Richard advised, let's put this safety stuff back. Usually, in > > i386.md, these kind of splitters are implemented as two patterns, one > > (

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-05 Thread Uros Bizjak
On Wed, Feb 5, 2020 at 11:05 AM Jakub Jelinek wrote: > > On Tue, Feb 04, 2020 at 02:15:04PM +0100, Uros Bizjak wrote: > > On Tue, Feb 4, 2020 at 2:13 PM Jakub Jelinek wrote: > > > > > > On Tue, Feb 04, 2020 at 01:38:51PM +0100, Uros Bizjak wrote: > > > >

Re: [PATCH] i386: Omit clobbers from vzeroupper until final [PR92190]

2020-02-05 Thread Uros Bizjak
On Wed, Feb 5, 2020 at 12:03 PM Jakub Jelinek wrote: > > On Wed, Feb 05, 2020 at 11:46:51AM +0100, Uros Bizjak wrote: > > I think we should just enable split4 also for -O0. This would also > > allow us to remove the "optimize > 0" check above and allow us to >

Re: [PATCH] x86-64: Pass aggregates with only float/double in GPRs for MS_ABI

2020-02-05 Thread Uros Bizjak
On Wed, Feb 5, 2020 at 6:59 PM H.J. Lu wrote: > > MS_ABI requires passing aggregates with only float/double in integer > registers. Checked gcc outputs against Clang and fixed: > > FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54 > -Wno-unused-variable -Wno-unused-parameter >

[committed] x86: Simplify post epilogue_completed splitters.

2020-02-05 Thread Uros Bizjak
Simplify post epilogue_completed splitters. Now that we have post epilogue_completed split point for all optimization levels, we can simplify post epilogue_completed splitters considerably. If corresponding define_peephole2 pattern fails to allocate a temporary register (or if peephole2 pass isn't

Re: [PATCH] i386: Improve avx* vector concatenation [PR93594]

2020-02-06 Thread Uros Bizjak
On Thu, Feb 6, 2020 at 9:34 AM Jakub Jelinek wrote: > > Hi! > > The following testcase shows that for _mm256_set*_m128i and similar > intrinsics, we sometimes generate bad code. All 4 routines are expressing > the same thing, a 128-bit vector zero padded to 256-bit vector, but only the > 3rd one

[PATCH] Improve splitX passes management

2020-02-06 Thread Uros Bizjak
The names of split_before_sched2 ("split4") and split_before_regstack ("split3") do not reflect their insertion point in the sequence of passes, where split_before_regstack follows split_before_sched2. Reorder the code and rename the passes to reflect the reality. split_before_regstack pass does n

[committed] x86: Emit "#" instead of calling gcc_unreachable for invalid insns.

2020-02-06 Thread Uros Bizjak
Implement standard approach by emitting "#" for insns that have to be split. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. 2020-02-06 Uroš Bizjak * config/i386/i386.md (*pushtf): Emit "#" instead of calling gcc_unreachable in insn output. (*pushxf): Ditto. (*

Re: [committed] x86: Emit "#" instead of calling gcc_unreachable for invalid insns.

2020-02-06 Thread Uros Bizjak
On Thu, Feb 6, 2020 at 6:07 PM Jakub Jelinek wrote: > > On Thu, Feb 06, 2020 at 05:36:43PM +0100, Uros Bizjak wrote: > > 2020-02-06 Uroš Bizjak > > > > * config/i386/i386.md (*pushtf): Emit "#" instead of > > calling gcc_unreachable

[committed] testsuite: Unify gcc.target/i386/memcpy scan strings.

2020-02-06 Thread Uros Bizjak
After -fno-common became the default, we can unify various scan strings between 64bit and 32bit targets. Tested on x86_64-linux-gnu {,-m32}. 2020-02-06 Uroš Bizjak * gcc.target/i386/memcpy-strategy-1.c (dg-final): Unify scan-assembler strings for all targets. * gcc.target/i386/mem

Re: [PATCH] i386: Fix splitters that call extract_insn_cached [PR93611]

2020-02-07 Thread Uros Bizjak
On Fri, Feb 7, 2020 at 8:58 AM Jakub Jelinek wrote: > > Hi! > > The following testcase ICEs. The generated split_insns starts > with recog_data.insn = NULL and then tries to put various operands into > recog_data.operand array and checks various splitter conditions. > The problem is that some ato

Re: [PATCH] i386: Better patch to improve avx* vector concatenation [PR93594]

2020-02-07 Thread Uros Bizjak
On Fri, Feb 7, 2020 at 9:05 AM Jakub Jelinek wrote: > > Hi! > > After thinking some more on this, we can do better; rather than having to > add a new prereload splitter pattern to catch all other cases where it might > be beneficial to fold first part of an UNSPEC_CAST back to the unspec > operand

Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Uros Bizjak
On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek wrote: > > On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote: > > It does not, I just checked with the master branch of binutils. > ... > > I did a -c test build with an older toolchain, it fails to compile > > (invalid register for .seh_savexmm) wh

Re: [PATCH] Improve splitX passes management

2020-02-08 Thread Uros Bizjak
On Fri, Feb 7, 2020 at 5:41 PM Segher Boessenkool wrote: > > On Thu, Feb 06, 2020 at 12:13:35PM +0100, Uros Bizjak wrote: > > The names of split_before_sched2 ("split4") and split_before_regstack > > ("split3") do not reflect their insertion poin

Re: [PATCH] i386: Make xmm16-xmm31 call used even in ms ABI

2020-02-08 Thread Uros Bizjak
On Sat, Feb 8, 2020 at 11:52 AM Jakub Jelinek wrote: > > On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote: > > I think that the patch should also be backported to gcc-9 branch. The > > change is backward compatible, since the new code will save and > > restore

[committed] testsuite: Fix target selector for pr91333.c

2020-02-09 Thread Uros Bizjak
Fix target selector for pr91333.c * gcc.target/i386/pr91333.c (dg-do): Fix target selector. Tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/testsuite/gcc.target/i386/pr91333.c b/gcc/testsuite/gcc.target/i386/pr91333.c index 269491202ae..2bdff871024 100644 --- a/gcc/testsuite/gcc.

Re: [PATCH] i386: Properly pop restore token in signal frame

2020-02-09 Thread Uros Bizjak
On Sat, Feb 8, 2020 at 2:43 PM H.J. Lu wrote: > > Linux CET kernel places a restore token on shadow stack for signal > handler to enhance security. The restore token is 8 byte and aligned > to 8 bytes. It is usually transparent to user programs since kernel > will pop the restore token when sign

Re: [PATCH] i386: Skip ENDBR32 at nested function entry

2020-02-10 Thread Uros Bizjak
On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu wrote: > > Since nested function isn't only called directly, there is ENDBR32 at > function entry and we need to skip it for direct jump in trampoline. Hm, I'm afraid I don't understand this comment. Can you perhaps rephrase it? Uros. > Tested on Linux/x8

Re: [PATCH] i386: Skip ENDBR32 at nested function entry

2020-02-10 Thread Uros Bizjak
On Mon, Feb 10, 2020 at 8:53 PM H.J. Lu wrote: > > On Mon, Feb 10, 2020 at 11:40 AM Uros Bizjak wrote: > > > > On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu wrote: > > > > > > Since nested function isn't only called directly, there is ENDBR32 at > > >

Re: [PATCH] i386: Fix -mavx -mno-mavx2 ICE with VEC_COND_EXPR [PR93637]

2020-02-10 Thread Uros Bizjak
On Mon, Feb 10, 2020 at 3:33 PM Jakub Jelinek wrote: > > Hi! > > As mentioned in the PR, for -mavx -mno-avx2 the backend does support > vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors > aren't much supported in that case, it is performed using > vandps/vandnps/vorps). The

  1   2   3   4   5   6   7   8   9   10   >