On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle wrote:
>
>
> This patch avoids long lea instructions for performing x<<2 and x<<3
> by splitting them into shorter sal and move (or xchg instructions).
> Because this increases the number of instructions, but reduces the
> total size, its suitable for -O
On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle wrote:
>
> Doh! ENOPATCH.
>
> > -Original Message-
> > From: Roger Sayle
> > Sent: 05 October 2023 12:44
> > To: 'gcc-patches@gcc.gnu.org'
> > Cc: 'Uros Bizjak'
> > Subject: [X8
The stringop strategy selection algorithm falls back to a libcall strategy
when it exhausts its pool of available strategies. The memory area copy
function (memcpy) is not available from the system library for non-default
address spaces, so the compiler emits the most trivial byte-at-a-time
copy l
On Fri, Oct 6, 2023 at 3:59 PM Roger Sayle wrote:
>
>
> Grr! I've done it again. ENOPATCH.
>
> > -Original Message-
> > From: Roger Sayle
> > Sent: 06 October 2023 14:58
> > To: 'gcc-patches@gcc.gnu.org'
> > Cc: 'Uros Bizja
On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote:
>
> When using -mcmodel=medium, large data objects larger than the
> -mlarge-data-threshold threshold are placed into large data sections
> (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place
> .l* sections into separate outpu
On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote:
>
> On 2023-10-16, Uros Bizjak wrote:
> >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote:
> >>
> >> When using -mcmodel=medium, large data objects larger than the
> >> -mlarge-data-threshold thresh
On Mon, Oct 16, 2023 at 9:58 PM Fangrui Song wrote:
>
> On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak wrote:
> >
> > On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote:
> > >
> > > On 2023-10-16, Uros Bizjak wrote:
> > > >On Tue, Aug 1, 2023 at 9:5
On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle wrote:
>
>
> This patch is the backend piece of a solution to PRs 101955 and 106245,
> that adds a define_insn_and_split to the i386 backend, to perform sign
> extension of a single (least significant) bit using AND $1 then NEG.
>
> Previously, (x<<31)>>
On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle wrote:
>
>
> Hi Uros,
> Thanks for the speedy review.
>
> > From: Uros Bizjak
> > Sent: 17 October 2023 17:38
> >
> > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle
> > wrote:
> > >
> > >
&g
On Mon, Jul 2, 2018 at 10:14 AM, Eric Botcazou wrote:
> Ping for https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01228.html
>
> Thanks in advance.
LGTM, but please note that the patch was already approved by Jeff on
22th of June [1].
[1] https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01466.html
Ur
On Tue, Jul 3, 2018 at 5:32 PM, H.J. Lu wrote:
> On Fri, Jun 8, 2018 at 3:27 AM, H.J. Lu wrote:
>> On x86, swapcontext may return via indirect branch when shadow stack
>> is enabled. To support code instrumentation of control-flow transfers
>> with -fcf-protection, add indirect_return function a
Hello!
Attached patch implements unsigned HImode and QImode vector average
instructions. This is all x86 has to offer...
2018-07-03 Uros Bizjak
PR target/85694
* config/i386/sse.md (uavg3_ceil): New expander.
(_uavg3): Simplify expander.
testsuite/ChangeLog:
2018-07-03 Uros
On Tue, Jul 3, 2018 at 7:38 PM, Uros Bizjak wrote:
> Hello!
>
> Attached patch implements unsigned HImode and QImode vector average
> instructions. This is all x86 has to offer...
FYI, I have tried the effectiveness of patched gcc with SPEC CPU2006
464.h264 (actually, jm19.0.zip so
On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu wrote:
> r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> which are enabled by PROCESSOR_HASWELL. As the result, -mtune=skylake
> generates slower codes on Skylake than before. The same also applies
> to Cannonlake and Icelak tun
On Fri, Jul 13, 2018 at 3:12 PM, H.J. Lu wrote:
> On Fri, Jul 13, 2018 at 08:53:02AM +0200, Uros Bizjak wrote:
> > On Thu, Jul 12, 2018 at 9:57 PM, H.J. Lu wrote:
> >
> > > r259399, which added PROCESSOR_SKYLAKE, disabled many x86 optimizations
> > > which are
le LT does.
The solution is to avoid the above expansion for compares that would
change their trappines and emit jumps around
2018-07-13 Uros Bizjak
PR target/86511
* expmed.c (emit_store_flag): Do not emit setcc followed by a
conditional move when trapping comparison was split
On Thu, Jul 19, 2018 at 7:00 AM, Koval, Julia wrote:
> Hi,
> This patch improves memset/memcpy strategy for Skylake. Ok for trunk?
Is this patch based on some benchmark data?
Uros.
> * gcc/config/i386/x86-tune-costs.h (skylake_memcpy,
> skylake_memcpy): Replace rep_prefix with u
On Thu, Jul 19, 2018 at 8:20 AM, Koval, Julia wrote:
> Yes, it gives small improvements(~2%) on 557.xz on O2 and on
> 548.exchange(~2.5%) and 500.perlbench(~1%) on Ofast in rate mode.
>
>> -Original Message-
>> From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> Se
On Fri, Jul 27, 2018 at 11:37 AM, Richard Earnshaw
wrote:
>
> This patch adds a speculation barrier for x86, based on my
> understanding of the required mitigation for that CPU, which is to use
> an lfence instruction.
>
> This patch needs some review by an x86 expert and if adjustments are
> need
ingop-overflow=]
In function 'void test_if(std::vector&, int) [with T = long int]':
cc1plus: warning: 'void* __builtin_memset(void*, int, long unsigned
int)' specified size 18446744073709551600 exceeds maximum object size
9223372036854775807 [-Wstringop-overflow=]
2018-0
On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle wrote:
>
>
> This patch contains clean-ups of the widening multiplication patterns in
> i386.md, and provides variants of the existing highpart multiplication
> peephole2 transformations (that tidy up register allocation after
> reload), and thereby fixe
On Fri, Oct 20, 2023 at 8:54 AM liuhongt wrote:
>
> When I'm working on enable more 32/64-bit vectorization for _Float16,
> I notice there's 1 redundant define_expand, the patch removed the expander.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
On Tue, Oct 24, 2023 at 12:08 PM Richard Sandiford
wrote:
>
> For the V2HI -> V2SI zero extension in:
>
> typedef unsigned short v2hi __attribute__((vector_size(4)));
> typedef unsigned int v2si __attribute__((vector_size(8)));
> v2si f (v2hi x) { return (v2si) {x[0], x[1]}; }
>
> ix86_expan
On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle wrote:
>
>
> The eagle-eyed may have spotted that my recent testcases for DImode shifts
> on x86_64 included -mno-stv in the dg-options. This is because the
> Scalar-To-Vector (STV) pass currently transforms these shifts to use
> SSE vector operations,
i386: Narrow test instructions with immediate operands [PR111698]
Narrow test instructions with immediate operand that test memory location
for zero. E.g. testl $0x00aa, mem can be converted to testb $0xaa, mem+2.
Reject targets where reading (possibly unaligned) part of memory location
after
sses before reload check both predicates and
> constraints.
>
> My original patch fixes PR 110511, using the same peephole2 idiom as already
> used elsewhere in i386.md. Ok for mainline?
Thanks for the explanation. The patch is OK.
> > -Original Message-
> > From: U
On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote:
>
> Hi all:
> This patch enables -march/-mtune=yongfeng, costs and tunings are set
> according to the characteristics of the processor. We add a new md file to
> describe yongfeng processor.
>
> Bootstrapped /regtested X86_64.
>
> Ok for
On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote:
>
> On 2023/10/26 17:34, Uros Bizjak wrote:
> > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote:
> >>
> >> Hi all:
> >> This patch enables -march/-mtune=yongfeng, costs and tunings are set
> >> a
On Mon, Oct 30, 2023 at 12:53 PM FX Coudert wrote:
>
> Hi,
>
> The newly introduced test gcc.target/i386/pr111698.c currently fails on
> Darwin, where the default arch is core2.
> Andrew suggested in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112287 to
> pass a recent value to -march, and I ca
On Mon, Oct 30, 2023 at 10:08 AM Mayshao-oc wrote:
>
> >On Fri, Oct 27, 2023 at 12:20 PM mayshao wrote:
> >>
> >> On 2023/10/26 17:34, Uros Bizjak wrote:
> >> > On Wed, Oct 25, 2023 at 8:43 AM mayshao wrote:
> >> >>
> >> >>
Improve stack protector patterns and peephole2s to substitute stack
protector scratch register clear with unrelated subsequent register
initialization in several ways:
a. Explicitly generate scratch register as named pseudo. This allows
optimizers to eventually reuse the zero value in the registe
On Mon, Oct 30, 2023 at 6:27 PM Roger Sayle wrote:
>
>
> This patch is a follow-up to my previous PR target/110551 patch, this
> time to address the additional move after mulx, seen on TARGET_BMI2
> architectures (such as -march=haswell). The complication here is
> that the flexible multiple-set
On Wed, Nov 1, 2023 at 1:58 PM Roger Sayle wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak
> > Sent: 01 November 2023 10:05
> > Subject: Re: [x86_64 PATCH] PR target/110551: Tweak mulx register allocation
> > using peephole2.
> >
> > On Mon
Move stack protector patterns above mov $0,%reg -> xor %reg,%reg
so the latter won't interfere with stack protector peephole2s.
gcc/ChangeLog:
* config/i386/i386.md: Move stack protector patterns
above mov $0,%reg -> xor %reg,%reg peephole2 pattern.
Bootstrapped and regression tested on
The patch generalizes address register class handling to allow multiple
address register classes. For APX EGPR targets, some instructions can't be
encoded with REX2 prefix, so it is necessary to limit address register
class to avoid REX2 registers. The same situation happens for instructions
with
023年11月3日周五 20:50写道:
> >
> > On Fri, Nov 3, 2023 at 6:34 PM Uros Bizjak wrote:
> > >
> > > The patch generalizes address register class handling to allow multiple
> > > address register classes. For APX EGPR targets, some instructions can't
> >
The patch generalizes address register class handling to allow multiple
register classes. For APX EGPR targets, some instructions do not support
GPR32 registers, so it is necessary to limit address register set to
avoid them. The same situation happens for instructions with high registers,
where
Also rename LEGACY_REGS to LEGACY_GENERAL_REGS.
gcc/ChangeLog:
* config/i386/i386.h (enum reg_class): Add LEGACY_INDEX_REGS.
Rename LEGACY_REGS to LEGACY_GENERAL_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
* config/i386/constraints.md ("R"): Update for rename.
Use "addr" attribute with "gpr8" value to limit address register class
to non-REX registers in instructions with high registers, where REX
registers can not be used in the address.
gcc/ChangeLog:
* config/i386/constraints.md (Bc): Remove constraint.
(Bn): Rewrite to use x86_extended_reg_m
On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu wrote:
>
> Update patch:
> Add m constraint to define_insn (sse_1_round *sse_1_round when under sse4 but not avx512f.
It looks to me that the original insn is incompletely defined. It
should use nonimmediate_operand, "m" constraint and pointer
size mod
On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu wrote:
>
> On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu wrote:
> >
> > On Fri, Oct 25, 2019 at 2:39 AM Uros Bizjak wrote:
> > >
> > > On Wed, Oct 23, 2019 at 7:48 AM Hongtao Liu wrote:
> > > >
&
On Fri, Oct 25, 2019 at 9:13 PM Hongtao Liu wrote:
>
> Update patch.
>
> On Fri, Oct 25, 2019 at 4:01 PM Uros Bizjak wrote:
> >
> > On Fri, Oct 25, 2019 at 7:55 AM Hongtao Liu wrote:
> > >
> > > On Fri, Oct 25, 2019 at 1:23 PM Hongtao Liu wrote:
>
On Fri, Oct 25, 2019 at 9:20 PM Hongtao Liu wrote:
>
> > Looking into sse.md, there is a lot of inconsistencies in existing *vm
> > patterns w.r.t. operand constraints. Unfortunately, these were copied
> > into proposed patterns. One example is existing
> >
> > (define_insn "_vmsqrt2"
> > [(set
On Sat, Oct 26, 2019 at 3:27 PM Hongtao Liu wrote:
>
> > BTW: Please also note that there is no need to use or operand
> > mode override in scalar insn templates for intel asm dialect when
> > operand already has a scalar mode.
> https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01868.html
>
> This p
These are not needed for scalar operands.
2019-10-28 Uroš Bizjak
* config/i386/sse.md (sse_cvtss2si_2):
Remove %k operand modifier.
(*vec_extractv2df_1_sse): Remove %q operand modifier.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Committed to mainline.
Uros.
2019-10-28 Uroš Bizjak
PR target/92225
* config/i386/sse.md (REDUC_SSE_SMINMAX_MODE): Use TARGET_SSE4_2
condition for V2DImode.
testsuite/ChangeLog:
2019-10-28 Uroš Bizjak
PR target/92225
* gcc.target/i386/pr92225.c: New test.
Bootstrapped and regression tested on x86
On Mon, Oct 28, 2019 at 11:02 PM Jakub Jelinek wrote:
>
> Hi!
>
> On Sat, Oct 26, 2019 at 09:27:12PM +0800, Hongtao Liu wrote:
> > > BTW: Please also note that there is no need to use or operand
> > > mode override in scalar insn templates for intel asm dialect when
> > > operand already has a sc
On Tue, Oct 29, 2019 at 8:57 AM Jakub Jelinek wrote:
>
> Hi!
>
> While working on the OpenMP isa patch, I've noticed most of the x86 ISA
> options imply in the help text that it enables MMX, when they do not.
> We now have the mmx in sse2, so to some extent most of the built-in functions
> are ena
On Wed, Nov 13, 2019 at 4:25 PM Martin Liška wrote:
>
> Hi.
>
> The patch adds a missing feature for PTA_ICELAKE_CLIENT and
> inherited CPUs. One can see that:
> https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> R
2020-01-14 Uroš Bizjak
PR target/93254
* config/i386/i386.md (*movsf_internal): Require SSE2 ISA for
alternatives 9 and 10.
Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
Committed to mainline, will be backported to gcc-9 branch.
Uros.
diff --git a/gcc/
Also, we can remove existing SSE2 ISA requirements for alternatives 14 and 15.
Remove invalid SSE2 ISA requirements in *movsf_internal.
2020-01-15 Uroš Bizjak
* config/i386/i386.md (*movsf_internal): Do not require
SSE2 ISA for alternatives 14 and 15.
Patch was bootstrapped and regre
On Sat, Jan 18, 2020 at 1:30 PM Jakub Jelinek wrote:
>
> Hi!
>
> In AT&T syntax leading $ is special, so if we have identifiers that start
> with dollar, we usually fail to assemble it (or assemble incorrectly).
> As mentioned in the PR, what works is wrapping the identifiers inside of
> parens, l
On Sun, Jan 19, 2020 at 2:58 PM H.J. Lu wrote:
>
> To add x32 support to -mtls-dialect=gnu2, we need to replace DI with
> P in GNU2 TLS patterns. Since thread pointer is in ptr_mode, PLUS in
> GNU2 TLS address computation must be done in ptr_mode to support
> -maddress-mode=long. Also drop the "
On Sun, Jan 19, 2020 at 6:43 PM Uros Bizjak wrote:
>
> On Sun, Jan 19, 2020 at 2:58 PM H.J. Lu wrote:
> >
> > To add x32 support to -mtls-dialect=gnu2, we need to replace DI with
> > P in GNU2 TLS patterns. Since thread pointer is in ptr_mode, PLUS in
> > GNU2 T
On Sun, Jan 19, 2020 at 7:07 PM H.J. Lu wrote:
>
> On Sun, Jan 19, 2020 at 9:48 AM Uros Bizjak wrote:
> >
> > On Sun, Jan 19, 2020 at 6:43 PM Uros Bizjak wrote:
> > >
> > > On Sun, Jan 19, 2020 at 2:58 PM H.J. Lu wrote:
> > > >
> > &g
On Sun, Jan 19, 2020 at 9:07 PM H.J. Lu wrote:
>
> On Sun, Jan 19, 2020 at 12:01 PM Uros Bizjak wrote:
> >
> > On Sun, Jan 19, 2020 at 7:07 PM H.J. Lu wrote:
> > >
> > > On Sun, Jan 19, 2020 at 9:48 AM Uros Bizjak wrote:
> > > >
> >
On Sun, Jan 19, 2020 at 10:00 PM H.J. Lu wrote:
>
> On Sun, Jan 19, 2020 at 12:16 PM Uros Bizjak wrote:
> >
> > On Sun, Jan 19, 2020 at 9:07 PM H.J. Lu wrote:
> > >
> > > On Sun, Jan 19, 2020 at 12:01 PM Uros Bizjak wrote:
> > > >
> &
On Mon, Jan 20, 2020 at 10:46 PM H.J. Lu wrote:
> > > OK. Let's go with this version, but please investigate if we need to
> > > calculate TLS address in ptr_mode instead of Pmode. Due to quite some
> > > zero-extension from ptr_mode to Pmode hacks in this area, it looks to
> > > me that the whol
On Tue, Jan 21, 2020 at 9:47 AM Uros Bizjak wrote:
>
> On Mon, Jan 20, 2020 at 10:46 PM H.J. Lu wrote:
>
> > > > OK. Let's go with this version, but please investigate if we need to
> > > > calculate TLS address in ptr_mode instead of Pmode. Due to quite som
On Tue, Jan 21, 2020 at 8:16 PM H.J. Lu wrote:
>
> On Tue, Jan 21, 2020 at 2:29 AM Uros Bizjak wrote:
> >
> > On Tue, Jan 21, 2020 at 9:47 AM Uros Bizjak wrote:
> > >
> > > On Mon, Jan 20, 2020 at 10:46 PM H.J. Lu wrote:
> > >
> > > > &
On Thu, Jan 23, 2020 at 8:56 AM Jakub Jelinek wrote:
>
> Hi!
>
> The bzhi patterns are quite complicated because they need to accurately
> describe the behavior of the instruction for all input values.
> The following patterns are simple and make bzhi recognizable even for
> cases where not all in
On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek wrote:
>
> On Thu, Jan 23, 2020 at 09:14:42AM +, Richard Sandiford wrote:
> > > The other patch is something suggested by Richard S., avoid using OImode
> > > for this and instead use a partial int mode that is smaller. This is
> > > still
> > >
On Thu, Jan 23, 2020 at 2:17 PM Jakub Jelinek wrote:
>
> On Thu, Jan 23, 2020 at 10:38:31AM +0100, Uros Bizjak wrote:
> > On Thu, Jan 23, 2020 at 10:33 AM Jakub Jelinek wrote:
> > >
> > > On Thu, Jan 23, 2020 at 09:14:42AM +, Richard Sandiford wrote:
> >
On Thu, Jan 23, 2020 at 10:48 PM Jakub Jelinek wrote:
>
> Hi!
>
> In Agner Fog's tables, vpermilp[sd] with immediates seem to be
> much faster than vpermpd with immediate, for a good reason,
> the former only permute something within the lanes and don't do anything
> intra-lane, while vpermpd can.
On Sun, Jan 26, 2020 at 12:55 AM Jakub Jelinek wrote:
>
> Hi!
>
> Apparently my recent patch which moved the *avx_vperm_broadcast* and
> *vpermil* patterns before vpermpd broke the following testcase, the
> define_insn_and_split matched always but the splitter condition only split
> it if not -mav
On Sun, Jan 26, 2020 at 12:59 AM Jakub Jelinek wrote:
>
> Hi!
>
> In the *{add,sub}v4_doubleword patterns, we don't really want to see a
> VOIDmode last operand, because it then means invalid RTL
> (sign_extend:{TI,POI} (const_int ...)) or so, and therefore something we
> don't really handle in th
On Mon, Jan 27, 2020 at 7:23 PM H.J. Lu wrote:
>
> movaps/movups is one byte shorter than movdaq/movdqu. But it isn't the
> case for AVX nor AVX512. We should disable TARGET_SSE_TYPELESS_STORES
> for TARGET_AVX.
>
> gcc/
>
> PR target/91461
> * config/i386/i386.h (TARGET_SSE_TYPE
On Mon, Jan 27, 2020 at 11:17 PM H.J. Lu wrote:
>
> On Mon, Jan 27, 2020 at 12:26 PM Uros Bizjak wrote:
> >
> > On Mon, Jan 27, 2020 at 7:23 PM H.J. Lu wrote:
> > >
> > > movaps/movups is one byte shorter than movdaq/movdqu. But it isn't the
> >
On Mon, Jan 27, 2020 at 3:13 PM H.J. Lu wrote:
>
> There are
>
> static void
> parse_mtune_ctrl_str (bool dump)
> {
> if (!ix86_tune_ctrl_string)
> return;
>
> parse_mtune_ctrl_str is only called from set_ix86_tune_features, which
> is only called from ix86_function_specific_restore and
> ix
On Tue, Jan 28, 2020 at 3:32 PM H.J. Lu wrote:
>
> On Mon, Jan 27, 2020 at 11:04 PM Uros Bizjak wrote:
> >
> > On Mon, Jan 27, 2020 at 11:17 PM H.J. Lu wrote:
> > >
> > > On Mon, Jan 27, 2020 at 12:26 PM Uros Bizjak wrote:
> > > >
> &
On Tue, Jan 28, 2020 at 4:34 PM H.J. Lu wrote:
> > You could move
> >
> > (match_test "TARGET_AVX")
> > (const_string "TI")
> >
> > up to bypass the cases below.
> >
>
> I don't think we can do that. There are 2 cases where we prefer
> movaps/movups:
>
> /* Use packed single precision instru
On Tue, Jan 28, 2020 at 6:51 PM H.J. Lu wrote:
>
> On Tue, Jan 28, 2020 at 9:12 AM Uros Bizjak wrote:
> >
> > On Tue, Jan 28, 2020 at 4:34 PM H.J. Lu wrote:
> >
> > > > You could move
> > > >
> > > > (match_test "TARGET_AVX"
On Thu, Jan 30, 2020 at 1:18 AM Jakub Jelinek wrote:
>
> Hi!
>
> Like any other instruction with 32-bit GPR destination operand in 64-bit
> mode, popcntl also clears the upper 32 bits of the register (and other bits
> too, it can return only 0 to 32 inclusive).
>
> During combine, the zero or sign
On Thu, Jan 30, 2020 at 1:23 AM Jakub Jelinek wrote:
>
> Hi!
>
> Some time ago, patterns were added to optimize move mask followed by zero
> extension from 32 bits to 64 bit. As the testcase shows, the intrinsics
> actually return int, not unsigned int, so it will happen quite often that
> one ac
The reason for TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL on AMD target is
only insn size, as advised in e.g. Software Optimization Guide for the
AMD Family 15h Processors [1], section 7.1.2, where it is said:
--quote--
7.1.2 Reduce Instruction SizeOptimization
Reduce the size of instructions when pos
On Tue, Feb 4, 2020 at 10:39 AM Jakub Jelinek wrote:
>
> Hi!
>
> As mentioned in the PR, the CLOBBERs in vzeroupper are added there even for
> registers that aren't ever live in the function before and break the
> prologue/epilogue expansion with ms ABI (normal ABIs are fine, as they
> consider al
On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek wrote:
>
> On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> > If it works OK, I'd rather see this functionality implemented as an
> > epilogue_completed guarded splitter. In the .md files, there are
> > al
On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak wrote:
>
> On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek wrote:
> >
> > On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote:
> > > If it works OK, I'd rather see this functionality implemented as an
> > >
On Tue, Feb 4, 2020 at 1:06 PM Richard Sandiford
wrote:
>
> Uros Bizjak writes:
> > On Tue, Feb 4, 2020 at 12:13 PM Uros Bizjak wrote:
> >>
> >> On Tue, Feb 4, 2020 at 12:05 PM Jakub Jelinek wrote:
> >> >
> >> > On Tue, Feb 04, 2020 at 11:
On Tue, Feb 4, 2020 at 1:30 PM Jakub Jelinek wrote:
>
> On Tue, Feb 04, 2020 at 12:24:10PM +0100, Uros Bizjak wrote:
> > > A && is missing in the split condition to inherit TARGET_AVX.
> >
> > Also, you don't need to emit "#" in output templa
On Tue, Feb 4, 2020 at 2:13 PM Jakub Jelinek wrote:
>
> On Tue, Feb 04, 2020 at 01:38:51PM +0100, Uros Bizjak wrote:
> > As Richard advised, let's put this safety stuff back. Usually, in
> > i386.md, these kind of splitters are implemented as two patterns, one
> > (
On Wed, Feb 5, 2020 at 11:05 AM Jakub Jelinek wrote:
>
> On Tue, Feb 04, 2020 at 02:15:04PM +0100, Uros Bizjak wrote:
> > On Tue, Feb 4, 2020 at 2:13 PM Jakub Jelinek wrote:
> > >
> > > On Tue, Feb 04, 2020 at 01:38:51PM +0100, Uros Bizjak wrote:
> > > >
On Wed, Feb 5, 2020 at 12:03 PM Jakub Jelinek wrote:
>
> On Wed, Feb 05, 2020 at 11:46:51AM +0100, Uros Bizjak wrote:
> > I think we should just enable split4 also for -O0. This would also
> > allow us to remove the "optimize > 0" check above and allow us to
>
On Wed, Feb 5, 2020 at 6:59 PM H.J. Lu wrote:
>
> MS_ABI requires passing aggregates with only float/double in integer
> registers. Checked gcc outputs against Clang and fixed:
>
> FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=54
> -Wno-unused-variable -Wno-unused-parameter
>
Simplify post epilogue_completed splitters.
Now that we have post epilogue_completed split point for all
optimization levels, we can simplify post epilogue_completed splitters
considerably. If corresponding define_peephole2 pattern fails to
allocate a temporary register (or if peephole2 pass isn't
On Thu, Feb 6, 2020 at 9:34 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase shows that for _mm256_set*_m128i and similar
> intrinsics, we sometimes generate bad code. All 4 routines are expressing
> the same thing, a 128-bit vector zero padded to 256-bit vector, but only the
> 3rd one
The names of split_before_sched2 ("split4") and split_before_regstack
("split3") do not reflect their insertion point in the sequence of passes,
where split_before_regstack follows split_before_sched2. Reorder the code
and rename the passes to reflect the reality.
split_before_regstack pass does n
Implement standard approach by emitting "#" for insns that have to be split.
Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
2020-02-06 Uroš Bizjak
* config/i386/i386.md (*pushtf): Emit "#" instead of
calling gcc_unreachable in insn output.
(*pushxf): Ditto.
(*
On Thu, Feb 6, 2020 at 6:07 PM Jakub Jelinek wrote:
>
> On Thu, Feb 06, 2020 at 05:36:43PM +0100, Uros Bizjak wrote:
> > 2020-02-06 Uroš Bizjak
> >
> > * config/i386/i386.md (*pushtf): Emit "#" instead of
> > calling gcc_unreachable
After -fno-common became the default, we can unify various
scan strings between 64bit and 32bit targets.
Tested on x86_64-linux-gnu {,-m32}.
2020-02-06 Uroš Bizjak
* gcc.target/i386/memcpy-strategy-1.c (dg-final):
Unify scan-assembler strings for all targets.
* gcc.target/i386/mem
On Fri, Feb 7, 2020 at 8:58 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following testcase ICEs. The generated split_insns starts
> with recog_data.insn = NULL and then tries to put various operands into
> recog_data.operand array and checks various splitter conditions.
> The problem is that some ato
On Fri, Feb 7, 2020 at 9:05 AM Jakub Jelinek wrote:
>
> Hi!
>
> After thinking some more on this, we can do better; rather than having to
> add a new prereload splitter pattern to catch all other cases where it might
> be beneficial to fold first part of an UNSPEC_CAST back to the unspec
> operand
On Sat, Feb 8, 2020 at 11:05 AM Jakub Jelinek wrote:
>
> On Sat, Feb 08, 2020 at 08:24:38AM +, JonY wrote:
> > It does not, I just checked with the master branch of binutils.
> ...
> > I did a -c test build with an older toolchain, it fails to compile
> > (invalid register for .seh_savexmm) wh
On Fri, Feb 7, 2020 at 5:41 PM Segher Boessenkool
wrote:
>
> On Thu, Feb 06, 2020 at 12:13:35PM +0100, Uros Bizjak wrote:
> > The names of split_before_sched2 ("split4") and split_before_regstack
> > ("split3") do not reflect their insertion poin
On Sat, Feb 8, 2020 at 11:52 AM Jakub Jelinek wrote:
>
> On Sat, Feb 08, 2020 at 11:32:40AM +0100, Uros Bizjak wrote:
> > I think that the patch should also be backported to gcc-9 branch. The
> > change is backward compatible, since the new code will save and
> > restore
Fix target selector for pr91333.c
* gcc.target/i386/pr91333.c (dg-do): Fix target selector.
Tested on x86_64-linux-gnu {,-m32}.
Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr91333.c
b/gcc/testsuite/gcc.target/i386/pr91333.c
index 269491202ae..2bdff871024 100644
--- a/gcc/testsuite/gcc.
On Sat, Feb 8, 2020 at 2:43 PM H.J. Lu wrote:
>
> Linux CET kernel places a restore token on shadow stack for signal
> handler to enhance security. The restore token is 8 byte and aligned
> to 8 bytes. It is usually transparent to user programs since kernel
> will pop the restore token when sign
On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu wrote:
>
> Since nested function isn't only called directly, there is ENDBR32 at
> function entry and we need to skip it for direct jump in trampoline.
Hm, I'm afraid I don't understand this comment. Can you perhaps rephrase it?
Uros.
> Tested on Linux/x8
On Mon, Feb 10, 2020 at 8:53 PM H.J. Lu wrote:
>
> On Mon, Feb 10, 2020 at 11:40 AM Uros Bizjak wrote:
> >
> > On Mon, Feb 10, 2020 at 8:22 PM H.J. Lu wrote:
> > >
> > > Since nested function isn't only called directly, there is ENDBR32 at
> > >
On Mon, Feb 10, 2020 at 3:33 PM Jakub Jelinek wrote:
>
> Hi!
>
> As mentioned in the PR, for -mavx -mno-avx2 the backend does support
> vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors
> aren't much supported in that case, it is performed using
> vandps/vandnps/vorps). The
1 - 100 of 6424 matches
Mail list logo