Update AVX512 tests to test the newly added FMSUB, FNMADD and FNMSUB
builtin functions.
PR target/72782
* gcc.target/i386/avx-1.c (__builtin_ia32_vfmsubpd512_mask): New.
(__builtin_ia32_vfmsubpd512_maskz): Likewise.
(__builtin_ia32_vfmsubps512_mask): Likewise.
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FNMSUB operations. In order to
support AVX512 memory broadcast for FNMSUB, FNMSUB builtin functions are
also added, instead of passing the negated value to FMA builtin functions.
gcc/
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FMSUB operations. In order to
support AVX512 memory broadcast for FMSUB, FMSUB builtin functions are
also added, instead of passing the negated value to FMA builtin functions.
gcc/
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FNMADD operations. In order to
support AVX512 memory broadcast for FNMADD, FNMADD builtin functions are
also added, instead of passing the negated value to FMA builtin functions.
gcc/
On Thu, Oct 18, 2018 at 05:55:35PM -0600, Jeff Law wrote:
> On 10/18/18 2:06 PM, Stafford Horne wrote:
> > On Thu, Oct 18, 2018 at 03:22:56PM +0200, Sebastian Huber wrote:
> >> Hello,
> >>
> >> is there a chance to get the or1k support integrated before the GCC 9 stage
> >> 3?
> >
> > Hello,
> >
> > Is it because you generate something manually and want to limit that
> > work,
>
> I think that this is one of the reasons.
> and as mentioned in my writeup, the targeted users of this new functionality
> is for live-patching users who generate
> patches by hand.
Ok just means they need be
Hi!
COMPOUND_LITERAL_EXPRs are removed from static initializers in
record_references_in_initializer, unfortunately decode_addr_const can be
called from const_hash_1 from output_constant_def before that happens
and as record_references_in_initializer needs a varpool node, we can't call
it during th
Hi!
We ICE on the following invalid testcase, because we failed to diagnose if
ordered construct without depend clause binds to loop with ordered(n) clause
(i.e. doacross loop).
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk so far.
Not happy about the si
On 10/19/18 4:16 PM, Peter Bergner wrote:
> Thoughts? I'll note that this does not fix the S390 bugs, since those seem
> to be due to problems with early clobber operands and "matching" constraint
> operands. I'm still working on that and hope to have something soon.
[snip]
> * lra-constrai
While working on
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00228.html I've
accumulated a few easy patches.
The first one renames the functions in question to hopefully encourage
proper future usage. The other ones use the unnumbered version of the
clone name function where I've verified the nu
Reduced test
! { dg-do compile }
MODULE TN4
IMPLICIT NONE
PRIVATE
INTEGER,PARAMETER::SH4=KIND('a')
TYPE,PUBLIC::TOP
CHARACTER(:,KIND=SH4),ALLOCATABLE::ROR
CHARACTER(:,KIND=SH4),ALLOCATABLE::VI8
CONTAINS
PROCEDURE,NON_OVERRIDABLE::SB=>TPX
END TYPE TOP
CONTAINS
SUBROUTINE T
Tests for the implicit allocator rebinding extension will fail if the
extension is disabled, so skip them.
* testsuite/23_containers/array/requirements/explicit_instantiation/
3.cc: Skip test when compiled with a -std=c++NN strict mode.
* testsuite/23_containers/deque/requ
When __STRICT_ANSI__ is defined the incorrect allocators used in these
tests also trigger and additional static assertion. Prune those extra
errors so that the tests don't fail when built with strict dialects.
* testsuite/23_containers/deque/48101_neg.cc: Prune additional errors
p
These tests include uses of the extension to allow allocators with the
wrong value_type in containers. Skip those parts of the tests when
__STRICT_ANIS__ is defined.
* testsuite/23_containers/forward_list/requirements/
explicit_instantiation/5.cc [__STRICT_ANSI__]: Don't test non-
As a GNU extension we allow containers to be instantiated with
allocators that use a different value type from the container, and
automatically rebind the allocator to the correct type. This extension
is disabled in strict modes (when __STRICT_ANSI__ is defined, i.e.
-std=c++NN dialects). These te
The airy and hypergeometric functions are non-standard extensions and
are only defined for -std=gnu++NN dialects, not -std=c++NN ones.
* ext/special_functions/airy_ai/check_nan.cc: Skip test for
non-standard extension when a strict -std=c++NN dialect is used.
* ext/special
These tests originally existed to check the containers in C++11 mode,
when the default was C++98 mode. Now that the default is C++14 (and we
run most tests for all modes) it serves no purpose to have two copies of
the tests when neither is explicitly using -std=gnu++98 anyway.
* testsuite
Hi Paul,
I get a regression with your patch:
obfuscated_tn4.f90:300:0:
300 | TP6%ROR=TP6%ROR(:PP4-1)
|
internal compiler error: in gfc_trans_deferred_vars, at
fortran/trans-decl.c:4754
I’ll try to reduce the test.
Dominique
Hi,
The x86 intrinsic compatibility headers contain a couple of instances of
undefined behavior where a cast to an aligned type is used when that
alignment is not guaranteed by the expression to be cast from. This
patch fixes that problem by replacing the aligned types with unaligned
versions of
Hi,
For historical reasons, there are different interpretations of whether a
type "__vector " is allowed when is a typedef. For maximum
compatibility between compilers, this patch removes some such cases from
the x86 intrinsic compatibility headers.
Bootstrapped and tested on powerpc64le-linux-
Vlad, Jeff and Segher,
I think I have determined what is happening with the aarch64 test case that
is failing after my r264897 commit. It appears my patch is just exposing
an issue in lra-constraints.c:process_alt_operands() when processing an insn
with early clobber operands. Jeff & Segher, I h
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FP mul operations.
gcc/
PR target/72782
* config/i386/sse.md (*mul3_bcst_1): New.
(*mul3_bcst_2): Likewise.
gcc/testsuite/
PR target/72782
* g
Many AVX512 vector operations can broadcast from a scalar memory source.
This patch enables memory broadcast for FP div operations.
gcc/
PR target/72782
* config/i386/sse.md (*_div3_bcst): New.
gcc/testsuite/
PR target/72782
* gcc.target/i386/avx512f-div-df-zmm-1
On 10/17/18, Martin Sebor wrote:
> On 10/16/2018 02:06 PM, David Malcolm wrote:
>> I've been extending -fopt-info to cover inlining, and I added a %S
>> format code to dump_printf which accepts a symtab_node *.
>>
>> Unfortunately, -Wformat doesn't like the fact that I'm passing in a
>> subclass p
On Thu, 2018-10-18 at 22:25 -0600, Sandra Loosemore wrote:
> On 10/18/2018 03:12 PM, David Malcolm wrote:
>
> > Here's an updated version of the patch, addressing your above
> > comments,
> > and those from Martin and Richard (I hope).
>
> Thanks, this one looks more readable. Some more specific
This patch changes the Go frontend to not export any functions with
special names. This keeps init functions from appearing in the export
data. Checking for special names in general means that we don't need
to check specifically for nested functions or thunks, which have
special names. Bootstrap
On Tue, 16 Oct 2018 at 17:28, Richard Sandiford
wrote:
>
> Iain Buclaw writes:
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 4b7cec82382..0b2daa320c3 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -2496,6 +2525,7 @@ s-tm-texi: build/genhooks$(build_exeext)
> > $
The Linux kernel requires and emulates LL and SC for the R5900 too. The
special --without-llsc default for the R5900 is therefore not applicable
in that case.
Reviewed-by: Maciej W. Rozycki
---
Changes in v2:
- Double spacing instead of single spacing in commit message
---
gcc/config.gcc | 8 +
Thank you for your review, Maciej,
> > The Linux kernel requires and emulates LL and SC for the R5900 too. The
>
> Two spaces after a full stop please (in commit descriptions too).
Fixed in v2, to be posted shortly.
> I hope this helps you get a general maintainer's approval.
Thanks! It looks
Ping.
On Fri, Oct 12, 2018 at 12:32:43PM -0400, Marek Polacek wrote:
> On Fri, Oct 12, 2018 at 02:26:45AM -0400, Jason Merrill wrote:
> > On Thu, Oct 11, 2018 at 8:25 PM Marek Polacek wrote:
> > >
> > > On Thu, Oct 11, 2018 at 11:35:23AM -0400, Jason Merrill wrote:
> > > > > + /* [dcl.fct.s
On Oct 17, 2018, at 2:19 PM, Jeff Law wrote:
>> 2018-10-17 Marek Polacek
>>
>> * g++.dg/*.C: Use target c++17 instead of explicit dg-options.
>> * lib/g++-dg.exp: Don't test C++11 by default. Add C++17 to
>> the list of default stds to test.Given this follows Jason's
>> recomm
IRA and LRA prefer to use CR7 (which is first in REG_ALLOC_ORDER) over
CR0, although the latter often is cheaper ("x" vs. "y" constraints).
We should figure out why this is and fix it; but until that is done,
this patch makes CR0 the first allocated register: it improves the
current code, and it is
Hi,
>> Maybe I am crazy, or the labels here are wrong, but that looks like the
>> error is three times as *big* after the patch. I.e. it worsened instead
>> of improving.
This error is actually 1ULP, so just a rounding error. Can't expect any better
than that!
> with input : = 9.98807907
The following backports limiting match.pd recursion together with a
new testcase, also applied to trunk.
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
2018-10-19 Richard Biener
PR middle-end/87645
Backport from mainline
2018-07-12 Richard Biener
Jakub Jelinek wrote:
> At this point this seems like something that shouldn't be done inline
> anymore, so either we don't do this optimization at all, because the errors
> are far bigger than what is acceptable even for -ffast-math, or we have a
> library function that does the sinh (tanh (x)) an
> Maybe I am crazy, or the labels here are wrong, but that looks like the
> error is three times as *big* after the patch. I.e. it worsened instead
> of improving.
Oh, sorry. I was not clear in my previous message.
The error did not improved with regard to the original formula. What I
meant is wi
On Fri, Oct 19, 2018 at 01:39:01PM +, Wilco Dijkstra wrote:
> >> Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the
> >> relative error
> >> should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2
> >> should be
> >> more accurate when abs(x)>0.5 and still
On Fri, 19 Oct 2018, Richard Biener wrote:
> On Fri, 19 Oct 2018, Richard Sandiford wrote:
>
> > Richard Biener writes:
> > > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> > > wrote:
> > >>Richard Biener writes:
> > >>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
> > Wh
* include/bits/regex_executor.tcc (_Backref_matcher::_M_apply): Use
_GLIBCXX_STD_A to refer to normal mode algorithms.
* testsuite/28_regex/headers/regex/parallel_mode.cc: New test.
* testsuite/28_regex/headers/regex/std_c++0x_neg.cc: Remove empty
whitespace
Hi all,
On Fri, Oct 19, 2018 at 09:21:07AM -0300, Giuliano Augusto Faulin Belinassi
wrote:
> > Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the
> > relative error
> > should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2
> > should be
> > more accurate w
On 11/10/2018 15:34, Christophe Lyon wrote:
We call __aeabi_read_tp() to get the thread pointer. Since this is a
function call, we have to restore the FDPIC register afterwards.
2018-XX-XX Christophe Lyon
Mickaël Guêné
gcc/
* config/arm/arm.c (arm_load_tp): Add FDPIC
On 12/10/2018 12:45, Richard Earnshaw (lists) wrote:
On 11/10/18 14:34, Christophe Lyon wrote:
The FDPIC register is hard-coded to r9, as defined in the ABI.
We have to disable tailcall optimizations if we don't know if the
target function is in the same module. If not, we have to set r9 to
the
Hi,
>> Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the
>> relative error
>> should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2
>> should be
>> more accurate when abs(x)>0.5 and still much faster.
>
>No, but I will check how to enable it if FMA is avai
This fixes the following testsuite failures on ia32 when compiled with
-D_GLIBCXX_DEBUG:
FAIL: 23_containers/map/modifiers/erase/dr130-linkage-check.cc
FAIL: 23_containers/multimap/modifiers/erase/dr130-linkage-check.cc
FAIL: 23_containers/multiset/modifiers/erase/dr130-linkage-check.cc
FAIL: 23_
On Tue, 16 Oct 2018, Rainer Orth wrote:
> The following patch documents the Solaris 10 obsoletion in the GCC 9
> changes.html. I've based this on the GCC 4.9 text which allowed for
> obsoletion of several targets. Tested by inspection in Firefox.
>
> Ok to install?
Yes. And technically as main
On Fri, 19 Oct 2018, Richard Sandiford wrote:
> Joseph Myers writes:
> > On Thu, 18 Oct 2018, Richard Sandiford wrote:
> >> - Type introspection for things like parsing format strings
> >>
> >> It sounded like the type descriptors would be fixed-sized types,
> >> a bit like a C version of st
The following fixes an ICE I introduced in the x86 backend by not
considering word_mode vectorization.
Bootstrap & regtest running on x86_64-unknown-linux-gnu, will apply
after that succeeded.
Richard.
2018-10-19 Richard Biener
PR target/87657
* config/i386/i386.c (ix86_bu
Hello,
> Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the
> relative error
> should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2
> should be
> more accurate when abs(x)>0.5 and still much faster.
No, but I will check how to enable it if FMA is availabl
On Thu, 18 Oct 2018, David Malcolm wrote:
> On Thu, 2018-10-18 at 15:09 +0200, Richard Biener wrote:
> > PR63155 made me pick up this old work from Steven, it turns our
> > linked-list implementation to a two-mode one with one being a
> > splay tree featuring O(log N) complexity for find/remove.
>
On Fri, 19 Oct 2018, Richard Sandiford wrote:
> Richard Biener writes:
> > On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> > wrote:
> >>Richard Biener writes:
> >>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
> >>>
> Richard Biener writes:
> > PR63155 made me pick up
On Fri, 19 Oct 2018, Steven Bosscher wrote:
> On Fri, Oct 19, 2018 at 8:46 AM Richard Biener <> wrote:
> > Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> > I guess in the end well predicted branches in the out of line code are
> > important...
I specifically meant the fa
Joseph Myers writes:
> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>> - Type introspection for things like parsing format strings
>>
>> It sounded like the type descriptors would be fixed-sized types,
>> a bit like a C version of std::type_info.
>
> It wasn't clear if people might also want
On 10/19/18, Uros Bizjak wrote:
> On Thu, Oct 18, 2018 at 11:44 PM H.J. Lu wrote:
>>
>> Many AVX512 vector operations can broadcast from a scalar memory source.
>> This patch enables memory broadcast for FP add operations.
>>
>> gcc/
>>
>> PR target/72782
>> * config/i386/sse.md
>
On 10/19/18, Uros Bizjak wrote:
> On Thu, Oct 18, 2018 at 11:44 PM H.J. Lu wrote:
>>
>> Many AVX512 vector operations can broadcast from a scalar memory source.
>> This patch enables memory broadcast for FP add operations.
>>
>> gcc/
>>
>> PR target/72782
>> * config/i386/sse.md
>
On 10/18/18, Jan Hubicka wrote:
>> we need to generate
>>
>> vxorp[ds] %xmmN, %xmmN, %xmmN
>> ...
>> vcvtss2sd f(%rip), %xmmN, %xmmX
>> ...
>> vcvtsi2ss i(%rip), %xmmN, %xmmY
>>
>> to avoid partial XMM register stall. This patch adds a pass to generate
>
Hi!
The spec says:
"The loops associated with an ordered clause with a parameter may not include
range-for
loops."
This patch implements this restriction. Committed to gomp-5_0-branch.
2018-10-19 Jakub Jelinek
* parser.c (cp_parser_omp_for_loop): Disallow ordered clause with
Improves the code generation by getting rid of redundant LAs, as seen
in the following example:
- la %r1,0(%r13)
- lg %r4,0(%r1)
+ lg %r4,0(%r13)
Also allows to proceed with the merge of movdi_64 and movdi_larl.
Currently LRA decides to spi
On Fri, Oct 19, 2018 at 8:46 AM Richard Biener <> wrote:
> Yeah. I also noticed some 'obvious' shortcomings in the heuristics...
> I guess in the end well predicted branches in the out of line code are
> important...
What also would help is to put bitmaps on their own obstack to improve
cache loc
On 18 October 2018 19:34:52 CEST, Qing Zhao wrote:
>A. an option to control GCC's IPA optimizations to provide a safe
>compilation for live-patching purpose. At the same time, provides
>multiple-level control of patch code-size and run time performance
>tradeoff.
>
>-fease-live-patching={none|
Richard Biener writes:
> On October 18, 2018 11:05:32 PM GMT+02:00, Richard Sandiford
> wrote:
>>Richard Biener writes:
>>> On Thu, 18 Oct 2018, Richard Sandiford wrote:
>>>
Richard Biener writes:
> PR63155 made me pick up this old work from Steven, it turns our
> linked-list imp
The compiler currently issues a warning/error mentioning a variable "frame",
which is not very user-friendly. This is changed to using the same wording as
frame_offset_overflow, i.e. "total size of local objects".
Tested on x86-64/Linux, applied on the mainline as obvious.
Btw, in most cases,
On Thu, Oct 18, 2018 at 11:44 PM H.J. Lu wrote:
>
> Many AVX512 vector operations can broadcast from a scalar memory source.
> This patch enables memory broadcast for FP add operations.
>
> gcc/
>
> PR target/72782
> * config/i386/sse.md
> (*3_bcst_1): New.
> (*add3
> Still OK :-)
Committed as r265304.
Regards
Robin
On Thu, Oct 18, 2018 at 11:44 PM H.J. Lu wrote:
>
> Many AVX512 vector operations can broadcast from a scalar memory source.
> This patch enables memory broadcast for FP add operations.
>
> gcc/
>
> PR target/72782
> * config/i386/sse.md
> (*3_bcst_1): New.
> (*add3
64 matches
Mail list logo