Hi Richard,
> On 5 Nov 2023, at 12:11, Richard Sandiford wrote:
>
> Iain Sandoe writes:
>>> On 26 Oct 2023, at 21:00, Iain Sandoe wrote:
>>
On 26 Oct 2023, at 20:49, Richard Sandiford
>> wrote:
Iain Sandoe writes:
> This was written before Thomas' modification to the EL
fld and fst have same address mode as ld.w and st.w, so the same
optimization as r14-4851 should be applied for them too.
gcc/ChangeLog:
* config/loongarch/loongarch.md (LD_AT_LEAST_32_BIT): New mode
iterator.
(ST_ANY): New mode iterator.
(define_peephole2): Use LD
As the commit message of r14-4674 has indicated, if the assembler does
not support conditional branch relaxation, a relocation overflow may
happen on conditional branches when relaxation is enabled because the
number of NOP instructions inserted by the assembler will be more than
the number estimat
On Fri, 3 Nov 2023, Andre Vieira (lists) wrote:
> Hi,
>
> The current codegen code to support VF's that are multiples of a simdclone
> simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not
> work for non-constant simdclones, so we should disable using such clones when
> t
Hi All,
This adds an implementation for conditional branch optab for AArch32.
For e.g.
void f1 ()
{
for (int i = 0; i < N; i++)
{
b[i] += a[i];
if (a[i] > 0)
break;
}
}
For 128-bit vectors we generate:
vcgt.s32q8, q9, #0
vpmax.u32 d7,
Hi All,
Advanced SIMD lacks flag setting vector comparisons which SVE adds. Since
machines
with SVE also support Advanced SIMD we can use the SVE comparisons to perform
the
operation in cases where SVE codegen is allowed, but the vectorizer has decided
to generate Advanced SIMD because of loop
Hi All,
This adds an implementation for conditional branch optab for MVE.
Unfortunately MVE has rather limited operations on VPT.P0, we are missing the
ability to do P0 comparisons and logical OR on P0.
For that reason we can only support cbranch with 0, as for comparing to a 0
predicate we don'
Hi All,
Advanced SIMD lacks a cmpeq for vectors, and unlike compare to 0 we can't
rewrite to a cmtst.
This operation is however fairly common, especially now that we support early
break vectorization.
As such this adds a pattern to recognize the negated any comparison and
transform it to an all.
Hi All,
This adds an implementation for conditional branch optab for AArch64.
For e.g.
void f1 ()
{
for (int i = 0; i < N; i++)
{
b[i] += a[i];
if (a[i] > 0)
break;
}
}
For 128-bit vectors we generate:
cmgtv1.4s, v1.4s, #0
umaxp v1.4s, v1.4s,
On Fri, 3 Nov 2023, Joseph Myers wrote:
> On Fri, 3 Nov 2023, Richard Biener wrote:
>
> > The following tries to clarify the __builtin_constant_p documentation,
> > stating that the argument expression is not evaluated and side-effects
> > are discarded. I'm struggling to find the correct terms
Hi All,
I didn't want these to get lost in the noise of updates.
The following three tests now correctly work for targets that have an
implementation of cbranch for vectors so XFAILs are conditionally removed gated
on vect_early_break support.
Bootstrapped Regtested on aarch64-none-linux-gnu and
Hi All,
What do people think about having the ability to force only the latch connected
exit as the exit as a param? I.e. what's in the patch but as a param.
I found this useful when debugging large example failures as it tells me where
I should be looking. No hard requirement but just figured I
Hi All,
The vectorizer at the moment uses a num_bb check to check for control flow.
This rejects a number of loops with no reason. Instead this patch changes it
to check the destination of the exits instead.
This also allows early break to work by also dropping the single_exit check.
Bootstrapp
Hi All,
This sets LOOP_VINFO_EARLY_BREAKS and does some misc changes so the other
patches are self contained.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_form): Analyse all exits.
Hi All,
This finishes wiring that didn't fit in any of the other patches.
Essentially just adding related changes so peeling for early break works.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* tree-vect-loop-manip.cc (ve
Hi All,
This wires through the final bits to support adding the guard block between
the loop and epilog.
For an "inverted loop", i.e. one where an early exit was chosen as the main
exit then we can never skip the scalar loop since we know we have side effects
to still perform. For those cases we
Hi All,
This implements vectorable_early_exit which is used as the codegen part of
vectorizing a gcond.
For the most part it shares the majority of the code with
vectorizable_comparison with addition that it needs to be able to reduce
multiple resulting statements into a single one for use in the
Hi All,
This updates relevancy analysis to support marking gcond's belonging to early
breaks as relevant for vectorization.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* tree-vect-stmts.cc (vect_stmt_relevant_p,
v
Hi All,
This adds support to vectorizable_live_reduction to handle multiple exits by
doing a search for which exit the live value should be materialized in.
Additinally which value in the index we're after depends on whether the exit
it's materialized in is an early exit or whether the loop's mai
Hi All,
This changes the PHI node updates to support early breaks.
It has to support both the case where the loop's exit matches the normal loop
exit and one where the early exit is "inverted", i.e. it's an early exit edge.
In the latter case we must always restart the loop for VF iterations. Fo
Hi All,
As requested, the vectorizer is now free to pick it's own exit which can be
different than what the loop CFG infrastucture uses. The vectorizer makes use
of this to vectorize loops that it previously could not.
But this means that loop control must be materialized in the block that needs
Hi All,
This has loop versioning use the vectorizer's IV exit edge when it's available
since single_exit (..) fails with multiple exits.
Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Ok for master?
Thanks,
Tamar
gcc/ChangeLog:
* tree-vect-loop-manip.cc (vect_loop_ver
Hi All,
This splits the part of the function that does peeling for loops at exits to
a different function. In this new function we also peel for early breaks.
Peeling for early breaks works by redirecting all early break exits to a
single "early break" block and combine them and the normal exit
Hi All,
When performing early break vectorization we need to be sure that the vector
operations are safe to perform. A simple example is e.g.
for (int i = 0; i < N; i++)
{
vect_b[i] = x + i;
if (vect_a[i]*2 != x)
break;
vect_a[i] = x;
}
where the store to vect_b is not allowed
Hi All,
This adds pragma GCC novector to testcases that have showed up
since last regression run and due to this series detecting more.
Is it ok that when it comes time to commit I can just update any
new cases before committing? since this seems a cat and mouse game..
Bootstrapped Regtested on
Hi All,
This patch adds initial support for early break vectorization in GCC.
The support is added for any target that implements a vector cbranch optab,
this includes both fully masked and non-masked targets.
Depending on the operation, the vectorizer may also require support for boolean
mask re
On Sun, 5 Nov 2023, Richard Sandiford wrote:
> Robin Dapp writes:
> >> Ah, OK. IMO it's better to keep the optab operands the same as the IFN
> >> operands, even if that makes things inconsistent with vcond_mask.
> >> vcond_mask isn't really a good example to follow, since the operand
> >> order
On Sun, Nov 5, 2023 at 7:33 PM Richard Sandiford
wrote:
>
> align_dynamic_address would output alignment operations even
> for a required alignment of 1 byte.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
OK
> Richard
>
>
> gcc/
> * explow.cc (align_dynamic_address)
On Sun, Nov 5, 2023 at 7:32 PM Richard Sandiford
wrote:
>
> This patch allows allocate_dynamic_stack_space to be called before
> or after virtual registers have been instantiated. It uses the
> same approach as allocate_stack_local, which already supported this.
>
> Tested on aarch64-linux-gnu &
Hi,
With latest trunk, case pr106550_1.c can run with failure on ppc under -m32.
While, the case is testing 64bit constant building. So, "has_arch_ppc64"
is required.
Test pass on ppc64{,le}.
BR,
Jeff (Jiufu Guo)
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr106550_1.c: Add has_arch_
This patch adds strided load/store support on loop vectorizer depending on
STMT_VINFO_STRIDED_P.
Bootstrap and regression on X86 passed.
Ok for trunk ?
gcc/ChangeLog:
* internal-fn.cc (strided_load_direct): New function.
(strided_store_direct): Ditto.
(expand_strided_st
Sorry.
This is middle-end patch, sending to wrong CC lists.
Forget about this patch.
juzhe.zh...@rivai.ai
From: Juzhe-Zhong
Date: 2023-11-06 14:52
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] VECT: Support mask_len_strided_load/mask_len_
This patch adds strided load/store support on loop vectorizer depending on
STMT_VINFO_STRIDED_P.
Bootstrap and regression on X86 passed.
Ok for trunk ?
gcc/ChangeLog:
* internal-fn.cc (strided_load_direct): New function.
(strided_store_direct): Ditto.
(expand_strided_st
I notice we failed to AVL propagate for reduction with more complicate
situation:
double foo (double *__restrict a,
double *__restrict b,
double *__restrict c,
int n)
{
double result = 0;
for (int i = 0; i < n; i++)
result += a[i] * b[i] * c[i];
return result;
}
vsetvli a5,a3
Hi,
The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes
the regression cases caused by previous patch. For sra-17/18, the long
array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit
platform. So the array is not be constructed in LC0 and SRA optimization
is
Hi,
This patch enables vector mode for by pieces equality compare. It
adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES
and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare
relies both move and compare instructions, so both macro are changed.
The vector load/stor
Iain Sandoe writes:
> Hi Richard,
>
>> On 26 Oct 2023, at 21:00, Iain Sandoe wrote:
>
>>> On 26 Oct 2023, at 20:49, Richard Sandiford
> wrote:
>>>
>>> Iain Sandoe writes:
This was written before Thomas' modification to the ELF-handling to allow
a config-based change for target details
On Nov 5, 2023, at 12:33 PM, FX Coudert wrote:
>
> kind ping for this easy patch
>
>
>> Le 30 oct. 2023 à 15:19, FX Coudert a écrit :
>>
>> Hi,
>>
>> The test is currently failing on x86_64-apple-darwin with "decimal
>> floating-point not supported for this target”.
>> Marking the test as r
On Oct 19, 2023, at 8:16 PM, Alexandre Oliva wrote:
>
> On Mar 10, 2021, Alexandre Oliva wrote:
>
>> ppc configurations that have -mstrict-align enabled by default fail
>> gcc.dg/strlenopt-80.c, because some memcpy calls don't get turned into
>> MEM_REFs, which defeats the tested-for strlen opt
On Oct 2, 2023, at 1:24 AM, Christophe Lyon wrote:
>
> ping?
>
> On Sun, 10 Sept 2023 at 21:31, Christophe Lyon
> wrote:
> Some targets like arm-eabi with newlib and default settings rely on
> __sync_synchronize() to ensure synchronization. Newlib does not
> implement it by default, to make u
On Nov 1, 2023, at 6:11 PM, Alexandre Oliva wrote:
>
> Several C++ tests fail with --disable-hosted-libstdcxx, whether
> because stdc++ext gets linked in despite not being built, because
> standard headers are included but that are unavailable in this mode,
> or because headers are (mistakenly?)
Hi FX
> On 5 Nov 2023, at 10:33, FX Coudert wrote:
>
> kind ping for this easy patch
IMO adding feature tests for features required by a test falls into the
“obvious”
category,
Iain
>
>
>> Le 30 oct. 2023 à 15:19, FX Coudert a écrit :
>>
>> Hi,
>>
>> The test is currently failing on x86
kind ping for this easy patch
> Le 30 oct. 2023 à 15:19, FX Coudert a écrit :
>
> Hi,
>
> The test is currently failing on x86_64-apple-darwin with "decimal
> floating-point not supported for this target”.
> Marking the test as requiring dfp fixes the issue.
>
> OK to push?
>
> FX
>
0001
Robin Dapp writes:
>> Ah, OK. IMO it's better to keep the optab operands the same as the IFN
>> operands, even if that makes things inconsistent with vcond_mask.
>> vcond_mask isn't really a good example to follow, since the operand
>> order is not only inconsistent with the IFN, it's also incons
On Oct 27, 2023, at 8:11 AM, Christophe Lyon wrote:
>
> In some configurations of our validation setup, we always call the
> compiler with -Wl,-rpath=XXX, which instructs the driver to invoke the
> linker if none of -c, -S or -E is used.
>
> This happens to be the case in the PCH tests, where dg
Andrew Carlotti writes:
> On Thu, Oct 26, 2023 at 07:41:09PM +0100, Richard Sandiford wrote:
>> Andrew Carlotti writes:
>> > This patch adds support for the "target_version" attribute to the middle
>> > end and the C++ frontend, which will be used to implement function
>> > multiversioning in the
Also rename LEGACY_REGS to LEGACY_GENERAL_REGS.
gcc/ChangeLog:
* config/i386/i386.h (enum reg_class): Add LEGACY_INDEX_REGS.
Rename LEGACY_REGS to LEGACY_GENERAL_REGS.
(REG_CLASS_NAMES): Ditto.
(REG_CLASS_CONTENTS): Ditto.
* config/i386/constraints.md ("R"): Update for rename.
On Sun, Nov 5, 2023 at 9:13 AM Cassio Neri wrote:
>
> I could not find any entry in gcc's bugzilla for that. Perhaps my search
> wasn't good enough.
I filed https://gcc.gnu.org/PR112395 with a first attempt at the patch
(will double check it soon).
Thanks,
Andrew
>
>
> On Sun, 5 Nov 2023 at 15
This patch adds a way for targets to ask that selected mode changes
be brought forward, through a combination of:
(1) requiring a mode in blocks where the entity was previously
transparent
(2) pushing the transition at the head of a block onto incomging edges
SME has two uses for this:
- A
The mode-switching pass assumed that all of an entity's modes
were mutually exclusive. However, the upcoming SME changes
have an entity with some overlapping modes, so that there is
sometimes a "superunion" mode that contains two given modes.
We can use this relationship to pass something more hel
The pass used the edge aux field to record which mode change
should happen on the edge, with -1 meaning "none". It's more
convenient for later patches to leave aux zero for "none",
and use numbers based at 1 to record a change.
gcc/
* mode-switching.cc (commit_mode_sets): Use 1-based edge
This patch passes the set of live hard registers to the after hook,
like the previous one did for the needed hook.
gcc/
* target.def (mode_switching.after): Add a regs_live parameter.
* doc/tm.texi: Regenerate.
* config/epiphany/epiphany-protos.h (epiphany_mode_after): Upda
The emit hook already takes the set of live hard registers as input.
This patch passes it to the needed hook too. SME uses this to
optimise the mode choice based on whether state is live or dead.
The main caller already had access to the required info, but the
special handling of return values di
The mode-switching pass already had hooks to say what mode
an entity is in on entry to a function and what mode it must
be in on return. For SME, we also want to say what mode an
entity is guaranteed to be in on entry to an exception handler.
gcc/
* target.def (mode_switching.eh_handler):
An entity isn't transparent in a block that requires a specific mode.
optimize_mode_switching took that into account for normal insns,
but didn't for the exit block. Later patches misbehaved because
of this.
In contrast, an entity was correctly marked as non-transparent
in the entry block, but th
For a given block, an entity is either transparent for
all modes or for none. Each update to the transparency set
therefore used a loop like:
for (i = 0; i < no_mode; i++)
clear_mode_bit (transp[bb->index], j, i);
This patch instead starts out with a bit-per-blo
optimize_mode_switching passes an entity's current mode (if known)
to the emit hook. However, the mode that it passed ignored the
effect of the after hook. Instead, the mode for the first emit
call in a block was taken from the incoming mode, whereas the
mode for each subsequent emit call was tak
add_seginfo chained insn information to the end of a list
by starting at the head of the list. This patch avoids the
quadraticness by keeping track of the tail pointer.
gcc/
* mode-switching.cc (add_seginfo): Replace head pointer with
a pointer to the tail pointer.
(optimi
optimize_mode_switching uses REG_DEAD notes to track register
liveness, but it failed to tell DF to calculate up-to-date notes.
Noticed by inspection. I don't have a testcase that fails
because of this.
gcc/
* mode-switching.cc (optimize_mode_switching): Call
df_note_add_problem.
I found the documentation for the mode-switching macros/hooks
a bit hard to follow at first. This patch tries to add the
information that I think would have made it easier to understand.
Of course, documentation preferences are personal, and so I could
be changing something that others understood
This series of patches extends the mode-switching pass so that it
can be used for AArch64's SME. I wondered about including a detailed
description of how the SME mode changes work, but it'd probably be
a distraction. The system is quite complex and target-specific, and
hopefully the details aren'
align_dynamic_address would output alignment operations even
for a required alignment of 1 byte.
Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
Richard
gcc/
* explow.cc (align_dynamic_address): Do nothing if the required
alignment is a byte.
---
gcc/explow.cc |
This patch allows allocate_dynamic_stack_space to be called before
or after virtual registers have been instantiated. It uses the
same approach as allocate_stack_local, which already supported this.
Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
Richard
gcc/
* function
seginfo had an unused bbnum field, presumably dating from before
BB information was attached directly to insns.
Pushed as obvious after testing on aarch64-linux-gnu &
x86_64-linux-gnu.
Richard
gcc/
* mode-switching.cc: Remove unused forward references.
(seginfo): Remove bbnum.
read_rtx_operand would spin endlessly for:
(unspec [(...))] UNSPEC_FOO)
because read_nested_rtx does nothing if the next character is not '('.
Pushed after testing on aarch64-linux-gnu & x86_&4-linux-gnu.
Richard
gcc/
* read-rtl.cc (read_rtx_operand): Avoid spinning endlessly for
The current implementation returns
(_M_y & (__is_multiple_of_100 ? 15 : 3)) == 0;
where __is_multiple_of_100 is calculated using an obfuscated algorithm which
saves one ror instruction when compared to _M_y % 100 == 0 [1].
In leap years calculation, it's mathematically correct to replace the
d
Hi!
check_field_decls for DECL_C_BIT_FIELD FIELD_DECLs with error_mark_node
TREE_TYPE continues early and doesn't call check_bitfield_decl which would
either set DECL_BIT_FIELD, or clear DECL_C_BIT_FIELD. So, the following
testcase ICEs after emitting tons of errors, because
SET_DECL_FIELD_CXX_ZE
Hi!
This patch mentions the C attribute syntax support in the libgomp documentation.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
2023-11-05 Jakub Jelinek
* libgomp.texi (Enabling OpenMP): Adjust wording for attribute syntax
supported also in C.
Hi!
I forgot to tweak c_common_has_attribute for the C++ omp::decl addition and now
also for the C omp::{directive,sequence,decl} addition.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.
2023-11-05 Jakub Jelinek
* c-lex.cc (c_common_has_attribute): Return
I could not find any entry in gcc's bugzilla for that. Perhaps my search
wasn't good enough.
On Sun, 5 Nov 2023 at 15:58, Marc Glisse wrote:
> On Sun, 5 Nov 2023, Cassio Neri wrote:
>
> > When year_month_day_last::day() was implemented, Dr. Matthias Kretz
> realised
> > that the operation "& 1"
On Sun, 5 Nov 2023, Cassio Neri wrote:
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.
Is there an entry in gcc's bugzilla about having the
When year_month_day_last::day() was implemented, Dr. Matthias Kretz realised
that the operation "& 1" wasn't necessary but we did not patch it at that
time. This patch removes the unnecessary operation.
libstdc++-v3/ChangeLog:
* include/std/chrono:
diff --git a/libstdc++-v3/include/std/chrono
b/
Bootstrapped and tested on x86_64-linux with no regressions.
Finally, the fabled diagnostics patch. I would like to note really
quickly that there was never a v2 and v3 of this patch, only the first
of these 2 had those versions. Originally I had planned to revise this
patch alongside the first bu
Bootstrapped and tested on x86_64-linux with no regressions.
I originally threw this e-mail together last night, but threw in the
towel when I thought I saw tests failing and went to sleep. I did a
proper bootstrap and comparison and whatnot and found that there were
thankfully no regressions.
An
Committed, thanks Juzhe.
Pan
From: juzhe.zhong
Sent: Sunday, November 5, 2023 5:40 PM
To: Li, Pan2
Cc: gcc-patches@gcc.gnu.org; Li, Pan2 ; Wang, Yanzhang
; kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec
lgtm
Replied Message
From
pan2
lgtm Replied Message Frompan2...@intel.comDate11/05/2023 17:30 Togcc-patches@gcc.gnu.org Ccjuzhe.zh...@rivai.ai,pan2...@intel.com,yanzhang.w...@intel.com,kito.ch...@gmail.comSubject[PATCH v1] RISC-V: Support FP rint to i/l/ll diff size autovec
From: Pan Li
This patch would like to support the FP below API auto vectorization
with different type size
+-+---+--+
| API | RV64 | RV32 |
+-+---+--+
| irint | DF => SI | DF => SI |
| irintf | - | -|
| lrint | -
77 matches
Mail list logo