ping
From: Wilco Dijkstra
Sent: 07 August 2017 15:13
To: GCC Patches; James Greenhalgh
Cc: nd; Richard Earnshaw
Subject: [PATCH][AArch64] Simplify aarch64_can_eliminate
Simplify aarch64_can_eliminate - if we need a frame pointer, we must
eliminate to HARD_FRAME_POINTER_REGNUM. Rather than
ping
From: Wilco Dijkstra
Sent: 04 August 2017 13:26
To: GCC Patches; James Greenhalgh
Cc: nd
Subject: [PATCH][AArch64] Introduce emit_frame_chain
The current frame code combines the separate concepts of a frame chain
(saving old FP,LR in a record and pointing new FP to it) and a frame
ping
From: Wilco Dijkstra
Sent: 20 July 2017 13:49
To: GCC Patches; James Greenhalgh
Cc: nd
Subject: [PATCH][AArch64] Improve addressing of TI/TFmode
In https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01125.html Jiong
pointed out some addressing inefficiencies due to a recent change in
ping
From: Wilco Dijkstra
Sent: 04 August 2017 13:41
To: GCC Patches; James Greenhalgh
Cc: nd
Subject: [PATCH][AArch64] Remove aarch64_frame_pointer_required
To implement -fomit-leaf-frame-pointer, there are 2 places where we need
to check whether we have to use a frame chain (since
stack
spills.
SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%.
Bootstrap OK, OK for commit?
ChangeLog:
2017-07-07 Wilco Dijkstra
* config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
Return true for more constants, symbols and label references
ping
From: Wilco Dijkstra
Sent: 25 July 2017 14:58
To: GCC Patches; James Greenhalgh; Jeff Law
Cc: nd
Subject: [PATCH][AArch64] Simplify frame layout for stack probing
This patch makes some changes to the frame layout in order to simplify
stack probing. We want to use the save of LR as
ping
From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the
ping
From: Wilco Dijkstra
Sent: 26 July 2017 14:46
To: GCC Patches; James Greenhalgh
Cc: nd
Subject: [PATCH][AArch64] Remove '*' from movsi/di/ti patterns
Remove the remaining uses of '*' from the movsi/di/ti patterns.
Using '*' in alternatives is typic
ping
From: Wilco Dijkstra
Sent: 31 July 2017 16:57
To: GCC Patches; James Greenhalgh
Cc: nd
Subject: [PATCH][AArch64] PR71951: Fix unwinding with -fomit-frame-pointer
As described in PR71951, if libgcc is built with -fomit-frame-pointer,
unwinding crashes, for example while doing a
Richard Biener wrote:
> On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra
> wrote:
> > Richard Biener wrote:
>>> > We also change the association of
>>> >
>>> > x / (y * C) -> (x / C) / y
>>> >
>>> > If C is a constant
the transformation for powf (10.0, x) in SPEC
was 2.5. If we allow use of exp10 in match.pd, the ULP error would be lower.
OK for commit?
ChangeLog:
2017-08-17 Wilco Dijkstra
* match.pd: Add pow (C, x) simplification.
--
diff --git a/gcc/match.pd b/gcc/match.pd
in
r of the transformation for powf (10.0, x) in SPEC
was 2.5. If we allow use of exp10 in match.pd, the ULP error would be lower.
ChangeLog:
2017-08-18 Wilco Dijkstra
* match.pd: Add pow (C, x) simplification.
--
diff --git a/gcc/match.pd b/gcc/match.pd
index
0e36f46b914bc63c257cef47
Hi,
The main reason we have this issue is that DImode can be treated as a
vector of size 1. As a result we do not know whether the shift is an integer
or SIMD instruction. One way around this is to never use the SIMD variant,
another is to introduce V1DImode for vectors of size 1.
Short term I be
Hi,
The register allocator inserts move preferences when an instruction has
one or more dead sources in add_insn_allocno_copies. If an instruction
doesn't have a matching constraint (eg. "0"), then any dead source is treated
as a copy with all destination registers with a low priority. In reality
Jeff Law wrote:
On 07/26/2017 05:29 PM, Wilco Dijkstra wrote:
> > But then the check size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0
> > seems wrong too given that round_push uses a different alignment to align
> > to.
> I had started to dig into the history of this code,
Richard Sandiford wrote:
>
> Sorry for only noticing now, but the call to aarch64_legitimate_address_p
> is asking whether the MEM itself is a legitimate LDP/STP address. Also,
> it might be better to pass false for strict_p, since this can be called
> before RA. So maybe:
>
>if (GET_CODE (op
Segher Boessenkool wrote:
> On Tue, Aug 22, 2017 at 10:48:17AM +0000, Wilco Dijkstra wrote:
> > The register allocator inserts move preferences when an instruction has
> > one or more dead sources in add_insn_allocno_copies. If an instruction
> > doesn't have a matching
Segher Boessenkool wrote:
>
> "0,r" might work, or "0,?r", or similar (alternatives have commas
> between them).
No, it doesn't work at all. But that is no surprise if you look at
ira_get_dup_out_num.
It iterates over the constraint string and if you have anything that matches
after a "0",
the "
Vladimir Makarov wrote:
>
> As I correctly understand, you just want an intuitive allocation. The
> current allocation performance has the same quality as the intuitive one.
Performance is affected as well but I didn't want to go into details as that
distracts from the underlying issue. But if yo
Jeff Law wrote:
> Right. exp is painful in glibc, but pow is *dramatically* more painful
> and likely always will be.
>
> Siddhesh did some great work in bringing those costs down in glibc but
> the more code we can reasonably shunt into exp instead of pow, the better.
>
> It's likely pow will alw
Kyrill Tkachov wrote:
> -(define_insn_and_split "*iordi_notsesidi_di"
> - [(set (match_operand:DI 0 "s_register_operand" "=&r,&r")
> - (ior:DI (not:DI (sign_extend:DI
> - (match_operand:SI 2 "s_register_operand" "r,r")))
> - (match_operand:DI 1 "s_regis
Kyrill Tkachov wrote:
> > After Bernd's change almost all DI mode instructions are split before
> > register
> > allocation. So instructions using DI mode no longer exist and thus these
> > extend variants can never be matched and are thus redundant.
>
> Bernd's patch splits them when we don't ha
Kyrill Tkachov wrote:
> I like the simplifications in the selection logic here :)
> However, changing the value for ARM from 6 to 4 looks a bit arbitrary to me.
> There's probably a reason why default values for ARM and Thumb-2 are
> different
> (maybe not a good one) and I'd rather not change it
Bernd Edlinger wrote:
> Combine creates an invalid insn out of these two insns:
Yes it looks like a latent bug. We need to use arm_general_register_operand
as arm_adddi3/subdi3 only allow integer registers. You don't need a new
predicate
s_register_operand_nv. Also I'd prefer something like arm_
Bernd Edlinger wrote:
> No, the split condition does not begin with "&& TARGET_32BIT...".
> Therefore the split is enabled in TARGET_NEON after reload_completed.
> And it is invoked from adddi3_neon for all alternatives without vfp
> registers:
Hmm that's a huge mess. I'd argue that any inst_and_s
g the outgoing arguments or setting STACK_BOUNDARY correctly.
Committed as obvious.
ChangeLog:
2017-09-06 Wilco Dijkstra
PR middle-end/78468
* gcc.dg/pr78468.c: Add alignment test.
--
diff --git a/gcc/testsuite/gcc.dg/pr78468.c b/gcc/testsuite/gcc.dg/pr78468.c
new file mode 1
Eric Botcazou wrote:
> The stack is aligned before the allocation but it gets misaligned during the
> allocation because the dynamic offset is not a multiple of STACK_BOUNDARY.
No, the stack never gets misaligned - my patch doesn't change that at all. The
issue is that Sparc backend doesn't corr
Hi Rainer,
Can you post the disassembly for say the 8-byte aligned tests? It may not be
built correctly or hit an offset that is accidentally aligned, however
pass/fail status can't change due to my patch as it doesn't change alignment at
all.
Wilco
Eric Botcazou wrote:
>> No, the stack never gets misaligned - my patch doesn't change that at all.
>
> Yes, it does.
No. Look at the diffs, there is not a single change in alignment anywhere for
all
of the alloca variants. If the alignment is incorrect after my patch, it is also
incorrect
Any further comments?
Kyrill Tkachov wrote:
> > After Bernd's change almost all DI mode instructions are split before
> > register
> > allocation. So instructions using DI mode no longer exist and thus these
> > extend variants can never be matched and are thus redundant.
>
> Bernd's patch
Jeff Law wrote:
> On 09/09/2017 02:51 AM, Eric Botcazou wrote:
> >> No, the stack never gets misaligned - my patch doesn't change that at all.
> >
> > Yes, it does. Dynamic allocation works like this: the amount to be
> > allocated
> > is added to VIRTUAL_STACK_DYNAMIC_REGNUM and the result is
Steve Ellcey wrote:
> This patch fixes the ttest failures on aarch64 by adding AM_CFLAGS to
> the test options, like btest already does and as Wilco says works for
> him in Comment #4 of the bug report.
Thanks for picking this up, this looks OK.
> Tested by me on aarch64. Ok to checkin?
This co
Richard Earnshaw (lists) wrote:
> On 04/05/17 18:38, Wilco Dijkstra wrote:
> > Richard Earnshaw wrote:
> >
>>> - 5, /* Max cond insns. */
>>> + 2, /* Max cond insns. */
>>
Richard Earnshaw (lists) wrote:
> (define_insn "*movdi_vfp"
> - [(set (match_operand:DI 0 "nonimmediate_di_operand"
> "=r,r,r,r,q,q,m,w,r,w,w, Uv")
> + [(set (match_operand:DI 0 "nonimmediate_di_operand"
> "=r,r,r,r,q,q,m,w,!r,w,w, Uv")
> Why have you introduced a no-reloads block on the 9th
Richard Earnshaw (lists) wrote:
> --- a/gcc/config/arm/aarch-common.c
> +++ b/gcc/config/arm/aarch-common.c
> @@ -254,12 +254,7 @@ arm_no_early_alu_shift_dep (rtx producer, rtx consumer)
> return 0;
>
> if ((early_op = arm_find_shift_sub_rtx (op)))
> - {
> - if (REG_P (early_op))
Richard Earnshaw (lists) wrote:
> While on the subject, why is the w->w operation also hidden?
No idea, this just fixes one case where it is obvious the use of '*' is
incorrect.
However I think all uses of '*' in md files are incorrect and the feature should
be removed. '?' already exists for c
Richard Earnshaw (lists) wrote:
> On 05/05/17 13:42, Wilco Dijkstra wrote:
>> Richard Earnshaw (lists) wrote:
>>> On 04/05/17 18:38, Wilco Dijkstra wrote:
>>> > Richard Earnshaw wrote:
>>> >
>>>>> - 5,
Richard Earnshaw (lists) wrote:
> On 05/05/17 17:10, Wilco Dijkstra wrote:
> > However I think all uses of '*' in md files are incorrect and the
> > feature should
> > be removed. '?' already exists for cases where the alternative may be
> > expens
This fixes a few failures on ARM and AArch64 due to a recent change in
alignment peeling by switching the vector cost model off
(https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00407.html).
Tested on AArch64, ARM and x64 - committed as obvious.
ChangeLog:
2017-05-08 Wilco Dijkstra
Move an use-after-free access before the delete.
Committed as obvious.
ChangeLog:
2017-05-10 Wilco Dijkstra
PR target/80671
* config/aarch64/cortex-a57-fma-steering.c (merge_forest):
Move member access before delete.
--
diff --git a/gcc/config/aarch64/cortex-a57-fma
, and while most appear safe or appear aware of the
issue, it is likely not all such calls are safe. This check enables
any such latent bugs to be found.
Bootstrap OK on AArch64.
2017-05-11 Wilco Dijkstra
* final.c (leaf_function_p): Check we are not in a sequence.
--
diff --git a
to
> have a comment here at all? E.g. "Ensure we walk the entire function body
> after
> the following get_insns call".
I've changed to to "Ensure we walk the entire function body."
Wilco
2017-05-11 Wilco Dijkstra
* final.c (leaf_function_p): Check we
erate
illegal instructions with the same hard register as the destination and a
clobber. Fix this by also checking for overlaps with the destination
register.
Bootstrap OK on arm-linux-gnueabihf for ARM and Thumb-2, OK for commit?
ChangeLog:
2017-05-16 Wilco Dijkstra
PR rtl-optimization/
Richard Sandiford wrote:
> Insn patterns shouldn't check can_create_pseudo_p, because there's no
> guarantee that the associated split happens before RA. In this case it
> should be safe to reuse operand 0 after RA if you change it to:
The goal is to only create and split this pattern before reg
Hurugalawadi, Naveen wrote:
>
> Please consider this as a personal reminder to review the patch
> at following link and let me know your comments on the same.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00839.htmll
That looks good to me now.
Wilco
Hurugalawadi, Naveen wrote:
>
> Please consider this as a personal reminder to review the patch
> at following link and let me know your comments on the same.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01333.html
Looks good to me.
Wilco
Hi,
This patch most likely broke all non-x86 targets:
configure: error: conditional "HAVE_AVX128" was never defined.
Usually this means the macro was only invoked conditionally.
Makefile:19843: recipe for target 'configure-target-libgfortran' failed
make[1]: *** [configure-target-libgfortran] Erro
qualifiers]
if (d->code == (const enum arm_builtins) fcode)
^
Avoid the warning by removing const, and bootstrap is OK again.
Committed as trivial patch (r248686).
ChangeLog:
2017-05-30 Wilco Dijkstra
* config/arm/a
Bernd Edlinger wrote:
On 12/20/16 16:09, Wilco Dijkstra wrote:
> > As a result of your patches a few patterns are unused now. All the Thumb-2
> > iordi_notdi*
> > patterns cannot be used anymore. Also I think arm_cmpdi_zero never gets
> > used - a DI
>> mode com
Adhemerval Zanella wrote:
Sorry for the late reply - but I think it's getting there. A few more comments:
+ /* If function uses stacked arguments save the old stack value so morestack
+ can return it. */
+ reg11 = gen_rtx_REG (Pmode, R11_REGNUM);
+ if (cfun->machine->frame.saved_regs_si
that support either
full overlap or no overlap.
Bootstrap & regress on arm-linux-gnueabihf OK on GCC6 branch.
OK for backport?
ChangeLog:
2017-01-05 Wilco Dijkstra
gcc/
PR target/78041
* config/arm/neon.md (ashldi3_neon): Add "r 0 i" and "&r r i&qu
best schedule. As a result of
these tweaks the performance of the benchmark improves by 20%.
ChangeLog:
2017-01-10 Wilco Dijkstra
* config/arm/cortex-a53.md: Add bypasses for
cortex_a53_r2f_cvt.
(cortex_a53_r2f): Only use for transfers.
(cortex_a53_f2r
James Greenhalgh wrote:
> I've been putting off reviewing this patch for a while now, because I don't
> understand enough about the current eh_return code to understand why what
> you're proposing is correct.
>
> The best way to progress this patch would be to go in to more detail as to
> what the
Wilco Dijkstra
PR77455
gcc/
* config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.
* config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove.
(EH_RETURN_HANDLER_RTX): New define.
* config/aarch64/aarch64.c (aarch64_frame_pointer_required
Wilco Dijkstra wrote:
> Ramana Radhakrishnan wrote:
>> On Wed, Dec 14, 2016 at 5:43 PM, Wilco Dijkstra
>> wrote:
>
> > > Yes, the reason to split the pattern was to introduce the '!' to
> > > discourage Neon->int moves on Cortex-A8
> (https
ping
From: Wilco Dijkstra
Sent: 31 October 2016 18:29
To: GCC Patches
Cc: nd
Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage
This patch cleans up all code related to the frame pointer. On AArch64 we
emit a frame chain even in cases where the frame pointer is not required.
So
ping
From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091. This reduces the number of addressing instructions
when using DI mode operations (such
ping
From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed
Wilco Dijkstra wrote:
> James Greenhalgh wrote:
>
> > I haven't seen a follow-up to Andrew's point regarding other
> > read-modify-write operations.
> >
> > Did youi investigate the cost of these?
>
> I looked at whether there are other similar cas
range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the
symbol.
ChangeLog:
2017-01-17 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_classify_symbol):
Apply reasonable limit to symbo
ed
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN). After this there are no Thumb-2 specific DImode
patterns.
[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html
ChangeLog:
2017-01-17 Wilco Dijkstra
* config/arm/thum
arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.
Bootstrap OK on arm-linux-gnueabihf.
ChangeLog:
2017-01-17 Wilco Dijkstra
* config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
(arm_ashldi3_1bit): Remove pattern.
(ashrdi3): Remove shift by 1 expansion
kugan wrote:
> Wilco Dijkstra wrote:
> > + /* Slightly disparage left shift by 1 at so we prefer adddi3. */
> > + if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
> Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is
> done
of an
earlier load is used in an address calculation. This significantly improved
benchmark scores in a proprietary benchmark suite.
Passes AArch64 bootstrap and regress. OK for stage 1?
ChangeLog:
2017-04-05 Wilco Dijkstra
* config/arm/aarch-common.c (arm_early_load_addr_de
weak model only keeps the order if it doesn't make the schedule worse, it
should not impact performance adversely on cores that don't show a gain.
Any objections?
ChangeLog:
2017-04-05 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (generic_tunings): Update prefetch model.
--
di
-12 Wilco Dijkstra
* config/aarch64/aarch64.c (cortexa35_tunings): Set jump alignment to 4.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(cortexa73_tunings): Likewise.
--
diff --git a/gcc/config/aarch64
codesize cost [2], so setting it to 4 is best. This gives a 0.2% overall
codesize improvement as well as performance gains in several benchmarks.
Any objections?
Bootstrap OK on AArch64, OK for stage 1?
ChangeLog:
2017-04-12 Wilco Dijkstra
* config/aarch64/aarch64.c (generic_tunings
and regress OK on arm-none-linux-gnueabihf.
OK for stage 1?
ChangeLog:
2017-04-12 Wilco Dijkstra
* gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.
(arm_cortex_a35_tune): Likewise.
---
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (generic_addrcost_table):
Change HI/TI mode setting.
---
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index
419b756efcb40e48880cd4529efc4f9f59938325..728ce7029f1e2b5161d9f317d10e564dd5a5f472
100644
--- a
>On Wed, Apr 12, 2017 at 09:29:55AM +, Sudi Das wrote:
> > Hi all
> >
> > This is a fix for PR 80131
> > Currently the code A << (B - C) is not simplified.
>> However at least a more specific case of 1U << (C -x) where C =
>> precision(type) - 1 can be simplified to (1 << C) >> x.
>
> Is tha
Jakub Jelinek wrote:
> No. Some constants sometimes even 7 instructions (e.g. sparc64; not talking
> in particular about 1ULL << 63 constant), or have one instruction
> that is more expensive than normal small constant load. Compare say x86_64
> movl/movq vs. movabsq, I think the latter has
Richard Biener wrote:
> It is IMHO a valid GIMPLE optimization / canonicalization.
>
> movabsq $-9223372036854775808, %rax
>
> so this should then have been generated as 1<<63?
>
> At some point variable shifts were quite expensive as well..
Yes I don't see a major difference between movabs
ping
From: Wilco Dijkstra
Sent: 12 April 2017 14:08
To: GCC Patches
Cc: nd; James Greenhalgh; Evandro Menezes; jim.wil...@linaro.org;
andrew.pin...@cavium.com
Subject: [PATCH][AArch64] Improve address cost for -mcpu=generic
All cores which add a cpu_addrcost_table use a non-zero value for
ping
From: Wilco Dijkstra
Sent: 12 April 2017 14:02
To: GCC Patches
Cc: nd; Kyrylo Tkachov
Subject: [PATCH][ARM] Update max_cond_insns settings
The existing setting of max_cond_insns for most cores is non-optimal.
Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences
ping
From: Wilco Dijkstra
Sent: 12 April 2017 13:58
To: GCC Patches
Cc: nd; James Greenhalgh; jim.wil...@linaro.org; Evandro Menezes;
andrew.pin...@cavium.com
Subject: [PATCH][AArch64] Update alignment for -mcpu=generic
With -mcpu=generic the loop alignment is currently 4. All but one of
ping
From: Wilco Dijkstra
Sent: 12 April 2017 13:50
To: GCC Patches
Cc: nd; James Greenhalgh
Subject: [PATCH][AArch64] Set jump alignment to 4 for Cortex cores
Set jump alignment to 4 for Cortex cores as it reduces codesize by 0.4% on
average
with no obvious performance difference. See
ping
From: Wilco Dijkstra
Sent: 05 April 2017 13:38
To: GCC Patches
Cc: nd; James Greenhalgh; andrew.pin...@cavium.com; Evandro Menezes;
jim.wil...@linaro.org
Subject: [PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic
Many supported cores use the AUTOPREFETCHER_WEAK setting
ping
From: Wilco Dijkstra
Sent: 05 April 2017 13:29
To: GCC Patches
Cc: nd; James Greenhalgh
Subject: [PATCH][AArch64] Model Cortex-A53 load forwarding
Code scheduling for Cortex-A53 isn't as good as it could be. It turns out
code runs faster overall if we place loads and stores w
ping
From: Wilco Dijkstra
Sent: 16 March 2017 17:22
To: GCC Patches; Evandro Menezes; andrew.pin...@cavium.com;
jim.wil...@linaro.org
Cc: nd
Subject: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
Many supported cores implement fusion of AES instructions. When fusion
happens it can
ping
From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that
ping
From: Wilco Dijkstra
Sent: 17 January 2017 18:00
To: GCC Patches
Cc: nd; Kyrylo Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns
After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This mean
ping
From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the
ping
From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed
ping
From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091. This reduces the number of addressing instructions
when using DI mode operations (such
ping
From: Wilco Dijkstra
Sent: 29 November 2016 11:05
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only
ping
From: Wilco Dijkstra
Sent: 31 October 2016 18:29
To: GCC Patches
Cc: nd
Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage
This patch cleans up all code related to the frame pointer. On AArch64 we
emit a frame chain even in cases where the frame pointer is not required.
So
Hi Naveen,
> https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01368.html
This looks good to me - I have just one comment:
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13972,6 +13972,15 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn
*curr)
{
enum at
Hi Naveen,
> https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01369.html
Same comment for this part, we want to return true if we match:
+ if (SET_DEST (curr_set) != (pc_rtx)
+ || GET_CODE (SET_SRC (curr_set)) != IF_THEN_ELSE
+ || ! REG_P (XEXP (XEXP (SET_SRC (curr_set), 0), 0)
cmp w0, 42
bhi .L6
scvtf s0, w0
ret
.L6:
fmuls0, s0, s0
ret
Passes regress & bootstrap, OK for commit?
ChangeLog:
2017-04-26 Wilco Dijkstra
* config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.
Hi Naveen,
This version has the same issue of claiming that all instructions should
be fused except for the cases that can be fused. You should only return
true if there is a match, not if there is not a match.
Cheers,
Wilco
which
improves the example in PR79665 by ~7%. Given it is no longer used,
remove aarch_forward_to_shift_is_not_shifted_reg.
Passes AArch64 bootstrap and regress. OK for commit?
ChangeLog:
2017-04-27 Wilco Dijkstra
PR target/79665
* config/arm/aarch-common.c
Richard Earnshaw wrote:
> - 5, /* Max cond insns. */
> + 2, /* Max cond insns. */
> This parameter is also used for A32 code. Is that really the right
> number there as well?
Yes, this parameter has always been
Instead of:
ldr w0, [x0]
dup v0.2s, w0
ret
ChangeLog:
2017-09-13 Wilco Dijkstra
* gcc.target/aarch64/vmov_n_1.c: Update dup scan-assembler.
--
diff --git a/gcc/testsuite/gcc.target/aarch64/vmov_n_1.c
b/gcc/testsuite/gcc.target/aarch64/vmov_
Hi Charlie,
I can't see any use for adding a bus width to tune params. There are many
different buses in a modern CPU, so there is no such thing as a single
"bus width".
What we need is to add separate costs for the different kinds of loads and
stores. The timings for these depend mostly on the m
Jeff Law wrote:
> On 09/06/2017 03:55 AM, Jackson Woodruff wrote:
> > On 08/30/2017 01:46 PM, Richard Biener wrote:
>>> rdivtmp = 1 / (y*C);
>>> tem = x *rdivtmp;
>>> tem2= z * rdivtmp;
>>>
>>> instead of
>>>
>>> rdivtmp = 1/y;
>>> tem = x * 1/C * rdivtmp;
>>> tem2 = z * 1/C * rdivtmp;
Steve Ellcey wrote:
> And in aarch64 rtl expansion I see:
>
> (insn 10 9 11 (set (reg:QI 81)
> (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8]))
> "pr77729.c":3 -1
> (nil))
Yes using QI/HI mode anywhere in the RTL seems perverse and incorrect given
AArch64
doesn't suppo
Marc Glisse wrote:
> The question is whether, having computed c=a/b, it is cheaper to test a c!=0.
> I think it is usually the second one, but not for all types on all targets.
> Although since
> you mention VRP, it is easier to do further optimizations using the
> information a
Richard Sandiford wrote:
> I don't think it's literally always. Testing the inputs instead of a
> multi-use result tends to mean that all three are live at once. If the
> == 0 condition is only one component of a more complex condition that
> relies on the result of division regardless, then it'
Hi,
Here is the list of my AArch64 patches for review:
* https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02040.html (Fix unwinding with
-fomit-frame-pointer)
* https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01216.html (Fix symbol offset
limit)
* https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00396.
James Greenhalgh wrote:
> This seems like a bit of a theoretical issue as we would normally build
> libgcc with -fno-omit-frame-pointer anyway, but it can't hurt to guarantee
> this, so OK.
It's not theoretical since there were multiple users reporting unwinding issues,
so clearly doing CFLAGS="-
701 - 800 of 1198 matches
Mail list logo