Richard Biener wrote:
On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra wrote:
> > If bswap is false no byte swap is needed, so we found a native endian load
> > and it will always perform the optimization by inserting an unaligned load.
>
> Yes, the general agreement is that t
ping
From: Wilco Dijkstra
Sent: 12 September 2016 15:50
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit
Wilco wrote:
> The original example is from GCC itself, the fixed_regs array is small but
> due to
> optimization w
ping
From: Wilco Dijkstra
Sent: 02 September 2016 12:31
To: Ramana Radhakrishnan; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
Ramana Radhakrishnan wrote:
> Can you please file a PR for this and add some testcases ? This sounds like
ping
From: Wilco Dijkstra
Sent: 25 October 2016 18:08
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Improve SHA1 scheduling
SHA1H instructions may be scheduled after a SHA1C instruction
that uses the same input register. However SHA1C updates its input,
so if SHA1H is scheduled after
Andrew Pinski wrote:
> On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra
> wrote:
> > SHA1H instructions may be scheduled after a SHA1C instruction
> > that uses the same input register. However SHA1C updates its input,
> > so if SHA1H is scheduled after it, i
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091. This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).
Bootstrap & regress OK.
ChangeLog:
2015-11-03 Wilco Dijkstra
gcc/
* config/arm/a
Hi,
The patch looks correct, however I would suggest to rewrite this bit of the code
urgently in separate patch as it is way too complex to assert it is now bug
free -
there are too many possible failure scenarios to list... Also it generates quite
inefficient code - pushable_regs should include
Evandro Menezes wrote:
On 06/29/16 07:59, James Greenhalgh wrote:
> On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote:
>> ping
>>
>>
>> From: Wilco Dijkstra
>> Sent: 03 June 2016 11:51
>> To: GCC Patches
>> Cc: nd; philipp.toms...@theobrom
This patch improves the accuracy of the Cortex-A53 integer scheduler,
resulting in performance gains across a wide range of benchmarks.
OK for commit?
ChangeLog:
2016-07-05 Wilco Dijkstra
* config/arm/cortex-a53.md: Use final_presence_set for in-order.
(cortex_a53_shift): Add
Fix prototype in vst1Q_laneu64-1.c to unsigned char* so it passes.
Committed as trivial fix.
ChangeLog
2016-07-06 Wilco Dijkstra
gcc/testsuite/
* gcc.target/arm/vst1Q_laneu64-1.c (foo): Use unsigned char*.
---
diff --git a/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c
b/gcc
This patchset improves zero extend costs and code generation.
When zero extending a 32-bit register, we emit a "mov", but currently
report the cost of the "mov" incorrectly.
In terms of speed, we currently say the cost is that of an extend
operation. But the cost of a "mov" is the cost of 1 instr
When zero extending a 32-bit value to 64 bits, there should always be a
SET operation on the outside, according to the patterns in aarch64.md.
However, the mid-end can also ask for the cost of a made-up instruction,
where the zero-extend is part of another operation, not SET.
In this case we curre
where UBFM has
the same performance as AND, and minor speedups across several
benchmarks on an implementation where UBFM is slower than AND.
Bootstrapped and tested on aarch64-none-elf.
2016-07-19 Kristina Martsenko
2016-07-19 Wilco Dijkstra
* config/aarch64/aarch64.md
Richard Earnshaw wrote:
> I'm not sure about this, while rtx_cost is called recursively as it
> walks the RTL, I'd normally expect the outer levels of the recursion to
> catch the cases where zero-extend is folded into a more complex
> operation. Hitting a case like this suggests that something is
Richard Earnshaw wrote:
> Why does combine care what the cost is if the instruction isn't valid?
No idea. Combine does lots of odd things that don't make sense to me.
Unfortunately the costs we give for cases like this need to be accurate or
they negatively affect code quality. The reason for thi
Richard Earnshaw wrote:
> Both of which look reasonable to me.
Yes the code we generate for these examples is fine, I don't believe this
example ever went bad. It's just the cost calculation that is incorrect with
the outer check.
Wilco
Richard Earnshaw wrote:
> So under what circumstances does it lead to sub-optimal code?
If the cost is incorrect Combine can make the wrong decision, for example
whether to emit a multiply-add or not. I'm not sure whether this still happens
as Kyrill fixed several issues in Combine since this patc
This patch improves the readability of the prolog and epilog code by moving
some
code into separate functions. There is no difference in generated code.
OK for commit?
ChangeLog:
2016-07-26 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.c (aarch64_pushwb_pair_reg): Rename
(if frame_pointer_needed)
4. stp reg3, reg4, [sp, callee_offset + N*16] (store remaining callee-saves)
5. sub sp, sp, final_adjust
The epilog reverses this, and may omit step 3 if alloca wasn't used.
Bootstrap, GCC & gdb regression OK.
ChangeLog:
2016-07-29 Wilco
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only to one of the duplicates.
Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?
ChangeLog:
2016-11-29 Wilco Dijkstra
correct uses
of leaf_function_p from the ARM backend.
Bootstrap OK (verified all reads of is_leaf in ARM backend are now after
initialization), OK for commit?
ChangeLog:
2016-11-29 Wilco Dijkstra
* gcc/ira.c (ira_setup_eliminable_regset): Initialize crtl->is_leaf.
(ira): Move init
Jeff Law wrote:
> On 11/29/2016 04:10 AM, Wilco Dijkstra wrote:
> > GCC caches the whether a function is a leaf in crtl->is_leaf. Using this
> > in the backend is best as leaf_function_p may not work correctly (eg. while
> > emitting prolog or epilog code).
I fo
Jeff Law wrote:
> On 11/29/2016 11:39 AM, Wilco Dijkstra wrote:
> > I forgot to ask, would it be reasonable to add an assert to check we're not
> > in
> > a sequence in leaf_function_p? I guess this will trigger on several targets
> > (leaf_function_p is used in s
Bernd Edlinger wrote:
> On 11/29/16 16:06, Wilco Dijkstra wrote:
> > Bernd Edlinger wrote:
> >
> > - "TARGET_32BIT && reload_completed
> > + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)
> > &&a
(long long a, long long b)
{
if (a < b) return 1;
return a + b;
}
cmp r0, r2
sbcsip, r1, r3
ite ge
addge r0, r0, r2
movlt r0, #1
bx lr
Bootstrap OK. CSibe benchmarks unchanged.
ChangeLog:
2016-11-30 Wilco Dijks
t
for PR77308). This should generate identical code in all cases.
ChangeLog:
2016-11-30 Wilco Dijkstra
* gcc/config/arm/arm.md (subsi3_carryin): Add Thumb-2 RSC #0.
(arm_negdi2) Rename to negdi2, allow on Thumb-2.
* gcc/config/arm/thumb2.md (thumb2_negdi2): Remove pa
ion differences
unless there was a bug due to leaf_function_p returning the wrong value.
Bootstrap OK.
ChangeLog:
2016-12-05 Wilco Dijkstra
* gcc/config/arm/arm.h (TARGET_BACKTRACE): Use crtl->is_leaf.
* gcc/config/arm/arm.c (arm_option_check_internal): Improve c
ping
From: Wilco Dijkstra
Sent: 30 November 2016 17:39
To: GCC Patches
Cc: nd; Bernd Edlinger
Subject: [PATCH][ARM] Merge negdi2 patterns
The negdi2 patterns for ARM and Thumb-2 are duplicated because Thumb-2
doesn't support RSC with an immediate. We can however emulate RSC with
ping
From: Wilco Dijkstra
Sent: 30 November 2016 17:32
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve Thumb allocation order
Thumb uses a special register allocation order to increase the use of low
registers. Oddly enough, LR appears before R12, which means that LR must
be saved
ping
From: Wilco Dijkstra
Sent: 29 November 2016 11:05
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only
ping
From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed
ping
From: Wilco Dijkstra
Sent: 31 October 2016 18:29
To: GCC Patches
Cc: nd
Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage
This patch cleans up all code related to the frame pointer. On AArch64 we
emit a frame chain even in cases where the frame pointer is not required
ping
From: Wilco Dijkstra
Sent: 12 September 2016 15:50
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit
Wilco wrote:
> The original example is from GCC itself, the fixed_regs array is small but
> due to
> optimization we c
ping
From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091. This reduces the number of addressing instructions
when using DI mode operations (such
ping
From: Wilco Dijkstra
Sent: 25 October 2016 18:08
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Improve SHA1 scheduling
SHA1H instructions may be scheduled after a SHA1C instruction
that uses the same input register. However SHA1C updates its input,
so if SHA1H is scheduled
ping
From: Wilco Dijkstra
Sent: 02 September 2016 12:31
To: Ramana Radhakrishnan; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
Ramana Radhakrishnan wrote:
> Can you please file a PR for this and add some testcases ? This sounds like
ping
From: Wilco Dijkstra
Sent: 11 November 2016 13:14
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64] Improve TI mode address offsets
Richard Earnshaw wrote:
> Has this patch been truncated? The last line above looks to be part-way
> through a hunk.
Oops
James Greenhalgh wrote:
> I haven't seen a follow-up to Andrew's point regarding other
> read-modify-write operations.
>
> Did youi investigate the cost of these?
I looked at whether there are other similar cases, but it appears SHA1
is unique due to the odd dataflow, the mismatch in latencies a
LDP with a PC-relative address if aarch64_pcrelative_literal_loads
is true.
Bootstrap passes with aarch64_pcrelative_literal_loads=true.
ChangeLog:
2015-12-08 Wilco Dijkstra
PR target/78733
* config/aarch64/aarch64.c (aarch64_classify_address):
Set load_store_pair_p
James Greenhalgh wrote:
>
> I presume you also made a testsuite run?
>
> You should be able to do something like:
>
> make check RUNTESTFLAGS="--target_board=unix/-mpc-relative-literal-loads"
Yes the results of that looked OK, the 250 new failures are gone. I've
committed the fix.
Wilco
Bernd wrote:
> Hmm, it probably doesn't hurt, but looking at the PR I think the originally
> reported problem
> suggests you need a different fix: a separate register class to be used for
> indirect sibling calls.
> I remember seeing similar issues on other targets.
The only safe way to bloc
ping
From: Wilco Dijkstra
Sent: 29 November 2016 11:05
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
ping
From: Wilco Dijkstra
Sent: 30 November 2016 17:39
To: GCC Patches
Cc: nd; Bernd Edlinger
Subject: [PATCH][ARM] Merge negdi2 patterns
The negdi2 patterns for ARM and Thumb-2 are duplicated because Thumb-2
doesn't support RSC with an immediate. We can however emulate RSC
ping
From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are
ping
From: Wilco Dijkstra
Sent: 30 November 2016 17:32
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve Thumb allocation order
Thumb uses a special register allocation order to increase the use of low
registers. Oddly enough, LR appears before R12, which means that LR must
be
ping
From: Wilco Dijkstra
Sent: 05 December 2016 14:52
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove uses of leaf_function_p
Using leaf_function_p in a backend is dangerous as it incorrectly returns
false if it is called while in a sequence (for example during prolog/epilog
ping
From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091. This reduces the number of addressing instructions
when using DI mode operations
ping
From: Wilco Dijkstra
Sent: 31 October 2016 18:29
To: GCC Patches
Cc: nd
Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage
This patch cleans up all code related to the frame pointer. On AArch64 we
emit a frame chain even in cases where the frame pointer is not required.
So
ping
From: Wilco Dijkstra
Sent: 02 September 2016 12:31
To: Ramana Radhakrishnan; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
Ramana Radhakrishnan wrote:
> Can you please file a PR for this and add some testcases ? This sounds like
> a s
Kyrill Tkachov wrote:
> On 14/12/16 16:37, Wilco Dijkstra wrote:
>
> > Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
> > unnecessary duplication and repeating bugs like PR78439 due to changes being
> > applied only to one of the duplicates.
>
Ramana Radhakrishnan wrote:
> On Wed, Dec 14, 2016 at 5:43 PM, Wilco Dijkstra
> wrote:
> > Yes, the reason to split the pattern was to introduce the '!' to discourage
> > Neon->int moves on Cortex-A8 (https://patches.linaro.org/patch/541/). I am
> > not re
Bernd Edlinger wrote:
> this splits the *arm_negdi2, *arm_cmpdi_insn and *arm_cmpdi_unsigned
> also at split1 except for TARGET_NEON and TARGET_IWMMXT.
>
> In the new test case the stack is reduced to about 270 bytes, except
> for neon and iwmmxt, where this does not change anything.
This looks od
James Greenhalgh wrote:
> So this is off for all cores currently supported by GCC?
>
> I'm not sure I understand why we should take this if it will immediately
> be dead code?
I presume it was meant to have the vector variants enabled with -mcpu=exynos-m1
as that is where you can get a good gain
James Greenhalgh wrote:
> So the part of this patch removing the fallthrough to general operand
> is not OK for trunk.
>
> The other parts look reasonable to me, please resubmit just those.
Right, I removed the removal of the fallthrough. Here is the revised version:
ChangeLog:
2016-
Kyrill Tkachov wrote:
> On 25/04/16 20:21, Wilco Dijkstra wrote:
> > The GCC switch expansion is awful, so
> > even with a good indirect predictor it is better to use conditional
> > branches.
>
> In what way is it awful? If there's something we can do better at
&g
Richard Biener wrote:
>
> Yeah ;) I'm currently bootstrapping/testing the patch that makes it possible
> to
> write all this in match.pd.
So did that pass bootstrap? It would be good to decide how to proceed with this.
Wilco
Bernd Schmidt wrote:
> On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote:
>> On ARM / AArch32 I haven't seen any performance data yet - the one place we
>> are concerned
>> about the impact is on Thumb2 code size as regrename may end up
>> inadvertently putting more
>> things in high registers
Ramana Radhakrishnan wrote:
>
> Can you file a bugzilla entry with a testcase that folks can look at please ?
I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961. Unfortunately
I don't have a simple testcase that I can share.
Wilco
>> The new version does not seem better, as it adds a branch on the path
>> and it is not smaller.
>
> That looks like bb-reorder isn't doing its job? Maybe it thinks that
> pop is too expensive to copy?
It relies on static branch probabilities, which are set completely wrong in GCC,
so it ends u
ping
From: Wilco Dijkstra
Sent: 27 April 2016 17:39
To: James Greenhalgh
Cc: gcc-patches@gcc.gnu.org; nd
Subject: Re: [PATCH][AArch64] print_operand should not fallthrough from
register operand into generic operand
James Greenhalgh wrote:
> So the p
ping
From: Wilco Dijkstra
Sent: 22 April 2016 17:15
To: gcc-patches@gcc.gnu.org
Cc: nd
Subject: [PATCH][AArch64] Improve aarch64_case_values_threshold setting
GCC expands switch statements in a very simplistic way and tries to use a table
expansion even
James Greenhalgh wrote:
> As this change will change code generation for all cores (except
> Exynos-M1), I'd like to hear from those with more detailed knowledge of
> ThunderX, X-Gene and qdf24xx before I take this patch.
>
> Let's give it another week or so for comments, and expand the CC list.
N
ping
From: Wilco Dijkstra
Sent: 22 April 2016 16:35
To: gcc-patches@gcc.gnu.org
Cc: nd
Subject: [PATCH][AArch64] Adjust SIMD integer preference
SIMD operations like combine prefer to have their operands in FP registers,
so increase the cost of integer
James Greenhalgh wrote:
> It would be handy if you could raise something in bugzilla for the
> register allocator deficiency.
The register allocation issues are well known and we have multiple
workarounds for this in place. When you allow modes to be tieable
the workarounds are not as effective.
Richard Biener wrote:
>
> Yeah ;) I'm currently bootstrapping/testing the patch that makes it possible
> to
> write all this in match.pd.
So what was the conclusion? Improving match.pd to be able to handle more cases
like this seems like a nice thing.
Wilco
Remove aarch64_cannot_change_mode_class as the underlying issue
(PR67609) has been resolved. This avoids a few unnecessary lane
widening operations like:
faddp d18, v18.2d
mov d18, v18.d[0]
Passes regress, OK for commit?
ChangeLog:
2016-05-19 Wilco Dijkstra
* gcc/config
Jim Wilson wrote:
> It looks like a slight lose on qdf24xx on SPEC CPU2006 at -O3. I see
> about a 0.37% loss on the integer benchmarks, and no significant
> change on the FP benchmarks. The integer loss is mainly due to
> 458.sjeng which drops 2%. We had tried various values for
> max_case_valu
Richard Henderson wrote:
> However, the way that aarch64 and alpha have done it hasn't
> been ideal, in that there's a fairly costly search that must
> be done every time. I've thought before about changing this
> so that we would be able to cache results, akin to how we do
> it in expmed.c for mu
> Richard Henderson wrote:
> On 08/12/2015 08:59 AM, Wilco Dijkstra wrote:
> > I looked at the statistics of AArch64 immediate generation a while ago.
> > The interesting thing is ~95% of calls are queries, and the same query is on
> > average repeated 10 times in a row. So
This is a set of patches to reduce the compile-time overhead of immediate
generation on AArch64.
There have been discussions and investigations into reducing the overhead of
immediate generation
using various caching strategies. However the statistics showed some of the
expensive immediate
loops
checks the mask is repeated across the full 64 bits. Native performance is 5-6x
faster on typical
queries.
No change in generated code, passes GCC regression/bootstrap.
ChangeLog:
2015-09-02 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (aarch64_bitmask_imm):
Reimplement using
tests/bootstrap.
ChangeLog:
2015-09-02 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Replace slow immediate matching loops with a faster algorithm.
---
gcc/config/aarch64/aarch64.c | 96 +++-
1 file
Remove aarch64_bitmasks, aarch64_build_bitmask_table and aarch64_bitmasks_cmp
as they are no longer
used by the immediate generation code.
No change in generated code, passes GCC regression tests/bootstrap.
ChangeLog:
2015-09-02 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c
used
instead of add/sub (codesize remains the same).
ChangeLog:
2015-09-02 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Remove redundant immediate generation code.
---
gcc/config/aarch64/aarch64.c | 60
generated code for some
special cases but
codesize is identical.
ChangeLog:
2015-09-02 Wilco Dijkstra
* gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate):
Cleanup immediate generation code.
---
gcc/config/aarch64/aarch64.c | 137
Hi,
Combine canonicalizes certain AND masks in a comparison with zero into extracts
of the widest
register type. During matching these are expanded into a very inefficient
sequence that fails to
match. For example (x & 2) == 0 is matched in combine like this:
Failed to match this instruction:
(
> Segher Boessenkool wrote:
> Hi Wilco,
>
> On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote:
> > Combine canonicalizes certain AND masks in a comparison with zero into
> > extracts of the
> widest
> > register type. During matching these are
> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote:
> > > > Combine canonicalizes certain AND masks in a comparison with zero into
> > > > extracts of the
> > > widest
> > > > register t
> Kyrill Tkachov wrote:
> A testcase I was looking at is:
> int
> foo (int a)
> {
>return (a & 7) != 0;
> }
>
> For me this generates:
> and w0, w0, 7
> cmp w0, wzr
> csetw0, ne
> ret
>
> when it could be:
> tst w0, 7
> cs
> Oleg Endo wrote:
> On 04 Sep 2015, at 01:54, Segher Boessenkool
> wrote:
>
> > On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote:
> >>> void g(void);
> >>> void f(int *x) { if (*x & 2) g(); }
> >
> >> A testcase I was looking at is:
> >> int
> >> foo (int a)
> >> {
> >> return (a
> Segher Boessenkool wrote:
> On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote:
> > >>You will end up with a *lot* of target hooks like this. It will also
> > >>make testing harder (less coverage). I am not sure that is a good idea.
> > >
> > >We certainly need a lot more target hooks in
> Kugan wrote:
> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
> -O3 -g Error: unaligned opcodes detected in executable segment. It works
> fine if I remove the -g. I am looking into it and needs to be fixed as well.
This is a known assembler bug I found a while back, Renl
> pins...@gmail.com wrote:
> > On Sep 7, 2015, at 7:22 PM, Kugan wrote:
> >
> >
> >
> > On 07/09/15 20:46, Wilco Dijkstra wrote:
> >>> Kugan wrote:
> >>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with
> >>&g
> Renlin Li wrote:
> Hi Andrew,
>
> Previously, there is a discussion thread in binutils mailing list:
>
> https://sourceware.org/ml/binutils/2015-04/msg00032.html
>
> Nick proposed a way to fix, Richard Henderson hold similar opinion as you.
Both Nick and Richard H seem to think it is an issue
from vectorizing this loop is around 15-30% which shows vectorizing it is
indeed beneficial.
ChangeLog:
2016-11-10 Wilco Dijkstra
* config/aarch64/aarch64.c (cortexa57_vector_cost):
Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.
--
diff --git a/gcc/config
ference in code
generation.
ChangeLog:
2016-11-10 Wilco Dijkstra
* config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3)
Use bfx attribute.
(aarch64_lshr_sisd_or_int_3): Likewise.
(aarch64_ashr_sisd_or_int_3): Likewise.
(si3_insn_uxtw): Likewise.
pipelines, so swap the bfm
and extend reservations. This results in minor scheduling differences.
I think the XGene-1 scheduler might need a similar change as currently all
AArch64
shifts are modelled as 2-cycle operations.
ChangeLog:
2016-11-10 Wilco Dijkstra
* config/arm/cortex-a57.md
mp; regress OK.
ChangeLog:
2015-11-10 Wilco Dijkstra
gcc/
* config/aarch64/aarch64.md (movti_aarch64): Change Ump to m.
(movtf_aarch64): Likewise.
* config/aarch64/aarch64.c (aarch64_classify_address):
Use correct intersect
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize. Given 4 works well for Thumb-2, use the same limit for ARM
for consistency.
ChangeLog:
2016-11-04 Wilco Dijkstra
Ramana Radhakrishnan wrote:
> On Thu, Nov 3, 2016 at 12:20 PM, Wilco Dijkstra
> wrote:
> HOST_WIDE_INT val = INTVAL (index);
> - /* ??? Can we assume ldrd for thumb2? */
> - /* Thumb-2 ldrd only has reg+const addressing modes. */
> - /* ldr
Richard Earnshaw wrote:
> On 10/11/16 17:19, Wilco Dijkstra wrote:
> > Improve the logic when setting max_insns_skipped. Limit the maximum size
> > of IT
> > to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
> > increasing codesize.
Richard Earnshaw wrote:
> Has this patch been truncated? The last line above looks to be part-way
> through a hunk.
Oops sorry, it seems the last few lines are missing. Here is the full version:
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index
3045e6d6447d5c1860fe
ping
From: Wilco Dijkstra
Sent: 02 November 2016 16:49
To: Ramana Radhakrishnan; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
ping
From: Wilco Dijkstra
Sent: 02 September 2016 12:31
To: Ramana Radhakrishnan; GCC Patches
Cc: nd
Subject: Re
ping
From: Wilco Dijkstra
Sent: 25 October 2016 18:08
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Improve SHA1 scheduling
SHA1H instructions may be scheduled after a SHA1C instruction
that uses the same input register. However SHA1C updates its input,
so if SHA1H is scheduled after
ping
From: Wilco Dijkstra
Sent: 12 September 2016 15:50
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit
Wilco wrote:
> The original example is from GCC itself, the fixed_regs array is small but
> due to
> optimization we c
ping
From: Wilco Dijkstra
Sent: 31 October 2016 18:29
To: GCC Patches
Cc: nd
Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage
This patch cleans up all code related to the frame pointer. On AArch64 we
emit a frame chain even in cases where the frame pointer is not required.
So
Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
> On 10/11/16 17:19, Wilco Dijkstra wrote:
> Long conditional sequences are slow on modern cores - the value 6 for
> max_insns_skipped is a few decades out of date as it was meant for ARM2!
> Even with -Os the performance loss for l
On 07/11/2016 16:59, Adhemerval Zanella wrote:
> On 14/10/2016 15:59, Wilco Dijkstra wrote:
> There is no limit afaik on gold split stack allocation handling,
> and I think one could be added for each backend (in the method
> override require to implement it).
>
> In fac
-11-12 Wilco Dijkstra
* gcc/target.def (gen_ccmp_first): Update documentation.
(gen_ccmp_next): Likewise.
* gcc/doc/tm.texi (gen_ccmp_first): Update documentation.
(gen_ccmp_next): Likewise.
* gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from return
ping
> -Original Message-
> From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com]
> Sent: 17 November 2015 18:36
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP
>
> (v2 version removes 4 enums)
>
> This patch adds support
901 - 1000 of 1191 matches
Mail list logo