Re: [RFC][PATCH] Remove a bad use of SLOW_UNALIGNED_ACCESS

2016-11-02 Thread Wilco Dijkstra
Richard Biener wrote: On Tue, Nov 1, 2016 at 10:39 PM, Wilco Dijkstra wrote: > > If bswap is false no byte swap is needed, so we found a native endian load > > and it will always perform the optimization by inserting an unaligned load. > > Yes, the general agreement is that t

Re: [PATCH v2][AArch64] Fix symbol offset limit

2016-11-02 Thread Wilco Dijkstra
    ping From: Wilco Dijkstra Sent: 12 September 2016 15:50 To: Richard Earnshaw; GCC Patches Cc: nd Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit     Wilco wrote:    > The original example is from GCC itself, the fixed_regs array is small but > due to > optimization w

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-11-02 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 02 September 2016 12:31 To: Ramana Radhakrishnan; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation     Ramana Radhakrishnan wrote: > Can you please file a PR for this and add some testcases ?  This sounds like

Re: [PATCH][AArch64] Improve SHA1 scheduling

2016-11-02 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 25 October 2016 18:08 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Improve SHA1 scheduling   SHA1H instructions may be scheduled after a SHA1C instruction that uses the same input register.  However SHA1C updates its input, so if SHA1H is scheduled after

Re: [PATCH][AArch64] Improve SHA1 scheduling

2016-11-03 Thread Wilco Dijkstra
Andrew Pinski wrote: > On Tue, Oct 25, 2016 at 10:08 AM, Wilco Dijkstra > wrote: > > SHA1H instructions may be scheduled after a SHA1C instruction > > that uses the same input register.  However SHA1C updates its input, > > so if SHA1H is scheduled after it, i

[PATCH][ARM] Fix ldrd offsets

2016-11-03 Thread Wilco Dijkstra
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020, without -255..4091. This reduces the number of addressing instructions when using DI mode operations (such as in PR77308). Bootstrap & regress OK. ChangeLog: 2015-11-03 Wilco Dijkstra gcc/ * config/arm/a

Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

2016-11-03 Thread Wilco Dijkstra
Hi, The patch looks correct, however I would suggest to rewrite this bit of the code urgently in separate patch as it is way too complex to assert it is now bug free - there are too many possible failure scenarios to list... Also it generates quite inefficient code - pushable_regs should include

Re: [PATCH][AArch64] Increase code alignment

2016-06-30 Thread Wilco Dijkstra
Evandro Menezes wrote: On 06/29/16 07:59, James Greenhalgh wrote: > On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote: >> ping >> >> >> From: Wilco Dijkstra >> Sent: 03 June 2016 11:51 >> To: GCC Patches >> Cc: nd; philipp.toms...@theobrom

[PATCH][AArch64] Improve Cortex-A53 integer scheduler

2016-07-05 Thread Wilco Dijkstra
This patch improves the accuracy of the Cortex-A53 integer scheduler, resulting in performance gains across a wide range of benchmarks. OK for commit? ChangeLog: 2016-07-05 Wilco Dijkstra * config/arm/cortex-a53.md: Use final_presence_set for in-order. (cortex_a53_shift): Add

[PATCH][ARM][Testsuite] Fix prototype in vst1Q_laneu64-1.c

2016-07-06 Thread Wilco Dijkstra
Fix prototype in vst1Q_laneu64-1.c to unsigned char* so it passes. Committed as trivial fix. ChangeLog 2016-07-06 Wilco Dijkstra gcc/testsuite/ * gcc.target/arm/vst1Q_laneu64-1.c (foo): Use unsigned char*. --- diff --git a/gcc/testsuite/gcc.target/arm/vst1Q_laneu64-1.c b/gcc

[PATCH 1/3][AArch64] Improve zero extend

2016-07-19 Thread Wilco Dijkstra
This patchset improves zero extend costs and code generation. When zero extending a 32-bit register, we emit a "mov", but currently report the cost of the "mov" incorrectly. In terms of speed, we currently say the cost is that of an extend operation. But the cost of a "mov" is the cost of 1 instr

[PATCH 2/3][AArch64] Improve zero extend

2016-07-19 Thread Wilco Dijkstra
When zero extending a 32-bit value to 64 bits, there should always be a SET operation on the outside, according to the patterns in aarch64.md. However, the mid-end can also ask for the cost of a made-up instruction, where the zero-extend is part of another operation, not SET. In this case we curre

[PATCH 3/3][AArch64] Improve zero extend

2016-07-19 Thread Wilco Dijkstra
where UBFM has the same performance as AND, and minor speedups across several benchmarks on an implementation where UBFM is slower than AND. Bootstrapped and tested on aarch64-none-elf. 2016-07-19 Kristina Martsenko 2016-07-19 Wilco Dijkstra * config/aarch64/aarch64.md

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > I'm not sure about this, while rtx_cost is called recursively as it > walks the RTL, I'd normally expect the outer levels of the recursion to > catch the cases where zero-extend is folded into a more complex > operation.  Hitting a case like this suggests that something is

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > Why does combine care what the cost is if the instruction isn't valid? No idea. Combine does lots of odd things that don't make sense to me. Unfortunately the costs we give for cases like this need to be accurate or they negatively affect code quality. The reason for thi

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > Both of which look reasonable to me. Yes the code we generate for these examples is fine, I don't believe this example ever went bad. It's just the cost calculation that is incorrect with the outer check. Wilco

Re: [PATCH 2/3][AArch64] Improve zero extend

2016-07-20 Thread Wilco Dijkstra
Richard Earnshaw wrote: > So under what circumstances does it lead to sub-optimal code? If the cost is incorrect Combine can make the wrong decision, for example whether to emit a multiply-add or not. I'm not sure whether this still happens as Kyrill fixed several issues in Combine since this patc

[PATCH][AArch64] Cleanup frame push/pop code

2016-07-26 Thread Wilco Dijkstra
This patch improves the readability of the prolog and epilog code by moving some code into separate functions. There is no difference in generated code. OK for commit? ChangeLog: 2016-07-26 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_pushwb_pair_reg): Rename

[PATCH][AArch64] Optimize prolog/epilog

2016-07-29 Thread Wilco Dijkstra
(if frame_pointer_needed) 4. stp reg3, reg4, [sp, callee_offset + N*16] (store remaining callee-saves) 5. sub sp, sp, final_adjust The epilog reverses this, and may omit step 3 if alloca wasn't used. Bootstrap, GCC & gdb regression OK. ChangeLog: 2016-07-29 Wilco

[PATCH][ARM] Remove movdi_vfp_cortexa8

2016-11-29 Thread Wilco Dijkstra
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid unnecessary duplication and repeating bugs like PR78439 due to changes being applied only to one of the duplicates. Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit? ChangeLog: 2016-11-29 Wilco Dijkstra

[PATCH] Remove uninitialized reads of is_leaf

2016-11-29 Thread Wilco Dijkstra
correct uses of leaf_function_p from the ARM backend. Bootstrap OK (verified all reads of is_leaf in ARM backend are now after initialization), OK for commit? ChangeLog: 2016-11-29 Wilco Dijkstra * gcc/ira.c (ira_setup_eliminable_regset): Initialize crtl->is_leaf. (ira): Move init

Re: [PATCH] Remove uninitialized reads of is_leaf

2016-11-29 Thread Wilco Dijkstra
Jeff Law wrote: > On 11/29/2016 04:10 AM, Wilco Dijkstra wrote: > > GCC caches the whether a function is a leaf in crtl->is_leaf. Using this > > in the backend is best as leaf_function_p may not work correctly (eg. while > > emitting prolog or epilog code).  I fo

Re: [PATCH] Remove uninitialized reads of is_leaf

2016-11-29 Thread Wilco Dijkstra
Jeff Law wrote: > On 11/29/2016 11:39 AM, Wilco Dijkstra wrote: > > I forgot to ask, would it be reasonable to add an assert to check we're not > > in > > a sequence in leaf_function_p? I guess this will trigger on several targets > > (leaf_function_p is used in s

Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2016-11-30 Thread Wilco Dijkstra
Bernd Edlinger wrote: > On 11/29/16 16:06, Wilco Dijkstra wrote: > > Bernd Edlinger wrote: > > > > -  "TARGET_32BIT && reload_completed > > +  "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed) > > &&a

[PATCH][ARM] Improve Thumb allocation order

2016-11-30 Thread Wilco Dijkstra
(long long a, long long b) { if (a < b) return 1; return a + b; } cmp r0, r2 sbcsip, r1, r3 ite ge addge r0, r0, r2 movlt r0, #1 bx lr Bootstrap OK. CSibe benchmarks unchanged. ChangeLog: 2016-11-30 Wilco Dijks

[PATCH][ARM] Merge negdi2 patterns

2016-11-30 Thread Wilco Dijkstra
t for PR77308). This should generate identical code in all cases. ChangeLog: 2016-11-30 Wilco Dijkstra * gcc/config/arm/arm.md (subsi3_carryin): Add Thumb-2 RSC #0. (arm_negdi2) Rename to negdi2, allow on Thumb-2. * gcc/config/arm/thumb2.md (thumb2_negdi2): Remove pa

[PATCH][ARM] Remove uses of leaf_function_p

2016-12-05 Thread Wilco Dijkstra
ion differences unless there was a bug due to leaf_function_p returning the wrong value. Bootstrap OK. ChangeLog: 2016-12-05 Wilco Dijkstra * gcc/config/arm/arm.h (TARGET_BACKTRACE): Use crtl->is_leaf. * gcc/config/arm/arm.c (arm_option_check_internal): Improve c

Re: [PATCH][ARM] Merge negdi2 patterns

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 30 November 2016 17:39 To: GCC Patches Cc: nd; Bernd Edlinger Subject: [PATCH][ARM] Merge negdi2 patterns   The negdi2 patterns for ARM and Thumb-2 are duplicated because Thumb-2 doesn't support RSC with an immediate.  We can however emulate RSC with

Re: [PATCH][ARM] Improve Thumb allocation order

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 30 November 2016 17:32 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Improve Thumb allocation order   Thumb uses a special register allocation order to increase the use of low registers.  Oddly enough, LR appears before R12, which means that LR must be saved

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 29 November 2016 11:05 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8   Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid unnecessary duplication and repeating bugs like PR78439 due to changes being applied only

Re: [PATCH][ARM] Improve max_insns_skipped logic

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 10 November 2016 17:19 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Improve max_insns_skipped logic   Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2016-12-06 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 31 October 2016 18:29 To: GCC Patches Cc: nd Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage     This patch cleans up all code related to the frame pointer.  On AArch64 we emit a frame chain even in cases where the frame pointer is not required

Re: [PATCH v2][AArch64] Fix symbol offset limit

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 12 September 2016 15:50 To: Richard Earnshaw; GCC Patches Cc: nd Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit   Wilco wrote:    > The original example is from GCC itself, the fixed_regs array is small but > due to > optimization we c

Re: [PATCH][ARM] Fix ldrd offsets

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 03 November 2016 12:20 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Fix ldrd offsets   Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020, without -255..4091.  This reduces the number of addressing instructions when using DI mode operations (such

Re: [PATCH][AArch64] Improve SHA1 scheduling

2016-12-06 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 25 October 2016 18:08 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Improve SHA1 scheduling     SHA1H instructions may be scheduled after a SHA1C instruction that uses the same input register.  However SHA1C updates its input, so if SHA1H is scheduled

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-12-06 Thread Wilco Dijkstra
    ping From: Wilco Dijkstra Sent: 02 September 2016 12:31 To: Ramana Radhakrishnan; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation     Ramana Radhakrishnan wrote: > Can you please file a PR for this and add some testcases ?  This sounds like

Re: [PATCH][AArch64] Improve TI mode address offsets

2016-12-06 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 11 November 2016 13:14 To: Richard Earnshaw; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64] Improve TI mode address offsets   Richard Earnshaw wrote: > Has this patch been truncated?  The last line above looks to be part-way > through a hunk. Oops

Re: [PATCH][AArch64] Improve SHA1 scheduling

2016-12-07 Thread Wilco Dijkstra
James Greenhalgh wrote: > I haven't seen a follow-up to Andrew's point regarding other > read-modify-write operations. > > Did youi investigate the cost of these? I looked at whether there are other similar cases, but it appears SHA1 is unique due to the odd dataflow, the mismatch in latencies a

[PATCH][AArch64] Fix PR78733

2016-12-08 Thread Wilco Dijkstra
LDP with a PC-relative address if aarch64_pcrelative_literal_loads is true. Bootstrap passes with aarch64_pcrelative_literal_loads=true. ChangeLog: 2015-12-08 Wilco Dijkstra PR target/78733 * config/aarch64/aarch64.c (aarch64_classify_address): Set load_store_pair_p

Re: [PATCH][AArch64] Fix PR78733

2016-12-08 Thread Wilco Dijkstra
James Greenhalgh wrote: > > I presume you also made a testsuite run? > > You should be able to do something like: > >  make check RUNTESTFLAGS="--target_board=unix/-mpc-relative-literal-loads" Yes the results of that looked OK, the 250 new failures are gone. I've committed the fix. Wilco

Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Wilco Dijkstra
Bernd wrote: > Hmm, it probably doesn't hurt, but looking at the PR I think the originally > reported problem > suggests you need a different fix: a separate register class to be used for > indirect sibling calls. > I remember seeing similar issues on other targets. The only safe way to bloc

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2016-12-14 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 29 November 2016 11:05 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8     Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid unnecessary duplication and repeating bugs like PR78439 due to changes being

Re: [PATCH][ARM] Merge negdi2 patterns

2016-12-14 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 30 November 2016 17:39 To: GCC Patches Cc: nd; Bernd Edlinger Subject: [PATCH][ARM] Merge negdi2 patterns     The negdi2 patterns for ARM and Thumb-2 are duplicated because Thumb-2 doesn't support RSC with an immediate.  We can however emulate RSC

Re: [PATCH][ARM] Improve max_insns_skipped logic

2016-12-14 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 10 November 2016 17:19 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Improve max_insns_skipped logic     Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are

Re: [PATCH][ARM] Improve Thumb allocation order

2016-12-14 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 30 November 2016 17:32 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Improve Thumb allocation order     Thumb uses a special register allocation order to increase the use of low registers.  Oddly enough, LR appears before R12, which means that LR must be

Re: [PATCH][ARM] Remove uses of leaf_function_p

2016-12-14 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 05 December 2016 14:52 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Remove uses of leaf_function_p   Using leaf_function_p in a backend is dangerous as it incorrectly returns false if it is called while in a sequence (for example during prolog/epilog

Re: [PATCH][ARM] Fix ldrd offsets

2016-12-14 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 03 November 2016 12:20 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Fix ldrd offsets     Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020, without -255..4091.  This reduces the number of addressing instructions when using DI mode operations

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2016-12-14 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 31 October 2016 18:29 To: GCC Patches Cc: nd Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage     This patch cleans up all code related to the frame pointer.  On AArch64 we emit a frame chain even in cases where the frame pointer is not required. So

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-12-14 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 02 September 2016 12:31 To: Ramana Radhakrishnan; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation     Ramana Radhakrishnan wrote: > Can you please file a PR for this and add some testcases ?  This sounds like > a s

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2016-12-14 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > On 14/12/16 16:37, Wilco Dijkstra wrote: > > > Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid > > unnecessary duplication and repeating bugs like PR78439 due to changes being > > applied only to one of the duplicates. >

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2016-12-19 Thread Wilco Dijkstra
Ramana Radhakrishnan wrote: > On Wed, Dec 14, 2016 at 5:43 PM, Wilco Dijkstra > wrote: > > Yes, the reason to split the pattern was to introduce the '!' to discourage > > Neon->int moves on Cortex-A8 (https://patches.linaro.org/patch/541/). I am > > not re

Re: [PATCH, ARM] Further improve stack usage in sha512, part 2 (PR 77308)

2016-12-20 Thread Wilco Dijkstra
Bernd Edlinger wrote: > this splits the *arm_negdi2, *arm_cmpdi_insn and *arm_cmpdi_unsigned > also at split1 except for TARGET_NEON and TARGET_IWMMXT. > > In the new test case the stack is reduced to about 270 bytes, except > for neon and iwmmxt, where this does not change anything. This looks od

Re: [AArch64] Emit division using the Newton series

2016-04-27 Thread Wilco Dijkstra
James Greenhalgh wrote: > So this is off for all cores currently supported by GCC? > > I'm not sure I understand why we should take this if it will immediately > be dead code? I presume it was meant to have the vector variants enabled with -mcpu=exynos-m1 as that is where you can get a good gain

Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-04-27 Thread Wilco Dijkstra
James Greenhalgh wrote: > So the part of this patch removing the fallthrough to general operand > is not OK for trunk. > > The other parts look reasonable to me, please resubmit just those. Right, I removed the removal of the fallthrough. Here is the revised version: ChangeLog: 2016-

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-28 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > On 25/04/16 20:21, Wilco Dijkstra wrote: > > The GCC switch expansion is awful, so > > even with a good indirect predictor it is better to use conditional > > branches. > > In what way is it awful? If there's something we can do better at &g

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-05-04 Thread Wilco Dijkstra
Richard Biener wrote: > > Yeah ;) I'm currently bootstrapping/testing the patch that makes it possible > to > write all this in match.pd. So did that pass bootstrap? It would be good to decide how to proceed with this. Wilco

Re: Enabling -frename-registers?

2016-05-04 Thread Wilco Dijkstra
Bernd Schmidt wrote: > On 05/04/2016 03:25 PM, Ramana Radhakrishnan wrote: >> On ARM / AArch32 I haven't seen any performance data yet - the one place we >> are concerned >> about the impact is on Thumb2 code size as regrename may end up >> inadvertently putting more >> things in high registers

Re: Enabling -frename-registers?

2016-05-05 Thread Wilco Dijkstra
Ramana Radhakrishnan wrote: > > Can you file a bugzilla entry with a testcase that folks can look at please ? I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70961. Unfortunately I don't have a simple testcase that I can share. Wilco

Re: [PATCH 3/3] shrink-wrap: Remove complicated simple_return manipulations

2016-05-10 Thread Wilco Dijkstra
>> The new version does not seem better, as it adds a branch on the path >> and it is not smaller. > > That looks like bb-reorder isn't doing its job? Maybe it thinks that > pop is too expensive to copy? It relies on static branch probabilities, which are set completely wrong in GCC, so it ends u

Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-05-16 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 27 April 2016 17:39 To: James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd Subject: Re: [PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand James Greenhalgh wrote: > So the p

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-16 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 22 April 2016 17:15 To: gcc-patches@gcc.gnu.org Cc: nd Subject: [PATCH][AArch64] Improve aarch64_case_values_threshold setting GCC expands switch statements in a very simplistic way and tries to use a table expansion even

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-16 Thread Wilco Dijkstra
James Greenhalgh wrote: > As this change will change code generation for all cores (except > Exynos-M1), I'd like to hear from those with more detailed knowledge of > ThunderX, X-Gene and qdf24xx before I take this patch. > > Let's give it another week or so for comments, and expand the CC list. N

Re: [PATCH][AArch64] Adjust SIMD integer preference

2016-05-17 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 22 April 2016 16:35 To: gcc-patches@gcc.gnu.org Cc: nd Subject: [PATCH][AArch64] Adjust SIMD integer preference SIMD operations like combine prefer to have their operands in FP registers, so increase the cost of integer

Re: [PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-05-17 Thread Wilco Dijkstra
James Greenhalgh wrote: > It would be handy if you could raise something in bugzilla for the > register allocator deficiency. The register allocation issues are well known and we have multiple workarounds for this in place. When you allow modes to be tieable the workarounds are not as effective.

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-05-18 Thread Wilco Dijkstra
Richard Biener wrote: > > Yeah ;) I'm currently bootstrapping/testing the patch that makes it possible > to > write all this in match.pd. So what was the conclusion? Improving match.pd to be able to handle more cases like this seems like a nice thing. Wilco

[PATCH][AArch64] Remove aarch64_cannot_change_mode_class

2016-05-19 Thread Wilco Dijkstra
Remove aarch64_cannot_change_mode_class as the underlying issue (PR67609) has been resolved. This avoids a few unnecessary lane widening operations like: faddp d18, v18.2d mov d18, v18.d[0] Passes regress, OK for commit? ChangeLog: 2016-05-19 Wilco Dijkstra * gcc/config

Re: [PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-05-24 Thread Wilco Dijkstra
Jim Wilson wrote: > It looks like a slight lose on qdf24xx on SPEC CPU2006 at -O3. I see > about a 0.37% loss on the integer benchmarks, and no significant > change on the FP benchmarks. The integer loss is mainly due to > 458.sjeng which drops 2%. We had tried various values for > max_case_valu

Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation

2015-08-12 Thread Wilco Dijkstra
Richard Henderson wrote: > However, the way that aarch64 and alpha have done it hasn't > been ideal, in that there's a fairly costly search that must > be done every time. I've thought before about changing this > so that we would be able to cache results, akin to how we do > it in expmed.c for mu

RE: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation

2015-08-25 Thread Wilco Dijkstra
> Richard Henderson wrote: > On 08/12/2015 08:59 AM, Wilco Dijkstra wrote: > > I looked at the statistics of AArch64 immediate generation a while ago. > > The interesting thing is ~95% of calls are queries, and the same query is on > > average repeated 10 times in a row. So

[PATCH][AArch64][0/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
This is a set of patches to reduce the compile-time overhead of immediate generation on AArch64. There have been discussions and investigations into reducing the overhead of immediate generation using various caching strategies. However the statistics showed some of the expensive immediate loops

[PATCH][AArch64][1/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
checks the mask is repeated across the full 64 bits. Native performance is 5-6x faster on typical queries. No change in generated code, passes GCC regression/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (aarch64_bitmask_imm): Reimplement using

[PATCH][AArch64][2/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
tests/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Replace slow immediate matching loops with a faster algorithm. --- gcc/config/aarch64/aarch64.c | 96 +++- 1 file

[PATCH][AArch64][3/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
Remove aarch64_bitmasks, aarch64_build_bitmask_table and aarch64_bitmasks_cmp as they are no longer used by the immediate generation code. No change in generated code, passes GCC regression tests/bootstrap. ChangeLog: 2015-09-02 Wilco Dijkstra * gcc/config/aarch64/aarch64.c

[PATCH][AArch64][4/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
used instead of add/sub (codesize remains the same). ChangeLog: 2015-09-02 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Remove redundant immediate generation code. --- gcc/config/aarch64/aarch64.c | 60

[PATCH][AArch64][5/5] Improve immediate generation

2015-09-02 Thread Wilco Dijkstra
generated code for some special cases but codesize is identical. ChangeLog: 2015-09-02 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (aarch64_internal_mov_immediate): Cleanup immediate generation code. --- gcc/config/aarch64/aarch64.c | 137

RFC: Combine of compare & and oddity

2015-09-02 Thread Wilco Dijkstra
Hi, Combine canonicalizes certain AND masks in a comparison with zero into extracts of the widest register type. During matching these are expanded into a very inefficient sequence that fails to match. For example (x & 2) == 0 is matched in combine like this: Failed to match this instruction: (

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote: > Hi Wilco, > > On Wed, Sep 02, 2015 at 06:09:24PM +0100, Wilco Dijkstra wrote: > > Combine canonicalizes certain AND masks in a comparison with zero into > > extracts of the > widest > > register type. During matching these are

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote: > On Thu, Sep 03, 2015 at 12:43:34PM +0100, Wilco Dijkstra wrote: > > > > Combine canonicalizes certain AND masks in a comparison with zero into > > > > extracts of the > > > widest > > > > register t

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Kyrill Tkachov wrote: > A testcase I was looking at is: > int > foo (int a) > { >return (a & 7) != 0; > } > > For me this generates: > and w0, w0, 7 > cmp w0, wzr > csetw0, ne > ret > > when it could be: > tst w0, 7 > cs

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Oleg Endo wrote: > On 04 Sep 2015, at 01:54, Segher Boessenkool > wrote: > > > On Thu, Sep 03, 2015 at 05:25:43PM +0100, Kyrill Tkachov wrote: > >>> void g(void); > >>> void f(int *x) { if (*x & 2) g(); } > > > >> A testcase I was looking at is: > >> int > >> foo (int a) > >> { > >> return (a

RE: RFC: Combine of compare & and oddity

2015-09-03 Thread Wilco Dijkstra
> Segher Boessenkool wrote: > On Thu, Sep 03, 2015 at 10:09:36AM -0600, Jeff Law wrote: > > >>You will end up with a *lot* of target hooks like this. It will also > > >>make testing harder (less coverage). I am not sure that is a good idea. > > > > > >We certainly need a lot more target hooks in

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-07 Thread Wilco Dijkstra
> Kugan wrote: > 2. vector-compare-1.c from c-c++-common/torture fails to assemble with > -O3 -g Error: unaligned opcodes detected in executable segment. It works > fine if I remove the -g. I am looking into it and needs to be fixed as well. This is a known assembler bug I found a while back, Renl

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-07 Thread Wilco Dijkstra
> pins...@gmail.com wrote: > > On Sep 7, 2015, at 7:22 PM, Kugan wrote: > > > > > > > > On 07/09/15 20:46, Wilco Dijkstra wrote: > >>> Kugan wrote: > >>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with > >>&g

RE: [0/7] Type promotion pass and elimination of zext/sext

2015-09-08 Thread Wilco Dijkstra
> Renlin Li wrote: > Hi Andrew, > > Previously, there is a discussion thread in binutils mailing list: > > https://sourceware.org/ml/binutils/2015-04/msg00032.html > > Nick proposed a way to fix, Richard Henderson hold similar opinion as you. Both Nick and Richard H seem to think it is an issue

[PATCH][AArch64] Tweak Cortex-A57 vector cost

2016-11-10 Thread Wilco Dijkstra
from vectorizing this loop is around 15-30% which shows vectorizing it is indeed beneficial. ChangeLog: 2016-11-10 Wilco Dijkstra * config/aarch64/aarch64.c (cortexa57_vector_cost): Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost. -- diff --git a/gcc/config

[PATCH 1/2][AArch64] Add bfx attribute

2016-11-10 Thread Wilco Dijkstra
ference in code generation. ChangeLog: 2016-11-10 Wilco Dijkstra * config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3) Use bfx attribute. (aarch64_lshr_sisd_or_int_3): Likewise. (aarch64_ashr_sisd_or_int_3): Likewise. (si3_insn_uxtw): Likewise.

[PATCH 2/2][AArch64] Add bfx attribute

2016-11-10 Thread Wilco Dijkstra
pipelines, so swap the bfm and extend reservations. This results in minor scheduling differences. I think the XGene-1 scheduler might need a similar change as currently all AArch64 shifts are modelled as 2-cycle operations. ChangeLog: 2016-11-10 Wilco Dijkstra * config/arm/cortex-a57.md

[PATCH][AArch64] Improve TI mode address offsets

2016-11-10 Thread Wilco Dijkstra
mp; regress OK. ChangeLog: 2015-11-10 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (movti_aarch64): Change Ump to m. (movtf_aarch64): Likewise. * config/aarch64/aarch64.c (aarch64_classify_address): Use correct intersect

[PATCH][ARM] Improve max_insns_skipped logic

2016-11-10 Thread Wilco Dijkstra
Improve the logic when setting max_insns_skipped. Limit the maximum size of IT to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed, increasing codesize. Given 4 works well for Thumb-2, use the same limit for ARM for consistency. ChangeLog: 2016-11-04 Wilco Dijkstra

Re: [PATCH][ARM] Fix ldrd offsets

2016-11-11 Thread Wilco Dijkstra
Ramana Radhakrishnan wrote: > On Thu, Nov 3, 2016 at 12:20 PM, Wilco Dijkstra > wrote: >   HOST_WIDE_INT val = INTVAL (index); > - /* ??? Can we assume ldrd for thumb2?  */ > - /* Thumb-2 ldrd only has reg+const addressing modes.  */ > - /* ldr

Re: [PATCH][ARM] Improve max_insns_skipped logic

2016-11-11 Thread Wilco Dijkstra
Richard Earnshaw wrote: > On 10/11/16 17:19, Wilco Dijkstra wrote: > > Improve the logic when setting max_insns_skipped.  Limit the maximum size > > of IT > > to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed, > > increasing codesize. 

Re: [PATCH][AArch64] Improve TI mode address offsets

2016-11-11 Thread Wilco Dijkstra
Richard Earnshaw wrote: > Has this patch been truncated?  The last line above looks to be part-way > through a hunk. Oops sorry, it seems the last few lines are missing. Here is the full version: diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3045e6d6447d5c1860fe

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-11-14 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 02 November 2016 16:49 To: Ramana Radhakrishnan; GCC Patches Cc: nd Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation       ping From: Wilco Dijkstra Sent: 02 September 2016 12:31 To: Ramana Radhakrishnan; GCC Patches Cc: nd Subject: Re

Re: [PATCH][AArch64] Improve SHA1 scheduling

2016-11-14 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 25 October 2016 18:08 To: GCC Patches Cc: nd Subject: [PATCH][AArch64] Improve SHA1 scheduling     SHA1H instructions may be scheduled after a SHA1C instruction that uses the same input register.  However SHA1C updates its input, so if SHA1H is scheduled after

Re: [PATCH v2][AArch64] Fix symbol offset limit

2016-11-14 Thread Wilco Dijkstra
    ping From: Wilco Dijkstra Sent: 12 September 2016 15:50 To: Richard Earnshaw; GCC Patches Cc: nd Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit     Wilco wrote:    > The original example is from GCC itself, the fixed_regs array is small but > due to > optimization we c

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2016-11-14 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 31 October 2016 18:29 To: GCC Patches Cc: nd Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage   This patch cleans up all code related to the frame pointer.  On AArch64 we emit a frame chain even in cases where the frame pointer is not required. So

Re: [PATCH][ARM] Improve max_insns_skipped logic

2016-11-14 Thread Wilco Dijkstra
Wilco Dijkstra wrote: > Richard Earnshaw wrote: > On 10/11/16 17:19, Wilco Dijkstra wrote: > Long conditional sequences are slow on modern cores - the value 6 for > max_insns_skipped is a few decades out of date as it was meant for ARM2! > Even with -Os the performance loss for l

Re: [PATCH v2] aarch64: Add split-stack initial support

2016-11-15 Thread Wilco Dijkstra
On 07/11/2016 16:59, Adhemerval Zanella wrote: > On 14/10/2016 15:59, Wilco Dijkstra wrote: > There is no limit afaik on gold split stack allocation handling, > and I think one could be added for each backend (in the method > override require to implement it). > > In fac

Re: [PATCH 1/4 v2][AArch64] Generalize CCMP support

2015-12-15 Thread Wilco Dijkstra
-11-12 Wilco Dijkstra * gcc/target.def (gen_ccmp_first): Update documentation. (gen_ccmp_next): Likewise. * gcc/doc/tm.texi (gen_ccmp_first): Update documentation. (gen_ccmp_next): Likewise. * gcc/ccmp.c (expand_ccmp_expr): Extract cmp_code from return

RE: [PATCH 2/4 v2][AArch64] Add support for FCCMP

2015-12-15 Thread Wilco Dijkstra
ping > -Original Message- > From: Wilco Dijkstra [mailto:wilco.dijks...@arm.com] > Sent: 17 November 2015 18:36 > To: gcc-patches@gcc.gnu.org > Subject: [PATCH 2/4 v2][AArch64] Add support for FCCMP > > (v2 version removes 4 enums) > > This patch adds support

<    5   6   7   8   9   10   11   12   >