Re: [PATCH][AArch64] Simplify aarch64_can_eliminate

2017-08-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 07 August 2017 15:13 To: GCC Patches; James Greenhalgh Cc: nd; Richard Earnshaw Subject: [PATCH][AArch64] Simplify aarch64_can_eliminate   Simplify aarch64_can_eliminate - if we need a frame pointer, we must eliminate to HARD_FRAME_POINTER_REGNUM.  Rather than

Re: [PATCH][AArch64] Introduce emit_frame_chain

2017-08-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 04 August 2017 13:26 To: GCC Patches; James Greenhalgh Cc: nd Subject: [PATCH][AArch64] Introduce emit_frame_chain   The current frame code combines the separate concepts of a frame chain (saving old FP,LR in a record and pointing new FP to it) and a frame

Re: [PATCH][AArch64] Improve addressing of TI/TFmode

2017-08-15 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 20 July 2017 13:49 To: GCC Patches; James Greenhalgh Cc: nd Subject: [PATCH][AArch64] Improve addressing of TI/TFmode     In  https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01125.html Jiong pointed out some addressing inefficiencies due to a recent change in

Re: [PATCH][AArch64] Remove aarch64_frame_pointer_required

2017-08-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 04 August 2017 13:41 To: GCC Patches; James Greenhalgh Cc: nd Subject: [PATCH][AArch64] Remove aarch64_frame_pointer_required   To implement -fomit-leaf-frame-pointer, there are 2 places where we need to check whether we have to use a frame chain (since

Re: [PATCH][AArch64] Improve aarch64_legitimate_constant_p

2017-08-15 Thread Wilco Dijkstra
stack spills. SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%. Bootstrap OK, OK for commit? ChangeLog: 2017-07-07  Wilco Dijkstra      * config/aarch64/aarch64.c (aarch64_legitimate_constant_p):     Return true for more constants, symbols and label references

Re: [PATCH][AArch64] Simplify frame layout for stack probing

2017-08-15 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 25 July 2017 14:58 To: GCC Patches; James Greenhalgh; Jeff Law Cc: nd Subject: [PATCH][AArch64] Simplify frame layout for stack probing     This patch makes some changes to the frame layout in order to simplify stack probing.  We want to use the save of LR as

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-08-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 17 January 2017 15:14 To: Richard Earnshaw; GCC Patches; James Greenhalgh Cc: nd Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit     Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a declaration is an integer. So the

Re: [PATCH][AArch64] Remove '*' from movsi/di/ti patterns

2017-08-15 Thread Wilco Dijkstra
  ping From: Wilco Dijkstra Sent: 26 July 2017 14:46 To: GCC Patches; James Greenhalgh Cc: nd Subject: [PATCH][AArch64] Remove '*' from movsi/di/ti patterns     Remove the remaining uses of '*' from the movsi/di/ti patterns. Using '*' in alternatives is typic

Re: [PATCH][AArch64] PR71951: Fix unwinding with -fomit-frame-pointer

2017-08-15 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 31 July 2017 16:57 To: GCC Patches; James Greenhalgh Cc: nd Subject: [PATCH][AArch64] PR71951: Fix unwinding with -fomit-frame-pointer   As described in PR71951, if libgcc is built with -fomit-frame-pointer, unwinding crashes, for example while doing a

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-08-17 Thread Wilco Dijkstra
Richard Biener wrote: > On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra > wrote: > > Richard Biener wrote: >>> > We also change the association of >>> > >>> >  x / (y * C) -> (x / C) / y >>> > >>> > If C is a constant

Re: [PATCH v2] Simplify pow with constant

2017-08-17 Thread Wilco Dijkstra
the transformation for powf (10.0, x) in SPEC was 2.5. If we allow use of exp10 in match.pd, the ULP error would be lower. OK for commit? ChangeLog: 2017-08-17 Wilco Dijkstra * match.pd: Add pow (C, x) simplification. -- diff --git a/gcc/match.pd b/gcc/match.pd in

Re: [PATCH v2] Simplify pow with constant

2017-08-18 Thread Wilco Dijkstra
r of the transformation for powf (10.0, x) in SPEC was 2.5. If we allow use of exp10 in match.pd, the ULP error would be lower. ChangeLog: 2017-08-18 Wilco Dijkstra * match.pd: Add pow (C, x) simplification. -- diff --git a/gcc/match.pd b/gcc/match.pd index 0e36f46b914bc63c257cef47

Re: [PATCH] [Aarch64] Optimize subtract in shift counts

2017-08-22 Thread Wilco Dijkstra
Hi, The main reason we have this issue is that DImode can be treated as a vector of size 1. As a result we do not know whether the shift is an integer or SIMD instruction. One way around this is to never use the SIMD variant, another is to introduce V1DImode for vectors of size 1. Short term I be

RFC: Explicit move preference hints

2017-08-22 Thread Wilco Dijkstra
Hi, The register allocator inserts move preferences when an instruction has one or more dead sources in add_insn_allocno_copies. If an instruction doesn't have a matching constraint (eg. "0"), then any dead source is treated as a copy with all destination registers with a low priority. In reality

Re: [PATCH] Improve alloca alignment

2017-08-22 Thread Wilco Dijkstra
Jeff Law wrote: On 07/26/2017 05:29 PM, Wilco Dijkstra wrote: > > But then the check size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0 > > seems wrong too given that round_push uses a different alignment to align > > to. > I had started to dig into the history of this code,

Re: [AArch64, PATCH] Improve Neon store of zero

2017-08-23 Thread Wilco Dijkstra
Richard Sandiford wrote: > > Sorry for only noticing now, but the call to aarch64_legitimate_address_p > is asking whether the MEM itself is a legitimate LDP/STP address. Also, > it might be better to pass false for strict_p, since this can be called > before RA. So maybe: > >if (GET_CODE (op

Re: RFC: Explicit move preference hints

2017-08-23 Thread Wilco Dijkstra
Segher Boessenkool wrote: > On Tue, Aug 22, 2017 at 10:48:17AM +0000, Wilco Dijkstra wrote: > > The register allocator inserts move preferences when an instruction has > > one or more dead sources in add_insn_allocno_copies. If an instruction > > doesn't have a matching

Re: RFC: Explicit move preference hints

2017-08-24 Thread Wilco Dijkstra
Segher Boessenkool wrote: > > "0,r" might work, or "0,?r", or similar (alternatives have commas > between them). No, it doesn't work at all. But that is no surprise if you look at ira_get_dup_out_num. It iterates over the constraint string and if you have anything that matches after a "0", the "

Re: RFC: Explicit move preference hints

2017-08-24 Thread Wilco Dijkstra
Vladimir Makarov wrote: > > As I correctly understand, you just want an intuitive allocation. The > current allocation performance has the same quality as the intuitive one. Performance is affected as well but I didn't want to go into details as that distracts from the underlying issue. But if yo

Re: [PATCH v2] Simplify pow with constant

2017-08-25 Thread Wilco Dijkstra
Jeff Law wrote: > Right.  exp is painful in glibc, but pow is *dramatically* more painful > and likely always will be. > > Siddhesh did some great work in bringing those costs down in glibc but > the more code we can reasonably shunt into exp instead of pow, the better. > > It's likely pow will alw

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-04 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > -(define_insn_and_split "*iordi_notsesidi_di" > -  [(set (match_operand:DI 0 "s_register_operand" "=&r,&r") > -   (ior:DI (not:DI (sign_extend:DI > -    (match_operand:SI 2 "s_register_operand" "r,r"))) > -   (match_operand:DI 1 "s_regis

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-04 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > > After Bernd's change almost all DI mode instructions are split before > > register > > allocation. So instructions using DI mode no longer exist and thus these > > extend variants can never be matched and are thus redundant. > > Bernd's patch splits them when we don't ha

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-09-05 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > I like the simplifications in the selection logic here :) > However, changing the value for ARM from 6 to 4 looks a bit arbitrary to me. > There's probably a reason why default values for ARM and Thumb-2 are > different > (maybe not a good one) and I'd rather not change it

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Wilco Dijkstra
Bernd Edlinger wrote: > Combine creates an invalid insn out of these two insns: Yes it looks like a latent bug. We need to use arm_general_register_operand as arm_adddi3/subdi3 only allow integer registers. You don't need a new predicate s_register_operand_nv. Also I'd prefer something like arm_

Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-05 Thread Wilco Dijkstra
Bernd Edlinger wrote: > No, the split condition does not begin with "&& TARGET_32BIT...". > Therefore the split is enabled in TARGET_NEON after reload_completed. > And it is invoked from adddi3_neon for all alternatives without vfp > registers: Hmm that's a huge mess. I'd argue that any inst_and_s

[committed][Testsuite] PR78468 - add alloca alignment test

2017-09-06 Thread Wilco Dijkstra
g the outgoing arguments or setting STACK_BOUNDARY correctly. Committed as obvious. ChangeLog: 2017-09-06 Wilco Dijkstra PR middle-end/78468 * gcc.dg/pr78468.c: Add alignment test. -- diff --git a/gcc/testsuite/gcc.dg/pr78468.c b/gcc/testsuite/gcc.dg/pr78468.c new file mode 1

Re: [PATCH] Improve alloca alignment

2017-09-08 Thread Wilco Dijkstra
Eric Botcazou wrote: > The stack is aligned before the allocation but it gets misaligned during the > allocation because the dynamic offset is not a multiple of STACK_BOUNDARY. No, the stack never gets misaligned - my patch doesn't change that at all. The issue is that Sparc backend doesn't corr

Re: [PATCH] Improve alloca alignment

2017-09-08 Thread Wilco Dijkstra
Hi Rainer, Can you post the disassembly for say the 8-byte aligned tests? It may not be built correctly or hit an offset that is accidentally aligned, however pass/fail status can't change due to my patch as it doesn't change alignment at all. Wilco

Re: [PATCH] Improve alloca alignment

2017-09-11 Thread Wilco Dijkstra
Eric Botcazou wrote:   >> No, the stack never gets misaligned - my patch doesn't change that at all. > > Yes, it does.  No. Look at the diffs, there is not a single change in alignment anywhere for all of the alloca variants. If the alignment is incorrect after my patch, it is also incorrect

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-11 Thread Wilco Dijkstra
Any further comments?   Kyrill Tkachov wrote: > > After Bernd's change almost all DI mode instructions are split before > > register > > allocation. So instructions using DI mode no longer exist and thus these > > extend variants can never be matched and are thus redundant. > > Bernd's patch

Re: [PATCH] Improve alloca alignment

2017-09-11 Thread Wilco Dijkstra
Jeff Law wrote: > On 09/09/2017 02:51 AM, Eric Botcazou wrote: > >> No, the stack never gets misaligned - my patch doesn't change that at all. > > > > Yes, it does.  Dynamic allocation works like this: the amount to be > > allocated > > is added to VIRTUAL_STACK_DYNAMIC_REGNUM and the result is

RE: [PATCH] Fix PR 81096 (ttest failures)

2017-09-12 Thread Wilco Dijkstra
Steve Ellcey wrote: > This patch fixes the ttest failures on aarch64 by adding AM_CFLAGS to > the test options, like btest already does and as Wilco says works for > him in Comment #4 of the bug report. Thanks for picking this up, this looks OK. > Tested by me on aarch64. Ok to checkin? This co

Re: [PATCH][ARM] Update max_cond_insns settings

2017-05-05 Thread Wilco Dijkstra
Richard Earnshaw (lists) wrote: > On 04/05/17 18:38, Wilco Dijkstra wrote: > > Richard Earnshaw wrote: > > >>> -  5, /* Max cond insns.  */ >>> +  2, /* Max cond insns.  */ >>

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-05-05 Thread Wilco Dijkstra
Richard Earnshaw (lists) wrote: >  (define_insn "*movdi_vfp" > -  [(set (match_operand:DI 0 "nonimmediate_di_operand" > "=r,r,r,r,q,q,m,w,r,w,w, Uv") > +  [(set (match_operand:DI 0 "nonimmediate_di_operand" > "=r,r,r,r,q,q,m,w,!r,w,w, Uv") > Why have you introduced a no-reloads block on the 9th

Re: [PATCH][AArch64] Improve Cortex-A53 shift bypass

2017-05-05 Thread Wilco Dijkstra
Richard Earnshaw (lists) wrote: > --- a/gcc/config/arm/aarch-common.c > +++ b/gcc/config/arm/aarch-common.c > @@ -254,12 +254,7 @@ arm_no_early_alu_shift_dep (rtx producer, rtx consumer) >  return 0; >  >    if ((early_op = arm_find_shift_sub_rtx (op))) > -    { > -  if (REG_P (early_op))

Re: [PATCH][AArch64] Improve float to int moves

2017-05-05 Thread Wilco Dijkstra
Richard Earnshaw (lists) wrote: > While on the subject, why is the w->w operation also hidden? No idea, this just fixes one case where it is obvious the use of '*' is incorrect. However I think all uses of '*' in md files are incorrect and the feature should be removed. '?' already exists for c

Re: [PATCH][ARM] Update max_cond_insns settings

2017-05-05 Thread Wilco Dijkstra
Richard Earnshaw (lists) wrote: > On 05/05/17 13:42, Wilco Dijkstra wrote: >> Richard Earnshaw (lists) wrote: >>> On 04/05/17 18:38, Wilco Dijkstra wrote: >>> > Richard Earnshaw wrote: >>> > >>>>> -  5, 

Re: [PATCH][AArch64] Improve float to int moves

2017-05-05 Thread Wilco Dijkstra
Richard Earnshaw (lists) wrote: > On 05/05/17 17:10, Wilco Dijkstra wrote: > > However I think all uses of '*' in md files are incorrect and the > > feature should > > be removed. '?' already exists for cases where the alternative may be > > expens

[Testsuite, committed] Fix vector peeling test failures

2017-05-08 Thread Wilco Dijkstra
This fixes a few failures on ARM and AArch64 due to a recent change in alignment peeling by switching the vector cost model off (https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00407.html). Tested on AArch64, ARM and x64 - committed as obvious. ChangeLog: 2017-05-08 Wilco Dijkstra

[Committed][AArch64] Fix PR80671

2017-05-10 Thread Wilco Dijkstra
Move an use-after-free access before the delete. Committed as obvious. ChangeLog: 2017-05-10 Wilco Dijkstra PR target/80671 * config/aarch64/cortex-a57-fma-steering.c (merge_forest): Move member access before delete. -- diff --git a/gcc/config/aarch64/cortex-a57-fma

[PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Wilco Dijkstra
, and while most appear safe or appear aware of the issue, it is likely not all such calls are safe. This check enables any such latent bugs to be found. Bootstrap OK on AArch64. 2017-05-11 Wilco Dijkstra * final.c (leaf_function_p): Check we are not in a sequence. -- diff --git a

Re: [PATCH] Add sequence check to leaf_function_p

2017-05-12 Thread Wilco Dijkstra
to > have a comment here at all?  E.g. "Ensure we walk the entire function body > after > the following get_insns call". I've changed to to "Ensure we walk the entire function body." Wilco 2017-05-11 Wilco Dijkstra * final.c (leaf_function_p): Check we

[PATCH] Fix PR80754

2017-05-17 Thread Wilco Dijkstra
erate illegal instructions with the same hard register as the destination and a clobber. Fix this by also checking for overlaps with the destination register. Bootstrap OK on arm-linux-gnueabihf for ARM and Thumb-2, OK for commit? ChangeLog: 2017-05-16 Wilco Dijkstra PR rtl-optimization/

Re: [PATCH] [Aarch64] Variable shift count truncation issues

2017-05-19 Thread Wilco Dijkstra
Richard Sandiford wrote: > Insn patterns shouldn't check can_create_pseudo_p, because there's no > guarantee that the associated split happens before RA. In this case it > should be safe to reuse operand 0 after RA if you change it to: The goal is to only create and split this pattern before reg

Re: [PING 2][PATCH] Move the check for any_condjump_p from sched-deps to target macros

2017-05-26 Thread Wilco Dijkstra
Hurugalawadi, Naveen wrote: > > Please consider this as a personal reminder to review the patch > at following link and let me know your comments on the same. > > https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00839.htmll That looks good to me now. Wilco

Re: [PING 2] [PATCH] [AArch64] Implement ALU_BRANCH fusion

2017-05-26 Thread Wilco Dijkstra
Hurugalawadi, Naveen wrote: > > Please consider this as a personal reminder to review the patch > at following link and let me know your comments on the same.  > > https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01333.html Looks good to me. Wilco   

Fortran bootstrap failure (was [patch, libfortran] AMD-specific versions of library matmul)

2017-05-26 Thread Wilco Dijkstra
Hi, This patch most likely broke all non-x86 targets: configure: error: conditional "HAVE_AVX128" was never defined. Usually this means the macro was only invoked conditionally. Makefile:19843: recipe for target 'configure-target-libgfortran' failed make[1]: *** [configure-target-libgfortran] Erro

[Committed][ARM] Fix ARM bootstrap failure

2017-05-30 Thread Wilco Dijkstra
qualifiers] if (d->code == (const enum arm_builtins) fcode)   ^ Avoid the warning by removing const, and bootstrap is OK again. Committed as trivial patch (r248686). ChangeLog: 2017-05-30  Wilco Dijkstra      * config/arm/a

Re: [PATCH, ARM] Further improve stack usage in sha512, part 2 (PR 77308)

2016-12-21 Thread Wilco Dijkstra
Bernd Edlinger wrote: On 12/20/16 16:09, Wilco Dijkstra wrote: > > As a result of your patches a few patterns are unused now. All the Thumb-2 > > iordi_notdi* > > patterns cannot be used anymore. Also I think arm_cmpdi_zero never gets > > used - a DI >> mode com

Re: [PATCH v2] aarch64: Add split-stack initial support

2017-01-03 Thread Wilco Dijkstra
Adhemerval Zanella wrote:   Sorry for the late reply - but I think it's getting there. A few more comments: + /* If function uses stacked arguments save the old stack value so morestack + can return it. */ + reg11 = gen_rtx_REG (Pmode, R11_REGNUM); + if (cfun->machine->frame.saved_regs_si

[PATCH][ARM] Backport - Avoid partial overlaps in DImode shifts *

2017-01-05 Thread Wilco Dijkstra
that support either full overlap or no overlap. Bootstrap & regress on arm-linux-gnueabihf OK on GCC6 branch. OK for backport? ChangeLog: 2017-01-05 Wilco Dijkstra gcc/ PR target/78041 * config/arm/neon.md (ashldi3_neon): Add "r 0 i" and "&r r i&qu

[PATCH][AArch64] Improve Cortex-A53 scheduling of int/fp transfers

2017-01-10 Thread Wilco Dijkstra
best schedule. As a result of these tweaks the performance of the benchmark improves by 20%. ChangeLog: 2017-01-10 Wilco Dijkstra * config/arm/cortex-a53.md: Add bypasses for cortex_a53_r2f_cvt. (cortex_a53_r2f): Only use for transfers. (cortex_a53_f2r

Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2017-01-13 Thread Wilco Dijkstra
James Greenhalgh wrote: > I've been putting off reviewing this patch for a while now, because I don't > understand enough about the current eh_return code to understand why what > you're proposing is correct. > > The best way to progress this patch would be to go in to more detail as to > what the

Re: [PATCH][AArch64 - v4] Simplify eh_return implementation

2017-01-16 Thread Wilco Dijkstra
Wilco Dijkstra PR77455 gcc/ * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter. * config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove. (EH_RETURN_HANDLER_RTX): New define. * config/aarch64/aarch64.c (aarch64_frame_pointer_required

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-01-17 Thread Wilco Dijkstra
Wilco Dijkstra wrote: > Ramana Radhakrishnan wrote: >> On Wed, Dec 14, 2016 at 5:43 PM, Wilco Dijkstra >> wrote: > > > > Yes, the reason to split the pattern was to introduce the '!' to > > > discourage Neon->int moves on Cortex-A8 > (https

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2017-01-17 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 31 October 2016 18:29 To: GCC Patches Cc: nd Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage     This patch cleans up all code related to the frame pointer.  On AArch64 we emit a frame chain even in cases where the frame pointer is not required. So

Re: [PATCH][ARM] Fix ldrd offsets

2017-01-17 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 03 November 2016 12:20 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Fix ldrd offsets     Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020, without -255..4091.  This reduces the number of addressing instructions when using DI mode operations (such

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-01-17 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 10 November 2016 17:19 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Improve max_insns_skipped logic     Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed

Re: [PATCH][AArch64] Improve SHA1 scheduling

2017-01-17 Thread Wilco Dijkstra
Wilco Dijkstra wrote: > James Greenhalgh wrote: > > > I haven't seen a follow-up to Andrew's point regarding other > > read-modify-write operations. > > > > Did youi investigate the cost of these? > > I looked at whether there are other similar cas

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-01-17 Thread Wilco Dijkstra
range for code/data between the symbol and its references. For symbols with a defined size, limit the offset to be within the size of the symbol. ChangeLog: 2017-01-17 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbo

[PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-01-17 Thread Wilco Dijkstra
ed on Thumb-2, and after this patch the orndi3_neon pattern matches instead (which still emits ORN). After this there are no Thumb-2 specific DImode patterns. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html ChangeLog: 2017-01-17 Wilco Dijkstra * config/arm/thum

[PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread Wilco Dijkstra
arm_ashrdi3_1bit and arm_lshrdi3_1bit patterns. Bootstrap OK on arm-linux-gnueabihf. ChangeLog: 2017-01-17 Wilco Dijkstra * config/arm/arm.md (ashldi3): Remove shift by 1 expansion. (arm_ashldi3_1bit): Remove pattern. (ashrdi3): Remove shift by 1 expansion

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-01-17 Thread Wilco Dijkstra
kugan wrote: > Wilco Dijkstra wrote: > > +   /* Slightly disparage left shift by 1 at so we prefer adddi3.  */ > > +   if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode)) > Your ChangeLog says decrease cost for ashldi3 by 1 but looks like it is > done

[PATCH][AArch64] Model Cortex-A53 load forwarding

2017-04-05 Thread Wilco Dijkstra
of an earlier load is used in an address calculation. This significantly improved benchmark scores in a proprietary benchmark suite. Passes AArch64 bootstrap and regress. OK for stage 1? ChangeLog: 2017-04-05 Wilco Dijkstra * config/arm/aarch-common.c (arm_early_load_addr_de

[PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic

2017-04-05 Thread Wilco Dijkstra
weak model only keeps the order if it doesn't make the schedule worse, it should not impact performance adversely on cores that don't show a gain. Any objections? ChangeLog: 2017-04-05 Wilco Dijkstra * gcc/config/aarch64/aarch64.c (generic_tunings): Update prefetch model. -- di

[PATCH][AArch64] Set jump alignment to 4 for Cortex cores

2017-04-12 Thread Wilco Dijkstra
-12 Wilco Dijkstra * config/aarch64/aarch64.c (cortexa35_tunings): Set jump alignment to 4. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (cortexa72_tunings): Likewise. (cortexa73_tunings): Likewise. -- diff --git a/gcc/config/aarch64

[PATCH][AArch64] Update alignment for -mcpu=generic

2017-04-12 Thread Wilco Dijkstra
codesize cost [2], so setting it to 4 is best. This gives a 0.2% overall codesize improvement as well as performance gains in several benchmarks. Any objections? Bootstrap OK on AArch64, OK for stage 1? ChangeLog: 2017-04-12 Wilco Dijkstra * config/aarch64/aarch64.c (generic_tunings

[PATCH][ARM] Update max_cond_insns settings

2017-04-12 Thread Wilco Dijkstra
and regress OK on arm-none-linux-gnueabihf. OK for stage 1? ChangeLog: 2017-04-12 Wilco Dijkstra * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2. (arm_cortex_a35_tune): Likewise. --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index

[PATCH][AArch64] Improve address cost for -mcpu=generic

2017-04-12 Thread Wilco Dijkstra
Wilco Dijkstra * gcc/config/aarch64/aarch64.c (generic_addrcost_table): Change HI/TI mode setting. --- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 419b756efcb40e48880cd4529efc4f9f59938325..728ce7029f1e2b5161d9f317d10e564dd5a5f472 100644 --- a

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Wilco Dijkstra
>On Wed, Apr 12, 2017 at 09:29:55AM +, Sudi Das wrote: > > Hi all > > > > This is a fix for PR 80131 > > Currently the code A << (B - C) is not simplified. >> However at least a more specific case of 1U << (C -x) where C = >> precision(type) - 1 can be simplified to (1 << C) >> x. > > Is tha

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Wilco Dijkstra
Jakub Jelinek wrote:   > No.  Some constants sometimes even 7 instructions (e.g. sparc64; not talking > in particular about 1ULL << 63 constant), or have one instruction > that is more expensive than normal small constant load.  Compare say x86_64 > movl/movq vs. movabsq, I think the latter has

Re: [PATCH][GCC] Simplification of 1U << (31 - x)

2017-04-13 Thread Wilco Dijkstra
Richard Biener wrote: > It is IMHO a valid GIMPLE optimization / canonicalization. > >    movabsq $-9223372036854775808, %rax > > so this should then have been generated as 1<<63? > > At some point variable shifts were quite expensive as well.. Yes I don't see a major difference between movabs

Re: [PATCH][AArch64] Improve address cost for -mcpu=generic

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 12 April 2017 14:08 To: GCC Patches Cc: nd; James Greenhalgh; Evandro Menezes; jim.wil...@linaro.org; andrew.pin...@cavium.com Subject: [PATCH][AArch64] Improve address cost for -mcpu=generic   All cores which add a cpu_addrcost_table use a non-zero value for

Re: [PATCH][ARM] Update max_cond_insns settings

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 12 April 2017 14:02 To: GCC Patches Cc: nd; Kyrylo Tkachov Subject: [PATCH][ARM] Update max_cond_insns settings   The existing setting of max_cond_insns for most cores is non-optimal. Thumb-2 IT has a maximum limit of 4, so 5 means emitting 2 IT sequences

Re: [PATCH][AArch64] Update alignment for -mcpu=generic

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 12 April 2017 13:58 To: GCC Patches Cc: nd; James Greenhalgh; jim.wil...@linaro.org; Evandro Menezes; andrew.pin...@cavium.com Subject: [PATCH][AArch64] Update alignment for -mcpu=generic   With -mcpu=generic the loop alignment is currently 4.  All but one of

Re: [PATCH][AArch64] Set jump alignment to 4 for Cortex cores

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 12 April 2017 13:50 To: GCC Patches Cc: nd; James Greenhalgh Subject: [PATCH][AArch64] Set jump alignment to 4 for Cortex cores   Set jump alignment to 4 for Cortex cores as it reduces codesize by 0.4% on average with no obvious performance difference.  See

Re: [PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 05 April 2017 13:38 To: GCC Patches Cc: nd; James Greenhalgh; andrew.pin...@cavium.com; Evandro Menezes; jim.wil...@linaro.org Subject: [PATCH][AArch64] Enable AUTOPREFETCHER_WEAK with -mcpu=generic   Many supported cores use the AUTOPREFETCHER_WEAK setting

Re: [PATCH][AArch64] Model Cortex-A53 load forwarding

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 05 April 2017 13:29 To: GCC Patches Cc: nd; James Greenhalgh Subject: [PATCH][AArch64] Model Cortex-A53 load forwarding   Code scheduling for Cortex-A53 isn't as good as it could be.  It turns out code runs faster overall if we place loads and stores w

Re: [PATCH][AArch64] Enable AES fusion with -mcpu=generic

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 16 March 2017 17:22 To: GCC Patches; Evandro Menezes; andrew.pin...@cavium.com; jim.wil...@linaro.org Cc: nd Subject: [PATCH][AArch64] Enable AES fusion with -mcpu=generic   Many supported cores implement fusion of AES instructions.  When fusion happens it can

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 17 January 2017 19:23 To: GCC Patches Cc: nd; Kyrill Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts   A left shift of 1 can always be done using an add, so slightly adjust rtx cost for DImode left shift by 1 so that

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 17 January 2017 18:00 To: GCC Patches Cc: nd; Kyrylo Tkachov; Richard Earnshaw Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns   After Bernd's DImode patch [1] almost all DImode operations are expanded early (except for -mfpu=neon). This mean

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 17 January 2017 15:14 To: Richard Earnshaw; GCC Patches; James Greenhalgh Cc: nd Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit   Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a declaration is an integer. So the

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 10 November 2016 17:19 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Improve max_insns_skipped logic   Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed

Re: [PATCH][ARM] Fix ldrd offsets

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 03 November 2016 12:20 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Fix ldrd offsets   Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020, without -255..4091.  This reduces the number of addressing instructions when using DI mode operations (such

Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 29 November 2016 11:05 To: GCC Patches Cc: nd Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8   Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid unnecessary duplication and repeating bugs like PR78439 due to changes being applied only

Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage

2017-04-20 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 31 October 2016 18:29 To: GCC Patches Cc: nd Subject: [RFC][PATCH][AArch64] Cleanup frame pointer usage   This patch cleans up all code related to the frame pointer.  On AArch64 we emit a frame chain even in cases where the frame pointer is not required. So

Re: [PING][PATCH] Move the check for any_condjump_p from sched-deps to target macros

2017-04-25 Thread Wilco Dijkstra
Hi Naveen, > https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01368.html This looks good to me - I have just one comment: --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13972,6 +13972,15 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) { enum at

Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-25 Thread Wilco Dijkstra
Hi Naveen, > https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01369.html Same comment for this part, we want to return true if we match: + if (SET_DEST (curr_set) != (pc_rtx) + || GET_CODE (SET_SRC (curr_set)) != IF_THEN_ELSE + || ! REG_P (XEXP (XEXP (SET_SRC (curr_set), 0), 0)

[PATCH][AArch64] Improve float to int moves

2017-04-26 Thread Wilco Dijkstra
cmp w0, 42 bhi .L6 scvtf s0, w0 ret .L6: fmuls0, s0, s0 ret Passes regress & bootstrap, OK for commit? ChangeLog: 2017-04-26 Wilco Dijkstra * config/aarch64/aarch64.md (movsi_aarch64): Remove '*' from r=w.

Re: [PING][PATCH][AArch64] Implement ALU_BRANCH fusion

2017-04-26 Thread Wilco Dijkstra
Hi Naveen, This version has the same issue of claiming that all instructions should be fused except for the cases that can be fused. You should only return true if there is a match, not if there is not a match. Cheers, Wilco   

[PATCH][AArch64] Improve Cortex-A53 shift bypass

2017-04-27 Thread Wilco Dijkstra
which improves the example in PR79665 by ~7%. Given it is no longer used, remove aarch_forward_to_shift_is_not_shifted_reg. Passes AArch64 bootstrap and regress. OK for commit? ChangeLog: 2017-04-27 Wilco Dijkstra PR target/79665 * config/arm/aarch-common.c

Re: [PATCH][ARM] Update max_cond_insns settings

2017-05-04 Thread Wilco Dijkstra
Richard Earnshaw wrote: > -  5, /* Max cond insns.  */ > +  2, /* Max cond insns.  */ > This parameter is also used for A32 code.  Is that really the right > number there as well? Yes, this parameter has always been

Re: [PATCH][AArch64] Remove '*' from movsi/di/ti patterns

2017-09-13 Thread Wilco Dijkstra
Instead of: ldr w0, [x0] dup v0.2s, w0 ret ChangeLog: 2017-09-13 Wilco Dijkstra * gcc.target/aarch64/vmov_n_1.c: Update dup scan-assembler. -- diff --git a/gcc/testsuite/gcc.target/aarch64/vmov_n_1.c b/gcc/testsuite/gcc.target/aarch64/vmov_

Re: [PATCH 1/3] [ARM] Add bus_width_bits to tune_params

2017-09-13 Thread Wilco Dijkstra
Hi Charlie, I can't see any use for adding a bus width to tune params. There are many different buses in a modern CPU, so there is no such thing as a single "bus width". What we need is to add separate costs for the different kinds of loads and stores. The timings for these depend mostly on the m

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-09-13 Thread Wilco Dijkstra
Jeff Law wrote: > On 09/06/2017 03:55 AM, Jackson Woodruff wrote: > > On 08/30/2017 01:46 PM, Richard Biener wrote: >>>   rdivtmp = 1 / (y*C); >>>   tem = x *rdivtmp; >>>   tem2= z * rdivtmp; >>> >>> instead of >>> >>>   rdivtmp = 1/y; >>>   tem = x * 1/C * rdivtmp; >>>   tem2 = z * 1/C * rdivtmp;

Re: [PATCH][aarch64] Fix target/pr77729 - missed optimization related to zero extension

2017-09-13 Thread Wilco Dijkstra
Steve Ellcey wrote: > And in aarch64 rtl expansion I see: > > (insn 10 9 11 (set (reg:QI 81) > (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) > "pr77729.c":3 -1 > (nil))​ Yes using QI/HI mode anywhere in the RTL seems perverse and incorrect given AArch64 doesn't suppo

Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-15 Thread Wilco Dijkstra
Marc Glisse wrote: > The question is whether, having computed c=a/b, it is cheaper to test a c!=0. > I think it is usually the second one, but not for all types on all targets. > Although since > you mention VRP, it is easier to do further optimizations using the > information a

Re: Transform (x / y) != 0 to x >=y and (x / y) == 0 to x < y if x, y are unsigned

2017-09-18 Thread Wilco Dijkstra
Richard Sandiford wrote: > I don't think it's literally always.  Testing the inputs instead of a > multi-use result tends to mean that all three are live at once.  If the > == 0 condition is only one component of a more complex condition that > relies on the result of division regardless, then it'

[AArch64] Patches for review

2017-09-20 Thread Wilco Dijkstra
Hi, Here is the list of my AArch64 patches for review: * https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02040.html (Fix unwinding with -fomit-frame-pointer) * https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01216.html (Fix symbol offset limit) * https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00396.

Re: [PATCH][AArch64] PR71951: Fix unwinding with -fomit-frame-pointer

2017-09-21 Thread Wilco Dijkstra
James Greenhalgh wrote: > This seems like a bit of a theoretical issue as we would normally build > libgcc with -fno-omit-frame-pointer anyway, but it can't hurt to guarantee > this, so OK. It's not theoretical since there were multiple users reporting unwinding issues, so clearly doing CFLAGS="-

<    3   4   5   6   7   8   9   10   11   12   >