On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: >>>> Hi, >>>> Current primitive cost model merges partitions with data references >>>> sharing the same >>>> base address. I believe it's designed to maximize data reuse in >>>> distribution, but >>>> that should be done by dedicated data reusing algorithm. At this stage of >>>> merging, >>>> we should be conservative and only merge partitions with the same >>>> references. >>>> Bootstrap and test on x86_64 and AArch64. Is it OK? >>> >>> Well, I'd say "conservative" is merging more, not less. For example >>> splitting a[i+1] from a[i] >>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for >>> merging. Maybe >>> DR_INIT within a cacheline or so. >>> >>> How many extra distributions in say SPEC do you get from this change alone? >> Hi, >> I collected data for spec2006 only with/without this patch. I am a >> bit surprised that it doesn't change the number of distributed loops. >>> >>> It shows also that having partition->reads_and_writes would be nice >>> ... the code duplication >> Yeah, I merged read/write data references in previous patch, now this >> duplication is gone. Update patch attached. Is it OK? > > + gcc_assert (i < datarefs_vec.length ()); > + dr1 = datarefs_vec[i]; > > these asserts are superfluous -- vec::operator[] does them as well. > > Ok if you remove them. Done. I realized I made mistakes when measuring the impact of this patch. This patch only apparently causes failure of gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch. I also collected the number of distributed loops in spec2k6 as below: trunk: 5882 only this patch: 7130 whole patch series: 5237 So the conclusion is, this patch does aggressive distribution like ldist-6.c, which means worse data-locality. The following patch does more fusion which mitigates impact of this patch and results in conservative distribution overall. But as we lack of data locality cost model, ldist-6.c remains failed even after applying whole patch series. Hmm, a cache-sensitive cost model is need for several passes now, distribution, prefetch and (possible) interchange. Richard, do you have second comment based on the new data?
Thanks, bin 2017-06-20 Bin Cheng <bin.ch...@arm.com> * tree-loop-distribution.c (ref_base_address): Delete. (similar_memory_accesses): Rename ... (share_memory_accesses): ... to this. Check if partitions access the same memory reference. (distribute_loop): Call share_memory_accesses. gcc/testsuite/ChangeLog 2017-06-20 Bin Cheng <bin.ch...@arm.com> * gcc.dg/tree-ssa/ldist-6.c: XFAIL.
From a002d0a88ab9e981d9c57bd8f1203072290623ad Mon Sep 17 00:00:00 2001 From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com> Date: Fri, 9 Jun 2017 12:41:36 +0100 Subject: [PATCH 08/13] share-memory-access-20170608.txt --- gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c | 2 +- gcc/tree-loop-distribution.c | 69 +++++++++++++++------------------ 2 files changed, 32 insertions(+), 39 deletions(-) diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c index 8eb1c62..e0a68d8 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c @@ -34,4 +34,4 @@ int loop1 (int k) return a[1000-2] + b[1000-1] + c[1000-2] + d[1000-2]; } -/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" } } */ +/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" { xfail *-*-* } } } */ diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index eafd119..119863f 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -1268,30 +1268,16 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition) } } -/* For a data reference REF, return the declaration of its base - address or NULL_TREE if the base is not determined. */ - -static tree -ref_base_address (data_reference_p dr) -{ - tree base_address = DR_BASE_ADDRESS (dr); - if (base_address - && TREE_CODE (base_address) == ADDR_EXPR) - return TREE_OPERAND (base_address, 0); - - return base_address; -} - -/* Returns true when PARTITION1 and PARTITION2 have similar memory - accesses in RDG. */ +/* Returns true when PARTITION1 and PARTITION2 access the same memory + object in RDG. */ static bool -similar_memory_accesses (struct graph *rdg, partition *partition1, - partition *partition2) +share_memory_accesses (struct graph *rdg, + partition *partition1, partition *partition2) { - unsigned i, j, k, l; + unsigned i, j; bitmap_iterator bi, bj; - data_reference_p ref1, ref2; + data_reference_p dr1, dr2; /* First check whether in the intersection of the two partitions are any loads or stores. Common loads are the situation that happens @@ -1301,23 +1287,30 @@ similar_memory_accesses (struct graph *rdg, partition *partition1, || RDG_MEM_READS_STMT (rdg, i)) return true; - /* Then check all data-references against each other. */ - EXECUTE_IF_SET_IN_BITMAP (partition1->stmts, 0, i, bi) - if (RDG_MEM_WRITE_STMT (rdg, i) - || RDG_MEM_READS_STMT (rdg, i)) - EXECUTE_IF_SET_IN_BITMAP (partition2->stmts, 0, j, bj) - if (RDG_MEM_WRITE_STMT (rdg, j) - || RDG_MEM_READS_STMT (rdg, j)) - { - FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, i), k, ref1) - { - tree base1 = ref_base_address (ref1); - if (base1) - FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, j), l, ref2) - if (base1 == ref_base_address (ref2)) - return true; - } - } + /* Then check whether the two partitions access the same memory object. */ + EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi) + { + dr1 = datarefs_vec[i]; + + if (!DR_BASE_ADDRESS (dr1) + || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1)) + continue; + + EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj) + { + dr2 = datarefs_vec[j]; + + if (!DR_BASE_ADDRESS (dr2) + || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2)) + continue; + + if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0) + && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0) + && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0) + && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0)) + return true; + } + } return false; } @@ -1654,7 +1647,7 @@ distribute_loop (struct loop *loop, vec<gimple *> stmts, for (int j = i + 1; partitions.iterate (j, &partition); ++j) { - if (similar_memory_accesses (rdg, into, partition)) + if (share_memory_accesses (rdg, into, partition)) { partition_merge_into (into, partition, FUSE_SHARE_REF); partitions.unordered_remove (j); -- 1.9.1