On Mon, Jun 19, 2017 at 4:20 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Mon, Jun 19, 2017 at 3:40 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>> On Wed, Jun 14, 2017 at 2:54 PM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Mon, Jun 12, 2017 at 7:03 PM, Bin Cheng <bin.ch...@arm.com> wrote:
>>>> Hi,
>>>> Current primitive cost model merges partitions with data references 
>>>> sharing the same
>>>> base address.  I believe it's designed to maximize data reuse in 
>>>> distribution, but
>>>> that should be done by dedicated data reusing algorithm.  At this stage of 
>>>> merging,
>>>> we should be conservative and only merge partitions with the same 
>>>> references.
>>>> Bootstrap and test on x86_64 and AArch64.  Is it OK?
>>>
>>> Well, I'd say "conservative" is merging more, not less.  For example
>>> splitting a[i+1] from a[i]
>>> would be bad(?), so I'd see to allow unequal DR_INIT as "equal" for
>>> merging.  Maybe
>>> DR_INIT within a cacheline or so.
>>>
>>> How many extra distributions in say SPEC do you get from this change alone?
>> Hi,
>> I collected data for spec2006 only with/without this patch.  I am a
>> bit surprised that it doesn't change the number of distributed loops.
>>>
>>> It shows also that having partition->reads_and_writes would be nice
>>> ...  the code duplication
>> Yeah, I merged read/write data references in previous patch, now this
>> duplication is gone.  Update patch attached.  Is it OK?
>
> +      gcc_assert (i < datarefs_vec.length ());
> +      dr1 = datarefs_vec[i];
>
> these asserts are superfluous -- vec::operator[] does them as well.
>
> Ok if you remove them.
Done.
I realized I made mistakes when measuring the impact of this patch.
This patch only apparently causes failure of
gcc.dg/tree-ssa/ldist-6.c, so here is the updated patch.  I also
collected the number of distributed loops in spec2k6 as below:
     trunk:  5882
     only this patch: 7130
     whole patch series: 5237
So the conclusion is, this patch does aggressive distribution like
ldist-6.c, which means worse data-locality.  The following patch does
more fusion which mitigates impact of this patch and results in
conservative distribution overall.   But as we lack of data locality
cost model, ldist-6.c remains failed even after applying whole patch
series.  Hmm, a cache-sensitive cost model is need for several passes
now, distribution, prefetch and (possible) interchange.
Richard, do you have second comment based on the new data?

Thanks,
bin
2017-06-20  Bin Cheng  <bin.ch...@arm.com>

    * tree-loop-distribution.c (ref_base_address): Delete.
    (similar_memory_accesses): Rename ...
    (share_memory_accesses): ... to this.  Check if partitions access
    the same memory reference.
    (distribute_loop): Call share_memory_accesses.

gcc/testsuite/ChangeLog
2017-06-20  Bin Cheng  <bin.ch...@arm.com>

    * gcc.dg/tree-ssa/ldist-6.c: XFAIL.
From a002d0a88ab9e981d9c57bd8f1203072290623ad Mon Sep 17 00:00:00 2001
From: Bin Cheng <binch...@e108451-lin.cambridge.arm.com>
Date: Fri, 9 Jun 2017 12:41:36 +0100
Subject: [PATCH 08/13] share-memory-access-20170608.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c |  2 +-
 gcc/tree-loop-distribution.c            | 69 +++++++++++++++------------------
 2 files changed, 32 insertions(+), 39 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
index 8eb1c62..e0a68d8 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ldist-6.c
@@ -34,4 +34,4 @@ int loop1 (int k)
   return a[1000-2] + b[1000-1] + c[1000-2] + d[1000-2];
 }
 
-/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" } } */
+/* { dg-final { scan-tree-dump-times "distributed: split to 2 loops" 0 "ldist" { xfail *-*-* } } } */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index eafd119..119863f 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1268,30 +1268,16 @@ classify_partition (loop_p loop, struct graph *rdg, partition *partition)
     }
 }
 
-/* For a data reference REF, return the declaration of its base
-   address or NULL_TREE if the base is not determined.  */
-
-static tree
-ref_base_address (data_reference_p dr)
-{
-  tree base_address = DR_BASE_ADDRESS (dr);
-  if (base_address
-      && TREE_CODE (base_address) == ADDR_EXPR)
-    return TREE_OPERAND (base_address, 0);
-
-  return base_address;
-}
-
-/* Returns true when PARTITION1 and PARTITION2 have similar memory
-   accesses in RDG.  */
+/* Returns true when PARTITION1 and PARTITION2 access the same memory
+   object in RDG.  */
 
 static bool
-similar_memory_accesses (struct graph *rdg, partition *partition1,
-			 partition *partition2)
+share_memory_accesses (struct graph *rdg,
+		       partition *partition1, partition *partition2)
 {
-  unsigned i, j, k, l;
+  unsigned i, j;
   bitmap_iterator bi, bj;
-  data_reference_p ref1, ref2;
+  data_reference_p dr1, dr2;
 
   /* First check whether in the intersection of the two partitions are
      any loads or stores.  Common loads are the situation that happens
@@ -1301,23 +1287,30 @@ similar_memory_accesses (struct graph *rdg, partition *partition1,
 	|| RDG_MEM_READS_STMT (rdg, i))
       return true;
 
-  /* Then check all data-references against each other.  */
-  EXECUTE_IF_SET_IN_BITMAP (partition1->stmts, 0, i, bi)
-    if (RDG_MEM_WRITE_STMT (rdg, i)
-	|| RDG_MEM_READS_STMT (rdg, i))
-      EXECUTE_IF_SET_IN_BITMAP (partition2->stmts, 0, j, bj)
-	if (RDG_MEM_WRITE_STMT (rdg, j)
-	    || RDG_MEM_READS_STMT (rdg, j))
-	  {
-	    FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, i), k, ref1)
-	      {
-		tree base1 = ref_base_address (ref1);
-		if (base1)
-		  FOR_EACH_VEC_ELT (RDG_DATAREFS (rdg, j), l, ref2)
-		    if (base1 == ref_base_address (ref2))
-		      return true;
-	      }
-	  }
+  /* Then check whether the two partitions access the same memory object.  */
+  EXECUTE_IF_SET_IN_BITMAP (partition1->datarefs, 0, i, bi)
+    {
+      dr1 = datarefs_vec[i];
+
+      if (!DR_BASE_ADDRESS (dr1)
+	  || !DR_OFFSET (dr1) || !DR_INIT (dr1) || !DR_STEP (dr1))
+	continue;
+
+      EXECUTE_IF_SET_IN_BITMAP (partition2->datarefs, 0, j, bj)
+	{
+	  dr2 = datarefs_vec[j];
+
+	  if (!DR_BASE_ADDRESS (dr2)
+	      || !DR_OFFSET (dr2) || !DR_INIT (dr2) || !DR_STEP (dr2))
+	    continue;
+
+	  if (operand_equal_p (DR_BASE_ADDRESS (dr1), DR_BASE_ADDRESS (dr2), 0)
+	      && operand_equal_p (DR_OFFSET (dr1), DR_OFFSET (dr2), 0)
+	      && operand_equal_p (DR_INIT (dr1), DR_INIT (dr2), 0)
+	      && operand_equal_p (DR_STEP (dr1), DR_STEP (dr2), 0))
+	    return true;
+	}
+    }
 
   return false;
 }
@@ -1654,7 +1647,7 @@ distribute_loop (struct loop *loop, vec<gimple *> stmts,
       for (int j = i + 1;
 	   partitions.iterate (j, &partition); ++j)
 	{
-	  if (similar_memory_accesses (rdg, into, partition))
+	  if (share_memory_accesses (rdg, into, partition))
 	    {
 	      partition_merge_into (into, partition, FUSE_SHARE_REF);
 	      partitions.unordered_remove (j);
-- 
1.9.1

Reply via email to