On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjam...@suse.cz> wrote:
>Hi,
>
>On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote:
>> On Wed, 12 Apr 2017, Martin Jambor wrote:
>> 
>> > Hi,
>> > 
>> > the patch below is an attempt to deal with PR 80293 as
>non-invasively
>> > as possible.  Basically, it switches off total SRA scalarization of
>> > any local aggregates which contains an array of elements that have
>one
>> > byte (or less).
>> > 
>> > The logic behind this is that accessing such arrays element-wise
>> > usually results in poor code and that such char arrays are often
>used
>> > for non-statically-typed content anyway, and we do not want to copy
>> > that byte per byte.
>> > 
>> > Alan, do you think this could impact your constant pool
>scalarization
>> > too severely?
>> 
>> Hmm, isn't one of the issues that we have
>> 
>>         if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>>           {
>>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>>                 <= max_scalarization_size)
>>               {
>>                 create_total_scalarization_access (var);
>> 
>> which limits the size of scalarizable vars but not the number
>> of accesses we create for total scalarization?
>
>Well, yes, but at least I understood from your comment #4 in the bug
>that you did not want to limit the number of replacements for gcc 7?
>
>> 
>> Is scalarizable_type_p only used in contexts where we have no hint
>> of the actual accesses?
>
>There are should_scalarize_away_bitmap and
>cannot_scalarize_away_bitmap hints.
>
>Total scalarization is a hack process where we chop up small
>aggregates according to their types - as opposed to actual accesses,
>meaning it is an alien process to the rest of SRA - knowing that they
>will go completely away afterwards (that is ensured by
>cannot_scalarize_away_bitmap) and that this will facilitate copy
>propagation (this is indicated by should_scalarize_away_bitmap, which
>has a bit set if there is a non-scalar assignment _from_ (a part of)
>the aggregate).

OK, but for the copy x = y where x would go away it still depends on uses of x 
what pieces we actually want?  Or is total scalarization only done for x  = y; 
z = x;?
Thus no further accesses to x?

>> That is, for the constant pool case we
>> usually have
>> 
>>   x = .LC0;
>>   .. = x[2];
>> 
>> so we have a "hint" that accesses on x are those we'd want to
>> optimize to accesses to .LC0.
>
>You don't need total scalarization for this, just the existence of
>read from x[2] would be sufficient but thanks for pointing me to the
>right direction.  For constant pool decl scalarization, it is not
>important to introduce artificial accesses for x but for .LC0.
>Therefore, I have adapted the patch to allow char arrays for const
>decls only and verified that it scalarizes a const-pool array of chars
>on Aarch64.  The (otherwise yet untested) result is below.
>
>What do you think?

Why special case char arrays?  I'd simply disallow total scalarization of 
non-const arrays completely.

>Martin
>
>
>2017-04-13  Martin Jambor  <mjam...@suse.cz>
>
>       * tree-sra.c (scalarizable_type_p): New parameter const_decl, make
>       char arrays not totally scalarizable if it is false.
>       (analyze_all_variable_accesses): Pass correct value in the new
>       parameter.
>
>testsuite/
>       * g++.dg/tree-ssa/pr80293.C: New test.
>---
>gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45
>+++++++++++++++++++++++++++++++++
> gcc/tree-sra.c                          | 21 ++++++++++-----
> 2 files changed, 60 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>
>diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>new file mode 100644
>index 00000000000..7faf35ae983
>--- /dev/null
>+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C
>@@ -0,0 +1,45 @@
>+// { dg-do compile }
>+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */
>+
>+#include <array>
>+
>+// Return a copy of the underlying memory of an arbitrary value.
>+template <
>+    typename T,
>+    typename = typename
>std::enable_if<std::is_trivially_copyable<T>::value>::type
>+>
>+auto getMem(
>+    T const & value
>+) -> std::array<char, sizeof(T)> {
>+    auto ret = std::array<char, sizeof(T)>{};
>+    __builtin_memcpy(ret.data(), &value, sizeof(T));
>+    return ret;
>+}
>+
>+template <
>+    typename T,
>+    typename = typename
>std::enable_if<std::is_trivially_copyable<T>::value>::type
>+>
>+auto fromMem(
>+    std::array<char, sizeof(T)> const & buf
>+) -> T {
>+    auto ret = T{};
>+    __builtin_memcpy(&ret, buf.data(), sizeof(T));
>+    return ret;
>+}
>+
>+double foo1(std::uint64_t arg) {
>+    return fromMem<double>(getMem(arg));
>+}
>+
>+double foo2(std::uint64_t arg) {
>+    return *reinterpret_cast<double*>(&arg);
>+}
>+
>+double foo3(std::uint64_t arg) {
>+    double ret;
>+    __builtin_memcpy(&ret, &arg, sizeof(arg));
>+    return ret;
>+}
>+
>+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } }
>diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
>index 02453d3ed9a..ab06be66131 100644
>--- a/gcc/tree-sra.c
>+++ b/gcc/tree-sra.c
>@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool
>write)
> 
>/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or
>fixed-length
>ARRAY_TYPE with fields that are either of gimple register types
>(excluding
>-   bit-fields) or (recursively) scalarizable types.  */
>+   bit-fields) or (recursively) scalarizable types.  CONST_DECL must
>be true if
>+   we are considering a decl from constant pool.  If it is false, char
>arrays
>+   will be refused.  */
> 
> static bool
>-scalarizable_type_p (tree type)
>+scalarizable_type_p (tree type, bool const_decl)
> {
>   gcc_assert (!is_gimple_reg_type (type));
>   if (type_contains_placeholder_p (type))
>@@ -970,7 +972,7 @@ scalarizable_type_p (tree type)
>           return false;
> 
>         if (!is_gimple_reg_type (ft)
>-            && !scalarizable_type_p (ft))
>+            && !scalarizable_type_p (ft, const_decl))
>           return false;
>       }
> 
>@@ -978,10 +980,16 @@ scalarizable_type_p (tree type)
> 
>   case ARRAY_TYPE:
>     {
>+      HOST_WIDE_INT min_elem_size;
>+      if (const_decl)
>+      min_elem_size = 0;
>+      else
>+      min_elem_size = BITS_PER_UNIT;
>+
>       if (TYPE_DOMAIN (type) == NULL_TREE
>         || !tree_fits_shwi_p (TYPE_SIZE (type))
>         || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type)))
>-        || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0)
>+        || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size)
>         || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type))))
>       return false;
>       if (tree_to_shwi (TYPE_SIZE (type)) == 0
>@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type)
> 
>       tree elem = TREE_TYPE (type);
>       if (!is_gimple_reg_type (elem)
>-       && !scalarizable_type_p (elem))
>+        && !scalarizable_type_p (elem, const_decl))
>       return false;
>       return true;
>     }
>@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void)
>       {
>       tree var = candidate (i);
> 
>-      if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var)))
>+      if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var),
>+                                              constant_decl_p (var)))
>         {
>           if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>               <= max_scalarization_size)

Reply via email to