On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjam...@suse.cz> wrote: >Hi, > >On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote: >> On Wed, 12 Apr 2017, Martin Jambor wrote: >> >> > Hi, >> > >> > the patch below is an attempt to deal with PR 80293 as >non-invasively >> > as possible. Basically, it switches off total SRA scalarization of >> > any local aggregates which contains an array of elements that have >one >> > byte (or less). >> > >> > The logic behind this is that accessing such arrays element-wise >> > usually results in poor code and that such char arrays are often >used >> > for non-statically-typed content anyway, and we do not want to copy >> > that byte per byte. >> > >> > Alan, do you think this could impact your constant pool >scalarization >> > too severely? >> >> Hmm, isn't one of the issues that we have >> >> if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) >> { >> if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) >> <= max_scalarization_size) >> { >> create_total_scalarization_access (var); >> >> which limits the size of scalarizable vars but not the number >> of accesses we create for total scalarization? > >Well, yes, but at least I understood from your comment #4 in the bug >that you did not want to limit the number of replacements for gcc 7? > >> >> Is scalarizable_type_p only used in contexts where we have no hint >> of the actual accesses? > >There are should_scalarize_away_bitmap and >cannot_scalarize_away_bitmap hints. > >Total scalarization is a hack process where we chop up small >aggregates according to their types - as opposed to actual accesses, >meaning it is an alien process to the rest of SRA - knowing that they >will go completely away afterwards (that is ensured by >cannot_scalarize_away_bitmap) and that this will facilitate copy >propagation (this is indicated by should_scalarize_away_bitmap, which >has a bit set if there is a non-scalar assignment _from_ (a part of) >the aggregate).
OK, but for the copy x = y where x would go away it still depends on uses of x what pieces we actually want? Or is total scalarization only done for x = y; z = x;? Thus no further accesses to x? >> That is, for the constant pool case we >> usually have >> >> x = .LC0; >> .. = x[2]; >> >> so we have a "hint" that accesses on x are those we'd want to >> optimize to accesses to .LC0. > >You don't need total scalarization for this, just the existence of >read from x[2] would be sufficient but thanks for pointing me to the >right direction. For constant pool decl scalarization, it is not >important to introduce artificial accesses for x but for .LC0. >Therefore, I have adapted the patch to allow char arrays for const >decls only and verified that it scalarizes a const-pool array of chars >on Aarch64. The (otherwise yet untested) result is below. > >What do you think? Why special case char arrays? I'd simply disallow total scalarization of non-const arrays completely. >Martin > > >2017-04-13 Martin Jambor <mjam...@suse.cz> > > * tree-sra.c (scalarizable_type_p): New parameter const_decl, make > char arrays not totally scalarizable if it is false. > (analyze_all_variable_accesses): Pass correct value in the new > parameter. > >testsuite/ > * g++.dg/tree-ssa/pr80293.C: New test. >--- >gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 >+++++++++++++++++++++++++++++++++ > gcc/tree-sra.c | 21 ++++++++++----- > 2 files changed, 60 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C > >diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C >b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C >new file mode 100644 >index 00000000000..7faf35ae983 >--- /dev/null >+++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C >@@ -0,0 +1,45 @@ >+// { dg-do compile } >+// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ >+ >+#include <array> >+ >+// Return a copy of the underlying memory of an arbitrary value. >+template < >+ typename T, >+ typename = typename >std::enable_if<std::is_trivially_copyable<T>::value>::type >+> >+auto getMem( >+ T const & value >+) -> std::array<char, sizeof(T)> { >+ auto ret = std::array<char, sizeof(T)>{}; >+ __builtin_memcpy(ret.data(), &value, sizeof(T)); >+ return ret; >+} >+ >+template < >+ typename T, >+ typename = typename >std::enable_if<std::is_trivially_copyable<T>::value>::type >+> >+auto fromMem( >+ std::array<char, sizeof(T)> const & buf >+) -> T { >+ auto ret = T{}; >+ __builtin_memcpy(&ret, buf.data(), sizeof(T)); >+ return ret; >+} >+ >+double foo1(std::uint64_t arg) { >+ return fromMem<double>(getMem(arg)); >+} >+ >+double foo2(std::uint64_t arg) { >+ return *reinterpret_cast<double*>(&arg); >+} >+ >+double foo3(std::uint64_t arg) { >+ double ret; >+ __builtin_memcpy(&ret, &arg, sizeof(arg)); >+ return ret; >+} >+ >+// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } >diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >index 02453d3ed9a..ab06be66131 100644 >--- a/gcc/tree-sra.c >+++ b/gcc/tree-sra.c >@@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool >write) > >/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or >fixed-length >ARRAY_TYPE with fields that are either of gimple register types >(excluding >- bit-fields) or (recursively) scalarizable types. */ >+ bit-fields) or (recursively) scalarizable types. CONST_DECL must >be true if >+ we are considering a decl from constant pool. If it is false, char >arrays >+ will be refused. */ > > static bool >-scalarizable_type_p (tree type) >+scalarizable_type_p (tree type, bool const_decl) > { > gcc_assert (!is_gimple_reg_type (type)); > if (type_contains_placeholder_p (type)) >@@ -970,7 +972,7 @@ scalarizable_type_p (tree type) > return false; > > if (!is_gimple_reg_type (ft) >- && !scalarizable_type_p (ft)) >+ && !scalarizable_type_p (ft, const_decl)) > return false; > } > >@@ -978,10 +980,16 @@ scalarizable_type_p (tree type) > > case ARRAY_TYPE: > { >+ HOST_WIDE_INT min_elem_size; >+ if (const_decl) >+ min_elem_size = 0; >+ else >+ min_elem_size = BITS_PER_UNIT; >+ > if (TYPE_DOMAIN (type) == NULL_TREE > || !tree_fits_shwi_p (TYPE_SIZE (type)) > || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) >- || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) >+ || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) > || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) > return false; > if (tree_to_shwi (TYPE_SIZE (type)) == 0 >@@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) > > tree elem = TREE_TYPE (type); > if (!is_gimple_reg_type (elem) >- && !scalarizable_type_p (elem)) >+ && !scalarizable_type_p (elem, const_decl)) > return false; > return true; > } >@@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) > { > tree var = candidate (i); > >- if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) >+ if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), >+ constant_decl_p (var))) > { > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > <= max_scalarization_size)