Hi, On Wed, Apr 12, 2017 at 01:55:01PM +0200, Richard Biener wrote: > On Wed, 12 Apr 2017, Martin Jambor wrote: > > > Hi, > > > > the patch below is an attempt to deal with PR 80293 as non-invasively > > as possible. Basically, it switches off total SRA scalarization of > > any local aggregates which contains an array of elements that have one > > byte (or less). > > > > The logic behind this is that accessing such arrays element-wise > > usually results in poor code and that such char arrays are often used > > for non-statically-typed content anyway, and we do not want to copy > > that byte per byte. > > > > Alan, do you think this could impact your constant pool scalarization > > too severely? > > Hmm, isn't one of the issues that we have > > if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) > { > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > <= max_scalarization_size) > { > create_total_scalarization_access (var); > > which limits the size of scalarizable vars but not the number > of accesses we create for total scalarization?
Well, yes, but at least I understood from your comment #4 in the bug that you did not want to limit the number of replacements for gcc 7? > > Is scalarizable_type_p only used in contexts where we have no hint > of the actual accesses? There are should_scalarize_away_bitmap and cannot_scalarize_away_bitmap hints. Total scalarization is a hack process where we chop up small aggregates according to their types - as opposed to actual accesses, meaning it is an alien process to the rest of SRA - knowing that they will go completely away afterwards (that is ensured by cannot_scalarize_away_bitmap) and that this will facilitate copy propagation (this is indicated by should_scalarize_away_bitmap, which has a bit set if there is a non-scalar assignment _from_ (a part of) the aggregate). > That is, for the constant pool case we > usually have > > x = .LC0; > .. = x[2]; > > so we have a "hint" that accesses on x are those we'd want to > optimize to accesses to .LC0. You don't need total scalarization for this, just the existence of read from x[2] would be sufficient but thanks for pointing me to the right direction. For constant pool decl scalarization, it is not important to introduce artificial accesses for x but for .LC0. Therefore, I have adapted the patch to allow char arrays for const decls only and verified that it scalarizes a const-pool array of chars on Aarch64. The (otherwise yet untested) result is below. What do you think? Martin 2017-04-13 Martin Jambor <mjam...@suse.cz> * tree-sra.c (scalarizable_type_p): New parameter const_decl, make char arrays not totally scalarizable if it is false. (analyze_all_variable_accesses): Pass correct value in the new parameter. testsuite/ * g++.dg/tree-ssa/pr80293.C: New test. --- gcc/testsuite/g++.dg/tree-ssa/pr80293.C | 45 +++++++++++++++++++++++++++++++++ gcc/tree-sra.c | 21 ++++++++++----- 2 files changed, 60 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr80293.C diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr80293.C b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C new file mode 100644 index 00000000000..7faf35ae983 --- /dev/null +++ b/gcc/testsuite/g++.dg/tree-ssa/pr80293.C @@ -0,0 +1,45 @@ +// { dg-do compile } +// { dg-options "-O2 -std=gnu++11 -fdump-tree-optimized" } */ + +#include <array> + +// Return a copy of the underlying memory of an arbitrary value. +template < + typename T, + typename = typename std::enable_if<std::is_trivially_copyable<T>::value>::type +> +auto getMem( + T const & value +) -> std::array<char, sizeof(T)> { + auto ret = std::array<char, sizeof(T)>{}; + __builtin_memcpy(ret.data(), &value, sizeof(T)); + return ret; +} + +template < + typename T, + typename = typename std::enable_if<std::is_trivially_copyable<T>::value>::type +> +auto fromMem( + std::array<char, sizeof(T)> const & buf +) -> T { + auto ret = T{}; + __builtin_memcpy(&ret, buf.data(), sizeof(T)); + return ret; +} + +double foo1(std::uint64_t arg) { + return fromMem<double>(getMem(arg)); +} + +double foo2(std::uint64_t arg) { + return *reinterpret_cast<double*>(&arg); +} + +double foo3(std::uint64_t arg) { + double ret; + __builtin_memcpy(&ret, &arg, sizeof(arg)); + return ret; +} + +// { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "optimized" } } diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 02453d3ed9a..ab06be66131 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -949,10 +949,12 @@ create_access (tree expr, gimple *stmt, bool write) /* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length ARRAY_TYPE with fields that are either of gimple register types (excluding - bit-fields) or (recursively) scalarizable types. */ + bit-fields) or (recursively) scalarizable types. CONST_DECL must be true if + we are considering a decl from constant pool. If it is false, char arrays + will be refused. */ static bool -scalarizable_type_p (tree type) +scalarizable_type_p (tree type, bool const_decl) { gcc_assert (!is_gimple_reg_type (type)); if (type_contains_placeholder_p (type)) @@ -970,7 +972,7 @@ scalarizable_type_p (tree type) return false; if (!is_gimple_reg_type (ft) - && !scalarizable_type_p (ft)) + && !scalarizable_type_p (ft, const_decl)) return false; } @@ -978,10 +980,16 @@ scalarizable_type_p (tree type) case ARRAY_TYPE: { + HOST_WIDE_INT min_elem_size; + if (const_decl) + min_elem_size = 0; + else + min_elem_size = BITS_PER_UNIT; + if (TYPE_DOMAIN (type) == NULL_TREE || !tree_fits_shwi_p (TYPE_SIZE (type)) || !tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (type))) - || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= 0) + || (tree_to_shwi (TYPE_SIZE (TREE_TYPE (type))) <= min_elem_size) || !tree_fits_shwi_p (TYPE_MIN_VALUE (TYPE_DOMAIN (type)))) return false; if (tree_to_shwi (TYPE_SIZE (type)) == 0 @@ -995,7 +1003,7 @@ scalarizable_type_p (tree type) tree elem = TREE_TYPE (type); if (!is_gimple_reg_type (elem) - && !scalarizable_type_p (elem)) + && !scalarizable_type_p (elem, const_decl)) return false; return true; } @@ -2653,7 +2661,8 @@ analyze_all_variable_accesses (void) { tree var = candidate (i); - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var))) + if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), + constant_decl_p (var))) { if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) <= max_scalarization_size) -- 2.12.0