On Wed, Jan 4, 2017 at 2:22 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Thu, Dec 22, 2016 at 7:26 AM, Jeff Law <l...@redhat.com> wrote:
>> This is the first of the 4 part patchkit to address deficiencies in our DSE
>> implementation.
>>
>> This patch addresses the P2 regression 33562 which has been a low priority
>> regression since gcc-4.3.  To summarize, DSE no longer has the ability to
>> detect an aggregate store as dead if subsequent stores are done in a
>> piecemeal fashion.
>>
>> I originally tackled this by changing how we lower complex objects. That was
>> sufficient to address 33562, but was reasonably rejected.
>>
>> This version attacks the problem by improving DSE to track stores to memory
>> at a byte level.  That allows us to determine if a series of stores
>> completely covers an earlier store (thus making the earlier store dead).
>>
>> A useful side effect of this is we can detect when parts of a store are dead
>> and potentially rewrite the store.  This patch implements that for complex
>> object initializations.  While not strictly part of 33562, it's so closely
>> related that I felt it belongs as part of this patch.
>>
>> This originally limited the size of the tracked memory space to 64 bytes.  I
>> bumped the limit after working through the CONSTRUCTOR and mem* trimming
>> patches.  The 256 byte limit is still fairly arbitrary and I wouldn't lose
>> sleep if we throttled back to 64 or 128 bytes.
>>
>> Later patches in the kit will build upon this patch.  So if pieces look like
>> skeleton code, that's because it is.
>>
>> The changes since the V2 patch are:
>>
>> 1. Using sbitmaps rather than bitmaps.
>> 2. Returning a tri-state from dse_classify_store (renamed from
>> dse_possible_dead_store_p)
>> 3. More efficient trim computation
>> 4. Moving trimming code out of dse_classify_store
>> 5. Refactoring code to delete dead calls/assignments
>> 6. dse_optimize_stmt moves into the dse_dom_walker class
>>
>> Not surprisingly, this patch has most of the changes based on prior feedback
>> as it includes the raw infrastructure.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.  OK for the trunk?
>
> New functions in sbitmap.c lack function comments.
>
> bitmap_count_bits fails to guard against GCC_VERSION >= 3400 (the version
> is not important, but non-GCC host compilers are).  See bitmap.c for a
> fallback.
>
> Both bitmap_clear_range and bitmap_set_range look rather inefficient...
> (it's not likely anybody will clean this up after you)
>
> I'd say split out the sbitmap.[ch] changes.
>
> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
> +        "dse-max-object-size",
> +        "Maximum size (in bytes) of objects tracked by dead store
> elimination.",
> +        256, 0, 0)
>
> the docs suggest that DSE doesn't handle larger stores but it does (just in
> the original limited way).  Maybe "tracked bytewise" is better.

Oh, and new --params need documeting in invoke.texi.

Richard.

> +static bool
> +valid_ao_ref_for_dse (ao_ref *ref)
> +{
> +  return (ao_ref_base (ref)
> +         && ref->max_size != -1
> +         && (ref->offset % BITS_PER_UNIT) == 0
> +         && (ref->size % BITS_PER_UNIT) == 0
> +         && (ref->size / BITS_PER_UNIT) > 0);
>
> I think the last test is better written as ref->size != -1.
>
> Btw, seeing you discount non-byte size/offset stores this somehow asks
> for store-merging being done before the last DSE (it currently runs after).
> Sth to keep in mind for GCC 8.
>
> +/* Delete a dead store STMT, which is mem* call of some kind.  */
>
> call STMT
>
> +static void
> +delete_dead_call (gimple *stmt)
> +{
> +
> excess vertical space
> ......
> +  if (lhs)
> +    {
> +      tree ptr = gimple_call_arg (stmt, 0);
> +      gimple *new_stmt = gimple_build_assign (lhs, ptr);
> +      unlink_stmt_vdef (stmt);
> +      if (gsi_replace (&gsi, new_stmt, true))
> +        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>
>   release_ssa_name (gimple_vdef (stmt));
>
>
> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
>
> formatting.
>
> The DSE parts look good to me with the nits above fixed.  Just re-spin
> the sbitmap.[ch] part please.
>
> Thanks,
> Richard.
>
>>
>>         PR tree-optimization/33562
>>         * params.def (PARM_DSE_MAX_OBJECT_SIZE): New PARAM.
>>         * sbitmap.h (bitmap_clear_range, bitmap_set_range): Prototype new
>>         functions.
>>         (bitmap_count_bits): Likewise.
>>         * sbitmap.c (bitmap_clear_range, bitmap_set_range): New functions.
>>         (bitmap_count_bits): Likewise.
>>         * tree-ssa-dse.c: Include params.h.
>>         (dse_store_status): New enum.
>>         (initialize_ao_ref_for_dse): New, partially extracted from
>>         dse_optimize_stmt.
>>         (valid_ao_ref_for_dse, normalize_ref): New.
>>         (setup_live_bytes_from_ref, compute_trims): Likewise.
>>         (clear_bytes_written_by, trim_complex_store): Likewise.
>>         (maybe_trim_partially_dead_store): Likewise.
>>         (maybe_trim_complex_store): Likewise.
>>         (dse_classify_store): Renamed from dse_possibly_dead_store_p.
>>         Track what bytes live from the original store.  Return tri-state
>>         for dead, partially dead or live.
>>         (dse_dom_walker): Add constructor, destructor and new private
>> members.
>>         (delete_dead_call, delete_dead_assignment): New extracted from
>>         dse_optimize_stmt.
>>         (dse_optimize_stmt): Make a member of dse_dom_walker.
>>         Use initialize_ao_ref_for_dse.
>>
>>
>>         * gcc.dg/tree-ssa/complex-4.c: No longer xfailed.
>>         * gcc.dg/tree-ssa/complex-5.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-9.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-18.c: New test.
>>         * gcc.dg/tree-ssa/ssa-dse-19.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-20.c: Likewise.
>>         * gcc.dg/tree-ssa/ssa-dse-21.c: Likewise.
>>
>> diff --git a/gcc/params.def b/gcc/params.def
>> index 50f75a7..f367c1d 100644
>> --- a/gcc/params.def
>> +++ b/gcc/params.def
>> @@ -532,6 +532,11 @@ DEFPARAM(PARAM_AVG_LOOP_NITER,
>>          "Average number of iterations of a loop.",
>>          10, 1, 0)
>>
>> +DEFPARAM(PARAM_DSE_MAX_OBJECT_SIZE,
>> +        "dse-max-object-size",
>> +        "Maximum size (in bytes) of objects tracked by dead store
>> elimination.",
>> +        256, 0, 0)
>> +
>>  DEFPARAM(PARAM_SCEV_MAX_EXPR_SIZE,
>>          "scev-max-expr-size",
>>          "Bound on size of expressions used in the scalar evolutions
>> analyzer.",
>> diff --git a/gcc/sbitmap.c b/gcc/sbitmap.c
>> index 10b4347..2b66a6c 100644
>> --- a/gcc/sbitmap.c
>> +++ b/gcc/sbitmap.c
>> @@ -202,6 +202,39 @@ bitmap_empty_p (const_sbitmap bmap)
>>    return true;
>>  }
>>
>> +void
>> +bitmap_clear_range (sbitmap bmap, unsigned int start, unsigned int count)
>> +{
>> +  for (unsigned int i = start; i < start + count; i++)
>> +    bitmap_clear_bit (bmap, i);
>> +}
>> +
>> +void
>> +bitmap_set_range (sbitmap bmap, unsigned int start, unsigned int count)
>> +{
>> +  for (unsigned int i = start; i < start + count; i++)
>> +    bitmap_set_bit (bmap, i);
>> +}
>> +
>> +
>> +unsigned int
>> +bitmap_count_bits (const_sbitmap bmap)
>> +{
>> +  unsigned int count = 0;
>> +  for (unsigned int i = 0; i < bmap->size; i++)
>> +    if (bmap->elms[i])
>> +      {
>> +# if HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONG
>> +       count += __builtin_popcountl (bmap->elms[i]);
>> +# elif HOST_BITS_PER_WIDEST_FAST_INT == HOST_BITS_PER_LONGLONG
>> +       count += __builtin_popcountll (bmap->elms[i]);
>> +# else
>> +       count += __builtin_popcount (bmap->elms[i]);
>> +# endif
>> +      }
>> +  return count;
>> +}
>> +
>>
>>  /* Zero all elements in a bitmap.  */
>>
>> diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h
>> index bd734f9..aa43d0d 100644
>> --- a/gcc/sbitmap.h
>> +++ b/gcc/sbitmap.h
>> @@ -231,8 +231,11 @@ extern sbitmap *sbitmap_vector_alloc (unsigned int,
>> unsigned int);
>>  extern sbitmap sbitmap_resize (sbitmap, unsigned int, int);
>>  extern void bitmap_copy (sbitmap, const_sbitmap);
>>  extern int bitmap_equal_p (const_sbitmap, const_sbitmap);
>> +extern unsigned int bitmap_count_bits (const_sbitmap);
>>  extern bool bitmap_empty_p (const_sbitmap);
>>  extern void bitmap_clear (sbitmap);
>> +extern void bitmap_clear_range (sbitmap, unsigned, unsigned);
>> +extern void bitmap_set_range (sbitmap, unsigned, unsigned);
>>  extern void bitmap_ones (sbitmap);
>>  extern void bitmap_vector_clear (sbitmap *, unsigned int);
>>  extern void bitmap_vector_ones (sbitmap *, unsigned int);
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> index 87a2638..3155741 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-4.c
>> @@ -10,4 +10,4 @@ int f(void)
>>    return g(&t);
>>  }
>>
>> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
>> *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> index e2cd403..e6d027f 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/complex-5.c
>> @@ -8,4 +8,4 @@ int f(void)
>>   __imag__ t = 2;
>>  }
>>
>> -/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" { xfail
>> *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
>> new file mode 100644
>> index 0000000..92b2df8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-18.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +int g(_Complex int*);
>> +int f(void)
>> +{
>> +  _Complex int t = 0;
>> +  int i, j;
>> + __imag__ t += 2;
>> +  return g(&t);
>> +}
>> +
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
>> new file mode 100644
>> index 0000000..718b746
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-19.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +int g(_Complex int*);
>> +int f(void)
>> +{
>> +  _Complex int t = 0;
>> +  int i, j;
>> + __real__ t += 2;
>> +  return g(&t);
>> +}
>> +
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
>> new file mode 100644
>> index 0000000..4e14d9b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-20.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
>> +_Complex int t = 0;
>> +int f(void)
>> +{
>> +  t = 0;
>> + __imag__ t = 2;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> +
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
>> new file mode 100644
>> index 0000000..d1e0b85
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-21.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O1 -fno-tree-dce -fdump-tree-optimized" } */
>> +_Complex int t = 0;
>> +int f(void)
>> +{
>> +  t = 0;
>> + __real__ t = 2;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump-times "__complex__" 0 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "REALPART_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "IMAGPART_EXPR" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> index 594c20c..ae48ddd 100644
>> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-9.c
>> @@ -11,4 +11,4 @@ foo ()
>>  }
>>
>>  /* We should eliminate the first assignment.  */
>> -/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" { xfail *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "VDEF" 2 "dse1" } } */
>> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
>> index 778b363..40acd83 100644
>> --- a/gcc/tree-ssa-dse.c
>> +++ b/gcc/tree-ssa-dse.c
>> @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-dfa.h"
>>  #include "domwalk.h"
>>  #include "tree-cfgcleanup.h"
>> +#include "params.h"
>>
>>  /* This file implements dead store elimination.
>>
>> @@ -68,14 +69,242 @@ along with GCC; see the file COPYING3.  If not see
>>     remove their dead edges eventually.  */
>>  static bitmap need_eh_cleanup;
>>
>> +/* Return value from dse_classify_store */
>> +enum dse_store_status
>> +{
>> +  DSE_STORE_LIVE,
>> +  DSE_STORE_MAYBE_PARTIAL_DEAD,
>> +  DSE_STORE_DEAD
>> +};
>> +
>> +/* STMT is a statement that may write into memory.  Analyze it and
>> +   initialize WRITE to describe how STMT affects memory.
>> +
>> +   Return TRUE if the the statement was analyzed, FALSE otherwise.
>> +
>> +   It is always safe to return FALSE.  But typically better optimziation
>> +   can be achieved by analyzing more statements.  */
>> +
>> +static bool
>> +initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
>> +{
>> +  /* It's advantageous to handle certain mem* functions.  */
>> +  if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>> +    {
>> +      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
>> +       {
>> +         case BUILT_IN_MEMCPY:
>> +         case BUILT_IN_MEMMOVE:
>> +         case BUILT_IN_MEMSET:
>> +           {
>> +             tree size = NULL_TREE;
>> +             if (gimple_call_num_args (stmt) == 3)
>> +               size = gimple_call_arg (stmt, 2);
>> +             tree ptr = gimple_call_arg (stmt, 0);
>> +             ao_ref_init_from_ptr_and_size (write, ptr, size);
>> +             return true;
>> +           }
>> +         default:
>> +           break;
>> +       }
>> +    }
>> +  else if (is_gimple_assign (stmt))
>> +    {
>> +      ao_ref_init (write, gimple_assign_lhs (stmt));
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Given REF from the the alias oracle, return TRUE if it is a valid
>> +   memory reference for dead store elimination, false otherwise.
>> +
>> +   In particular, the reference must have a known base, known maximum
>> +   size, start at a byte offset and have a size that is one or more
>> +   bytes.   Finally the offset and size must be smaller than
>> +   PARAM_DSE_MAX_OBJECT_SIZE.  */
>> +
>> +static bool
>> +valid_ao_ref_for_dse (ao_ref *ref)
>> +{
>> +  return (ao_ref_base (ref)
>> +         && ref->max_size != -1
>> +         && (ref->offset % BITS_PER_UNIT) == 0
>> +         && (ref->size % BITS_PER_UNIT) == 0
>> +         && (ref->size / BITS_PER_UNIT) > 0);
>> +}
>> +
>> +/* Normalize COPY (an ao_ref) relative to REF.  Essentially when we are
>> +   done COPY will only refer bytes found within REF.
>> +
>> +   We have already verified that COPY intersects at least one
>> +   byte with REF.  */
>> +
>> +static void
>> +normalize_ref (ao_ref *copy, ao_ref *ref)
>> +{
>> +  /* If COPY starts before REF, then reset the beginning of
>> +     COPY to match REF and decrease the size of COPY by the
>> +     number of bytes removed from COPY.  */
>> +  if (copy->offset < ref->offset)
>> +    {
>> +      copy->size -= (ref->offset - copy->offset);
>> +      copy->offset = ref->offset;
>> +    }
>> +
>> +  /* If COPY extends beyond REF, chop off its size appropriately.  */
>> +  if (copy->offset + copy->size > ref->offset + ref->size)
>> +    copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
>> +}
>> +
>> +/* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
>> +   address written by STMT must match the one found in REF, which must
>> +   have its base address previously initialized.
>> +
>> +   This routine must be conservative.  If we don't know the offset or
>> +   actual size written, assume nothing was written.  */
>> +
>> +static void
>> +clear_bytes_written_by (sbitmap live_bytes, gimple *stmt, ao_ref *ref)
>> +{
>> +  ao_ref write;
>> +  if (!initialize_ao_ref_for_dse (stmt, &write))
>> +    return;
>> +
>> +  /* Verify we have the same base memory address, the write
>> +     has a known size and overlaps with REF.  */
>> +  if (valid_ao_ref_for_dse (&write)
>> +      && write.base == ref->base
>> +      && write.size == write.max_size
>> +      && ((write.offset < ref->offset
>> +          && write.offset + write.size > ref->offset)
>> +         || (write.offset >= ref->offset
>> +             && write.offset < ref->offset + ref->size)))
>> +    {
>> +      normalize_ref (&write, ref);
>> +      bitmap_clear_range (live_bytes,
>> +                         (write.offset - ref->offset) / BITS_PER_UNIT,
>> +                         write.size / BITS_PER_UNIT);
>> +    }
>> +}
>> +
>> +/* REF is a memory write.  Extract relevant information from it and
>> +   initialize the LIVE_BYTES bitmap.  If successful, return TRUE.
>> +   Otherwise return FALSE.  */
>> +
>> +static bool
>> +setup_live_bytes_from_ref (ao_ref *ref, sbitmap live_bytes)
>> +{
>> +  if (valid_ao_ref_for_dse (ref)
>> +      && (ref->size / BITS_PER_UNIT
>> +         <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
>> +    {
>> +      bitmap_clear (live_bytes);
>> +      bitmap_set_range (live_bytes, 0, ref->size / BITS_PER_UNIT);
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>> +/* Compute the number of elements that we can trim from the head and
>> +   tail of ORIG resulting in a bitmap that is a superset of LIVE.
>> +
>> +   Store the number of elements trimmed from the head and tail in
>> +   TRIM_HEAD and TRIM_TAIL.  */
>> +
>> +static void
>> +compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
>> +{
>> +  /* We use sbitmaps biased such that ref->offset is bit zero and the
>> bitmap
>> +     extends through ref->size.  So we know that in the original bitmap
>> +     bits 0..ref->size were true.  We don't actually need the bitmap, just
>> +     the REF to compute the trims.  */
>> +
>> +  /* Now identify how much, if any of the tail we can chop off.  */
>> +  *trim_tail = 0;
>> +  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
>> +  int last_live = bitmap_last_set_bit (live);
>> +  *trim_tail = (last_orig - last_live) & ~0x1;
>> +
>> +  /* Identify how much, if any of the head we can chop off.  */
>> +  int first_orig = 0;
>> +  int first_live = bitmap_first_set_bit (live);
>> +  *trim_head = (first_live - first_orig) & ~0x1;
>> +}
>> +
>> +/* STMT initializes an object from COMPLEX_CST where one or more of the
>> +   bytes written may be dead stores.  REF is a representation of the
>> +   memory written.  LIVE is the bitmap of stores that are actually live.
>> +
>> +   Attempt to rewrite STMT so that only the real or imaginary part of
>> +   the object is actually stored.  */
>> +
>> +static void
>> +maybe_trim_complex_store (ao_ref *ref, sbitmap live, gimple *stmt)
>> +{
>> +  int trim_head, trim_tail;
>> +  compute_trims (ref, live, &trim_head, &trim_tail);
>> +
>> +  /* The amount of data trimmed from the head or tail must be at
>> +     least half the size of the object to ensure we're trimming
>> +     the entire real or imaginary half.  By writing things this
>> +     way we avoid more O(n) bitmap operations.  */
>> +  if (trim_tail * 2 >= ref->size / BITS_PER_UNIT)
>> +    {
>> +      /* TREE_REALPART is live */
>> +      tree x = TREE_REALPART (gimple_assign_rhs1 (stmt));
>> +      tree y = gimple_assign_lhs (stmt);
>> +      y = build1 (REALPART_EXPR, TREE_TYPE (x), y);
>> +      gimple_assign_set_lhs (stmt, y);
>> +      gimple_assign_set_rhs1 (stmt, x);
>> +    }
>> +  else if (trim_head * 2 >= ref->size / BITS_PER_UNIT)
>> +    {
>> +      /* TREE_IMAGPART is live */
>> +      tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt));
>> +      tree y = gimple_assign_lhs (stmt);
>> +      y = build1 (IMAGPART_EXPR, TREE_TYPE (x), y);
>> +      gimple_assign_set_lhs (stmt, y);
>> +      gimple_assign_set_rhs1 (stmt, x);
>> +    }
>> +
>> +  /* Other cases indicate parts of both the real and imag subobjects
>> +     are live.  We do not try to optimize those cases.  */
>> +}
>> +
>> +/* STMT is a memory write where one or more bytes written are dead
>> +   stores.  ORIG is the bitmap of bytes stored by STMT.  LIVE is the
>> +   bitmap of stores that are actually live.
>> +
>> +   Attempt to rewrite STMT so that it writes fewer memory locations.  Right
>> +   now we only support trimming at the start or end of the memory region.
>> +   It's not clear how much there is to be gained by trimming from the
>> middle
>> +   of the region.  */
>> +
>> +static void
>> +maybe_trim_partially_dead_store (ao_ref *ref, sbitmap live, gimple *stmt)
>> +{
>> +  if (is_gimple_assign (stmt))
>> +    {
>> +      switch (gimple_assign_rhs_code (stmt))
>> +       {
>> +       case COMPLEX_CST:
>> +         maybe_trim_complex_store (ref, live, stmt);
>> +         break;
>> +       default:
>> +         break;
>> +       }
>> +    }
>> +}
>>
>>  /* A helper of dse_optimize_stmt.
>>     Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
>>     statement *USE_STMT that may prove STMT to be dead.
>>     Return TRUE if the above conditions are met, otherwise FALSE.  */
>>
>> -static bool
>> -dse_possible_dead_store_p (ao_ref *ref, gimple *stmt, gimple **use_stmt)
>> +static dse_store_status
>> +dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
>> +                   bool byte_tracking_enabled, sbitmap live_bytes)
>>  {
>>    gimple *temp;
>>    unsigned cnt = 0;
>> @@ -97,7 +326,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>        /* Limit stmt walking to be linear in the number of possibly
>>           dead stores.  */
>>        if (++cnt > 256)
>> -       return false;
>> +       return DSE_STORE_LIVE;
>>
>>        if (gimple_code (temp) == GIMPLE_PHI)
>>         defvar = PHI_RESULT (temp);
>> @@ -135,7 +364,7 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>                   fail = true;
>>                   BREAK_FROM_IMM_USE_STMT (ui);
>>                 }
>> -             /* Do not consider the PHI as use if it dominates the
>> +             /* Do not consider the PHI as use if it dominates the
>>                  stmt defining the virtual operand we are processing,
>>                  we have processed it already in this case.  */
>>               if (gimple_bb (defvar_def) != gimple_bb (use_stmt)
>> @@ -164,7 +393,13 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>         }
>>
>>        if (fail)
>> -       return false;
>> +       {
>> +         /* STMT might be partially dead and we may be able to reduce
>> +            how many memory locations it stores into.  */
>> +         if (byte_tracking_enabled && !gimple_clobber_p (stmt))
>> +           return DSE_STORE_MAYBE_PARTIAL_DEAD;
>> +         return DSE_STORE_LIVE;
>> +       }
>>
>>        /* If we didn't find any definition this means the store is dead
>>           if it isn't a store to global reachable memory.  In this case
>> @@ -172,20 +407,100 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>        if (!temp)
>>         {
>>           if (ref_may_alias_global_p (ref))
>> -           return false;
>> +           return DSE_STORE_LIVE;
>>
>>           temp = stmt;
>>           break;
>>         }
>> +
>> +      if (byte_tracking_enabled && temp)
>> +       clear_bytes_written_by (live_bytes, temp, ref);
>>      }
>> -  /* Continue walking until we reach a kill.  */
>> -  while (!stmt_kills_ref_p (temp, ref));
>> +  /* Continue walking until we reach a full kill as a single statement
>> +     or there are no more live bytes.  */
>> +  while (!stmt_kills_ref_p (temp, ref)
>> +        && !(byte_tracking_enabled && bitmap_empty_p (live_bytes)));
>>
>>    *use_stmt = temp;
>> +  return DSE_STORE_DEAD;
>> +}
>> +
>> +
>> +class dse_dom_walker : public dom_walker
>> +{
>> +public:
>> +  dse_dom_walker (cdi_direction direction) : dom_walker (direction)
>> +  { m_live_bytes = sbitmap_alloc (PARAM_VALUE
>> (PARAM_DSE_MAX_OBJECT_SIZE));m_byte_tracking_enabled = false; }
>> +
>> +  ~dse_dom_walker () { sbitmap_free (m_live_bytes); }
>> +
>> +  virtual edge before_dom_children (basic_block);
>>
>> -  return true;
>> +private:
>> +  sbitmap m_live_bytes;
>> +  bool m_byte_tracking_enabled;
>> +  void dse_optimize_stmt (gimple_stmt_iterator *);
>> +};
>> +
>> +/* Delete a dead store STMT, which is mem* call of some kind.  */
>> +static void
>> +delete_dead_call (gimple *stmt)
>> +{
>> +
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "  Deleted dead call: ");
>> +      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
>> +      fprintf (dump_file, "\n");
>> +    }
>> +
>> +  tree lhs = gimple_call_lhs (stmt);
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>> +  if (lhs)
>> +    {
>> +      tree ptr = gimple_call_arg (stmt, 0);
>> +      gimple *new_stmt = gimple_build_assign (lhs, ptr);
>> +      unlink_stmt_vdef (stmt);
>> +      if (gsi_replace (&gsi, new_stmt, true))
>> +        bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>> +    }
>> +  else
>> +    {
>> +      /* Then we need to fix the operand of the consuming stmt.  */
>> +      unlink_stmt_vdef (stmt);
>> +
>> +      /* Remove the dead store.  */
>> +      if (gsi_remove (&gsi, true))
>> +       bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
>> +      release_defs (stmt);
>> +    }
>>  }
>>
>> +/* Delete a dead store STMT, which is a gimple assignment. */
>> +
>> +static void
>> +delete_dead_assignment (gimple *stmt)
>> +{
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "  Deleted dead store: ");
>> +      print_gimple_stmt (dump_file, stmt, dump_flags, 0);
>> +      fprintf (dump_file, "\n");
>> +    }
>> +
>> +  /* Then we need to fix the operand of the consuming stmt.  */
>> +  unlink_stmt_vdef (stmt);
>> +
>> +  /* Remove the dead store.  */
>> +  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
>> +  basic_block bb = gimple_bb (stmt);
>> +  if (gsi_remove (&gsi, true))
>> +    bitmap_set_bit (need_eh_cleanup, bb->index);
>> +
>> +  /* And release any SSA_NAMEs set in this statement back to the
>> +     SSA_NAME manager.  */
>> +  release_defs (stmt);
>> +}
>>
>>  /* Attempt to eliminate dead stores in the statement referenced by BSI.
>>
>> @@ -198,8 +513,8 @@ dse_possible_dead_store_p (ao_ref *ref, gimple *stmt,
>> gimple **use_stmt)
>>     is used precisely once by a later store to the same location which
>>     post dominates the first store, then the first store is dead.  */
>>
>> -static void
>> -dse_optimize_stmt (gimple_stmt_iterator *gsi)
>> +void
>> +dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>  {
>>    gimple *stmt = gsi_stmt (*gsi);
>>
>> @@ -214,6 +529,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>           || TREE_CODE (gimple_assign_lhs (stmt)) != MEM_REF))
>>      return;
>>
>> +  ao_ref ref;
>> +  if (!initialize_ao_ref_for_dse (stmt, &ref))
>> +    return;
>> +
>>    /* We know we have virtual definitions.  We can handle assignments and
>>       some builtin calls.  */
>>    if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
>> @@ -225,42 +544,26 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>           case BUILT_IN_MEMSET:
>>             {
>>               gimple *use_stmt;
>> -             ao_ref ref;
>> -             tree size = NULL_TREE;
>> -             if (gimple_call_num_args (stmt) == 3)
>> -               size = gimple_call_arg (stmt, 2);
>> -             tree ptr = gimple_call_arg (stmt, 0);
>> -             ao_ref_init_from_ptr_and_size (&ref, ptr, size);
>> -             if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
>> +             enum dse_store_status store_status;
>> +             m_byte_tracking_enabled
>> +               = setup_live_bytes_from_ref (&ref, m_live_bytes);
>> +             store_status = dse_classify_store (&ref, stmt, &use_stmt,
>> +                                                m_byte_tracking_enabled,
>> +                                                m_live_bytes);
>> +             if (store_status == DSE_STORE_LIVE)
>>                 return;
>>
>> -             if (dump_file && (dump_flags & TDF_DETAILS))
>> -               {
>> -                 fprintf (dump_file, "  Deleted dead call: ");
>> -                 print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags,
>> 0);
>> -                 fprintf (dump_file, "\n");
>> -               }
>> -
>> -             tree lhs = gimple_call_lhs (stmt);
>> -             if (lhs)
>> +             if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>>                 {
>> -                 gimple *new_stmt = gimple_build_assign (lhs, ptr);
>> -                 unlink_stmt_vdef (stmt);
>> -                 if (gsi_replace (gsi, new_stmt, true))
>> -                   bitmap_set_bit (need_eh_cleanup, gimple_bb
>> (stmt)->index);
>> +                 maybe_trim_partially_dead_store (&ref, m_live_bytes,
>> stmt);
>> +                 return;
>>                 }
>> -             else
>> -               {
>> -                 /* Then we need to fix the operand of the consuming stmt.
>> */
>> -                 unlink_stmt_vdef (stmt);
>>
>> -                 /* Remove the dead store.  */
>> -                 if (gsi_remove (gsi, true))
>> -                   bitmap_set_bit (need_eh_cleanup, gimple_bb
>> (stmt)->index);
>> -                 release_defs (stmt);
>> -               }
>> -             break;
>> +             if (store_status == DSE_STORE_DEAD)
>> +               delete_dead_call (stmt);
>> +             return;
>>             }
>> +
>>           default:
>>             return;
>>         }
>> @@ -276,10 +579,20 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>         use_stmt = stmt;
>>        else
>>         {
>> -         ao_ref ref;
>> -         ao_ref_init (&ref, gimple_assign_lhs (stmt));
>> -         if (!dse_possible_dead_store_p (&ref, stmt, &use_stmt))
>> +         m_byte_tracking_enabled
>> +           = setup_live_bytes_from_ref (&ref, m_live_bytes);
>> +         enum dse_store_status store_status;
>> +         store_status = dse_classify_store (&ref, stmt, &use_stmt,
>> +                                            m_byte_tracking_enabled,
>> +                                            m_live_bytes);
>> +         if (store_status == DSE_STORE_LIVE)
>>             return;
>> +
>> +         if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>> +           {
>> +             maybe_trim_partially_dead_store (&ref, m_live_bytes, stmt);
>> +             return;
>> +           }
>>         }
>>
>>        /* Now we know that use_stmt kills the LHS of stmt.  */
>> @@ -290,35 +603,10 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi)
>>           && !gimple_clobber_p (use_stmt))
>>         return;
>>
>> -      if (dump_file && (dump_flags & TDF_DETAILS))
>> -       {
>> -         fprintf (dump_file, "  Deleted dead store: ");
>> -         print_gimple_stmt (dump_file, gsi_stmt (*gsi), dump_flags, 0);
>> -         fprintf (dump_file, "\n");
>> -       }
>> -
>> -      /* Then we need to fix the operand of the consuming stmt.  */
>> -      unlink_stmt_vdef (stmt);
>> -
>> -      /* Remove the dead store.  */
>> -      basic_block bb = gimple_bb (stmt);
>> -      if (gsi_remove (gsi, true))
>> -       bitmap_set_bit (need_eh_cleanup, bb->index);
>> -
>> -      /* And release any SSA_NAMEs set in this statement back to the
>> -        SSA_NAME manager.  */
>> -      release_defs (stmt);
>> +      delete_dead_assignment (stmt);
>>      }
>>  }
>>
>> -class dse_dom_walker : public dom_walker
>> -{
>> -public:
>> -  dse_dom_walker (cdi_direction direction) : dom_walker (direction) {}
>> -
>> -  virtual edge before_dom_children (basic_block);
>> -};
>> -
>>  edge
>>  dse_dom_walker::before_dom_children (basic_block bb)
>>  {
>> @@ -391,7 +679,7 @@ pass_dse::execute (function *fun)
>>      }
>>
>>    BITMAP_FREE (need_eh_cleanup);
>> -
>> +
>>    /* For now, just wipe the post-dominator information.  */
>>    free_dominance_info (CDI_POST_DOMINATORS);
>>    return 0;
>>

Reply via email to