Hi Richard,

     I finally got around to getting the data you wanted. Thanks for
the response. Please
find my comments below.


On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther
<richard.guent...@gmail.com> wrote:
> On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam<tmsri...@google.com> wrote:
>> Hi,
>>
>>    Here is a patch to eliminate redundant zero-extension instructions
>> on x86_64.
>>
>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
>> that the results are the same with/without this patch.
>
> The patch misses testcases.

Added.

Why does zee run after register allocation?
> Your examples suggest that it will free hard registers so doing it before
> regalloc looks odd.

Originally, I had written this patch to have ZEE run before IRA.
However, I noticed
that IRA generates poorer code when my patch is turned on.

Here is to give an example of how badly RA can hurt . I show a piece
of code around a
zero-extend that got eliminated. The code on the right is after
eliminating zero-extends.
The code is pretty much the same except the extra move highlighted in
yellow. IRA is not
able to coalesce %esi and %r15d.

Base line :

48b760: imul   $0x9e406cb5,%r15d,%esi
48b767: mov    %rax,%rcx
48b76a: shr    $0x12,%esi
48b76d: and    %r12d,%esi
48b770: mov    %edi,%eax
48b772: add    $0x1,%edi
48b775: shr    $0x5,%eax
48b778: mov    %eax,%eax    # redundant zero extend
48b77a: lea    (%rcx,%rax,1),%rax
48b77e: cmp    %rax,%r9


-fzee :

48b7d0: imul   $0x9e406cb5,%r15d,%r15d # The destination should have
just been esi.
48b7d7: mov    %rax,%rcx
48b7da: shr    $0x12,%r15d
48b7de: mov    %r15d,%esi   # This move is useless if r15d and esi can
be coalesced into esi.
48b7e1: and    %r12d,%esi
48b7e4: mov    %edi,%eax
48b7e6: add    $0x1,%edi
48b7e9: shr    $0x5,%eax
Ok, zero-extend eliminated.
48b7ec: lea    (%rcx,%rax,1),%rax
48b7f0: cmp    %rax,%r9

Going after IRA preserves code quality and the useless extension gets removed.

>
> What is the compile-time impact of your patch on say, gcc bootstrap?
> How many percent of instructions are removed as useless zero-extensions
> during gcc bootstrap?  How much do CSiBE numbers improve?

CSiBE numbers :

Total number of zero-extension instructions before : 667.
Total number of zero-extension instructions after   : 122.
Performance : no measurable impact.

GCC bootstrap :

Total number of zero-extension instructions before  : 1456
Total number of zero-extension instructions after    :  5814
No impact on boot-strap time.


I have attached the latest patch :


On Sun, Aug 9, 2009 at 2:15 PM, Richard Guenther
<richard.guent...@gmail.com> wrote:
> On Sat, Aug 8, 2009 at 11:59 PM, Sriraman Tallam<tmsri...@google.com> wrote:
>> Hi,
>>
>>    Here is a patch to eliminate redundant zero-extension instructions
>> on x86_64.
>>
>> Tested: Ran the gcc regresssion testsuite on x86_64-linux and verified
>> that the results are the same with/without this patch.
>
> The patch misses testcases.  Why does zee run after register allocation?
> Your examples suggest that it will free hard registers so doing it before
> regalloc looks odd.
>
> What is the compile-time impact of your patch on say, gcc bootstrap?
> How many percent of instructions are removed as useless zero-extensions
> during gcc bootstrap?  How much do CSiBE numbers improve?
>
> Thanks,
> Richard.
>
>>
>> Problem Description :
>> ---------------------------------
>>
>> This pass is intended to be applicable only to targets that implicitly
>> zero-extend 64-bit registers after writing to their lower 32-bit half.
>> For instance, x86_64 zero-extends the upper bits of a register
>> implicitly whenever an instruction writes to its lower 32-bit half.
>> For example, the instruction *add edi,eax* also zero-extends the upper
>> 32-bits of rax after doing the addition.  These zero extensions come
>> for free and GCC does not always exploit this well.  That is, it has
>> been observed that there are plenty of cases where GCC explicitly
>> zero-extends registers for x86_64 that are actually useless because
>> these registers were already implicitly zero-extended in a prior
>> instruction.  This pass tries to eliminate such useless zero extension
>> instructions.
>>
>> Motivating Example I :
>> ----------------------------------
>> For this program :
>> **********************************************
>> bad_code.c
>>
>> int mask[1000];
>>
>> int foo(unsigned x)
>> {
>>  if (x < 10)
>>    x = x * 45;
>>  else
>>    x = x * 78;
>>  return mask[x];
>> }
>> **********************************************
>>
>> $ gcc -O2 bad_code.c
>>  ........
>>  400315:       b8 4e 00 00 00            mov    $0x4e,%eax
>>  40031a:       0f af f8                        imul   %eax,%edi
>>  40031d:       89 ff                             mov    %edi,%edi
>> ---> Useless zero extend.
>>  40031f:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
>>  400326:       c3                                 retq
>>  ......
>>  400330:       ba 2d 00 00 00          mov    $0x2d,%edx
>>  400335:       0f af fa                      imul   %edx,%edi
>>  400338:       89 ff                           mov    %edi,%edi  --->
>> Useless zero extend.
>>  40033a:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
>>  400341:       c3                      retq
>>
>> $ gcc -O2 -fzee bad_code.c
>>  ......
>>  400315:       6b ff 4e                imul   $0x4e,%edi,%edi
>>  400318:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
>>  40031f:       c3                      retq
>>  400320:       6b ff 2d                imul   $0x2d,%edi,%edi
>>  400323:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
>>  40032a:       c3                      retq
>>
>>
>>
>> Thanks,
>>
>> Sriraman M Tallam.
>> Google, Inc.
>> tmsri...@google.com
>>
>
Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 152036)
+++ tree-pass.h (working copy)
@@ -479,6 +479,7 @@ extern struct rtl_opt_pass pass_stack_ptr_mod;
 extern struct rtl_opt_pass pass_initialize_regs;
 extern struct rtl_opt_pass pass_combine;
 extern struct rtl_opt_pass pass_if_after_combine;
+extern struct rtl_opt_pass pass_implicit_zee;
 extern struct rtl_opt_pass pass_partition_blocks;
 extern struct rtl_opt_pass pass_match_asm_constraints;
 extern struct rtl_opt_pass pass_regmove;
Index: testsuite/gcc.target/i386/zee.c
===================================================================
--- testsuite/gcc.target/i386/zee.c     (revision 0)
+++ testsuite/gcc.target/i386/zee.c     (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -fzee -S" } */
+/* { dg-final { scan-assembler-not "mov\[\\t \]+\(%\[\^,\]+\),\[\\t \]*\\1" } 
} */
+int mask[100];
+int foo(unsigned x)
+{
+  if (x < 10)
+    x = x * 45;
+  else
+    x = x * 78;
+  return mask[x];
+}
Index: timevar.def
===================================================================
--- timevar.def (revision 152036)
+++ timevar.def (working copy)
@@ -182,6 +182,7 @@ DEFTIMEVAR (TV_RELOAD                    , "reload")
 DEFTIMEVAR (TV_RELOAD_CSE_REGS       , "reload CSE regs")
 DEFTIMEVAR (TV_SEQABSTR              , "sequence abstraction")
 DEFTIMEVAR (TV_GCSE_AFTER_RELOAD      , "load CSE after reload")
+DEFTIMEVAR (TV_ZEE                  , "zee")
 DEFTIMEVAR (TV_THREAD_PROLOGUE_AND_EPILOGUE, "thread pro- & epilogue")
 DEFTIMEVAR (TV_IFCVT2               , "if-conversion 2")
 DEFTIMEVAR (TV_COMBINE_STACK_ADJUST  , "combine stack adjustments")
Index: implicit-zee.c
===================================================================
--- implicit-zee.c      (revision 0)
+++ implicit-zee.c      (revision 0)
@@ -0,0 +1,1029 @@
+/* Redundant Zero-extension elimination for targets that implicitly
+   zero-extend writes to the lower 32-bit portion of 64-bit registers.
+   Copyright (C) 2009 Free Software Foundation, Inc.
+   Contributed by Sriraman Tallam (tmsri...@google.com) and
+                  Silvius Rus     (r...@google.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>. */
+
+
+/* Problem Description :
+  ---------------------
+
+This pass is intended to be applicable only to targets that implicitly
+zero-extend 64-bit registers after writing to their lower 32-bit half.
+For instance, x86_64 zero-extends the upper bits of a register
+implicitly whenever an instruction writes to its lower 32-bit half.
+For example, the instruction *add edi,eax* also zero-extends the upper
+32-bits of rax after doing the addition.  These zero extensions come
+for free and GCC does not always exploit this well.  That is, it has
+been observed that there are plenty of cases where GCC explicitly
+zero-extends registers for x86_64 that are actually useless because
+these registers were already implicitly zero-extended in a prior
+instruction.  This pass tries to eliminate such useless zero extension
+instructions.
+
+How does this pass work  ?
+--------------------------
+
+This pass is run after register allocation.  Hence, all registers that
+this pass deals with are hard registers.  This pass first looks for a
+zero-extension instruction that could possibly be redundant. Such zero
+extension instructions show up in RTL with the pattern :
+(set (reg:DI x) (zero_extend:DI (reg:SI x))).
+where x can be any one of the 64-bit hard registers.
+Now, this pass tries to eliminate this instruction by merging the
+zero-extension with the definitions of register x. For instance, if
+one of the definitions of register x was  :
+(set (reg:SI x) (plus:SI (reg:SI z1) (reg:SI z2))),
+then the combination converts this into :
+(set (reg:DI x) (zero_extend:DI (plus:SI (reg:SI z1) (reg:SI z2)))).
+If all the merged definitions are recognizable assembly instructions,
+the zero-extension is effectively eliminated.  For example, in x86_64,
+implicit zero-extensions are captured with appropriate patterns in the
+i386.md file.  Hence, these merged definition can be matched to a single
+assembly instruction.  The original zero-extension instruction is then
+deleted if all the definitions can be merged.
+
+However, there are cases where the definition instruction cannot be
+merged with a zero-extend.  Examples are CALL instructions.  In such
+cases, the original zero extension is not redundant and this pass does
+not delete it.
+
+Handling conditional moves :
+----------------------------
+
+Architectures like x86_64 support conditional moves whose semantics for
+zero-extension differ from the other instructions.  For instance, the
+instruction *cmov ebx, eax*
+zero-extends eax onto rax only when the move from ebx to eax happens.
+Otherwise, eax may not be zero-extended.  Conditional moves appear as
+RTL instructions of the form
+(set (reg:SI x) (if_then_else (cond) (reg:SI y) (reg:SI z))).
+This pass tries to merge a zero-extension with a conditional move by
+actually merging the defintions of y and z with a zero-extend and then
+converting the conditional move into :
+(set (reg:DI x) (if_then_else (cond) (reg:DI y) (reg:DI z))).
+Since registers y and z are zero-extended, register x will also be
+zero-extended after the conditional move.  Note that this step has to
+be done transitively since the definition of a conditional copy can be
+another conditional copy.
+
+Motivating Example I :
+---------------------
+For this program :
+**********************************************
+bad_code.c
+
+int mask[1000];
+
+int foo(unsigned x)
+{
+  if (x < 10)
+    x = x * 45;
+  else
+    x = x * 78;
+  return mask[x];
+}
+**********************************************
+
+$ gcc -O2 -fsee bad_code.c (Turned on existing sign-extension elimination.)
+  ........
+  400315:       b8 4e 00 00 00          mov    $0x4e,%eax
+  40031a:       0f af f8                imul   %eax,%edi
+  40031d:       89 ff                   mov    %edi,%edi  ---> Useless extend.
+  40031f:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
+  400326:       c3                      retq
+  ......
+  400330:       ba 2d 00 00 00          mov    $0x2d,%edx
+  400335:       0f af fa                imul   %edx,%edi
+  400338:       89 ff                   mov    %edi,%edi  ---> Useless extend.
+  40033a:       8b 04 bd 60 19 40 00    mov    0x401960(,%rdi,4),%eax
+  400341:       c3                      retq
+
+$ gcc -O2 -fzee bad_code.c
+  ......
+  400315:       6b ff 4e                imul   $0x4e,%edi,%edi
+  400318:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
+  40031f:       c3                      retq
+  400320:       6b ff 2d                imul   $0x2d,%edi,%edi
+  400323:       8b 04 bd 40 19 40 00    mov    0x401940(,%rdi,4),%eax
+  40032a:       c3                      retq
+
+Motivating Example II :
+---------------------
+
+Here is an example with a conditional move.
+
+For this program :
+**********************************************
+
+unsigned long long foo(unsigned x , unsigned y)
+{
+  unsigned z;
+  if (x > 100)
+    z = x + y;
+  else
+    z = x - y;
+  return (unsigned long long)(z);
+}
+
+$ gcc -O2 -fsee bad_code.c (Turned on existing sign-extension elimination.)
+  ............
+  400360:       8d 14 3e                lea    (%rsi,%rdi,1),%edx
+  400363:       89 f8                   mov    %edi,%eax
+  400365:       29 f0                   sub    %esi,%eax
+  400367:       83 ff 65                cmp    $0x65,%edi
+  40036a:       0f 43 c2                cmovae %edx,%eax
+  40036d:       89 c0                   mov    %eax,%eax ---> Useless extend.
+  40036f:       c3                      retq
+
+$ gcc -O2 -fzee bad_code.c
+  .............
+  400360:       89 fa                   mov    %edi,%edx
+  400362:       8d 04 3e                lea    (%rsi,%rdi,1),%eax
+  400365:       29 f2                   sub    %esi,%edx
+  400367:       83 ff 65                cmp    $0x65,%edi
+  40036a:       89 d6                   mov    %edx,%esi
+  40036c:       48 0f 42 c6             cmovb  %rsi,%rax
+  400370:       c3                      retq
+
+
+Usefulness :
+----------
+
+This pass reduces the dynamic instruction count of a compression benchmark by
+2.8% and improves its run-time by about 1%.  The compression benchmark had the
+following code sequence in a very hot region of code before ZEE optimized it :
+
+shr $0x5, %edx
+mov %edx, %edx --> Useless zero-extend.
+
+How to turn on ?
+----------------
+-fzee -O2
+
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "tm_p.h"
+#include "flags.h"
+#include "regs.h"
+#include "hard-reg-set.h"
+#include "basic-block.h"
+#include "insn-config.h"
+#include "function.h"
+#include "expr.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "real.h"
+#include "toplev.h"
+#include "target.h"
+#include "timevar.h"
+#include "optabs.h"
+#include "insn-codes.h"
+#include "rtlhooks-def.h"
+/* Include output.h for dump_file.  */
+#include "output.h"
+#include "params.h"
+#include "timevar.h"
+#include "tree-pass.h"
+#include "df.h"
+#include "cgraph.h"
+
+/* This says if a register is newly created for the purpose of
+   zero-extension. */
+
+enum insn_merge_code
+{
+  MERGE_NOT_ATTEMPTED = 0,
+  MERGE_SUCCESS
+};
+
+/* This says if a INSN UID or its definition has already been merged
+   with a zero-extend or not. */
+
+static enum insn_merge_code *is_insn_merge_attempted;
+static int max_insn_uid;
+
+/* Get and Set methods for is_insn_merge_attempted */
+
+static enum insn_merge_code
+get_insn_status (rtx insn)
+{
+  gcc_assert (INSN_UID (insn) < max_insn_uid);
+  return is_insn_merge_attempted[INSN_UID (insn)];
+}
+
+static void
+set_insn_status (rtx insn, enum insn_merge_code code)
+{
+  gcc_assert (INSN_UID (insn) < max_insn_uid);
+  is_insn_merge_attempted[INSN_UID (insn)] = code;
+}
+
+/* Check to see if this zero-extend matches a pattern
+   that could be eliminated.  This is called via
+   for_each_rtx in function find_and_remove_ze. */
+
+static int
+is_set_with_extension_DI (rtx *expr, void *data)
+{
+  /* Looking only for patterns of the type :
+     SET (REG:DI X) (ZERO_EXTEND (REG:SI x))
+   */
+
+  if (GET_CODE (*expr) == SET
+      && GET_MODE (SET_DEST (*expr)) == DImode
+      && GET_CODE (SET_DEST (*expr)) == REG)
+    {
+      if (GET_CODE (SET_SRC (*expr)) == ZERO_EXTEND
+          && GET_CODE (XEXP (SET_SRC (*expr),0)) == REG
+          && GET_MODE (XEXP (SET_SRC (*expr),0)) == SImode
+          && REGNO (SET_DEST (*expr)) == REGNO (XEXP (SET_SRC (*expr),0)))
+            {
+              *(rtx **)(data) = expr;
+              return 1;
+            }
+    }
+  return 0;
+}
+
+/* Given a insn and a pointer to the SET rtx that needs to be
+   modified, this code modifies the SET rtx to a new SET rtx
+   that zero_extends the right hand expression into a DImode
+   register on the left hand side.  Note that multiple
+   assumptions are made about the nature of the set that needs
+   to be true for this to work and is called from merge_def_and_ze.
+
+   Original :
+   (set (reg:SI a) (expression))
+
+   Transform :
+   (set (reg:DI a) (zero_extend (expression)))
+
+   Special Cases :
+   If the expression is a constant or another zero_extend directly
+   assign it to the 64-bit register.
+*/
+
+static bool
+combine_set_zero_extend (rtx curr_insn, rtx *orig_set, rtx newreg)
+{
+  rtx temp_extension, simplified_temp_extension, new_set, new_const_int;
+  rtx orig_src,orig_dest;
+  HOST_WIDE_INT val;
+
+  /* Change the SET rtx and validate it. */
+  orig_src = SET_SRC (*orig_set);
+  orig_dest = SET_DEST (*orig_set);
+  new_set = NULL_RTX;
+
+  /* The right hand side can also be VOIDmode.  These cases have to be
+     handled differently. */
+
+  if (GET_MODE (orig_src) != SImode)
+    {
+      /* Merge constants by directly moving the constant into the
+         64-bit register under some conditions. */
+
+      if (GET_CODE (orig_src) == CONST_INT)
+        {
+          if (INTVAL (orig_src) >= 0 && INTVAL (orig_src) <= 0x7fffffff)
+            {
+              new_set = gen_rtx_SET (VOIDmode, newreg, orig_src);
+            }
+          else if (INTVAL (orig_src) < 0)
+            {
+              val = INTVAL (orig_src);
+
+              /* Zero-extending a negative 32-bit integer into 64-bits makes
+                 it a positive integer.  Convert the given negative integer
+                 into the appropriate postive integer when zero-extended. */
+
+              val = 0xffffffff + val + 1;
+              gcc_assert (val > 0);
+              new_const_int = gen_rtx_CONST_INT (VOIDmode, val);
+              new_set = gen_rtx_SET (VOIDmode, newreg, new_const_int);
+            }
+          else
+            {
+              return false;
+            }
+        }
+      else
+        {
+          /* This is mostly due to a call insn that should not be
+             optimized. */
+
+          return false;
+        }
+    }
+  else if (GET_CODE (orig_src) == ZERO_EXTEND)
+    {
+      /* Here a zero-extend is used to get to SI. Why not make it
+         all the  way till DI. */
+
+      temp_extension = gen_rtx_ZERO_EXTEND (DImode, XEXP (orig_src, 0));
+      simplified_temp_extension = simplify_rtx (temp_extension);
+      if (simplified_temp_extension)
+        temp_extension = simplified_temp_extension;
+      new_set = gen_rtx_SET (VOIDmode, newreg, temp_extension);
+    }
+  else if (GET_CODE (orig_src) == IF_THEN_ELSE)
+    {
+      /* Only IF_THEN_ELSE of phi-type copies are combined. Otherwise,
+         in general, IF_THEN_ELSE should not be combined. */
+
+      return false;
+    }
+  else
+    {
+      /* This is the normal case we expect. */
+
+      temp_extension = gen_rtx_ZERO_EXTEND (DImode, orig_src);
+      simplified_temp_extension = simplify_rtx (temp_extension);
+      if (simplified_temp_extension)
+        temp_extension = simplified_temp_extension;
+      new_set = gen_rtx_SET (VOIDmode, newreg, temp_extension);
+    }
+
+  gcc_assert (new_set != NULL_RTX);
+
+  /* This change is a part of a group of changes. Hence,
+     validate_change will not try to commit the change. */
+
+  if (validate_change (curr_insn, orig_set, new_set, true))
+    {
+      if (dump_file)
+        {
+          fprintf (dump_file, "Merged Instruction with ZERO_EXTEND:\n");
+          print_rtl_single (dump_file, curr_insn);
+        }
+      return true;
+    }
+  return false;
+}
+
+/* This returns the DI mode for the SI register. */
+
+static rtx
+get_reg_di (rtx reg_si)
+{
+  rtx newreg;
+
+  newreg = gen_rtx_REG (DImode, REGNO (reg_si));
+  gcc_assert (newreg);
+  return newreg;
+}
+
+/* Treat if_then_else insns, where the operands of both branches
+   are registers, as copies. For instance,
+   Original :
+   (set (reg:SI a) (if_then_else (cond) (reg:SI b) (reg:SI c)))
+   Transformed :
+   (set (reg:DI a) (if_then_else (cond) (reg:DI b) (reg:DI c))) */
+
+static bool
+transform_ifelse (rtx def_insn)
+{
+  rtx set_insn = PATTERN (def_insn);
+  rtx srcreg, dstreg, srcreg2;
+  rtx map_srcreg, map_dstreg, map_srcreg2;
+  rtx ifexpr;
+  rtx cond;
+  rtx new_set;
+
+  gcc_assert (GET_CODE (set_insn) == SET);
+  cond = XEXP (SET_SRC (set_insn), 0);
+  dstreg = SET_DEST (set_insn);
+  srcreg = XEXP (SET_SRC (set_insn), 1);
+  srcreg2 = XEXP (SET_SRC (set_insn), 2);
+  map_srcreg = get_reg_di (srcreg);
+  map_srcreg2 = get_reg_di (srcreg2);
+  map_dstreg = get_reg_di (dstreg);
+  ifexpr = gen_rtx_IF_THEN_ELSE (DImode, cond, map_srcreg, map_srcreg2);
+  new_set = gen_rtx_SET (VOIDmode, map_dstreg, ifexpr);
+
+  if (validate_change (def_insn, &PATTERN (def_insn), new_set, true))
+    {
+      if (dump_file)
+        {
+          fprintf (dump_file, "Cond_Move Instruction's mode extended :\n");
+          print_rtl_single (dump_file, def_insn);
+        }
+      return true;
+    }
+  else
+    return false;
+}
+
+/* Function to get all the immediate definitions of an instruction.
+   The reaching definitions are desired for which_reg used in
+   curr_insn.  This function returns 0 if there was an error getting
+   a definition.  Upon success, this function returns the number of
+   definitions. */
+
+static int
+get_defs (rtx curr_insn, rtx which_reg, VEC (rtx,heap) **dest)
+{
+  df_ref reg_info, *defs;
+  struct df_link *def_chain;
+  int n_refs = 0;
+
+  defs = DF_INSN_USES (curr_insn);
+  reg_info = NULL;
+
+  while (*defs)
+    {
+      reg_info = *defs;
+      if (GET_CODE (DF_REF_REG (reg_info)) == SUBREG)
+        {
+          return 0;
+        }
+      if (REGNO (DF_REF_REG (reg_info)) == REGNO (which_reg))
+        {
+          break;
+        }
+      defs++;
+    }
+
+  gcc_assert (reg_info != NULL && defs != NULL);
+  def_chain = DF_REF_CHAIN (reg_info);
+
+  while (def_chain)
+    {
+      /* Problem getting some definition for this instruction */
+
+      if (def_chain->ref == NULL)
+        {
+          return 0;
+        }
+      if (DF_REF_INSN_INFO (def_chain->ref) == NULL)
+        {
+          return 0;
+        }
+      def_chain = def_chain->next;
+    }
+
+  def_chain = DF_REF_CHAIN (reg_info);
+
+  if (dest == NULL)
+    return 1;
+
+  while (def_chain)
+    {
+      VEC_safe_push (rtx, heap, *dest, DF_REF_INSN (def_chain->ref));
+      def_chain = def_chain->next;
+      n_refs++;
+    }
+  return n_refs;
+}
+
+/* rtx function to check if this SET insn is a conditional copy insn :
+   (set (reg:SI a ) (IF_THEN_ELSE (cond) (reg:SI b) (reg:SI c)))
+   Called from is_insn_cond_copy */
+
+static int
+is_this_a_cmove (rtx expr, void *data)
+{
+  /* Check for conditional (if-then-else) copy. */
+
+  if (GET_CODE (expr) == SET
+      && GET_CODE (SET_DEST (expr)) == REG
+      && GET_MODE (SET_DEST (expr)) == SImode
+      && GET_CODE (SET_SRC (expr))  == IF_THEN_ELSE
+      && GET_CODE (XEXP (SET_SRC (expr), 1)) == REG
+      && GET_MODE (XEXP (SET_SRC (expr), 1)) == SImode
+      && GET_CODE (XEXP (SET_SRC (expr), 2)) == REG
+      && GET_MODE (XEXP (SET_SRC (expr), 2)) == SImode)
+    {
+      ((rtx *)data)[0] = XEXP (SET_SRC (expr), 1);
+      ((rtx *)data)[1] = XEXP (SET_SRC (expr), 2);
+      return 1;
+    }
+  return 0;
+}
+
+/* This returns 1 if it found
+   (SET (reg:SI REGNO (def_reg)) (if_then_else (cond) (REG:SI x1) (REG:SI x2)))
+   in its pattern.  It stores the x1 and x2 in copy_reg_1 and copy_reg_2. */
+
+static int
+is_insn_cond_copy (rtx def_insn, rtx *copy_reg_1, rtx *copy_reg_2)
+{
+  int type;
+  rtx set_expr;
+  rtx srcreg[2];
+
+  srcreg[0] = NULL_RTX;
+  srcreg[1] = NULL_RTX;
+
+  set_expr = single_set (def_insn);
+
+  if (set_expr == NULL_RTX)
+    return 0;
+
+  type = is_this_a_cmove (set_expr, (void *) srcreg);
+
+  if (type)
+    {
+      *copy_reg_1 = srcreg[0];
+      *copy_reg_2 = srcreg[1];
+      return type;
+    }
+
+  return 0;
+}
+
+/* Reaching Definitions of the zero-extended register could be conditional
+   copies or regular definitions.  This function separates the two types into
+   two lists.  This is necessary because, if a reaching definition is a
+   conditional copy, combining the zero_extend with this definition is wrong.
+   Conditional copies are merged by transitively merging its definitions.
+   The defs_list is populated with all the reaching definitions of the
+   zero-extension instruction (zero_extend_insn) which must be merged with
+   a zero_extend.  The copies_list contains all the conditional moves that
+   will later be extended into a DI mode conditonal move if all the merges
+   are successful.  The function returns false when there is a failure in
+   getting some definitions, like that of parameters.  It returns 1 upon
+   success, 0 upon failure and 2 when all definitions of the zero_extend_insn
+   were merged previously. */
+
+static int
+make_defs_and_copies_lists (rtx zero_extend_insn, rtx set_pat,
+                            VEC (rtx,heap) **defs_list,
+                            VEC (rtx,heap) **copies_list)
+{
+  bool *is_insn_visited;
+  VEC (rtx,heap) *work_list;
+  rtx srcreg, copy_reg_1, copy_reg_2;
+  rtx def_insn;
+  int n_defs = 0;
+  int vec_index = 0;
+  int n_worklist = 0;
+  int i, is_copy;
+
+  srcreg = XEXP (SET_SRC (set_pat), 0);
+  work_list = VEC_alloc (rtx, heap, 8);
+
+  is_insn_visited = XNEWVEC (bool, max_insn_uid);
+
+  for (i = 0; i < max_insn_uid; i++)
+    is_insn_visited[i] = false;
+
+  /* Initialize the Work List */
+  n_worklist = get_defs (zero_extend_insn, srcreg, &work_list);
+
+  if (n_worklist == 0)
+    {
+      VEC_free (rtx, heap, work_list);
+      XDELETEVEC (is_insn_visited);
+      /* The number of defs being equal to zero can only imply that all of its
+         definitions have been previously merged. */
+      return 2;
+    }
+
+  /* Perform transitive closure for conditional copies. */
+  while (n_worklist > vec_index)
+    {
+      def_insn = VEC_index (rtx, work_list, vec_index);
+      gcc_assert (INSN_UID (def_insn) < max_insn_uid);
+
+      if (is_insn_visited[INSN_UID (def_insn)])
+        {
+          vec_index++;
+          continue;
+        }
+
+      is_insn_visited[INSN_UID (def_insn)] = true;
+      copy_reg_1 = copy_reg_2 = NULL_RTX;
+      is_copy = is_insn_cond_copy (def_insn, &copy_reg_1, &copy_reg_2);
+      if (is_copy)
+        {
+          gcc_assert (copy_reg_1 && copy_reg_2);
+
+          /* Push it into the copy list first. */
+
+          VEC_safe_push (rtx, heap, *copies_list, def_insn);
+
+          /* Perform transitive closure here */
+
+          n_defs = 0;
+          n_defs = get_defs (def_insn, copy_reg_1, &work_list);
+
+          if (n_defs == 0)
+            {
+              VEC_free (rtx, heap, work_list);
+              XDELETEVEC (is_insn_visited);
+              return 0;
+            }
+          n_worklist += n_defs;
+
+          n_defs = 0;
+          n_defs = get_defs (def_insn, copy_reg_2, &work_list);
+          if (n_defs == 0)
+            {
+              VEC_free (rtx, heap, work_list);
+              XDELETEVEC (is_insn_visited);
+              return 0;
+            }
+          n_worklist += n_defs;
+        }
+      else
+        {
+          VEC_safe_push (rtx, heap, *defs_list, def_insn);
+        }
+      vec_index++;
+    }
+
+  VEC_free (rtx, heap, work_list);
+  XDELETEVEC (is_insn_visited);
+  return 1;
+}
+
+/* Merge the definition with a zero-extend.  Calls combine_set_zero_extend
+   on the SET pattern. */
+
+static bool
+merge_def_and_ze (rtx def_insn)
+{
+  enum rtx_code code;
+  rtx setreg;
+  rtx *sub_rtx;
+  rtx s_expr;
+  int i;
+
+  code = GET_CODE (PATTERN (def_insn));
+  sub_rtx = NULL;
+
+  if (code == PARALLEL)
+    {
+      for (i = 0; i < XVECLEN (PATTERN (def_insn), 0); i++)
+        {
+          s_expr = XVECEXP (PATTERN (def_insn), 0, i);
+          if (GET_CODE (s_expr) != SET)
+            continue;
+
+          if (sub_rtx == NULL)
+            sub_rtx = &XVECEXP (PATTERN (def_insn), 0, i);
+          else
+            {
+              /* PARALLEL with multiple SETs. */
+              return false;
+            }
+        }
+    }
+  else if (code == SET)
+    {
+      sub_rtx = &PATTERN (def_insn);
+    }
+  else
+    {
+      /* It is not a PARALLEL or a SET, what could it be ? */
+      return false;
+    }
+
+  gcc_assert (sub_rtx != NULL);
+
+  if (GET_CODE (SET_DEST (*sub_rtx)) == REG
+      && GET_MODE (SET_DEST (*sub_rtx)) == SImode)
+    {
+      setreg = get_reg_di (SET_DEST (*sub_rtx));
+      if (!combine_set_zero_extend (def_insn, sub_rtx, setreg))
+        {
+          return false;
+        }
+      else
+        {
+          return true;
+        }
+    }
+  else
+    {
+      return false;
+    }
+  return true;
+}
+
+/* This function goes through all reaching defs of the source
+   of the zero extension instruction (zero_extend_insn) and
+   tries to combine the zero extension with the definition
+   instruction.  The changes are made as a group so that even
+   if one definition cannot be merged, all reaching definitions
+   end up not being merged. When a conditional copy is encountered,
+   merging is attempted transitively on its definitions.  It returns
+   true upon success and false upon failure. */
+
+static bool
+combine_reaching_defs (rtx zero_extend_insn, rtx set_pat)
+{
+  rtx newreg = NULL_RTX;
+  rtx srcreg, def_insn;
+  bool merge_successful = true;
+  int i;
+  int defs_ix;
+  int outcome;
+
+  /* To store the definitions that have been merged. */
+
+  VEC (rtx, heap) *defs_list, *copies_list, *vec;
+  enum insn_merge_code merge_code;
+
+  srcreg = XEXP (SET_SRC (set_pat), 0);
+  newreg = NULL_RTX;
+
+  defs_list = VEC_alloc (rtx, heap, 8);
+  copies_list = VEC_alloc (rtx, heap, 8);
+
+  outcome = make_defs_and_copies_lists (zero_extend_insn,
+                                        set_pat, &defs_list, &copies_list);
+
+  /* outcome == 2 implies that all the definitions for this zero_extend were
+     merged while previously when handling other zero_extends. */
+
+  if (outcome == 2)
+    {
+      VEC_free (rtx, heap, defs_list);
+      VEC_free (rtx, heap, copies_list);
+      if (dump_file)
+        fprintf (dump_file, "All definitions have been merged 
previously...\n");
+      return true;
+    }
+
+  if (outcome == 0)
+    {
+      VEC_free (rtx, heap, defs_list);
+      VEC_free (rtx, heap, copies_list);
+      return false;
+    }
+
+  newreg = get_reg_di (srcreg);
+  merge_successful = true;
+
+  /* Go through the defs vector and try to merge all the definitions
+     in this vector. */
+
+  vec = VEC_alloc (rtx, heap, 8);
+  for (defs_ix = 0; VEC_iterate (rtx, defs_list, defs_ix, def_insn); defs_ix++)
+    {
+      merge_code = get_insn_status (def_insn);
+      gcc_assert (merge_code == MERGE_NOT_ATTEMPTED);
+
+      if (merge_def_and_ze (def_insn))
+        {
+          VEC_safe_push (rtx, heap, vec, def_insn);
+        }
+      else
+        {
+          merge_successful = false;
+          break;
+        }
+    }
+
+  /* Now go through the conditional copies vector and try to merge all
+     the copies in this vector.*/
+
+  if (merge_successful)
+    {
+      for (i = 0; VEC_iterate (rtx, copies_list, i, def_insn); i++)
+        {
+          if (transform_ifelse (def_insn))
+            {
+              VEC_safe_push (rtx, heap, vec, def_insn);
+            }
+          else
+            {
+              merge_successful = false;
+              break;
+            }
+        }
+    }
+
+  if (merge_successful)
+    {
+      /* Commit the changes here if possible */
+      /* XXX : Now, it is an all or nothing scenario.  Even if one definition
+         cannot be merged we totally bail.  In future, allow zero-extensions to
+         be partially eliminated along those paths where the definitions could
+         be merged.  */
+
+      if (apply_change_group ())
+        {
+          if (dump_file)
+            fprintf (dump_file, "All merges were successful ....\n");
+
+          for (i = 0; VEC_iterate (rtx, vec, i, def_insn); i++)
+            {
+              set_insn_status (def_insn, MERGE_SUCCESS);
+            }
+
+          VEC_free (rtx, heap, vec);
+          VEC_free (rtx, heap, defs_list);
+          VEC_free (rtx, heap, copies_list);
+          return true;
+        }
+      else
+        {
+          /* Changes need not be cancelled explicitly as apply_change_group ()
+             does it.   Print list of definitions in the dump_file for debug
+             purposes.  This zero-extension cannot be deleted. */
+
+          if (dump_file)
+            {
+              for (i = 0; VEC_iterate (rtx, vec, i, def_insn); i++)
+                {
+                  fprintf (dump_file, " Ummergable definitions : \n");
+                  print_rtl_single (dump_file, def_insn);
+                }
+            }
+        }
+    }
+  else
+    {
+      /* Cancel any changes that have been made so far. */
+      cancel_changes (0);
+    }
+
+  VEC_free (rtx, heap, vec);
+  VEC_free (rtx, heap, defs_list);
+  VEC_free (rtx, heap, copies_list);
+  return false;
+}
+
+/* Goes through the instruction stream looking for zero-extends.  If the zero
+   extension instruction has atleast one def it adds it to a list of possible
+   candidates for deletion.  It returns the list of candidates. */
+static VEC (rtx,heap)*
+find_removable_zero_extends (void)
+{
+  VEC (rtx, heap) *zeinsn_list;
+  basic_block curr_block;
+  rtx curr_insn;
+  rtx *set_insn;
+  rtx which_reg;
+  int type ;
+  int has_defs;
+
+  zeinsn_list = VEC_alloc (rtx, heap, 8);
+  FOR_EACH_BB (curr_block)
+    {
+      FOR_BB_INSNS (curr_block, curr_insn)
+        {
+          if (!INSN_P (curr_insn))
+            continue;
+
+          type = for_each_rtx (&PATTERN (curr_insn),
+                               is_set_with_extension_DI,
+                               (void *)&set_insn);
+
+          if (!type)
+            continue;
+
+          which_reg = XEXP (SET_SRC (*set_insn), 0);
+          has_defs = get_defs (curr_insn, which_reg, NULL);
+          if (has_defs)
+            {
+              VEC_safe_push (rtx, heap, zeinsn_list, curr_insn);
+            }
+          else if (dump_file)
+            {
+              fprintf (dump_file, "Cannot eliminate zero extension : \n");
+              print_rtl_single (dump_file, curr_insn);
+              fprintf (dump_file,
+                       "This has no defs. Could be extending parameters.\n");
+            }
+        }
+    }
+  return zeinsn_list;
+}
+
+static unsigned int
+find_and_remove_ze (void)
+{
+  rtx curr_insn = NULL_RTX;
+  int i;
+  int ix;
+  long long num_realized = 0;
+  long long num_ze_opportunities = 0;
+  VEC (rtx, heap) *zeinsn_list;
+  VEC (rtx, heap) *zeinsn_del_list;
+
+  /* Construct DU chain to get all reaching definitions of each
+     zero-extension instruction. */
+
+  df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN);
+  df_analyze ();
+
+  max_insn_uid = get_max_uid ();
+
+  is_insn_merge_attempted = XNEWVEC (enum insn_merge_code,
+                                     sizeof (enum insn_merge_code)* 
max_insn_uid);
+
+  for (i = 0; i < max_insn_uid; i++)
+    {
+      is_insn_merge_attempted[i] = MERGE_NOT_ATTEMPTED;
+    }
+
+  num_ze_opportunities = num_realized = 0;
+
+  zeinsn_del_list = VEC_alloc (rtx, heap, 4);
+
+  zeinsn_list = find_removable_zero_extends ();
+
+  for (ix = 0; VEC_iterate (rtx, zeinsn_list, ix, curr_insn); ix++)
+    {
+      num_ze_opportunities++;
+      /* Try to combine the zero-extends with the definition here. */
+
+      if (dump_file)
+        {
+          fprintf (dump_file, "Trying to eliminate zero extension : \n");
+          print_rtl_single (dump_file, curr_insn);
+        }
+
+      if (combine_reaching_defs (curr_insn, PATTERN (curr_insn)))
+        {
+          if (dump_file)
+            fprintf (dump_file, "Eliminated the zero extension...\n");
+          num_realized++;
+          VEC_safe_push (rtx, heap, zeinsn_del_list, curr_insn);
+        }
+    }
+
+  /* Delete all useless zero extensions here in one sweep. */
+  for (ix = 0; VEC_iterate (rtx, zeinsn_del_list, ix, curr_insn); ix++)
+    {
+      delete_insn (curr_insn);
+    }
+
+  free (is_insn_merge_attempted);
+  VEC_free (rtx, heap, zeinsn_list);
+  VEC_free (rtx, heap, zeinsn_del_list);
+
+  if (dump_file && num_ze_opportunities > 0)
+    fprintf (dump_file, "\n %s : num_zee_opportunities = %lld "
+                        "num_realized = %lld \n",
+                        current_function_name (),
+                        num_ze_opportunities, num_realized);
+
+  df_finish_pass (false);
+  return 0;
+}
+
+static unsigned int
+rest_of_handle_zee (void)
+{
+  timevar_push (TV_ZEE);
+  find_and_remove_ze ();
+  timevar_pop (TV_ZEE);
+  return 0;
+}
+
+static bool
+gate_handle_zee (void)
+{
+  return (optimize > 0 && flag_zee);
+}
+
+struct rtl_opt_pass pass_implicit_zee =
+{
+ {
+  RTL_PASS,
+  "zee",                                /* name */
+  gate_handle_zee,                      /* gate */
+  rest_of_handle_zee,                   /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_ZEE,                               /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_ggc_collect |
+  TODO_dump_func |
+  TODO_verify_rtl_sharing,              /* todo_flags_finish */
+ }
+};
+
+
Index: common.opt
===================================================================
--- common.opt  (revision 152036)
+++ common.opt  (working copy)
@@ -1099,6 +1099,10 @@ fsee
 Common
 Does nothing.  Preserved for backward compatibility.
 
+fzee
+Common Report Var(flag_zee) Init(0)
+Eliminate redundant zero extensions on targets that support implicit 
extensions.
+
 fshow-column
 Common C ObjC C++ ObjC++ Report Var(flag_show_column) Init(1)
 Show column numbers in diagnostics, when available.  Default on
Index: Makefile.in
===================================================================
--- Makefile.in (revision 152036)
+++ Makefile.in (working copy)
@@ -1190,6 +1190,7 @@ OBJS-common = \
        haifa-sched.o \
        hooks.o \
        ifcvt.o \
+       implicit-zee.o \
        init-regs.o \
        integrate.o \
        intl.o \
@@ -2857,6 +2858,11 @@ fwprop.o : fwprop.c $(CONFIG_H) $(SYSTEM_H) corety
 web.o : web.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h 
$(TOPLEV_H) \
    $(DF_H) $(OBSTACK_H) $(TIMEVAR_H) $(TREE_PASS_H)
+implicit-zee.o : implicit-zee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) 
$(RTL_H) \
+   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
+   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
+   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(REAL_H) 
$(TOPLEV_H) \
+   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
 gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) $(REAL_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h $(TOPLEV_H) \
Index: passes.c
===================================================================
--- passes.c    (revision 152036)
+++ passes.c    (working copy)
@@ -790,6 +790,7 @@ init_optimization_passes (void)
          NEXT_PASS (pass_postreload_cse);
          NEXT_PASS (pass_gcse2);
          NEXT_PASS (pass_split_after_reload);
+         NEXT_PASS (pass_implicit_zee);
          NEXT_PASS (pass_branch_target_load_optimize1);
          NEXT_PASS (pass_thread_prologue_and_epilogue);
          NEXT_PASS (pass_rtl_dse2);
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 152036)
+++ config/i386/i386.c  (working copy)
@@ -4214,6 +4214,10 @@ optimization_options (int level, int size ATTRIBUT
     flag_schedule_insns = 0;
 #endif
 
+/* For -O2 and beyond, turn on -fzee for x86_64 target. */
+  if (level > 1 && TARGET_64BIT)
+    flag_zee = 1;
+
   if (TARGET_MACHO)
     /* The Darwin libraries never set errno, so we might as well
        avoid calling them when that's the only reason we would.  */

Reply via email to