On Mon, 11 Nov 2013, Richard Biener wrote:
On Sun, Nov 10, 2013 at 4:27 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
Hello,
I am posting this patch to get some feedback on the approach. The goal is to
replace malloc+free with a stack allocation (a decl actually) when the size
is a small constant.
For testing, I highjacked the "leaf" attribute, but it isn't right, I'll
remove it from the list (not sure what I'll do for the testcases then). What
I'd want instead is a "returns" attribute that means the function will
return (either by "return" or an exception), as opposed to having an
infinite loop, calling exit or longjmp, etc (sched-deps.c has a related
call_may_noreturn_p). The situation I am trying to avoid is:
p=malloc(12);
f(p)
free(p)
where f contains somewhere:
free(p); exit(42);
(or longjmp or something else that takes the regular call to free out of the
execution flow).
Instead of a new attribute IPA should be used to compute this call-graph
property.
Could be both, no? Any idea where I could store this information? Space is
so short that pure and const don't even have their bits in the same place.
Then there is the question of what properties should be looked at exactly:
- may leave the function via longjmp
- must leave the function via return or an exception
- may free memory it didn't allocate itself
- ?
An infinite loop wouldn't be an issue, but the real issue is
that it shouldn't free the object which would be only valid if the free
after the call isn't reached.
So the infinite loop is an issue exactly in the same way calling exit is
an issue, right?
IPA pure-const is used to compute similar properties (may loop,
may throw).
You also have to make sure to properly align the replacement.
Thanks, for some reason I lost this line when I copied from
fold_builtin_alloca_with_align.
The size above which the malloc->stack transformation is not applied should
depend on a parameter, I don't know if it should get its own or depend on an
existing one. In any case, I'd like to avoid ending up with a ridiculously
low threshold (my programs use GMP, which internally uses alloca up to 65536
bytes (even in recursive functions that have a dozen allocations), so I
don't want gcc to tell me that 50 bytes are too much).
A program with a double-free may, with this patch, end up crashing on the
first free instead of the second, but that's an invalid program anyway.
I don't know if tree-ssa-forwprop is a good place for it, but it was
convenient for a prototype. I like the idea of running it several times:
replacing malloc with a decl early can help other optimizations, and other
optimizations can make this one possible later.
We have a similar transform in CCP (fold_builtin_alloca_with_align) which
is there because CCP is a good place where size arguments may
become constant (the case you handle - you don't seem to handle
variable malloc -> alloca transform which would be possible if for example
VRP figures out a acceptable upper bound for the size).
I added something for the case where VRP gives an upper bound on the size,
but I am still using a DECL and not alloca. Using alloca has some
drawbacks: it shouldn't be done inside a loop (unless we know the memory
is deallocated in the same iteration and we can save/restore the stack à
la VLA, but that gets complicated), it should be tried only in late passes
not to hinder inlining, etc. I guess I could try, but I fear it will be
quite rare to have suitable bounds on the size of allocations.
I am currently having trouble with removing the free calls. I am using
gsi_replace to put a clobber instead (is that useful?) but I also tried
gsi_remove, and adding various release_defs, unlink_stmt_vdef, update_stmt
without success. When I compile heapstack-1.c, the sink pass crashes
because it looks at the uses of p=&array and ends up on __builtin_free(p),
which was removed several passes earlier and doesn't have a bb anymore.
What is the right way to get rid of those statements?Index: testsuite/g++.dg/tree-ssa/heapstack-1.C
===================================================================
--- testsuite/g++.dg/tree-ssa/heapstack-1.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/heapstack-1.C (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+struct A {
+ void*p;
+ A(void*q) : p(q) {}
+ ~A(){ __builtin_free(p); }
+};
+void f(void*)__attribute__((__leaf__));
+void h(void*)__attribute__((__leaf__,__nothrow__));
+void g(){
+ void*p=__builtin_malloc(12);
+ A a(p);
+ f(p);
+}
+
+void i(){
+ void*p=__builtin_malloc(12);
+ h(p);
+ __builtin_free(p);
+}
+
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Property changes on: testsuite/g++.dg/tree-ssa/heapstack-1.C
___________________________________________________________________
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Index: testsuite/g++.dg/tree-ssa/heapstack-2.C
===================================================================
--- testsuite/g++.dg/tree-ssa/heapstack-2.C (revision 0)
+++ testsuite/g++.dg/tree-ssa/heapstack-2.C (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void f(void*)__attribute__((__leaf__));
+void g(){
+ void*p=__builtin_malloc(12);
+ f(p);
+ __builtin_free(p);
+}
+
+/* { dg-final { scan-tree-dump-times "malloc" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Property changes on: testsuite/g++.dg/tree-ssa/heapstack-2.C
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: testsuite/gcc.dg/tree-ssa/heapstack-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/heapstack-1.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/heapstack-1.c (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void* f(void*,int)__attribute__((__pure__));
+void * volatile q;
+void g(int m,int n,int s){
+ int i;
+ if (s<0||s>3)__builtin_unreachable();
+ void*p=__builtin_calloc(s,4);
+ switch(n){
+ case 1:
+ for (i=0; i<m; ++i)
+ q=f(p,i);
+ break;
+ case 2:
+ if (p)
+ __builtin_free(p);
+ return;
+ case 3:
+ __builtin_exit(42);
+ case 4:
+ __builtin_unreachable();
+ default:;
+ }
+ __builtin_free(p);
+}
+
+/* { dg-final { scan-tree-dump-not "malloc" "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Property changes on: testsuite/gcc.dg/tree-ssa/heapstack-1.c
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: testsuite/gcc.dg/tree-ssa/heapstack-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/heapstack-2.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/heapstack-2.c (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+void f(void*);
+void g(){
+ void*p=__builtin_malloc(12);
+ f(p);
+ __builtin_free(p);
+}
+
+/* { dg-final { scan-tree-dump-times "malloc" 1 "optimized" } } */
+/* { dg-final { cleanup-tree-dump "optimized" } } */
Property changes on: testsuite/gcc.dg/tree-ssa/heapstack-2.c
___________________________________________________________________
Added: svn:keywords
## -0,0 +1 ##
+Author Date Id Revision URL
\ No newline at end of property
Added: svn:eol-style
## -0,0 +1 ##
+native
\ No newline at end of property
Index: testsuite/gcc.dg/tree-ssa/pr19831-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/pr19831-3.c (revision 204685)
+++ testsuite/gcc.dg/tree-ssa/pr19831-3.c (working copy)
@@ -1,31 +1,31 @@
/* { dg-do compile } */
/* { dg-options "-O2 -fdump-tree-optimized" } */
-void test2(void)
+void test2(int n)
{
- int *p = __builtin_malloc (sizeof (int) * 4);
+ int *p = __builtin_malloc (sizeof (int) * n);
if (p == (void *)0)
__builtin_abort ();
__builtin_free (p);
}
-void test5(int b)
+void test5(int n)
{
- int *p = __builtin_malloc (sizeof (int) * 4);
+ int *p = __builtin_malloc (sizeof (int) * n);
if (p)
__builtin_free (p);
}
-void test6(void)
+void test6(int n)
{
- int *p = __builtin_malloc (sizeof (int) * 4);
+ int *p = __builtin_malloc (sizeof (int) * n);
if (p == (void *)0)
__builtin_abort ();
if (p)
__builtin_free (p);
}
/* We should be able to remove all malloc/free pairs with CDDCE.
Assume p was non-NULL for test2.
For test5, it doesn't matter if p is NULL or non-NULL. */
Index: tree-ssa-ccp.c
===================================================================
--- tree-ssa-ccp.c (revision 204685)
+++ tree-ssa-ccp.c (working copy)
@@ -127,20 +127,21 @@ along with GCC; see the file COPYING3.
#include "tree-ssanames.h"
#include "tree-pass.h"
#include "tree-ssa-propagate.h"
#include "value-prof.h"
#include "langhooks.h"
#include "target.h"
#include "diagnostic-core.h"
#include "dbgcnt.h"
#include "params.h"
#include "hash-table.h"
+#include "bitmap.h"
/* Possible lattice values. */
typedef enum
{
UNINITIALIZED,
UNDEFINED,
CONSTANT,
VARYING
} ccp_lattice_t;
@@ -161,20 +162,24 @@ typedef struct prop_value_d prop_value_t
/* Array of propagated constant values. After propagation,
CONST_VAL[I].VALUE holds the constant value for SSA_NAME(I). If
the constant is held in an SSA name representing a memory store
(i.e., a VDEF), CONST_VAL[I].MEM_REF will contain the actual
memory reference used to store (i.e., the LHS of the assignment
doing the store). */
static prop_value_t *const_val;
static unsigned n_const_val;
+/* Avoid looking at the same malloc call twice if there are several
+ matching free. The bitmap stores the SSA_NAME of the variable. */
+static bitmap malloc_studied;
+
static void canonicalize_value (prop_value_t *);
static bool ccp_fold_stmt (gimple_stmt_iterator *);
/* Dump constant propagation value VAL to file OUTF prefixed by PREFIX. */
static void
dump_lattice_value (FILE *outf, const char *prefix, prop_value_t val)
{
switch (val.lattice_val)
{
@@ -805,20 +810,22 @@ ccp_initialize (void)
for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
{
gimple phi = gsi_stmt (i);
if (virtual_operand_p (gimple_phi_result (phi)))
prop_set_simulate_again (phi, false);
else
prop_set_simulate_again (phi, true);
}
}
+
+ malloc_studied = BITMAP_ALLOC (NULL);
}
/* Debug count support. Reset the values of ssa names
VARYING when the total number ssa names analyzed is
beyond the debug count specified. */
static void
do_dbg_cnt (void)
{
unsigned i;
@@ -886,20 +893,21 @@ ccp_finalize (void)
nonzero_bits = nonzero_bits | tree_to_double_int (val->value);
nonzero_bits &= get_nonzero_bits (name);
set_nonzero_bits (name, nonzero_bits);
}
}
/* Perform substitutions based on the known constant values. */
something_changed = substitute_and_fold (get_constant_value,
ccp_fold_stmt, true);
+ BITMAP_FREE (malloc_studied);
free (const_val);
const_val = NULL;
return something_changed;;
}
/* Compute the meet operator between *VAL1 and *VAL2. Store the result
in VAL1.
any M UNDEFINED = any
@@ -1910,20 +1918,161 @@ fold_builtin_alloca_with_align (gimple s
singleton_p = pt_solution_singleton_p (&pi->pt, &uid);
gcc_assert (singleton_p);
SET_DECL_PT_UID (var, uid);
}
}
/* Fold alloca to the address of the array. */
return fold_convert (TREE_TYPE (lhs), build_fold_addr_expr (var));
}
+/* Returns a constant upper bound if possible, or the variable if not. */
+static tree
+get_upper_bound (tree var)
+{
+ tree cst = get_constant_value (var);
+ if (cst)
+ return cst;
+ double_int min, max;
+ if (TREE_CODE (var) != SSA_NAME
+ || get_range_info (var, &min, &max) != VR_RANGE)
+ return var;
+ return double_int_to_tree (TREE_TYPE (var), max);
+}
+
+/* We want to be sure that free (VAR) can't be called in another function, and
+ the easiest way to ensure that is to prove that it is called in this
+ function. The hardest part is avoiding some call to a function that may not
+ return because of exit, an infinite loop, setjmp, etc, where it might call
+ free on VAR. */
+static bool
+malloc_safe_on_stack (gimple stmt, vec<gimple, va_heap> *list_of_frees)
+{
+ tree var = gimple_call_lhs (stmt);
+ if (!bitmap_set_bit (malloc_studied, SSA_NAME_VERSION (var)))
+ return false;
+ basic_block bb = gimple_bb (stmt);
+ stack_vec<basic_block, 20> bb_to_visit;
+ bitmap bb_visited = BITMAP_ALLOC (NULL);
+ bitmap_set_bit (bb_visited, bb->index);
+ gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+ enum tree_code code;
+
+next_stmt:
+ gsi_next_nondebug (&gsi);
+
+handle_stmt:
+ if (gsi_end_p (gsi))
+ {
+ edge e;
+ edge_iterator ei;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+ bb_to_visit.safe_push (e->dest);
+ goto next_bb;
+ }
+ stmt = gsi_stmt (gsi);
+ if (stmt_can_throw_external (stmt))
+ /* We could give up only if VAR has escaped (same for return), but that
+ means there is a memory leak, so don't bother. */
+ goto unsafe;
+
+ switch (gimple_code (stmt))
+ // TODO: GIMPLE_ASM, EH related gimples?
+ {
+ /* We leave the function without calling free. */
+ case GIMPLE_RETURN:
+ goto unsafe;
+
+ case GIMPLE_COND:
+ code = gimple_cond_code (stmt);
+ /* If we replace malloc by an array on the stack, it can't be NULL. */
+ if ((code == EQ_EXPR || code == NE_EXPR)
+ && var == gimple_cond_lhs (stmt)
+ && integer_zerop (gimple_cond_rhs (stmt)))
+ {
+ edge e;
+ edge_iterator ei;
+ FOR_EACH_EDGE (e, ei, bb->succs)
+ if (e->flags
+ & ((code == NE_EXPR) ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE))
+ bb_to_visit.safe_push (e->dest);
+ goto next_bb;
+ }
+ /* Fallthrough. */
+
+ /* Last stmt of the bb, handled by looking at the outgoing edges. */
+ case GIMPLE_GOTO:
+ case GIMPLE_SWITCH:
+ /* Statements that are irrelevant. */
+ case GIMPLE_ASSIGN:
+ case GIMPLE_LABEL:
+ case GIMPLE_NOP:
+ goto next_stmt;
+
+ case GIMPLE_CALL:
+ {
+ tree callee = gimple_call_fndecl (stmt);
+ if (callee != NULL_TREE
+ && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL)
+ switch (DECL_FUNCTION_CODE (callee))
+ {
+ case BUILT_IN_FREE:
+ if (gimple_call_arg (stmt, 0) != var)
+ goto next_stmt;
+ list_of_frees->safe_push (stmt);
+ /* Fallthrough. */
+
+ case BUILT_IN_ABORT:
+ case BUILT_IN_EXIT:
+ case BUILT_IN__EXIT:
+ case BUILT_IN__EXIT2:
+ case BUILT_IN_TRAP:
+ case BUILT_IN_UNREACHABLE:
+ /* Nothing could throw an exception earlier in the block,
+ so we don't need to check the EH edges. */
+ goto next_bb;
+ default:;
+ }
+
+#if 0
+ // For Joost
+ goto next_stmt;
+#endif
+ int flags = gimple_call_flags (stmt);
+ // FIXME: leaf is actually not safe, we need some new ECF_* flags.
+ // ??? In fortran any call is safe?
+ if (flags & (ECF_CONST|ECF_PURE|ECF_LEAF|ECF_NOVOPS))
+ goto next_stmt;
+ goto unsafe;
+ }
+ default:
+ goto unsafe;
+ }
+
+next_bb:
+ if (bb_to_visit.is_empty ())
+ goto safe;
+ bb = bb_to_visit.last ();
+ bb_to_visit.pop ();
+ if (!bitmap_set_bit (bb_visited, bb->index))
+ goto next_bb;
+ gsi = gsi_start_nondebug_bb (bb);
+ goto handle_stmt;
+
+unsafe:
+ BITMAP_FREE (bb_visited);
+ return false;
+safe:
+ BITMAP_FREE (bb_visited);
+ return true;
+}
+
/* Fold the stmt at *GSI with CCP specific information that propagating
and regular folding does not catch. */
static bool
ccp_fold_stmt (gimple_stmt_iterator *gsi)
{
gimple stmt = gsi_stmt (*gsi);
switch (gimple_code (stmt))
{
@@ -1998,20 +2147,90 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi
if (new_rhs)
{
bool res = update_call_from_tree (gsi, new_rhs);
tree var = TREE_OPERAND (TREE_OPERAND (new_rhs, 0),0);
gcc_assert (res);
insert_clobbers_for_var (*gsi, var);
return true;
}
}
+ /* Replace malloc+free with a stack allocation. */
+ if (gimple_call_builtin_p (stmt, BUILT_IN_FREE))
+ {
+ bool is_calloc = false;
+ tree ptr = gimple_call_arg (stmt, 0);
+ if (TREE_CODE (ptr) != SSA_NAME)
+ return false;
+ gimple stmt1 = SSA_NAME_DEF_STMT (ptr);
+ if ((is_calloc = gimple_call_builtin_p (stmt1, BUILT_IN_CALLOC))
+ || gimple_call_builtin_p (stmt1, BUILT_IN_MALLOC))
+ {
+ gcc_checking_assert (gimple_call_lhs (stmt1) == ptr);
+ tree size = gimple_call_arg (stmt1, 0);
+ size = get_upper_bound (size);
+ if (is_calloc)
+ {
+ tree siz2 = gimple_call_arg (stmt1, 1);
+ siz2 = get_upper_bound (siz2);
+ size = fold_binary (MULT_EXPR, size_type_node, size, siz2);
+ }
+ if (!size || !host_integerp (size, 1)
+ /* param_value(...) / 10? Some new parameter? */
+ || TREE_INT_CST_LOW (size)
+ > (unsigned HOST_WIDE_INT)
+ PARAM_VALUE (PARAM_LARGE_STACK_FRAME))
+ return false;
+ stack_vec<gimple, 20> list_of_frees;
+ if (!malloc_safe_on_stack (stmt1, &list_of_frees))
+ return false;
+ /* Declare array. */
+ tree elem_type =
+ build_nonstandard_integer_type (BITS_PER_UNIT, 1);
+ tree array_type =
+ build_array_type_nelts (elem_type, TREE_INT_CST_LOW (size));
+ tree var = create_tmp_var (array_type, NULL);
+ DECL_ALIGN (var) = BIGGEST_ALIGNMENT;
+ tree p = fold_convert (TREE_TYPE (ptr),
+ build_fold_addr_expr (var));
+ /* Replace free with a clobber. */
+ int i;
+ gimple free_stmt;
+ FOR_EACH_VEC_ELT (list_of_frees, i, free_stmt)
+ {
+#if 0
+ unlink_stmt_vdef (free_stmt);
+ gimple_stmt_iterator gsi_free = gsi_for_stmt (free_stmt);
+ gsi_remove (&gsi_free, true);
+ release_defs (free_stmt);
+#else
+ tree clobber = build_constructor (array_type, NULL);
+ TREE_THIS_VOLATILE (clobber) = 1;
+ gimple clobber_stmt = gimple_build_assign (var, clobber);
+ gimple_stmt_iterator gsi_free = gsi_for_stmt (free_stmt);
+ gsi_replace (&gsi_free, clobber_stmt, false);
+#endif
+ }
+ /* Replace malloc with the address of the variable. */
+ gimple_stmt_iterator gsi1 = gsi_for_stmt (stmt1);
+ update_call_from_tree (&gsi1, p);
+ if (is_calloc)
+ {
+ tree memset_decl = builtin_decl_explicit (BUILT_IN_MEMSET);
+ gimple zero = gimple_build_call (memset_decl, 3, ptr,
+ integer_zero_node, size);
+ gsi_insert_after (&gsi1, zero, GSI_SAME_STMT);
+ }
+ return true;
+ }
+ }
+
/* Propagate into the call arguments. Compared to replace_uses_in
this can use the argument slot types for type verification
instead of the current argument type. We also can safely
drop qualifiers here as we are dealing with constants anyway. */
argt = TYPE_ARG_TYPES (gimple_call_fntype (stmt));
for (i = 0; i < gimple_call_num_args (stmt) && argt;
++i, argt = TREE_CHAIN (argt))
{
tree arg = gimple_call_arg (stmt, i);
if (TREE_CODE (arg) == SSA_NAME