On 11/12/2014 01:34 PM, Jakub Jelinek wrote:
On Wed, Nov 12, 2014 at 12:18:31PM +0300, Yury Gribov wrote:
--- gcc/sanopt.c.jj     2014-11-11 09:13:36.698280115 +0100
+++ gcc/sanopt.c        2014-11-11 18:07:17.913539517 +0100
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.
  #include "langhooks.h"
  #include "ubsan.h"
  #include "params.h"
+#include "tree-ssa-operands.h"


  /* This is used to carry information about basic blocks.  It is
@@ -56,7 +57,29 @@ along with GCC; see the file COPYING3.

  struct sanopt_info
  {
-  /* True if this BB has been visited.  */
+  /* True if this BB might call (directly or indirectly) free/munmap
+     or similar operation.  */

Frankly I think this is more about memory poisoned status then free.

For ASAN about anything that might make ASAN_CHECK non-redundant, that
certainly is free as the most common thing (bet munmap doesn't poison
the memory being unmapped) and also the asan_*poison* calls.

It's all about names, "freeing" is just subset of what really goes on ("poisoning"). We can keep current "freeing"/"nonfreeing" stuff but IMHO it doesn't properly express what we really bother about.

@@ -69,11 +92,307 @@ struct sanopt_ctx
       a vector of UBSAN_NULL call statements that check this pointer.  */
    hash_map<tree, auto_vec<gimple> > null_check_map;

+  /* This map maps a pointer (the second argument of ASAN_CHECK) to
+     a vector of ASAN_CHECK call statements that check the access.  */
+  hash_map<tree, auto_vec<gimple> > asan_check_map;

How about using traits class like the one below for both maps?

Well, for null_check_map, it is only SSA_NAMEs that matter (perhaps the
code doesn't return early if it is not, maybe should), address of
decls is always non-NULL and alignment should be known too.
Or, Marek, can you see if we can get there e.g. decls for alignments,
extern char a[];
long long int
foo (void)
{
   *(long long int *) &a[0] = 5;
}
?

See below for maybe_get_single_definition discussion.

struct tree_map_traits : default_hashmap_traits
{
   static inline hashval_t hash (const_tree ref)
     {
       return iterative_hash_expr (ref, 0);
     }

   static inline bool equal_keys (const_tree ref1, const_tree ref2)
     {
       return operand_equal_p (ref1, ref2, 0);
     }
};

Also the hash_map probably deserves a typedef.

For asan you're right, we can have addresses of decls there etc.
If you have spare cycles, feel free to take over the patch and adjust it.

I guess I'd wait when this gets to trunk?

+      tree glen = gimple_call_arg (g, 2);
+      /* If glen is not integer, we'd have added it to the vector only if
+        ASAN_CHECK_NON_ZERO_LEN flag is set, so treat it as length 1.  */

Frankly I don't think we use ASAN_CHECK_NON_ZERO_LEN anymore (it's only set
for trivial cases now).  Perhaps we should just nuke it from asan.c and
sanopt.c alltogether?

I thought that for the builtins libasan doesn't instrument (which includes
very often used functions like __memcpy_chk etc.) we still use it.

We could only emit non-trivial ASAN_CHECK_NON_ZERO_LEN in rare cases e.g. checking buffer length _after_ strlen call but I don't think we have anything like this left.

+  bool asan_check_optimize
+    = (flag_sanitize & SANITIZE_ADDRESS)
+      && ((flag_sanitize & flag_sanitize_recover
+          & SANITIZE_KERNEL_ADDRESS) == 0);

Why do we disable check optimizations for KASan?

Only for -fno-sanitize-recover=kernel-address too.  The thing is,
if you do recover from failed asan checks, supposedly you want to
see all errors reported, not just the first one.

Hm, that's questionable. The error is already reported so why bother user with duplicates (also hurting performance)?

          case IFN_ASAN_CHECK:
-           ctx->asan_num_accesses++;
+           if (asan_check_optimize)
+             remove = maybe_optimize_asan_check_ifn (ctx, stmt);

It may be useful to also store base address in check-table:

static tree
maybe_get_single_definition (tree t)
{
   if (TREE_CODE (t) == SSA_NAME)
     {
       gimple g = SSA_NAME_DEF_STMT (t);
       if (gimple_assign_single_p (g))
         return gimple_assign_rhs1 (g);
     }
   return NULL_TREE;
}

Why?  forwprop etc. should have propagated it into the ASAN_CHECK if
it is is_gimple_val.  Or do you have specific examples which you have in
mind?

Yes, non-gimple cases (struct field addresses, etc.) are not propagated but still seem ok to optimize. So for each SSA name we'll store both it and it's base definition in table. This way different SSA with same base can optimize each other out.

-Y

Reply via email to