First round of comments. I think we should add this to google/main. It's in sufficiently good shape for it. You can keep improving it in the branch.
It is now too late for 4.7's stage 1, so I think a reasonable way to proceed is to keep it in google/main and then present it for trunk inclusion when stage 1 opens for 4.8. http://codereview.appspot.com/5272048/diff/30001/toplev.c File toplev.c (right): http://codereview.appspot.com/5272048/diff/30001/toplev.c#newcode621 toplev.c:621: asan_finish_file(); + /* File-scope initialization for AddressSanitizer. */ Two spaces before '*/' + if (flag_asan) + asan_finish_file(); Space before '()' http://codereview.appspot.com/5272048/diff/30001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode80 tree-asan.c:80: (All I need is to traverse *all* memory accesses and instrument them). + Implementation details: + This is my first code in gcc. I wrote it by copying tree-mudflap.c, + stripping 70% of irrelevant code and modifying the instrumentation routine + build_check_stmt. The code seems to work, but I don't feel I understand it. + In particular, transform_derefs() and transform_statements() seem too complex. + Suggestions are welcome on how to simplify them. + (All I need is to traverse *all* memory accesses and instrument them). No need to have this in the code. These details usually go in the mail message you submit your patch with. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode140 tree-asan.c:140: 'size' is one of 1, 2, 4, 8, 16. */ +/* Instrument the memory access instruction 'base'. + Insert new statements before 'instr_gsi'. + 'location' is source code location. + 'is_store' is either 1 (for a store) or 0 (for a load). + 'size' is one of 1, 2, 4, 8, 16. */ When documenting arguments, you should refer to the arguments in CAPITALS instead of 'quoted'. The comment needs to be formatted so the lines below the first line are indented 3 spaces. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode155 tree-asan.c:155: tree shadow_type = size > 8 ? short_integer_type_node : char_type_node; + tree shadow_type = size > 8 ? short_integer_type_node : char_type_node; s/8/BITS_PER_UNIT/ http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode219 tree-asan.c:219: build1 (VIEW_CONVERT_EXPR, shadow_ptr_type, t)); + t = build2 (RSHIFT_EXPR, uintptr_type, base_addr, + build_int_cst (uintptr_type, asan_scale)); + t = build2 (PLUS_EXPR, uintptr_type, t, + build2 (LSHIFT_EXPR, uintptr_type, + build_int_cst (uintptr_type, 1), + build_int_cst (uintptr_type, asan_offset_log) + )); + t = build1 (INDIRECT_REF, shadow_type, + build1 (VIEW_CONVERT_EXPR, shadow_ptr_type, t)); It helps if this adds documentation on what expressions it's building. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode250 tree-asan.c:250: gimple_seq_add_stmt (&seq, g); [ ... ] + g = gimple_build_assign (cond, t); + gimple_set_location (g, location); + gimple_seq_add_stmt (&seq, g); + g = gimple_build_cond (NE_EXPR, cond, boolean_false_node, NULL_TREE, + NULL_TREE); + gimple_set_location (g, location); + gimple_seq_add_stmt (&seq, g); So, instead of open coding all the access checks, how about we created a new GIMPLE instruction? This instruction would take the same parameters, and have the semantics of the check but it would be optimizable. You could for instance make the instruction produce a pointer SSA name that is then used by the memory reference. With this, you can then allow the optimizers do their thing. These instructions could be moved around, eliminated, coalesced. Resulting in a reduced number of checks. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode251 tree-asan.c:251: /* debug_gimple_seq (seq); */ + /* debug_gimple_seq (seq); */ If you want the pass to dump debugging data, use the dump support. See other passes for examples (look for 'if (dump_file)' snippets)'. When -fdump-tree-asan is passed to the compiler, the pass manager will instantiate a 'dump_file' that you can use to emit debug info. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode253 tree-asan.c:253: /* crash */ + /* crash */ ??? http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode272 tree-asan.c:272: location_t location, int is_store) +static void +transform_derefs (gimple_stmt_iterator *iter, tree *tp, + location_t location, int is_store) Needs documentation. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode326 tree-asan.c:326: bb = ENTRY_BLOCK_PTR ->next_bb; + bb = ENTRY_BLOCK_PTR ->next_bb; No space before '->'. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode327 tree-asan.c:327: do FOR_EACH_BB (bb) http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode335 tree-asan.c:335: if (gimple_code (s) == GIMPLE_ASSIGN) + + /* Only a few GIMPLE statements can reference memory. */ + if (gimple_code (s) == GIMPLE_ASSIGN) You could also simply ask if the statement has memory references, though this would get you GIMPLE_CALLs as well that you generally don't want to deal with. Other than that, only GIMPLE_ASSIGN and GIMPLE_ASM may reference memory. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode349 tree-asan.c:349: while (bb && bb->index <= saved_last_basic_block); + bb = next; + } + while (bb && bb->index <= saved_last_basic_block); Not needed. FOR_EACH_BB does what you want. http://codereview.appspot.com/5272048/