On Thu, May 18, 2017 at 01:22:02PM +0200, Martin Liška wrote: > On 05/16/2017 09:14 PM, David Malcolm wrote: > > On Mon, 2017-05-15 at 15:36 +0200, Martin Liška wrote: > >> Hi. > >> > >> I sent this email to David some time ago, but it should be probably > >> answered > >> on gcc mailing list. > > > >> I have idea one to improve gcov tool and I'm interested in more > >> precise locations for gimple > >> statements. For gcov purpose, we dump location in ipa-profile pass, > >> which is an early IPA > >> pass and this data is used by gcov tool to map statements (blocks) to > >> lines of code. > >> > >> I did a small experiment on the place we emit the location data: > >> inform (gimple_location (stmt), "output_location"); > >> > >> and it shows for: > >> $ cat m2.c > >> unsigned int > >> UuT (void) > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned int > >> ret = 0; if (true_var) /* count(1) */ { if (false_var) /* count(1) */ > >> ret = 111; /* count(#####) */ } else ret = 999; /* count(#####) */ > >> return ret; } > >> > >> int > >> main (int argc, char **argv) > >> { > >> UuT (); > >> return 0; > >> } > >> > >> $ gcc --coverage m2.c > >> m2.c: In function ‘main’: > >> m2.c:8:3: note: output_location > >> UuT (); > >> ^~~~~~ > >> # .MEM_2 = VDEF <.MEM_1(D)> > >> UuT (); > >> m2.c:9:10: note: output_location > >> return 0; > >> ^ > >> _3 = 0; > >> m2.c: In function ‘UuT’: > >> m2.c:3:16: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> ^~~~~~~~ > >> true_var_3 = 1; > >> m2.c:3:43: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> ^~~~~~~~~ > >> false_var_4 = 0; > >> m2.c:3:71: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> > >> ^~~ > >> ret_5 = 0; > >> m2.c:3:83: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> > >> ^ > >> if (true_var_3 != 0) > >> m2.c:3:114: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> > >> ^ > >> if (false_var_4 != 0) > >> m2.c:3:145: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> > >> > >> ~~~~^~~~~ > >> ret_7 = 111; > >> m2.c:3:182: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> > >> > >> ~~~~^~~~~ > >> ret_6 = 999; > >> m2.c:3:215: note: output_location > >> { unsigned int true_var = 1; unsigned int false_var = 0; unsigned > >> int ret = 0; if (true_var) /* count(1) */ { if (false_var) /* > >> count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /* > >> count(#####) */ return ret; } > >> > >> > >> > >> ^~~ > >> _8 = ret_2; > >> m2.c:3:215: note: output_location > >> # VUSE <.MEM_9(D)> > >> return _8; > >> > >> Which is not optimal, for some assignments I see just LHS > >> (false_var_4 = 0),
Note that unsigned int false_var = 0; is not an assignment-expression, it's an initialization. Only the '0' here is parsed as an assignment-expression, but in this case set_c_expr_source_range isn't called. > > > > My first though was: are there assignments for which this isn't the > > case? The only one I see is the: > > ret = 999; > > ~~~~^~~~~ > > > > Are the locations for these assignments coming through from the > > frontend? > > Hi. > > Actually not all, the default assignments are created in gimplifier and > location is assigned from DECL_EXPR: > > (gdb) p debug_tree(*expr_p) > <decl_expr 0x7ffff6988c80 > type <void_type 0x7ffff6878f18 void VOID > align 8 symtab 0 alias set -1 canonical type 0x7ffff6878f18 > pointer_to_this <pointer_type 0x7ffff68800a8>> > side-effects > arg 0 <var_decl 0x7ffff7f9ae10 true_var > type <integer_type 0x7ffff6878690 unsigned int public unsigned SI > size <integer_cst 0x7ffff6860f18 constant 32> > unit size <integer_cst 0x7ffff6860f30 constant 4> > align 32 symtab 0 alias set -1 canonical type 0x7ffff6878690 > precision 32 min <integer_cst 0x7ffff6860f48 0> max <integer_cst > 0x7ffff6860f00 4294967295> > pointer_to_this <pointer_type 0x7ffff6885dc8>> > used unsigned SI file /tmp/m2.c line 4 col 16 size <integer_cst > 0x7ffff6860f18 32> unit size <integer_cst 0x7ffff6860f30 4> > align 32 context <function_decl 0x7ffff697ce00 UuT> initial > <integer_cst 0x7ffff698b258 1> > chain <var_decl 0x7ffff7f9aea0 false_var type <integer_type > 0x7ffff6878690 unsigned int> > used unsigned SI file /tmp/m2.c line 4 col 43 size <integer_cst > 0x7ffff6860f18 32> unit size <integer_cst 0x7ffff6860f30 4> > align 32 context <function_decl 0x7ffff697ce00 UuT> initial > <integer_cst 0x7ffff6860f48 0> chain <var_decl 0x7ffff7f9af30 ret>>> > /tmp/m2.c:4:16 start: /tmp/m2.c:4:16 finish: /tmp/m2.c:4:23> > > That explains why only LHS of these assignments is selected. > > > > > I believe that in the C frontend these are assignment-expression, and > > hence handled by c_parser_expr_no_commas; in particular the use of > > op_location and the call: > > > > set_c_expr_source_range (&ret, lhs.get_start (), rhs.get_finish ()); > > > > ought to be setting up the caret of the assignment to be on the > > operator token, and for the start/finish to range from the start of the > > lhs to the end of the rhs i.e. what we see for: > > > > ret = 999; > > ~~~~^~~~~ > > Yep, MODIFY_EXPRs created in FE go this way and it's fine. > > > > > > >> for return statements only a returned value is displayed. > > > > Is this running on SSA form? If so, I wonder if you're running into > > something like this: > > > > retval_N = PHI <lots of values>; > > return retval_N; > > > > where it's using the location of that "return retval_N;" for all of the > > return statements in the function, rather than the individual source > > locations. > > Yep, but we properly assign each assignment to a SSA name that's going to > be merged in exit BB by PHI node: > > _8 = ret_2; > /tmp/m2.c:7:8: note: output_location > return ret; } > ^~~ > > Here the location comes from c_finish_return function where location > comes from a value that's returned. > > > > >> For conditions, only condition beginning is showed. > >> Is this known behavior or do I miss > >> something? > > > > c_parser_if_statement has: > > > > loc = c_parser_peek_token (parser)->location; > > > > which is that of the open-paren. Maybe we should build a location > > covering the range of the "if ( expression )" part of the if-statement? > > Adding Marek as C FE maintainer to reply the question. I suppose we could do better and I'd probably highlight just the expression part of "if ( expression )". But not sure how many use cases this range location would have. Marek