> On Jul 2, 2024, at 18:02, David Malcolm <dmalc...@redhat.com> wrote: > > On Tue, 2024-07-02 at 16:17 +0000, Qing Zhao wrote: >> due to code duplication from jump threading [PR109071] >> Control this with a new option -fdiagnostic-try-to-explain-harder. > > The name -fdiagnostic-try-to-explain-harder seems a little too "cute" > to me, but I can't think of a better name. Me either -:) > > Various comments inline below... I'm sorry I didn't take a close look > at the copy history implementation; I'm hoping Richi will dig into that > part of the patch. > >> >> This patch has been tested with -fdiagnostic-try-to-expain-harder on >> by >> default to bootstrap gcc and regression testing on both x86 and >> aarch64, >> resolved all bootstrap issues and regression testing issues. >> >> I need some help in the following two items: >> 1. suggestions on better documentation wordings for the new option. >> 2. checking on the new data structures copy_history and the >> memory management for it. >> In the beginning, I tried to use the GGC for it. >> but have met quite some issues (possibly because the gimple and >> the containing >> basic block elimination), then I gave up the GGC scheme and >> instead >> used the obstack and manually clean and deleted the obstack in the >> end of >> the compilation. >> >> The following is more details on the patchs >> >> Thanks a lot! >> >> Qing >> >> $ cat t.c >> extern void warn(void); >> static inline void assign(int val, int *regs, int *index) >> { >> if (*index >= 4) >> warn(); >> *regs = val; >> } >> struct nums {int vals[4];}; >> >> void sparx5_set (int *ptr, struct nums *sg, int index) >> { >> int *val = &sg->vals[index]; >> >> assign(0, ptr, &index); >> assign(*val, ptr, &index); >> } >> >> $ gcc -Wall -O2 -c -o t.o t.c >> t.c: In function ‘sparx5_set’: >> t.c:12:23: warning: array subscript 4 is above array bounds of >> ‘int[4]’ [-Warray-bounds=] >> 12 | int *val = &sg->vals[index]; >> | ~~~~~~~~^~~~~~~ >> t.c:8:18: note: while referencing ‘vals’ >> 8 | struct nums {int vals[4];}; >> | ^~~~ >> >> In the above, Although the warning is correct in theory, the warning >> message >> itself is confusing to the end-user since there is information that >> cannot >> be connected to the source code directly. >> >> It will be a nice improvement to add more information in the warning >> message >> to report where such index value come from. >> >> In order to achieve this, we add a new data structure copy_history to >> record >> the condition and the transformation that triggered the code >> duplication. >> Whenever there is a code duplication due to some specific >> transformations, >> such as jump threading, loop switching, etc, a copy_history structure >> is >> created and attached to the duplicated gimple statement. >> >> During array out-of-bound checking or other warning checking, the >> copy_history >> that was attached to the gimple statement is used to form a sequence >> of >> diagnostic events that are added to the corresponding rich location >> to be used >> to report the warning message. >> >> This behavior is controled by the new option -fdiagnostic-try-to- >> explain-harder >> which is off by default. >> >> With this change, by adding -fdiagnostic-try-to-explain-harder, >> the warning message for the above testing case is now: >> >> t.c: In function ‘sparx5_set’: >> t.c:12:23: warning: array subscript 4 is above array bounds of >> ‘int[4]’ [-Warray-bounds=] >> 12 | int *val = &sg->vals[index]; >> | ~~~~~~~~^~~~~~~ >> event 1 >> | >> | 4 | if (*index >= 4) >> | | ^ >> | | | >> | | (1) when the condition is evaluated to true >> | >> t.c:8:18: note: while referencing ‘vals’ >> 8 | struct nums {int vals[4];}; >> | ^~~~ > > BTW I notice in the above example you have the extra vertical line > below "event 1" here: > > event 1 > | > | 4 | if (*index >= 4) > | | ^ > | | | > | | (1) when the condition is evaluated to true > | > > whereas in the testcase it looks like you're expecting the simpler: > > event 1 > 4 | if (*index >= 4) > | ^ > | | > | (1) when the condition is evaluated to true > > which is due to r15-533-g3cd267446755ab, so I think the example text is > slightly outdated.
Yes, you are right, I updated the format in the testing case but forgot to update it in the comment of my patch. I will update the comment of the patch too. > > I wonder if the wording of the event could be improved to better > explain to the user what the optimizer is "thinking". > > Perhaps it could be two events: > > (1) when specializing the code for both branches... > (2) ...and considering the 'true' branch... > > or something like that? (I'm not sure) I can explain the event in more details as your suggested > > Is there a more complicated testcase with multiple events? When I bootstrapped GCC, there were some issues relating to the multiple copies of the same gimple statement, all in jump threading pass, however, no any warning issue. It’s not easy to come up with a testing case with multiple copies of the same gimple statement and generate warning at the same time. I will try to think harder here. > > Various other points about the path: > > When the analyzer builds paths for its diagnostics, it adds a final > event that repeats the problem. This is somewhat tautological, but > might make the path more readable - I'm not sure. Consider: > > t.c: In function ‘sparx5_set’: > t.c:12:23: warning: array subscript 4 is above array bounds of > ‘int[4]’ [-Warray-bounds=] > 12 | int *val = &sg->vals[index]; > | ~~~~~~~~^~~~~~~ > events 1-2 > 4 | if (*index >= 4) > | ^ > | | > | (1) when the condition is evaluated to true > ... > 12 | int *val = &sg->vals[index]; > | ~~~~~~~~^~~~~~~ > | | > | (2) ⚠️ out of array bounds here > > t.c:8:18: note: while referencing ‘vals’ > 8 | struct nums {int vals[4];}; > | ^~~~ > > FWIW there's a way to get get a ⚠️ emoji on that final event, by > having its meaning have verb == diagnostic_event::VERB_danger. > > GCC 15 also supports visualizing control flow between events, by having > diagnostic_event::connect_to_next_event_p () return true; see r15-636- > g770657d02c986c, so that might be something to consider adding. Yes, that above diagnostic message is much more clear, I will try this. Thanks for the suggestion. > > > [...snip...] > >> diff --git a/gcc/Makefile.in b/gcc/Makefile.in >> index 7d3ea27a6ab..36a5b5fa305 100644 >> --- a/gcc/Makefile.in >> +++ b/gcc/Makefile.in >> @@ -1436,6 +1436,7 @@ OBJS = \ >> df-problems.o \ >> df-scan.o \ >> dfp.o \ >> + diagnostic-copy-history.o \ >> digraph.o \ >> dojump.o \ >> dominance.o \ >> @@ -1821,7 +1822,8 @@ OBJS = \ >> >> # Objects in libcommon.a, potentially used by all host binaries and >> with >> # no target dependencies. >> -OBJS-libcommon = diagnostic-spec.o diagnostic.o diagnostic-color.o \ >> +OBJS-libcommon = diagnostic-spec.o \ >> + diagnostic.o diagnostic-color.o \ > > Is this change to OBJS-libcommon a stray edit? I don't think it > changes the meaning. You are right, will delete this change. > > >> diagnostic-format-json.o \ >> diagnostic-format-sarif.o \ >> diagnostic-global-context.o \ >> diff --git a/gcc/common.opt b/gcc/common.opt >> index 327230967ea..833cc2f4da4 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -1558,6 +1558,10 @@ fdiagnostics-minimum-margin-width= >> Common Joined UInteger Var(diagnostics_minimum_margin_width) Init(6) >> Set minimum width of left margin of source code when showing source. >> >> +fdiagnostics-try-to-explain-harder >> +Common Var(flag_diagnostics_try_to_explain_harder) >> +Collect and print more context information for diagnostics. >> + > > Eventually you'll want to run "make regenerate-opt-urls". Yes, will do that. > > [...snip...] > >> diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc >> index 1637a2fc4f4..9050b06bf1f 100644 >> --- a/gcc/gimple-array-bounds.cc >> +++ b/gcc/gimple-array-bounds.cc > > [...snip...] > >> @@ -801,6 +802,21 @@ array_bounds_checker::check_array_bounds (tree >> *tp, int *walk_subtree, >> else >> location = gimple_location (wi->stmt); >> >> + /* Generate a rich location for this location. */ >> + gcc_rich_location richloc (location); >> + simple_diagnostic_path path (global_dc->printer); >> + >> + /* Add events to this diagnostic_path per the copy_history >> attached to >> + the corresponding STMT. */ >> + copy_history_t cp_history = wi->stmt ? get_copy_history (wi->stmt) >> : NULL; >> + for (copy_history_t cur_ch = cp_history; cur_ch; >> + cur_ch = cur_ch->prev_copy) >> + path.add_event (cur_ch->condition, NULL_TREE, 0, >> + "when the condition is evaluated to %s", >> + cur_ch->is_true_path ? "true" : "false"); > > add_event's 2nd param is the function in which the event occurs, which > you may want to set to cfun, and set the stack depth to 1, rather than > 0, suggesting a purely intraprocedural control flow. > > FWIW the analyzer has some nasty logic to try to reconstruct functions > and stack frames in the light of inlining, which might or might not > make sense for you to make use of (I'm not sure). Okay, thanks for the info, will study a little bit here. > > >> + >> + richloc.set_path (&path); > > Creating and populating "path" here is doing non-trivial work per call > to check_array_bounds, and if I'm reading things right that's being > done for every statement. > > So maybe we want something like: > > lazy_diaqnostic_path path (callback, data); > > that is backed by a diagnostic_path *, which gets populated on demand > if any vfuncs on the lazy_diagnostic_path are actually called. > > Or we could abandon simple_diagnostic_path here and have a: > > copy_history_diagnostic_path path (wi->stmt); > > subclass of diagnostic_path that implements its vfuncs directly in > terms of a copy_history_t; the ctor simply sets up the vtable ptr and > initializes a gimple *. > > Or, all those > > if (for_array_bound) > warned = warning_at (richloc, OPT_Warray_bounds_, > /* etc */); > > could become: > > if (for_array_bound) > { > copy_history_diagnostic_path path (location, stmt); > warned = warning_at (richloc, OPT_Warray_bounds_, > /* etc */); > } > > or somesuch, explicitly deferring creation of the path until we're > actually about to emit a warning, and putting the logic in > copy_history_diagnostic_path's ctor. So, the major complain here is, the creating and populating “path” is expensive, it’s a waste to do this for every gimple, it’s better to delay the creation of the path until we’re about to emit a warning. Agreed, I will try to delay the creation of the path until necessary. > >> + >> *walk_subtree = true; >> >> bool warned = false; >> @@ -808,14 +824,14 @@ array_bounds_checker::check_array_bounds (tree >> *tp, int *walk_subtree, >> gcc_assert (checker->m_stmt == wi->stmt); >> >> if (TREE_CODE (t) == ARRAY_REF) >> - warned = checker->check_array_ref (location, t, wi->stmt, >> + warned = checker->check_array_ref (&richloc, t, wi->stmt, >> false/*ignore_off_by_one*/); >> else if (TREE_CODE (t) == MEM_REF) >> - warned = checker->check_mem_ref (location, t, >> + warned = checker->check_mem_ref (&richloc, t, >> false /*ignore_off_by_one*/); >> else if (TREE_CODE (t) == ADDR_EXPR) >> { >> - checker->check_addr_expr (location, t, wi->stmt); >> + checker->check_addr_expr (&richloc, t, wi->stmt); >> *walk_subtree = false; >> } >> else if (inbounds_memaccess_p (t, wi->stmt)) > > [...snip...] > >> diff --git a/gcc/gimple-array-bounds.h b/gcc/gimple-array-bounds.h >> index aa7ca8e9730..c2f7a017fc2 100644 >> --- a/gcc/gimple-array-bounds.h >> +++ b/gcc/gimple-array-bounds.h >> @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. If not see >> #define GCC_GIMPLE_ARRAY_BOUNDS_H >> >> #include "pointer-query.h" >> +#include "gcc-rich-location.h" >> >> class array_bounds_checker >> { >> @@ -32,9 +33,10 @@ public: >> >> private: >> static tree check_array_bounds (tree *tp, int *walk_subtree, void >> *data); >> - bool check_array_ref (location_t, tree, gimple *, bool >> ignore_off_by_one); >> - bool check_mem_ref (location_t, tree, bool ignore_off_by_one); >> - void check_addr_expr (location_t, tree, gimple *); >> + bool check_array_ref (gcc_rich_location *, tree, gimple *, >> + bool ignore_off_by_one); > > Note that these could just take a rich_location *, I think, and drop > the #include above - I don't think there's anything you're using in > gcc_rich_location that isn't in the rich_location base class. Okay, will update this. > >> + bool check_mem_ref (gcc_rich_location *, tree, bool >> ignore_off_by_one); >> + void check_addr_expr (gcc_rich_location *, tree, gimple *); >> void get_value_range (irange &r, const_tree op, gimple *); >> >> /* Current function. */ > > [...snip...] > > > Hope this is constructive Thanks a lot for all your comment and suggestions, very helpful. Qing > Dave