An update on this task. (Also need more suggestions -:) I have an initial implemenation locally, with this gcc, I can get the following message with the testing case in PR109071:
[ 109071]$ cat t.c extern void warn(void); static inline void assign(int val, int *regs, int *index) { if (*index >= 4) warn(); *regs = val; } struct nums {int vals[4];}; void sparx5_set (int *ptr, struct nums *sg, int index) { int *val = &sg->vals[index]; assign(0, ptr, &index); assign(*val, ptr, &index); } [109071]$ sh t /home/opc/Install/latest-d/bin/gcc -O2 -Warray-bounds=1 -c -o t.o t.c t.c: In function ‘sparx5_set’: t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=] 12 | int *val = &sg->vals[index]; | ~~~~~~~~^~~~~~~ event 1 | | 4 | if (*index >= 4) | | ^ | | | | | (1) when the condition is evaluated to true | t.c:8:18: note: while referencing ‘vals’ 8 | struct nums {int vals[4];}; | ^~~~ 1. Is the above diagnostic message good enough? Any more suggestion? 2. It’s hard for me to come up with a more complicate testing case that has one basic block copied multiple times by the jump thread, do you have any pointer to such testing cases? Thanks a lot for any help. Qing > On Jun 7, 2024, at 15:13, Qing Zhao <qing.z...@oracle.com> wrote: > > Hi, Richard, > >> On Jun 5, 2024, at 13:58, Qing Zhao <qing.z...@oracle.com> wrote: >>> >>>>>>>>>> Like this? >>>>>>>>>> >>>>>>>>>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h >>>>>>>>>> index e6e2b0897572..ee344f91333b 100644 >>>>>>>>>> --- a/libcpp/include/line-map.h >>>>>>>>>> +++ b/libcpp/include/line-map.h >>>>>>>>>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro { >>>>>>>>>> struct GTY(()) location_adhoc_data { >>>>>>>>>> location_t locus; >>>>>>>>>> source_range src_range; >>>>>>>>>> - void * GTY((skip)) data; >>>>>>>>>> unsigned discriminator; >>>>>>>>>> + void * GTY((skip)) data; >>>>>>>>>> + void * GTY((skip)) copy_history; >>>>>>>>>> }; >>>>>>>>>> struct htab; >>>>>>>>> >>>>>>>>> Yes. >>>>>>>>> >>>>>>>>>> How about the copy_history? Do we need a new data structure (like >>>>>>>>>> the following, or other suggestion) for this? Where should I add >>>>>>>>>> this new data structure? >>>>>>>>> >>>>>>>>> As it needs to be managed by libcpp it should be in this very same >>>>>>>>> file. >>>>>>>>> >>>>>>>>>> struct copy_history { >>>>>>>>>> location_t condition; >>>>>>>>>> Bool is_true_path; >>>>>>>>>> } >>>>>>>>> >>>>>>>>> I think we want a pointer to the previous copy-of state as well in >>>>>>>>> case a stmt >>>>>>>>> was copied twice. We'll see whether a single (condition) location >>>>>>>>> plus edge flag >>>>>>>>> is sufficient. I'd say we should plan for an enum to indicate the >>>>>>>>> duplication >>>>>>>>> reason at least (jump threading, unswitching, unrolling come to my >>>>>>>>> mind). For >>>>>>>>> jump threading being able to say "when <condition> is true/false" is >>>>>>>>> probably >>>>>>>>> good enough, though it might not always be easy to identify a single >>>>>>>>> condition >>>>>>>>> here given a threading path starts at an incoming edge to a CFG merge >>>>>>>>> and >>>>>>>>> will usually end with the outgoing edge of a condition that we can >>>>>>>>> statically >>>>>>>>> evaluate. The condition controlling the path entry might not be >>>>>>>>> determined >>>>>>>>> fully by a single condition location. >>>>>>>>> >>>>>>>>> Possibly building a full "diagnostic path" object at threading time >>>>>>>>> might be >>>>>>>>> the only way to record all the facts, but that's also going to be >>>>>>>>> more >>>>>>>>> expensive. >>>>>>>> >>>>>>>> Note that a diagnostic_path represents a path through some kind of >>>>>>>> graph, whereas it sounds like you want to be storing the *nodes* in the >>>>>>>> graph, and later generating the diagnostic_path from that graph when we >>>>>>>> need it (which is trivial if the graph is actually just a tree: just >>>>>>>> follow the parent links backwards, then reverse it). >>>>>>> >>>>>>> I think we are mixing two things - one is that a single transform like >>>>>>> jump >>>>>>> threading produces a stmt copy and when we emit a diagnostic on that >>>>>>> copied statement we want to tell the user the condition under which the >>>>>>> copy is executed. That "condition" can be actually a sequence of >>>>>>> conditionals. I wanted to point out that a diagnostic_path instance >>>>>>> could >>>>>>> be used to describe such complex condition. >>>>>>> >>>>>>> But then the other thing I wanted to address with the link to a previous >>>>>>> copy_history - that's when a statement gets copied twice, for example >>>>>>> by two distinct jump threading optimizations. Like when dumping >>>>>>> the inlining decisions for diagnostics we could dump the logical "and" >>>>>>> of the conditions of the two threadings. Since we have a single >>>>>>> location per GIMPLE stmt we'd have to keep a "stack" of copy events >>>>>>> associated with it. That's the linked list (I think a linked list >>>>>>> should >>>>>>> work fine here). >>>>>> Yes, the linked list to keep the “stack” of copy events should be good >>>>>> enough >>>>>> to form the sequence of conditionals event for the diagnostic_path >>>>>> instance. >>>>>>> >>>>>>> I realize things may get a bit "fat", but eventually we are not >>>>>>> duplicating >>>>>>> statements that much. I do hope we can share for example a big >>>>>>> diagnostic_path when we duplicate a basic block during threading >>>>>>> and use one instance for all stmts in such block copy (IIRC we never >>>>>>> release locations or their ad-hoc data, we just make sure to never >>>>>>> use locations with ad-hoc data pointing to BLOCKs that we released, >>>>>>> but the linemap data will still have pointers in "dead" location >>>>>>> entries, >>>>>>> right?) >>>>>> Are you still suggesting to add two artificial stmts in the beginning >>>>>> and the >>>>>> end of the duplicated block to carry the copy history information for >>>>>> all the >>>>>> stmts in the block to save space? >>>>>> >>>>>> Compared with the approach to carry such information to each duplicated >>>>>> stmts (which I preferred), >>>>>> The major concerns with the approach are: >>>>>> 1. Optimizations might move stmts out of these two artificial stmts, >>>>>> therefore we need add >>>>>> Some memory barrier on these two artificial stmts to prevent such >>>>>> movements. >>>>>> This might prevent good optimization from happening and result in >>>>>> runtime performance loss; >>>>>> 2. Every time when query whether the stmt is copied and get its copy >>>>>> history, we have to search backward or >>>>>> Forward through the stmt chain to get to the artificial stmts that >>>>>> carry the copy history, compilation time will >>>>>> be slower. >>>>>> >>>>>> I still think that carrying the copy history info to each duplicated >>>>>> stmts might be better. We can limit the length of >>>>>> the history to control the space, what do you think? >>>>> >>>>> Copying to each stmt is definitely easier so I think it's better to >>>>> start with that and only resort >>>>> to other things when this fails. >>>> Okay. Will try this approach first. >>>>> >>>>> Note we could, instead of putting a pointer to data into the ad-hoc >>>>> info, put in an index >>>>> and have the actual data be maintained elsewhere. That data could >>>>> even only keep >>>>> the "last" 1000 contexts and simply discard "older" ones. Before IPA >>>>> info can still be >>>>> transfered between functions, but after IPA where most of the >>>>> threadings happen the >>>>> extra info could be discarded once we get a function to the last pass >>>>> emitting diagnostics >>>>> we want to amend. Doing an index lookup makes it possible to not >>>>> worry about "stale" >>>>> entries. >>>> >>>> Are you suggesting to use a global array with upper-bound “1000” to hold >>>> the copy-history info? >>>> Then for each copied stmt, put in an index to this array to refer the >>>> copy-history linked list for it? >>>> Is “1000” enough for one function? We can do some experiments to decide >>>> this number. >>>> >>>> What do you mean by “stale” entries? The entries for the previous function? >>>> >>>> Is a function-level array enough for this purpose? i.e, for each function, >>>> we use an array to hold the copy-history info, and free this array after >>>> this function is done with compilation? >>>> A little confused here…. >>> >>> I was thinking of a hash map and noting when we create an entry after IPA >>> so we can scrap it when done with the function >> Okay. I see. >> then where should this hash map be declared? diagnostic-spec.h? > > I studied a little bit on hash_map and some related code. For our case, I > came up with the following: > > ==== > > /* An enum for the reason why this copy is made. Right now, there are > three reasons, we can add more if needed. */ > enum copy_reason { > COPY_BY_THREAD_JUMP, > COPY_BY_LOOP_UNSWITCH, > COPY_BY_LOOP_UNROLL > }; > > /* This data structure records the information when a statement is duplicated > during different transformations. Such information will be used by the later > diagnostic message to report more context of the warning or error. */ > struct copy_history { > /* The location of the condition statement that triggered the code > duplication. */ > location_t condition; > > /* Whether this copy is on the TRUE path of the condition. */ > bool is_true_path; > > /* The reason for the code duplication. */ > enum copy_reason reason; > > /* This statement itself might be a previous code duplication. */ > structure copy_history *prev_copy; > }; > > typedef struct copy_history *copy_history_t; > > typedef hash_map<gimple *, copy_history_t> copy_history_map_t; > > > /* A mapping from a simple to a pointer to the copy history of it. */ > GTY(()) copy_history_map_t *copy_history_map; > > /* Get the copy history for the gimple STMT, return NULL when there is > no associated copy history. */ > copy_history_t > get_copy_history (gimple *stmt) > { > if (!copy_history_map) > return NULL; > > if (const copy_history_t cp_history = copy_history_map->get (stmt)) > return cp_history; > > return NULL; > } > > /* Insert the copy history for the gimple STMT. */ > void > insert_copy_history (gimple *stmt, copy_history_t cp_history) > { > if (!copy_history_map) > copy_history_map = copy_history_map_t::create_ggc (1000); > > copy_history_map->put (stmt, cp_history); > return; > } > > ===== > > Do you have any comment and suggestion on the above? > > I have some questions: > > 1. Should I add a new header file for this purpose, for example, > diagnostic-copy-history.h? > Of just add these new data structures into an existing header file? If so, > which one is better? > > 2. Should I explicitly delete and free the hash_map “copy_history_map” when > it is not used anymore? > If so, where should I do this? > Or “create_ggc” will guarantee that this hash_map will be automatically > deleted and freed. > > 3. You mentioned that this hash_map “copy_history_map” will be scrapped after > the current function is done. > Is there any special places in GCC that I should explicitly create this > hash_map for the current function and > then delete it when the current function is done? > > Thanks a lot. > > Qing > >> >> Qing >> >>> >>>> Thanks for clarification. >>>> >>>> Qing >>>> >>>> >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> Qing >>>>>> >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> >>>>>>>> Dave >>>>>>>> >>>>>>>>> I can think of having a -fdiagnostic-try-to-explain-harder option to >>>>>>>>> clarify confusing >>>>>>>>> diagnostics people could use on-demand? >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Qing >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> >>>>>>>>>>>> Qing >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> But feel free to ask me any questions about the >>>>>>>>>>>>>> diagnostic_path >>>>>>>>>>>>>> machinery within the diagnostics subsystem. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [...snip...] >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards >>>>>>>>>>>>>> Dave