Hi, On Thu, Sep 16 2021, Kewen.Lin wrote: > Hi Martin, > > Thanks for the review comments! > > on 2021/9/15 下午8:51, Martin Jambor wrote: >> Hi, >> >> since this is inlining-related, I would somewhat prefer Honza to have a >> look too, but I have the following comments: >> >> On Wed, Sep 08 2021, Kewen.Lin wrote: >>> >> >> [...] >> >>> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h >>> index 78399b0b9bb..300b8da4507 100644 >>> --- a/gcc/ipa-fnsummary.h >>> +++ b/gcc/ipa-fnsummary.h >>> @@ -193,6 +194,9 @@ public: >>> vec<ipa_freqcounting_predicate, va_gc> *loop_strides; >>> /* Parameters tested by builtin_constant_p. */ >>> vec<int, va_heap, vl_ptr> GTY((skip)) builtin_constant_p_parms; >>> + /* Like fp_expressions, but it's to hold some target specific >>> information, >>> + such as some target specific isa flags. */ >>> + auto_vec<HOST_WIDE_INT> GTY((skip)) target_info; >>> /* Estimated growth for inlining all copies of the function before start >>> of small functions inlining. >>> This value will get out of date as the callers are duplicated, but >> >> Segher already wrote in the first thread that a vector of HOST_WIDE_INTs >> is an overkill and I agree. So at least make the new field just a >> HOST_WIDE_INT or better yet, an unsigned int. But I would even go >> further and make target_info only a 16-bit bit-field, place it after the >> other bit-fields in class ipa_fn_summary and pass it to the hooks as >> uint16_t. Unless you have plans which require more space, I think we >> should be conservative here. >> > > OK, yeah, the consideration is mainly for the scenario that target has > a few bits to care about. I just realized that to avoid inefficient > bitwise operation for mapping target info bits to isa_flag bits, target > can rearrange the sparse bits in isa_flag, so it's not a deal. > Thanks for re-raising this! I'll use the 16 bits bit-field in v3 as you > suggested, if you don't mind, I will put it before the existing bit-fields > to have a good alignment.
All right. > >> I am also not sure if I agree that the field should not be streamed for >> offloading, but since we do not have an offloading compiler needing them >> I guess for now that is OK. But it should be documented in the comment >> describing the field that it is not streamed to offloading compilers. >> > > Good point, will add it in v3. > >> [...] >> >> >>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c >>> index 2470937460f..72091b6193f 100644 >>> --- a/gcc/ipa-fnsummary.c >>> +++ b/gcc/ipa-fnsummary.c >>> @@ -2608,6 +2617,7 @@ analyze_function_body (struct cgraph_node *node, bool >>> early) >>> info->conds = NULL; >>> info->size_time_table.release (); >>> info->call_size_time_table.release (); >>> + info->target_info.release(); >>> >>> /* When optimizing and analyzing for IPA inliner, initialize loop >>> optimizer >>> so we can produce proper inline hints. >>> @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, >>> bool early) >>> bb_predicate, >>> bb_predicate); >>> >>> + /* Only look for target information for inlinable functions. */ >>> + bool scan_for_target_info = >>> + info->inlinable >>> + && targetm.target_option.need_ipa_fn_target_info (node->decl, >>> + info->target_info); >>> + >>> if (fbi.info) >>> compute_bb_predicates (&fbi, node, info, params_summary); >>> const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count; >>> @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, >>> bool early) >>> if (dump_file) >>> fprintf (dump_file, " fp_expression set\n"); >>> } >>> + if (scan_for_target_info) >>> + scan_for_target_info = >>> + targetm.target_option.update_ipa_fn_target_info >>> + (info->target_info, stmt); >>> } >> >> Practically it probably does not matter, but why is this in the "if >> (this_time || this_size)" block? Although I can see that setting >> fp_expression is also done that way... but it seems like copying a >> mistake to me. > > Yeah, I felt target info scanning is similar to fp_expression scanning, > so I just followed the same way. If I read it right, the case > !(this_time || this_size) means the STMT won't be weighted to any RTL > insn from both time and size perspectives, so guarding it seems to avoid > unnecessary scannings. I assumed that target bifs and inline asm would > not be evaluated as zero cost, it seems safe so far for HTM usage. > > Do you worry about some special STMT which is weighted to zero but it's > necessarily to be checked for target info in a long term? > If so, I'll move it out in v3. It seems that gimple_call_internal_p statements are always costed to zero and I am wondering whether those are something that targets would want to look out for in the future. But hopefully anyone implementing that in the future would come up with a testcase and would need to fix this to have the testcase pass. Thanks, Martin >> >> All that said, the overall approach seems correct to me. >> > > Thanks again. > BR, > Kewen