Hi Martin,

Thanks for the review comments!

on 2021/9/15 下午8:51, Martin Jambor wrote:
> Hi,
> 
> since this is inlining-related, I would somewhat prefer Honza to have a
> look too, but I have the following comments:
> 
> On Wed, Sep 08 2021, Kewen.Lin wrote:
>>
> 
> [...]
> 
>> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
>> index 78399b0b9bb..300b8da4507 100644
>> --- a/gcc/ipa-fnsummary.h
>> +++ b/gcc/ipa-fnsummary.h
>> @@ -193,6 +194,9 @@ public:
>>    vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
>>    /* Parameters tested by builtin_constant_p.  */
>>    vec<int, va_heap, vl_ptr> GTY((skip)) builtin_constant_p_parms;
>> +  /* Like fp_expressions, but it's to hold some target specific information,
>> +     such as some target specific isa flags.  */
>> +  auto_vec<HOST_WIDE_INT> GTY((skip)) target_info;
>>    /* Estimated growth for inlining all copies of the function before start
>>       of small functions inlining.
>>       This value will get out of date as the callers are duplicated, but
> 
> Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
> is an overkill and I agree.  So at least make the new field just a
> HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
> further and make target_info only a 16-bit bit-field, place it after the
> other bit-fields in class ipa_fn_summary and pass it to the hooks as
> uint16_t.  Unless you have plans which require more space, I think we
> should be conservative here.
> 

OK, yeah, the consideration is mainly for the scenario that target has
a few bits to care about.  I just realized that to avoid inefficient
bitwise operation for mapping target info bits to isa_flag bits, target
can rearrange the sparse bits in isa_flag, so it's not a deal.
Thanks for re-raising this!  I'll use the 16 bits bit-field in v3 as you
suggested, if you don't mind, I will put it before the existing bit-fields
to have a good alignment.

> I am also not sure if I agree that the field should not be streamed for
> offloading, but since we do not have an offloading compiler needing them
> I guess for now that is OK. But it should be documented in the comment
> describing the field that it is not streamed to offloading compilers.
> 

Good point, will add it in v3.

> [...]
> 
> 
>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
>> index 2470937460f..72091b6193f 100644
>> --- a/gcc/ipa-fnsummary.c
>> +++ b/gcc/ipa-fnsummary.c
>> @@ -2608,6 +2617,7 @@ analyze_function_body (struct cgraph_node *node, bool 
>> early)
>>    info->conds = NULL;
>>    info->size_time_table.release ();
>>    info->call_size_time_table.release ();
>> +  info->target_info.release();
>>  
>>    /* When optimizing and analyzing for IPA inliner, initialize loop 
>> optimizer
>>       so we can produce proper inline hints.
>> @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, bool 
>> early)
>>                         bb_predicate,
>>                         bb_predicate);
>>  
>> +  /* Only look for target information for inlinable functions.  */
>> +  bool scan_for_target_info =
>> +    info->inlinable
>> +    && targetm.target_option.need_ipa_fn_target_info (node->decl,
>> +                                                  info->target_info);
>> +
>>    if (fbi.info)
>>      compute_bb_predicates (&fbi, node, info, params_summary);
>>    const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>> @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, bool 
>> early)
>>                if (dump_file)
>>                  fprintf (dump_file, "   fp_expression set\n");
>>              }
>> +          if (scan_for_target_info)
>> +            scan_for_target_info =
>> +              targetm.target_option.update_ipa_fn_target_info
>> +              (info->target_info, stmt);
>>          }
> 
> Practically it probably does not matter, but why is this in the "if
> (this_time || this_size)" block?  Although I can see that setting
> fp_expression is also done that way... but it seems like copying a
> mistake to me.

Yeah, I felt target info scanning is similar to fp_expression scanning,
so I just followed the same way.  If I read it right, the case
!(this_time || this_size) means the STMT won't be weighted to any RTL
insn from both time and size perspectives, so guarding it seems to avoid
unnecessary scannings.  I assumed that target bifs and inline asm would
not be evaluated as zero cost, it seems safe so far for HTM usage.

Do you worry about some special STMT which is weighted to zero but it's
necessarily to be checked for target info in a long term?
If so, I'll move it out in v3.
> 
> All that said, the overall approach seems correct to me.
> 

Thanks again.
BR,
Kewen

Reply via email to