Hi,

On Thu, Sep 16 2021, Kewen.Lin wrote:
> Hi Martin,
>
> Thanks for the review comments!
>
> on 2021/9/15 下午8:51, Martin Jambor wrote:
>> Hi,
>> 
>> since this is inlining-related, I would somewhat prefer Honza to have a
>> look too, but I have the following comments:
>> 
>> On Wed, Sep 08 2021, Kewen.Lin wrote:
>>>
>> 
>> [...]
>> 
>>> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h
>>> index 78399b0b9bb..300b8da4507 100644
>>> --- a/gcc/ipa-fnsummary.h
>>> +++ b/gcc/ipa-fnsummary.h
>>> @@ -193,6 +194,9 @@ public:
>>>    vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
>>>    /* Parameters tested by builtin_constant_p.  */
>>>    vec<int, va_heap, vl_ptr> GTY((skip)) builtin_constant_p_parms;
>>> +  /* Like fp_expressions, but it's to hold some target specific 
>>> information,
>>> +     such as some target specific isa flags.  */
>>> +  auto_vec<HOST_WIDE_INT> GTY((skip)) target_info;
>>>    /* Estimated growth for inlining all copies of the function before start
>>>       of small functions inlining.
>>>       This value will get out of date as the callers are duplicated, but
>> 
>> Segher already wrote in the first thread that a vector of HOST_WIDE_INTs
>> is an overkill and I agree.  So at least make the new field just a
>> HOST_WIDE_INT or better yet, an unsigned int.  But I would even go
>> further and make target_info only a 16-bit bit-field, place it after the
>> other bit-fields in class ipa_fn_summary and pass it to the hooks as
>> uint16_t.  Unless you have plans which require more space, I think we
>> should be conservative here.
>> 
>
> OK, yeah, the consideration is mainly for the scenario that target has
> a few bits to care about.  I just realized that to avoid inefficient
> bitwise operation for mapping target info bits to isa_flag bits, target
> can rearrange the sparse bits in isa_flag, so it's not a deal.
> Thanks for re-raising this!  I'll use the 16 bits bit-field in v3 as you
> suggested, if you don't mind, I will put it before the existing bit-fields
> to have a good alignment.

All right.

>
>> I am also not sure if I agree that the field should not be streamed for
>> offloading, but since we do not have an offloading compiler needing them
>> I guess for now that is OK. But it should be documented in the comment
>> describing the field that it is not streamed to offloading compilers.
>> 
>
> Good point, will add it in v3.
>
>> [...]
>> 
>> 
>>> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
>>> index 2470937460f..72091b6193f 100644
>>> --- a/gcc/ipa-fnsummary.c
>>> +++ b/gcc/ipa-fnsummary.c
>>> @@ -2608,6 +2617,7 @@ analyze_function_body (struct cgraph_node *node, bool 
>>> early)
>>>    info->conds = NULL;
>>>    info->size_time_table.release ();
>>>    info->call_size_time_table.release ();
>>> +  info->target_info.release();
>>>  
>>>    /* When optimizing and analyzing for IPA inliner, initialize loop 
>>> optimizer
>>>       so we can produce proper inline hints.
>>> @@ -2659,6 +2669,12 @@ analyze_function_body (struct cgraph_node *node, 
>>> bool early)
>>>                        bb_predicate,
>>>                        bb_predicate);
>>>  
>>> +  /* Only look for target information for inlinable functions.  */
>>> +  bool scan_for_target_info =
>>> +    info->inlinable
>>> +    && targetm.target_option.need_ipa_fn_target_info (node->decl,
>>> +                                                 info->target_info);
>>> +
>>>    if (fbi.info)
>>>      compute_bb_predicates (&fbi, node, info, params_summary);
>>>    const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN (cfun)->count;
>>> @@ -2876,6 +2892,10 @@ analyze_function_body (struct cgraph_node *node, 
>>> bool early)
>>>               if (dump_file)
>>>                 fprintf (dump_file, "   fp_expression set\n");
>>>             }
>>> +         if (scan_for_target_info)
>>> +           scan_for_target_info =
>>> +             targetm.target_option.update_ipa_fn_target_info
>>> +             (info->target_info, stmt);
>>>         }
>> 
>> Practically it probably does not matter, but why is this in the "if
>> (this_time || this_size)" block?  Although I can see that setting
>> fp_expression is also done that way... but it seems like copying a
>> mistake to me.
>
> Yeah, I felt target info scanning is similar to fp_expression scanning,
> so I just followed the same way.  If I read it right, the case
> !(this_time || this_size) means the STMT won't be weighted to any RTL
> insn from both time and size perspectives, so guarding it seems to avoid
> unnecessary scannings.  I assumed that target bifs and inline asm would
> not be evaluated as zero cost, it seems safe so far for HTM usage.
>
> Do you worry about some special STMT which is weighted to zero but it's
> necessarily to be checked for target info in a long term?
> If so, I'll move it out in v3.

It seems that gimple_call_internal_p statements are always costed to
zero and I am wondering whether those are something that targets would
want to look out for in the future.

But hopefully anyone implementing that in the future would come up with
a testcase and would need to fix this to have the testcase pass.

Thanks,

Martin

>> 
>> All that said, the overall approach seems correct to me.
>> 
>
> Thanks again.
> BR,
> Kewen

Reply via email to