On 09/10/14 12:35, Christian Bruel wrote:
> 
> On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote:
>> Hi Christian,
> 
> << snipped agreed stuf >>
>> 3) about inlining
>>    I dislike inlining different modes, From a conceptual use, a user
>> might want to switch mode only when changing a function's hotness.
>> Usually inlining a cold function into a hot one is not what the user
>> explicitly required when setting different mode attributes for them,
>> __attribute__((thumb)) should not imply coldness or hotness. Inlining 
>> between cold and hot functions should be done based on profile feedback. 
>> The choice of compiling in "Thumb1" state for coldness is a separate one 
>> because that's where the choice needs to be made.
> 
> Ideally yes. but I think that a user motivation to use target attribute
> (("thumb") is to reduce code size even in the cases where PFO is not
> available (libraries, kernel or user application build spec packaged
> without profile data). And there are cases where static probabilities
> are not enough and that a user wants it own control with gprof or oprofile.
> But in this case, we could point to the __attribute__ (("cold")) on the
> function ? That would probably be the best workaround to propose if we
> recommend this
> 

Hot vs cold is interesting, but arm/thumb shouldn't be used to imply
that.  The days when ARM=fast, thumb=small are in the past now, and
thumb2 code should be both fast and small.  Indeed, smaller thumb2 code
can be faster than larger ARM code simply because you can get more of it
in the cache.  The use of arm vs thumb is likely to be much more subtle now.

> But here is another scenario: Using of attribute (("arm")) for exception
> entry points is indeed not related to hotness. But consider a typical
> thumb binary with an entry point in arm compiled in C (ex handler, a
> kernel...). Today due to the file boundary the thumb part is not inlined
> into the arm point. (Using -flto is not possible because the whole
> gimple would be thumb).
> 

We have no-inline attributes for scenarios like that.  I don't think a
specific use case should dominate other cases.

> Now, using attribute (("target")) for the functions others than the
> entry point, with your approach they would all be inlined (assuming the
> cost allow this) and we would end up with a arm binary instead of a
> thumb binary...
> 
> But there are still 3 points  :
> 
> - At least 2 other target (i386, Powerpc) that support attribute_target
> disable inlining between modes that are not subsets. I like to think
> about homogeneity between targets and I find odd to have different
> inlining rules...
> 

That's because use of specific instructions must not be allowed to leak
past a gating check that is in the caller.  It would be a disaster if a
function that used a neon register, for example, was allowed to leak
into code that might run on a target with no Neon register file.  The
ARM/thumb distinction shouldn't, by default, be limited in that manner.

I believe inlining could happen from a subset of the archtiecture into a
function using a superset, just not vice-versa.

> - Scanning the function body to check for ASM_INPUT does not look very
> elegant (if this matters) because the asm could well be unrelated
> 
> The only case when it will always be a win to inline thumb into arm is
> when the cost of the inlined body is less than a BX instruction (but
> still, with branch prediction this cost is pondered).
> 

One of the problems with not inlining is that the C++ abstraction
penalty is likely to shoot up.  There will be many major lost
optimization opportunities if we start down that path.

So I think inlining should only be disabled if there's some technical
reason why it should be disabled, not because of some 'it might not
always be ideal' feelings.  Furthermore, we should expect users to use
the other attributes consistently when they expect specific behaviours
to occur.

My 2p.

R.

> 
>>
>>> The compiler would take a decision that is not what the user wrote. And
>>> in addition if you consider the few instructions to modify R15 to switch
>>> state that would end up with more code executed in the critical path,
>>> voiding a possible size of speed gain.
>> I do not expect there to be any additional instructions needed to switch 
>> state. If function x is inlined into function y the state would be lost 
>> and the state would be in terms of the state of function x.
> Yes, indeed. I was in a LCM/mode-switching thinking mode when writing
> this. In this case the mode is inherited.
> 
>> Obviously if the user doesn't want inlining - the user would add 
>> attributes to disable inlining. You do have extensions such as 
>> __attribute__((noinline)) and __attribute__((never_inline)) to give the 
>> user that control and those bits need to be used in addition.
> 
> Those attributes are overkill. They would disable inlining between
> caller-callee of a same mode. This is not what we want
> 
>>
>> The attribute then purely reflects then the output instruction state of 
>> the function if a copy of it's body is laid out separately in the output.
>>
>> IMHO, the heuristics for inlining should be the best judge of when 
>> functions should be inlined between one and another and we shouldn't be 
>> second guessing that in the backend
>>
>> If there is a copy of the function to be put out by the compiler, only 
>> then should we choose this based on the state of the "target" i.e. arm 
>> or thumb.
>>
> Yes,
> 
> So to summarize, we can:
> 
>   1) don't inline between different modes. Same behavior with other
> targets. Solves asm case
>   2) always inline unless the function contains asm statements. ( I
> reject adding a new compilation switch)
>   3) always inline. But recommend the use of attribute ((noinline)) to
> handle asm or attribute ((cold,hot)) in the absence of profile datas
> 
> I obviously prefer 1) safe and  homogenous, 3) is the worse as it
> requires additional user action (poor user). 2) is less worse.
> 
> Thanks for supporting me ::)
> 
> Christian
> 
> 


Reply via email to