On 10/09/2014 04:11 PM, Richard Earnshaw wrote: > On 09/10/14 12:35, Christian Bruel wrote: >> On 10/08/2014 06:56 PM, Ramana Radhakrishnan wrote: >>> Hi Christian, >> << snipped agreed stuf >> >>> 3) about inlining >>> I dislike inlining different modes, From a conceptual use, a user >>> might want to switch mode only when changing a function's hotness. >>> Usually inlining a cold function into a hot one is not what the user >>> explicitly required when setting different mode attributes for them, >>> __attribute__((thumb)) should not imply coldness or hotness. Inlining >>> between cold and hot functions should be done based on profile feedback. >>> The choice of compiling in "Thumb1" state for coldness is a separate one >>> because that's where the choice needs to be made. >> Ideally yes. but I think that a user motivation to use target attribute >> (("thumb") is to reduce code size even in the cases where PFO is not >> available (libraries, kernel or user application build spec packaged >> without profile data). And there are cases where static probabilities >> are not enough and that a user wants it own control with gprof or oprofile. >> But in this case, we could point to the __attribute__ (("cold")) on the >> function ? That would probably be the best workaround to propose if we >> recommend this >> > Hot vs cold is interesting, but arm/thumb shouldn't be used to imply > that. The days when ARM=fast, thumb=small are in the past now, and > thumb2 code should be both fast and small. Indeed, smaller thumb2 code > can be faster than larger ARM code simply because you can get more of it > in the cache. The use of arm vs thumb is likely to be much more subtle now.
I'm also very interested by this. From my last bench session, ARM mode could bring a speedup (from noise to 5/6%) but with a very big size penalty. So I believe there is room for fine tuning at application level, and I agree this is very subtle and difficult this is another topic. (That was with a GCC 4.8, maybe the gap has reduced since). > >> But here is another scenario: Using of attribute (("arm")) for exception >> entry points is indeed not related to hotness. But consider a typical >> thumb binary with an entry point in arm compiled in C (ex handler, a >> kernel...). Today due to the file boundary the thumb part is not inlined >> into the arm point. (Using -flto is not possible because the whole >> gimple would be thumb). >> > We have no-inline attributes for scenarios like that. I don't think a > specific use case should dominate other cases. That's severe, no-inline attribute would disable inlining same modes ! > >> Now, using attribute (("target")) for the functions others than the >> entry point, with your approach they would all be inlined (assuming the >> cost allow this) and we would end up with a arm binary instead of a >> thumb binary... >> >> But there are still 3 points : >> >> - At least 2 other target (i386, Powerpc) that support attribute_target >> disable inlining between modes that are not subsets. I like to think >> about homogeneity between targets and I find odd to have different >> inlining rules... >> > That's because use of specific instructions must not be allowed to leak > past a gating check that is in the caller. It would be a disaster if a > function that used a neon register, for example, was allowed to leak > into code that might run on a target with no Neon register file. The > ARM/thumb distinction shouldn't, by default, be limited in that manner. > > I believe inlining could happen from a subset of the archtiecture into a > function using a superset, just not vice-versa. I'm afraid I misunderstand this, Do you want inlining from Thumb to a function using ARM because you consider thumb to be a subset of ARM ? You know better that I but I never thought that, or maybe there is something to do with the unified assembler ? In this case I see the problem under a new light. With the unified assembly, indeed we could inline from any mode as long as no divide mode asm inside. > >> - Scanning the function body to check for ASM_INPUT does not look very >> elegant (if this matters) because the asm could well be unrelated >> >> The only case when it will always be a win to inline thumb into arm is >> when the cost of the inlined body is less than a BX instruction (but >> still, with branch prediction this cost is pondered). >> > One of the problems with not inlining is that the C++ abstraction > penalty is likely to shoot up. There will be many major lost > optimization opportunities if we start down that path. I would never consider users to use extensively this attribute on inlined member functions. But I take your point > So I think inlining should only be disabled if there's some technical > reason why it should be disabled, not because of some 'it might not > always be ideal' feelings. Furthermore, we should expect users to use > the other attributes consistently when they expect specific behaviours > to occur. Sure, me also I would have preferred objective benchmarks results, but its a little bit early to have experience with instrumentation of large code. Thanks for your input, it's great to see the problem from all directions between taking a decision Christian > My 2p. > > R. > >>>> The compiler would take a decision that is not what the user wrote. And >>>> in addition if you consider the few instructions to modify R15 to switch >>>> state that would end up with more code executed in the critical path, >>>> voiding a possible size of speed gain. >>> I do not expect there to be any additional instructions needed to switch >>> state. If function x is inlined into function y the state would be lost >>> and the state would be in terms of the state of function x. >> Yes, indeed. I was in a LCM/mode-switching thinking mode when writing >> this. In this case the mode is inherited. >> >>> Obviously if the user doesn't want inlining - the user would add >>> attributes to disable inlining. You do have extensions such as >>> __attribute__((noinline)) and __attribute__((never_inline)) to give the >>> user that control and those bits need to be used in addition. >> Those attributes are overkill. They would disable inlining between >> caller-callee of a same mode. This is not what we want >> >>> The attribute then purely reflects then the output instruction state of >>> the function if a copy of it's body is laid out separately in the output. >>> >>> IMHO, the heuristics for inlining should be the best judge of when >>> functions should be inlined between one and another and we shouldn't be >>> second guessing that in the backend >>> >>> If there is a copy of the function to be put out by the compiler, only >>> then should we choose this based on the state of the "target" i.e. arm >>> or thumb. >>> >> Yes, >> >> So to summarize, we can: >> >> 1) don't inline between different modes. Same behavior with other >> targets. Solves asm case >> 2) always inline unless the function contains asm statements. ( I >> reject adding a new compilation switch) >> 3) always inline. But recommend the use of attribute ((noinline)) to >> handle asm or attribute ((cold,hot)) in the absence of profile datas >> >> I obviously prefer 1) safe and homogenous, 3) is the worse as it >> requires additional user action (poor user). 2) is less worse. >> >> Thanks for supporting me ::) >> >> Christian >> >> >