[PATCH] D98061: [InstrProfiling] Generate runtime hook for ELF platforms

Vedant Kumar via Phabricator via cfe-commits Tue, 09 Mar 2021 16:37:37 -0800

vsk added a comment.

In D98061#2615386 <https://reviews.llvm.org/D98061#2615386>, @phosek wrote:


> In D98061#2615334 <https://reviews.llvm.org/D98061#2615334>, @vsk wrote:
>
>> In D98061#2615250 <https://reviews.llvm.org/D98061#2615250>, @phosek wrote:
>>
>>> In D98061#2615239 <https://reviews.llvm.org/D98061#2615239>, @vsk wrote:
>>>
>>>> @ributzka may have stronger thoughts about when -fprofile-instr-generate 
>>>> must imply that a known set of symbols appear with external visibility. Up 
>>>> until now, the answer has been "always", and this is what tapi enforces 
>>>> for MachO. It's awkward to have this be inconsistent between MachO/ELF, 
>>>> but if there's a compelling performance reason then I think it's fine.
>>>
>>> From the perspective of Fuchsia, we've seen a noticeable impact of this 
>>> change when using `-fprofile-instr-generate` together `-fprofile-list` to 
>>> apply instrumentation selectively only to modified files.
>>
>> What kind of impact do you see? If #counters > 0, is it mostly binary size 
>> cost? If #counters == 0, what's the avg. overhead of writing out the full 
>> profile?
>
> It depends a bit on the runtime and the platform. In Fuchsia where we always 
> use the continuous mode, there's quite a bit of upfront cost (see 
> https://github.com/llvm/llvm-project/blob/75f3f778052cdcd89e79f7a42a50915ee5ce2281/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c#L109),
>  we need to allocate a memory object, map it into address space, publish it, 
> write some additional information into the log.
>
> While it may not be so bad for a single binary, over hundreds of tests it 
> adds up and with this change we saw the total test execution time go down 
> from 30 to 18 minutes. There are other steps we're taking, like eliminating 
> the need for logging, but that's unlikely to eliminate all of the overhead.
>
>> Can it be fixed by doing an early-exit in the runtime initializer, writing 
>> out an empty .profraw?
>
> I considered that initially, but that's less efficient than the approach 
> implemented here, especially when it comes to binary size.

I don't follow where the binary size cost comes from (is it the cost of writing 
out many empty .profraw headers?), but it sounds like the 30 -> 18min speed up 
is achieved by not registering essentially an empty profile with the VM, so it 
does follow that unconditionally writing out an empty .profraw won't work as 
well as disabling the runtime initializer entirely.

>> That raises a question about tooling support: some workflows (like the Xcode 
>> coverage plugin) might assume that a program compiled with 
>> -fprofile-instr-generate always creates a .profraw. If there's no profile 
>> written at all for the #counters == 0 case, that could be a breaking change.
>
> That's a good point, would it be better to put this behind a (frontend or 
> backend) flag?

I don't think it should be an option because I have doubts about how 
discoverable it'd be. My preference would be to add a section to the clang 
coverage docs explaining the different guarantees for .profraw emission.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98061/new/

https://reviews.llvm.org/D98061

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D98061: [InstrProfiling] Generate runtime hook for ELF platforms

Reply via email to