[PATCH] D36850: [ThinLTO] Add norecurse function attribute propagation

Di Mo via Phabricator via cfe-commits Thu, 26 Aug 2021 21:31:44 -0700

modimo marked 3 inline comments as done.
modimo added a comment.

In D36850#2964293 <https://reviews.llvm.org/D36850#2964293>, @tejohnson wrote:


> Good point on indirect calls. Rather than add a bit to the summary, can the 
> flags just be set conservatively in any function containing an indirect call 
> when we build the summaries initially? I think that would get the same effect.

That could have an issue where A calls {indirect, B} and A gets propagated upon 
from B without knowledge that the indirect call exists. Right now I've got a 
FunFlags `hasUnknownCall` which marks these as non-propagatable.

> For speculative devirtualization aka ICP, we will still be left with a 
> fallback indirect call, so it would still need to be treated conservatively. 
> The extra edges added for ICP promotion candidates shouldn't be a problem or 
> affect this.

Ah good point. I was thinking it may pessimize the propagation because of 
having to process all of these edges this is a no-go because of the fallback.

> Note that with class hierarchy analysis we can do better for virtual calls, 
> e.g. when -fwhole-program-vtables is enabled for whole program 
> devirtualization and we have the necessary whole program visibility on 
> vtables. We could potentially integrate WPD decision here. Even if we can't 
> find a single devirtualization target, we can compute the set of all possible 
> targets of virtual calls during the thin link and could use that information 
> to basically merge the attributes from all possible targets. But probably 
> best to leave that as a future refinement as it will take some additional 
> work to get that hooked up. We'd still need to be conservative for virtual 
> calls when we don't have the necessary type info (when 
> -fwhole-program-vtables not enabled or we don't have whole program visibility 
> on the vtable defs), or for non-virtual indirect calls.

Agreed, it's an engineering problem more than anything. I ran an optimistic 
build (same revisions as before, release + noinline) where indirect and virtual 
calls were assumed to always propagate (thinlto_prop_optimistic) and the effect 
in Clang self-build at least is not too large:

thinlto_base/

  "dwarfehprepare.NumCleanupLandingPadsRemaining": 217515,
  "dwarfehprepare.NumNoUnwind": 299126,
  "dwarfehprepare.NumUnwind": 332785,

thinlto_prop/

  "dwarfehprepare.NumCleanupLandingPadsRemaining": 158372,
  "dwarfehprepare.NumNoUnwind": 420918,
  "dwarfehprepare.NumUnwind": 210870,

thinlto_prop_optimistic/

  "dwarfehprepare.NumCleanupLandingPadsRemaining": 154958,
  "dwarfehprepare.NumNoUnwind": 425893,
  "dwarfehprepare.NumUnwind": 205889,

(425893-420918)/(420918-299126) = 4% boost over being conservative and correct. 
This may change in real workloads though so I added a 
`thinlto-funcattrs-optimistic-indirect` flag for easy measurement.



================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:383
+  //   occur in a couple of circumstances:
+  //      a. An internal function gets imported due to its caller getting
+  //      imported, it becomes AvailableExternally
----------------
tejohnson wrote:
> I'm not sure how this case could be happening as we haven't actually done the 
> importing that would create these new available externally copies yet - that 
> happens in the LTO backends, during the thin link we just add them to import 
> lists.
I added the test funcattrs-prop-exported-internal.ll that demonstrates this. 
The summary has its internal linkage converted to external in 
[thinLTOResolvePrevailingInIndex](https://github.com/llvm/llvm-project/blob/92ce6db/llvm/lib/LTO/LTO.cpp#L436)
 then converted to AvailableExternally in 
[thinLTOResolvePrevailingGUID](https://github.com/llvm/llvm-project/blob/92ce6db/llvm/lib/LTO/LTO.cpp#L370).
 Currently being handled conservatively since a prevailing copy doesn't exist.


================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:388
+  //         propagating on its caller.
+  //      b. C++11 [temp.explicit]p10 can generate AvailableExternally
+  //      definitions of explicitly instanced template declarations
----------------
tejohnson wrote:
> There is no prevailing copy presumably because the prevailing copy is in a 
> native library being linked? I think these cases can be handled 
> conservatively.
Yeah the prevailing copy is in the native binary.

This is a [C++ specific 
feature](https://github.com/llvm/llvm-project/blob/92ce6db/clang/lib/AST/ASTContext.cpp#L10755)
 which has ODR and these are already being propagated/inlined from in pre-link. 
The current thinlink propagation implementation is conservative because a 
prevailing copy doesn't exist. Currently being handled conservatively since a 
prevailing copy doesn't exist.


================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:419
+    // Virtual calls are unknown so go conservative
+    if (!FS || FS->getTypeIdInfo())
+      return CachedAttributes[VI];
----------------
tejohnson wrote:
> I think this checking for virtual calls will only work if 
> -fwhole-program-vtables is enabled for whole program devirtualization or CFI. 
> Otherwise we don't have the type tests that cause this to get populated. This 
> also won't detect non-virtual indirect calls.
I see. I added `hasUnknownCall` as an explicit flag for all indirect calls that 
should capture both cases.


================
Comment at: llvm/lib/Transforms/IPO/FunctionAttrs.cpp:443
+          // we'll keep the last one seen.
+          ODR = FS;
+        } else if (!Prevailing && !ODR) {
----------------
tejohnson wrote:
> modimo wrote:
> > tejohnson wrote:
> > > Won't we still have a copy marked prevailing? Wondering if the weak 
> > > linkage cases can all be merged.
> > Yeah, there will still be a copy that's prevailing. Reading through the 
> > linkage descriptions again and also those in 
> > `FunctionImportGlobalProcessing::getLinkage`:
> > 
> > 1. I think with External/WeakODR/LinkOnceODR once the prevailing is found 
> > use that copy
> > 
> > 2. For Weak/LinkOnce even with a prevailing copy I don't know if the copy 
> > ultimately used will be prevailing. I'm wondering if a native definition 
> > could be the victor in which case we just can't propagate off these 
> > functions. 
> > 
> > WDYT about (2)? For C++ at least these don't seem to really exist and 
> > testing with Clang self-build I'm not seeing this kick in anywhere. I added 
> > a flag to specifically disable this case so it's easy to test out the 
> > differences.
> Since the linker which invokes this should have been passed all objects to 
> link, bitcode and native, it can do symbol resolution across all of them. So 
> if there is an overriding native strong symbol, it should see that and the 
> bitcode resolution would be non-prevailing and all bitcode copies marked dead 
> (in computeDeadSymbols). So I think the weak any and linkonce any case can 
> take the prevailing copy.
That makes it much easier and everything folds into the prevailing case! 
Changed and added a test for it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D36850/new/

https://reviews.llvm.org/D36850

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D36850: [ThinLTO] Add norecurse function attribute propagation

Reply via email to