minglotus-6 wrote:

> > > The work sounds interesting. Can you provide a bit more context about it? 
> > > Will it be used to improve ICP when it's sufficient to just compare the 
> > > vtable address instead of the vfunc address?
> > 
> > 
> > yes -- it can not only eliminate vtable load, but also enable target check 
> > combining.
> > What is more important is that it can be combined with more aggressive 
> > interprocedural type propagation that enables full (unconditional) 
> > devirtualization. Example:
> > base->foo(); base->bar(); ==> if (base->vptr == Derived) { 
> > Derived::foo(base); // base type is known so virtual calls in foo,bar can 
> > further be devirtualized. Derived::bar(base); } else {.. }
> 
> Thanks for the illustration! Have you enabled this in your fleet, and how 
> much performance improvement have you seen?
> 
> We've been also thinking about similar work based on sample PGO, in both the 
> compiler and bolt. cc @WenleiHe

I tested a prototype (using the simplest heuristic to do vtable comparison only 
if the distribution of vtable is the same as vfunc distribution) on one 
internal workloads. It shows a statistically significant +0.26% qps improvement 
on one search workload, gcu reductions on two other workloads and mostly 
neutral for a database. The numbers are initial without tuning (e.g., what 
about do vtable comparison if there are two vtable values and one function, etc)

https://github.com/llvm/llvm-project/pull/66825
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to