We are not going to create a new devirtualization framework from
scratch, just hope it to be an enhancement on current speculative
devirtualization. The process does not need parse native code in
library, but only resort to existing lightweight symbol resolution
by LTO-prelinker. And C++ virtual dispatching is expected to be
translated to gimple IR from C++ source, if user attempts to
hand-craft those using embedded ASMs, it should be considered as an
UB to C++ ABI.

Compile time of whole-program analysis is not that terrible as you
think, basically, it is realistically acceptable even base code is
very large. As I know, google enables WPD in building of chrome,
while it is based on llvm.

Thanks,
Feng

________________________________________
From: Basile Starynkevitch <bas...@starynkevitch.net>
Sent: Friday, August 20, 2021 8:36 PM
To: Feng Xue OS
Cc: basile.starynkevi...@cea.fr; gcc@gcc.gnu.org
Subject: GCC [RFC] Whole Program Devirtualization

Hello Feng Xue OS


Your project is interesting, but ambitious.

I think the major points are:

whole program analysis. Static analysis tools like https://frama-c.com/ or 
https://github.com/bstarynk/bismon/ could be relevant. Projects like 
https://www.decoder-project.eu/ could be relevant. With cross-compilation, 
things are becoming harder.

abstract interpretation might be relevant (but difficult and costly to 
implement). See wikipedia.

size of the whole program which is analyzed.  If the entire program (including 
system libraries like libc) has e.g. less than ten thousand routines and less 
than a million GIMPLE instructions in total, it make sense. But if the entire 
program is as large as the Linux kernel, or the GCC compiler, or the Firefox 
browser (all have many millions lines of source code) you probably won't be 
able to do whole program devirtualization in a few years of human work.


computed gotos or labels as values (see 
https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html for more) are making 
this difficult. But they do exist, and probably could be hidden in GNU glibc or 
libstdc++ internal code.

asm statements are difficult. They usually appear inside your libc. How would 
you deal with them?

Can you afford a month of computer time to compile a large software with your 
whole program devirtualizer? In most cases, not, but Pitrat's book Artificial 
Beings - the conscience of a conscious machine (ISBN 9781848211018) suggest 
cases where it might make sense (he is explaining a "compiler like system" 
which runs for a month of CPU time).

My recommendation would be to code first a simple GCC plugin as a proof of 
concept thing, which reject programs which could not be realistically 
devirtualized, and store somewhere (in some database perhaps) a representation 
of them otherwise. I worked 3 years full time on 
https://github.com/bstarynk/bismon/ to achieve a similar goal (and I don't 
claim to have succeeded, and I don't have any more funding). My guess is that 
some code could be useful to you (then contact me by email both at work 
basile.starynkevi...@cea.fr<mailto:basile.starynkevi...@cea.fr> and at home 
bas...@starynkevitch.net<mailto:bas...@starynkevitch.net> ....)

The most important thing: limit your ambition at first. Write a document (at 
least an internal one) stating what you won't do.


Cheers

--
Basile Starynkevitch                  
<bas...@starynkevitch.net><mailto:bas...@starynkevitch.net>
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/


Reply via email to