We are not going to create a new devirtualization framework from scratch, just hope it to be an enhancement on current speculative devirtualization. The process does not need parse native code in library, but only resort to existing lightweight symbol resolution by LTO-prelinker. And C++ virtual dispatching is expected to be translated to gimple IR from C++ source, if user attempts to hand-craft those using embedded ASMs, it should be considered as an UB to C++ ABI.
Compile time of whole-program analysis is not that terrible as you think, basically, it is realistically acceptable even base code is very large. As I know, google enables WPD in building of chrome, while it is based on llvm. Thanks, Feng ________________________________________ From: Basile Starynkevitch <bas...@starynkevitch.net> Sent: Friday, August 20, 2021 8:36 PM To: Feng Xue OS Cc: basile.starynkevi...@cea.fr; gcc@gcc.gnu.org Subject: GCC [RFC] Whole Program Devirtualization Hello Feng Xue OS Your project is interesting, but ambitious. I think the major points are: whole program analysis. Static analysis tools like https://frama-c.com/ or https://github.com/bstarynk/bismon/ could be relevant. Projects like https://www.decoder-project.eu/ could be relevant. With cross-compilation, things are becoming harder. abstract interpretation might be relevant (but difficult and costly to implement). See wikipedia. size of the whole program which is analyzed. If the entire program (including system libraries like libc) has e.g. less than ten thousand routines and less than a million GIMPLE instructions in total, it make sense. But if the entire program is as large as the Linux kernel, or the GCC compiler, or the Firefox browser (all have many millions lines of source code) you probably won't be able to do whole program devirtualization in a few years of human work. computed gotos or labels as values (see https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html for more) are making this difficult. But they do exist, and probably could be hidden in GNU glibc or libstdc++ internal code. asm statements are difficult. They usually appear inside your libc. How would you deal with them? Can you afford a month of computer time to compile a large software with your whole program devirtualizer? In most cases, not, but Pitrat's book Artificial Beings - the conscience of a conscious machine (ISBN 9781848211018) suggest cases where it might make sense (he is explaining a "compiler like system" which runs for a month of CPU time). My recommendation would be to code first a simple GCC plugin as a proof of concept thing, which reject programs which could not be realistically devirtualized, and store somewhere (in some database perhaps) a representation of them otherwise. I worked 3 years full time on https://github.com/bstarynk/bismon/ to achieve a similar goal (and I don't claim to have succeeded, and I don't have any more funding). My guess is that some code could be useful to you (then contact me by email both at work basile.starynkevi...@cea.fr<mailto:basile.starynkevi...@cea.fr> and at home bas...@starynkevitch.net<mailto:bas...@starynkevitch.net> ....) The most important thing: limit your ambition at first. Write a document (at least an internal one) stating what you won't do. Cheers -- Basile Starynkevitch <bas...@starynkevitch.net><mailto:bas...@starynkevitch.net> (only mine opinions / les opinions sont miennes uniquement) 92340 Bourg-la-Reine, France web page: starynkevitch.net/Basile/