I've collected some data running X86_64 SPECINT2006 guest on qemu-system-x86_64. Indirect branches and calls are responsible for an average 16.49% of all the code cache exits on the measured workloads with the highest of 33.2% in 464.h264ref.
Every code cache exit is followed by a TB lookup and code cache re-enter which sum to a non-trivial amount of instructions. https://docs.google.com/spreadsheets/d/1sR7XFpVn4qCAJuU4oTOMIezvEo1WTE7riRPhT6xxUtg/edit?usp=sharing Thanks, Xin On Thu, Sep 4, 2014 at 10:36 AM, Xin Tong <trent.t...@gmail.com> wrote: > Hi > > I would like to implement a well known indirect branch optimization named > Polymorphic Inline Caching (PIC) in QEMU. PIC relies on software > speculation on the likely target of the indirect branch to speed up its > dispatch. > > Currently, QEMU generates a EOB (end of block) after indirect branches and > relies on the runtime to find the next TB. This results in code cache > exit/re-entry and TB lookup which can take up a non-trivial amount of time. > > PIC mitigates this by using compares and jumps for a few most likely > targets to reduce the # of code cache exits as well as TB lookups. An > example of PIC is shown below. > > *without PIC for indirect branch* > update IA > goto code-cache-epilogue; > lookup TB; > goto code-cache-prologue; > > *with PIC for indirect branch:* > update IA; > compare IA with likely target-#1; > jump to TB-target-#1 if match; > compare IA with likely target-#2; > jump to TB-target-#2 if match; > compare IA with likely target-#3; > jump to TB-target-#3 if match; > goto code-cache-epilogue; > lookup TB; > goto code-cache-prologue; > > I think target-X/translation.c as well as tcg/X/ need to be changed here. > And a new TCG opc needs to be added. > > Any comments on how to get started with this are appreciated. > > Thanks, > Xin > > >