I've collected some data running X86_64 SPECINT2006 guest on
qemu-system-x86_64.  Indirect branches and calls are responsible for an
average 16.49% of all the code cache exits on the measured workloads with
the highest of 33.2% in 464.h264ref.

Every code cache exit is followed by a TB lookup and code cache re-enter
which sum to a non-trivial amount of instructions.

https://docs.google.com/spreadsheets/d/1sR7XFpVn4qCAJuU4oTOMIezvEo1WTE7riRPhT6xxUtg/edit?usp=sharing

Thanks,
Xin


On Thu, Sep 4, 2014 at 10:36 AM, Xin Tong <trent.t...@gmail.com> wrote:

> Hi
>
> I would like to implement a well known indirect branch optimization named
> Polymorphic Inline Caching (PIC) in QEMU. PIC relies on software
> speculation on the likely target of the indirect branch to speed up its
> dispatch.
>
> Currently, QEMU generates a EOB (end of block) after indirect branches and
> relies on the runtime to find the next TB. This results in code cache
> exit/re-entry and TB lookup which can take up a non-trivial amount of time.
>
> PIC mitigates this by using compares and jumps for a few most likely
> targets to reduce the # of code cache exits as well as TB lookups. An
> example of PIC is shown below.
>
> *without PIC for indirect branch*
> update IA
> goto code-cache-epilogue;
> lookup TB;
> goto code-cache-prologue;
>
> *with PIC for indirect branch:*
> update IA;
> compare IA with likely target-#1;
> jump to TB-target-#1 if match;
> compare IA with likely target-#2;
> jump to TB-target-#2 if match;
> compare IA with likely target-#3;
> jump to TB-target-#3 if match;
> goto code-cache-epilogue;
> lookup TB;
> goto code-cache-prologue;
>
> I think target-X/translation.c as well as tcg/X/ need to be changed here.
> And a new TCG opc needs to be added.
>
> Any comments on how to get started with this are appreciated.
>
> Thanks,
> Xin
>
>
>

Reply via email to