https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96475
Bug ID: 96475 Summary: direct threaded interpreter with computed gotos generates suboptimal dispatch loop Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: npiggin at gmail dot com CC: segher at gcc dot gnu.org Target Milestone: --- Target: powerpc64le-linux-gnu Created attachment 48999 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48999&action=edit test case The attached test case code generation with -O2 for run_program_goto generates a central indirect branch dispatch to handlers that branch back to the central dispatcher. Direct threaded code with indirect branches between handlers is faster on a POWER9 when there are no branch mispredictions due to fewer branches, and it should generally do better with branch prediction when there is an indirect branch from each handler.