Avoids generating TCG code to call guest code tracing events in vCPUs that are not dynamically tracing that event.
Currently, events with the 'tcg' property always generate TCG code to trace that event at guest code execution time, when their dynamic tracing state is checked. This series adds a performance optimization where TCG code for events with the 'tcg' and 'vcpu' properties is not generated if the event is dynamically disabled. This optimization raises two issues: * An event can be dynamically disabled/enabled after the corresponding TCG code has been generated (i.e., a new TB with the corresponding code should be used). * Each vCPU can have a different dynamic state for the same event (i.e., tracing the memory accesses of only one process pinned to a vCPU). To handle both issues, this series replicates the shared physical TB cache, creating a separate physical TB cache for every combination of event states (those with the 'vcpu' and 'tcg' properties). Then, all vCPUs tracing the same events will use the same physical TB cache. Sharing physical TBs makes this very space efficient (only the physical TB caches, simple arrays of pointers, are replicated), sharing physical TB caches maximizes TB reuse across vCPUs whenever possible, and makes dynamic event state changes more efficient (simply use a different TB array). The physical TB cache array is indexed with the vCPU's trace event state bitmask. This is simpler and more efficient than emitting TCG code to check if an event needs tracing; then we should still move the tracing call code to either a cold path (making tracing performance worse), or leave it inlined (making non-tracing performance worse). It is also more efficient than eliding TCG code only when *zero* vCPUs are tracing an event, since enabling it on a single vCPU will impact the performance of all other vCPUs that are not tracing that event. Signed-off-by: Lluís Vilanova <vilan...@ac.upc.edu> --- Changes in v2 ============= * Fix bitmap copy in cpu_tb_cache_set_apply(). * Split generated code re-alignment into a separate patch [Daniel P. Berrange]. Lluís Vilanova (5): exec: [tcg] Refactor flush of per-CPU virtual TB cache exec: [tcg] Use multiple physical TB caches exec: [tcg] Switch physical TB cache based on vCPU tracing state trace: [tcg] Do not generate TCG code to trace dinamically-disabled events trace: [tcg,trivial] Re-align generated code cpu-exec.c | 11 ++++ cputlb.c | 2 - include/exec/exec-all.h | 12 ++++ include/exec/tb-context.h | 2 - include/qom/cpu.h | 4 + qom/cpu.c | 1 scripts/tracetool/backend/dtrace.py | 2 - scripts/tracetool/backend/ftrace.py | 20 ++++--- scripts/tracetool/backend/log.py | 16 +++--- scripts/tracetool/backend/simple.py | 2 - scripts/tracetool/backend/syslog.py | 6 +- scripts/tracetool/backend/ust.py | 2 - scripts/tracetool/format/h.py | 23 ++++++-- scripts/tracetool/format/tcg_h.py | 20 ++++++- scripts/tracetool/format/tcg_helper_c.py | 3 + trace/control-target.c | 2 + trace/control.h | 3 + translate-all.c | 83 ++++++++++++++++++++++++++---- translate-all.h | 43 ++++++++++++++++ translate-all.inc.h | 13 +++++ 20 files changed, 221 insertions(+), 49 deletions(-) create mode 100644 translate-all.inc.h To: qemu-devel@nongnu.org Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Eduardo Habkost <ehabk...@redhat.com> Cc: Eric Blake <ebl...@redhat.com>