Re: [PATCH] plugins: optimize cpu_index code generation

2024-11-28 Thread Pierrick Bouvier
On 11/27/24 09:53, Richard Henderson wrote: On 11/26/24 13:02, Pierrick Bouvier wrote: @@ -266,6 +266,19 @@ static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused) assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX); qemu_rec_mutex_lock(&plugin.lock); + +

Re: [PATCH] plugins: optimize cpu_index code generation

2024-11-27 Thread Pierrick Bouvier
On 11/27/24 10:27, Richard Henderson wrote: On 11/27/24 11:57, Pierrick Bouvier wrote: I noticed that it was redundant (for user-mode at least), but it seemed too implicit to rely on this. As well, I didn't observe such a flush in system-mode, does it work the same as user-mode (regarding the

Re: [PATCH] plugins: optimize cpu_index code generation

2024-11-27 Thread Richard Henderson
On 11/27/24 11:57, Pierrick Bouvier wrote: I noticed that it was redundant (for user-mode at least), but it seemed too implicit to rely on this. As well, I didn't observe such a flush in system-mode, does it work the same as user-mode (regarding the CF_PARALLEL flag)? Yes, we set CF_PARALLEL f

Re: [PATCH] plugins: optimize cpu_index code generation

2024-11-27 Thread Pierrick Bouvier
Hi Richard, On 11/27/24 09:53, Richard Henderson wrote: On 11/26/24 13:02, Pierrick Bouvier wrote: @@ -266,6 +266,19 @@ static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused) assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX); qemu_rec_mutex_lock(&plugin

Re: [PATCH] plugins: optimize cpu_index code generation

2024-11-27 Thread Richard Henderson
On 11/26/24 13:02, Pierrick Bouvier wrote: @@ -266,6 +266,19 @@ static void qemu_plugin_vcpu_init__async(CPUState *cpu, run_on_cpu_data unused) assert(cpu->cpu_index != UNASSIGNED_CPU_INDEX); qemu_rec_mutex_lock(&plugin.lock); + +/* + * We want to flush tb when a second c

Re: [PATCH] plugins: optimize cpu_index code generation

2024-11-26 Thread Pierrick Bouvier
On 11/26/24 11:02, Pierrick Bouvier wrote: When running with a single vcpu, we can return a constant instead of a load when accessing cpu_index. A side effect is that all tcg operations using it are optimized, most notably scoreboard access. When running a simple loop in user-mode, the speedup is

[PATCH] plugins: optimize cpu_index code generation

2024-11-26 Thread Pierrick Bouvier
When running with a single vcpu, we can return a constant instead of a load when accessing cpu_index. A side effect is that all tcg operations using it are optimized, most notably scoreboard access. When running a simple loop in user-mode, the speedup is around 20%. Signed-off-by: Pierrick Bouvier