Reviewed-by: Glenn Miles <mil...@linux.ibm.com>
On Mon, 2025-05-12 at 13:10 +1000, Nicholas Piggin wrote:
> From: Glenn Miles <mil...@linux.ibm.com>
>
> The current xive algorithm for finding a matching group vCPU
> target always uses the first vCPU found. And, since it always
> starts the search with thread 0 of a core, thread 0 is almost
> always used to handle group interrupts. This can lead to additional
> interrupt latency and poor performance for interrupt intensive
> work loads.
>
> Changing this to use a simple round-robin algorithm for deciding which
> thread number to use when starting a search, which leads to a more
> distributed use of threads for handling group interrupts.
>
> [npiggin: Also round-robin among threads, not just cores]
> Signed-off-by: Glenn Miles <mil...@linux.ibm.com>
> ---
> hw/intc/pnv_xive2.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
> index 72cdf0f20c..d7ca97ecbb 100644
> --- a/hw/intc/pnv_xive2.c
> +++ b/hw/intc/pnv_xive2.c
> @@ -643,13 +643,18 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr,
> uint8_t format,
> int i, j;
> bool gen1_tima_os =
> xive->cq_regs[CQ_XIVE_CFG >> 3] & CQ_XIVE_CFG_GEN1_TIMA_OS;
> + static int next_start_core;
> + static int next_start_thread;
> + int start_core = next_start_core;
> + int start_thread = next_start_thread;
>
> for (i = 0; i < chip->nr_cores; i++) {
> - PnvCore *pc = chip->cores[i];
> + PnvCore *pc = chip->cores[(i + start_core) % chip->nr_cores];
> CPUCore *cc = CPU_CORE(pc);
>
> for (j = 0; j < cc->nr_threads; j++) {
> - PowerPCCPU *cpu = pc->threads[j];
> + /* Start search for match with different thread each call */
> + PowerPCCPU *cpu = pc->threads[(j + start_thread) %
> cc->nr_threads];
> XiveTCTX *tctx;
> int ring;
>
> @@ -694,6 +699,15 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr,
> uint8_t format,
> if (!match->tctx) {
> match->ring = ring;
> match->tctx = tctx;
> +
> + next_start_thread = j + start_thread + 1;
> + if (next_start_thread >= cc->nr_threads) {
> + next_start_thread = 0;
> + next_start_core = i + start_core + 1;
> + if (next_start_core >= chip->nr_cores) {
> + next_start_core = 0;
> + }
> + }
> }
> count++;
> }