Hi drew,Thanks for your reply. I agree this patch remains insufficient. Regardless, Guest Interrupt Files represent a new and critical resource type that upper layers currently lack awareness of, and significant work is still needed to fully integrate them.
> From: "Andrew Jones"<ajo...@ventanamicro.com> > Date: Thu, Jul 17, 2025, 16:02 > Subject: Re: [PATCH] target/riscv/kvm: Introduce simple handler for VS-file > allocation failure > To: "BillXiang"<xiangwench...@lanxincomputing.com> > Cc: <pal...@dabbelt.com>, <alistair.fran...@wdc.com>, <liwei1...@gmail.com>, > <dbarb...@ventanamicro.com>, <zhiwei_...@linux.alibaba.com>, > <qemu-ri...@nongnu.org>, <qemu-devel@nongnu.org> > On Wed, Jul 16, 2025 at 03:47:37PM +0800, BillXiang wrote: > > Consider a system with 8 harts, where each hart supports 5 > > Guest Interrupt Files (GIFs), yielding 40 total GIFs. > > If we launch a QEMU guest with over 5 vCPUs using > > "-M virt,aia='aplic-imsic' -accel kvm,riscv-aia=hwaccel" – which > > relies solely on VS-files (not SW-files) for higher performance – the > > guest requires more than 5 GIFs. However, the current Linux scheduler > > lacks GIF awareness, potentially scheduling >5 vCPUs to a single hart. > > This triggers VS-file allocation failure, and since no handler exists > > for this error, the QEMU guest becomes corrupted. > > What do you mean by "become corrupted"? Shouldn't the VM just stop after > the vcpu dumps register state? > > > > > To address this, we introduce this simple handler by rescheduling vCPU > > to alternative harts when VS-file allocation fails on the current hart. > > > > Signed-off-by: BillXiang <xiangwench...@lanxincomputing.com> > > --- > > target/riscv/kvm/kvm-cpu.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c > > index 5c19062c19..7cf258604f 100644 > > --- a/target/riscv/kvm/kvm-cpu.c > > +++ b/target/riscv/kvm/kvm-cpu.c > > @@ -1706,6 +1706,9 @@ static bool kvm_riscv_handle_debug(CPUState *cs) > > int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) > > { > > int ret = 0; > > + uint64_t code; > > + cpu_set_t set; > > + long cpus; > > switch (run->exit_reason) { > > case KVM_EXIT_RISCV_SBI: > > ret = kvm_riscv_handle_sbi(cs, run); > > @@ -1718,6 +1721,18 @@ int kvm_arch_handle_exit(CPUState *cs, struct > > kvm_run *run) > > ret = EXCP_DEBUG; > > } > > break; > > + case KVM_EXIT_FAIL_ENTRY: > > + code = run->fail_entry.hardware_entry_failure_reason; > > + if (code == CSR_HSTATUS) { > > + // Schedule vcpu to next hart upon VS-file > > + // allocation failure on current hart. > > + cpus = sysconf(_SC_NPROCESSORS_ONLN); > > + CPU_ZERO(&set); > > + CPU_SET((run->fail_entry.cpu+1)%cpus, &set); > > + ret = sched_setaffinity(0, sizeof(set), &set); > > If other guests have already consumed all the VS-files on the selected > hart then this will fail again and the next hart will be tried and if all > VS-files of the system are already consumed then we'll just go around and > around. > > Other than that problem, this isn't the right approach because QEMU should > not be pinning vcpus - that's a higher level virt management layer's job > since it's a policy. > > A better solution to this is to teach KVM to track free VS-files and then > migrate (but not pin) vcpus to harts with free VS-files, rather than > immediately fail. > > But, if all guests are configured to only use VS-files, then upper layers > of the virt stack will still need to be aware that they can never schedule > more vcpus than supported by the number of total VS-files. And, if upper > layers are already involved in the scheduling, then pinning is also an > option to avoid this problem. Indeed pinning is better for the failure > case of over scheduling, since over scheduling with the KVM vcpu migration > approach can result in a VM launched earlier to be killed, whereas with > the upper layer pinning approach, the last guest launched will fail before > it runs. > > Thanks, > drew > > > + break; > > + } > > + /* FALLTHRU */ > > default: > > qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n", > > __func__, run->exit_reason); > > -- > > 2.46.2.windows.1 > > >