On 05/16/2017 02:55 PM, Laurent Vivier wrote: > On 16/05/2017 14:50, Cédric Le Goater wrote: >> On 05/16/2017 02:03 PM, Laurent Vivier wrote: >>> On 26/04/2017 09:00, David Gibson wrote: >>>> From: Cédric Le Goater <c...@kaod.org> >>>> >>>> Today, all the ICPs are created before the CPUs, stored in an array >>>> under the sPAPR machine and linked to the CPU when the core threads >>>> are realized. This modeling brings some complexity when a lookup in >>>> the array is required and it can be simplified by allocating the ICPs >>>> when the CPUs are. >>>> >>>> This is the purpose of this proposal which introduces a new 'icp_type' >>>> field under the machine and creates the ICP objects of the right type >>>> (KVM or not) before the PowerPCCPU object are. >>>> >>>> This change allows more cleanups : the removal of the icps array under >>>> the sPAPR machine and the removal of the xics_get_cpu_index_by_dt_id() >>>> helper. >>>> >>>> Signed-off-by: Cédric Le Goater <c...@kaod.org> >>>> Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> >>>> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> >>>> --- >>>> hw/intc/xics.c | 11 ----------- >>>> hw/ppc/spapr.c | 47 >>>> ++++++++++++++--------------------------------- >>>> hw/ppc/spapr_cpu_core.c | 18 ++++++++++++++---- >>>> include/hw/ppc/spapr.h | 2 +- >>>> include/hw/ppc/xics.h | 2 -- >>>> 5 files changed, 29 insertions(+), 51 deletions(-) >>>> >>> >>> This commit breaks CPU re-hotplugging with KVM >>> >>> the sequence "device_add, device_del, device_add" brings to the >>> following error message: >>> >>> Unable to connect CPUx to kernel XICS: Device or resource busy >>> >>> It comes from icp_kvm_cpu_setup(): >>> >>> ... >>> ret = kvm_vcpu_enable_cap(cs, KVM_CAP_IRQ_XICS, 0, kernel_xics_fd, >>> kvm_arch_vcpu_id(cs)); >>> if (ret < 0) { >>> error_report("Unable to connect CPU%ld to kernel XICS: %s", >>> kvm_arch_vcpu_id(cs), strerror(errno)); >>> exit(1); >>> } >>> .. >>> >>> It should be protected by cap_irq_xics_enabled: >>> >>> ... >>> /* >>> * If we are reusing a parked vCPU fd corresponding to the CPU >>> * which was hot-removed earlier we don't have to renable >>> * KVM_CAP_IRQ_XICS capability again. >>> */ >>> if (icp->cap_irq_xics_enabled) { >>> return; >>> } >>> >>> ... >>> ret = kvm_vcpu_enable_cap(...); >>> ... >>> icp->cap_irq_xics_enabled = true; >>> ... >>> >>> But since this commit, "icp" is a new object on each call: >>> >>> spapr_cpu_core_realize_child() >>> ... >>> obj = object_new(spapr->icp_type); >>> ... >>> xics_cpu_setup(XICS_FABRIC(spapr), cpu, ICP(obj)); >>> ... >>> icpc->cpu_setup(icp, cpu); -> icp_kvm_cpu_setup() >>> ... >>> ... >>> >>> and "cap_irq_xics_enabled" is reinitialized. >>> >>> Any idea how to fix that? >> >> it seems that a cleanup is not done in the kernel. We are missing >> a way to call kvmppc_xics_free_icp() from QEMU. Today the only >> way is to destroy the vcpu. > > The commit introducing this hack, for reference: > > commit a45863bda90daa8ec39e5a312b9734fd4665b016 > Author: Bharata B Rao <bhar...@linux.vnet.ibm.com> > Date: Thu Jul 2 16:23:20 2015 +1000 > > xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled > > When supporting CPU hot removal by parking the vCPU fd and reusing > it during hotplug again, there can be cases where we try to reenable > KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled. > Introduce a boolean member in ICPState to track this and don't > reenable the CAP if it was already enabled earlier. > > Re-enabling this CAP should ideally work, but currently it results in > kernel trying to create and associate ICP with this vCPU and that > fails since there is already an ICP associated with it. Hence this > patch is needed to work around this problem in the kernel. > > This change allows CPU hot removal to work for sPAPR. > > Signed-off-by: Bharata B Rao <bhar...@linux.vnet.ibm.com> > Reviewed-by: David Gibson <da...@gibson.dropbear.id.au> > Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> > Signed-off-by: Alexander Graf <ag...@suse.de>
OK. Greg is looking at re-adding the ICPState array because of a migration issue with older machines. We might need to do so unconditionally ... But for that specific issue, I think it would have been better to clean up the kernel state. Is that possible ? Thanks, C. >> Else we need to reintroduce the array of icps (again) to keep some >> xics state ... but that just sucks :/ Let me think about it. >> > > Thanks, > Laurent >> C. >> >