On 11.01.2021 09:45, Paul Durrant wrote: >> -----Original Message----- >> From: Igor Druzhinin <igor.druzhi...@citrix.com> >> Sent: 09 January 2021 00:56 >> To: p...@xen.org; xen-devel@lists.xenproject.org >> Cc: w...@xen.org; i...@xenproject.org; anthony.per...@citrix.com; >> andrew.coop...@citrix.com; >> george.dun...@citrix.com; jbeul...@suse.com; jul...@xen.org; >> sstabell...@kernel.org; >> roger....@citrix.com >> Subject: Re: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per >> partition >> >> On 08/01/2021 08:32, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Igor Druzhinin <igor.druzhi...@citrix.com> >>>> Sent: 08 January 2021 00:47 >>>> To: xen-devel@lists.xenproject.org >>>> Cc: p...@xen.org; w...@xen.org; i...@xenproject.org; >>>> anthony.per...@citrix.com; >>>> andrew.coop...@citrix.com; george.dun...@citrix.com; jbeul...@suse.com; >>>> jul...@xen.org; >>>> sstabell...@kernel.org; roger....@citrix.com; Igor Druzhinin >>>> <igor.druzhi...@citrix.com> >>>> Subject: [PATCH 1/2] viridian: remove implicit limit of 64 VPs per >>>> partition >>>> >>>> TLFS 7.8.1 stipulates that "a virtual processor index must be less than >>>> the maximum number of virtual processors per partition" that "can be >>>> obtained >>>> through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing >>>> the Microsoft Hypervisor Interface" defines that starting from Windows >>>> Server >>>> 2012, which allowed more than 64 CPUs to be brought up, this leaf can now >>>> contain a value -1 basically assuming the hypervisor has no restriction >>>> while >>>> 0 (that we currently expose) means the default restriction is still >>>> present. >>>> >>>> Along with previous changes exposing ExProcessorMasks this allows a recent >>>> Windows VM with Viridian extension enabled to have more than 64 vCPUs >>>> without >>>> going into immediate BSOD. >>>> >>> >>> This is very odd as I was happily testing with a 128 vCPU VM once >>> ExProcessorMasks was >> implemented... no need for the extra leaf. >>> The documentation for 0x40000005 states " Describes the scale limits >>> supported in the current >> hypervisor implementation. If any >>> value is zero, the hypervisor does not expose the corresponding >>> information". It does not say (in >> section 7.8.1 or elsewhere AFAICT) >>> what implications that has for VP_INDEX. >>> >>> In " Requirements for Implementing the Microsoft Hypervisor Interface" I >>> don't see anything saying >> what the semantics of not >>> implementing leaf 0x40000005 are, only that if implementing it then -1 must >>> be used to break the 64 >> VP limit. It also says that >>> reporting -1 in 0x40000005 means that 40000004.EAX bits 1 and 2 *must* be >>> clear, which is clearly >> not true if ExProcessorMasks is >>> enabled. That document is dated June 13th 2012 so I think it should be >>> taken with a pinch of salt. >>> >>> Have you actually observed a BSOD with >64 vCPUs when ExProcessorMasks is >>> enabled? If so, which >> version of Windows? I'd like to get >>> a repro myself. >> >> I return with more testing that confirm both my and your results. >> >> 1) with ExProcessorMask and 66 vCPUs assigned both current WS19 build and >> pre-release 20270 of vNext boot correctly with no changes >> >> and that would be fine but the existing documentation for ex_processor_masks >> states that: >> "Hence this enlightenment must be specified for guests with more >> than 64 vCPUs if B<hcall_remote_tlb_flush> and/or B<hcall_ipi> are also >> specified." >> >> You then would expect 64+ vCPU VM to boot without ex_processors_mask, >> hcall_remote_tlb_flush and hcall_ipi. > > Indeed. > >> >> So, >> 2) without ExProcessorMaks and 66 vCPUs assigned and with >> hcall_remote_tlb_flush >> and hcall_ipi disabled: WS19 BSODs and vNext fails to initialize secondary >> CPUs >> > > This is not what I recall from testing but I can confirm I see the same now. > >> After applying my change, >> 3) without ExProcessorMaks and 66 vCPUs assigned and with >> hcall_remote_tlb_flush >> and hcall_ipi disabled WS19 and vNext boot correctly >> >> So we either need to change documentation and require ExProcessorMasks for >> all >> VMs with 64+ vCPUs or go with my change. >> > > I think we go with your change. I guess we can conclude then that the > separate viridian flag as part of the base set is the right way to go (to > stop the leaf magically appearing on migrate of existing guests) and leave > ex_processor_masks (and the documentation) as-is. > > You can add my R-b to the patch.
That's the unchanged patch then, including the libxl change that I had asked about and that I have to admit I don't fully follow Igor's responses? I'm hesitant to give an ack for that aspect of the change, yet I suppose the libxl maintainers will defer to x86 ones there. Alternatively Andrew or Roger could of course ack this ... Jan