On Tue, May 31, 2016 at 05:07:41PM +0200, Igor Mammedov wrote: > On Tue, 31 May 2016 07:49:16 +0300 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > On Tue, May 17, 2016 at 04:43:07PM +0200, Igor Mammedov wrote: > > > Signed-off-by: Igor Mammedov <imamm...@redhat.com> > > > --- > > > docs/specs/acpi_cpu_hotplug.txt | 88 > > > +++++++++++++++++++++++++++++++++++------ > > > 1 file changed, 76 insertions(+), 12 deletions(-) > > > > > > diff --git a/docs/specs/acpi_cpu_hotplug.txt > > > b/docs/specs/acpi_cpu_hotplug.txt > > > index 340b751..c5bce6a 100644 > > > --- a/docs/specs/acpi_cpu_hotplug.txt > > > +++ b/docs/specs/acpi_cpu_hotplug.txt > > > @@ -4,21 +4,85 @@ QEMU<->ACPI BIOS CPU hotplug interface > > > QEMU supports CPU hotplug via ACPI. This document > > > describes the interface between QEMU and the ACPI BIOS. > > > > > > -ACPI GPE block (IO ports 0xafe0-0xafe3, byte access): > > > ------------------------------------------ > > > - > > > -Generic ACPI GPE block. Bit 2 (GPE.2) used to notify CPU > > > -hot-add/remove event to ACPI BIOS, via SCI interrupt. > > > +ACPI BIOS GPE.2 handler is dedicated for notifying OS about CPU hot-add > > > +and hot-remove events. > > > > > > +============================================ > > > +Legacy ACPI CPU hotplug interface registers: > > > +-------------------------------------------- > > > CPU present bitmap for: > > > + One bit per CPU. Bit position reflects corresponding CPU APIC ID. > > > Read-only. > > > ICH9-LPC (IO port 0x0cd8-0xcf7, 1-byte access) > > > PIIX-PM (IO port 0xaf00-0xaf1f, 1-byte access) > > > --------------------------------------------------------------- > > > -One bit per CPU. Bit position reflects corresponding CPU APIC ID. > > > -Read-only. > > > +QEMU sets corresponding CPU bit on hot-add event and issues SCI > > > +with GPE.2 event set. CPU present map read by ACPI BIOS GPE.2 handler > > > +to notify OS about CPU hot-add events. CPU hot-remove isn't supported. > > > + > > > +===================================== > > > +ACPI CPU hotplug interface registers: > > > +------------------------------------- > > > +Register block base address: > > > + ICH9-LPC IO port 0x0cd8 > > > + PIIX-PM IO port 0xaf00 > > > > OK but this means we either use legacy or new one, > > bot both, which is problematic for people using old seabios > > without acpi loading support and with -M pc. > > > > I don't say we must support them with >255 CPUs, > > but I do say we should make an effort for simple > > setups with <255 CPUs. > > > it works with QEMU provided(shipped) BIOS, > it works for migration case as legacy stays enables because of -M > src_legacy_machine > > more that 255 cpus will break old BIOS in different > ways /corrupt/hang/ depending on BIOS version. > > I'm not sure we should care about using old BIOS > with new QEMU+new machine type though and allow > expectations to be beyond what hw vendors usually set. > It's the same as real hw does i.e. new hardware > shipped with new BIOS. > If user insist on old BIOS+new machine type he still has > property knob to force legacy mode. > > on the fist glance, it's probably not that very hard > to switch IO ports handling from legacy to new interface > by sending from new cpu-hotplug AML a command to do so, > my concerns here is: > * +1 more state to migrate > * probably issues with migration as target started with > different IO layout > * IO window freed after switching from legacy to new, > will not be available to guest as it started with > legacy window consumed by CPUS.CRS. > * that legacy switching business is only PC specific > means having a knob to turn it on so it won't pollute ARM > > all in all it's probably too much headache to make sure > that improbable usecase would work, so after considering > this idea I've dropped it and did it the way it's now.
Well I think it's worth the effort. I agree it's tricky to implement but we do maintain compatibility for years. Being orthogonal with bios version is very helpful for a variety of reasons such as debugging. > > > +Register block size: > > > + ACPI_CPU_HOTPLUG_REG_LEN = 12 > > > + > > > +read access: > > > > So this implies acpi must scan all cpus on each event, and > > this seems too aggressive. > > I think we need something hierarchical where > > you read one level and know which cpus to probe. > That's what we do for mem-hotplug as it's not > performance critical path. > > In addition to that depending on guest OS/version > it will anyway do enumeration of all CPUs after > our hotplug AML method scanned all CPUs and > sent notifies. > > not that I'm in favor of complicating this protocol, > but I wouldn't do it hierarchical, > that's what Notify(BUS_CHECK) is supposed to do > but it's broken on some guests. Interesting. Which ones? Would they be easy to detect? > So if I'd do a more complicated protocol, > I'd do polling from AML side telling QEMU > if (cpu = give_me_next_cpu_with_event()) > switch_to_cpu(cpu) > ... > > that means adding extra state to migrate, > probably ther might be other issues I'm not aware of yet. > > On the other hand mem-hotplug like interface > so far hasn't caused any issues > (that of cause doesn't mean that there aren't any). > > > > + offset: > > > + [0x0-0x3] reserved > > > + [0x4] CPU device status fields: (1 byte access) > > > + bits: > > > + 0: Device is enabled and may be used by guest > > > + 1: Device insert event, used to distinguish device for which > > > + no device check event to OSPM was issued. > > > + It's valid only when bit 0 is set. > > > + 2: Device remove event, used to distinguish device for which > > > + no device eject request to OSPM was issued. > > > + 3-7: reserved and should be ignored by OSPM > > > + [0x5-0x7] reserved > > > + [0x8] Command data: (DWORD access) > > > + in case of error or unsupported command reads is 0xFFFFFFFF > > > + current 'Command field' value: > > > + 0: returns PXM value corresponding to device > > > + > > > +write access: > > > + offset: > > > + [0x0-0x3] CPU selector: (DWORD access) > > > + selects active CPU device. All following accesses to other > > > + registers will read/store data from/to selected CPU. > > > > I've been thinking - is it time to add an EmbeddedControl interface? > > This way we have another namespace. > > There were patches on list with it some time ago > > "[PATCH RFC v2 0/7] coordinate cpu hotplug/unplug bewteen QEMU and kernel > by EC" > > but we dropped it as bloatware because there weren't any > real need for it as current IO port interfaces aren't > going away any time soon and work just fine without EC. Right. But now you are removing them for >255 CPUs anyway. > And if we were to move current interfaces into EC namespace > we would have to do it only for new machines types > and keep old code in place as well and a bunch of compat > code for a company. Hmm. Exactly like this current one? > > Not insisting on it, just an idea. > I wouldn't make it a necessary dependency and block this series. > Might be worth to look at again when there is usecases for it > without legacy stuff attached. and then we'll have to maintain 3 interfaces? > > > > > + [0x4] CPU device control fields: (1 byte access) > > > + bits: > > > + 0: reserved, OSPM must clear it before writing to register. > > > + 1: if set to 1 clears device insert event, set by OSPM > > > + after it has emitted device check event for the > > > + selected CPU device > > > + 2: if set to 1 clears device remove event, set by OSPM > > > + after it has emitted device eject request for the > > > + selected CPU device > > > + 3: if set to 1 initiates device eject, set by OSPM when it > > > + triggers CPU device removal and calls _EJ0 method > > > + 4-7: reserved, OSPM must clear them before writing to > > > register > > > + [0x5] Command field: (1 byte access) > > > + value: > > > + 0: following reads from 'Command data' register returns PXM > > > + value of device > > > + 1: following writes to 'Command data' register set OST event > > > + register in QEMU > > > + 2: following writes to 'Command data' register set OST status > > > + register in QEMU > > > + other values: reserved > > > + [0x6-0x7] reserved > > > + [0x8] Command data: (DWORD access) > > > + current 'Command field' value: > > > + 1: stores value into OST event register > > > + 2: stores value into OST status register, triggers > > > + ACPI_DEVICE_OST QMP event from QEMU to external > > > applications > > > + with current values of OST event and status registers. > > > + other values: reserved > > > > > > -CPU hot-add/remove notification: > > > ------------------------------------------------------ > > > -QEMU sets/clears corresponding CPU bit on hot-add/remove event. > > > -CPU present map read by ACPI BIOS GPE.2 handler to notify OS of CPU > > > -hot-(un)plug events. > > > +Selecting CPU device beyond possible range has no effect on platform: > > > + - write accesses to CPU hot-plug registers not documented above are > > > + ignored > > > + - read accesses to CPU hot-plug registers not documented above return > > > + all bits set to 1. > > > > why not 0? > isn't hardware usually return 1s on unhandled IO reads? > (I even thought that's it was you who told me it) PCI does but it's special. > > that's what has been done for memory hotplug, so I'm doing the same here. > > > > > > -- > > > 1.8.3.1 > >