On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <ag...@suse.de> wrote: > > On 24.09.2011, at 09:41, Blue Swirl wrote: > >> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <scottw...@freescale.com> wrote: >>> On 09/19/2011 06:35 AM, Alexander Graf wrote: >>>> >>>> On 17.09.2011, at 19:40, Blue Swirl wrote: >>>> >>>>> On Sat, Sep 17, 2011 at 5:15 PM, Alexander Graf <ag...@suse.de> wrote: >>>>>> >>>>>> Am 17.09.2011 um 18:58 schrieb Blue Swirl <blauwir...@gmail.com>: >>>>>> >>>>>>> On Sparc32, there is no need for a PV device. The CPU is woken up from >>>>>>> halted state with an IPI. Maybe you could use this approach? >>>>>> >>>>>> The way it's done here is defined by u-boot and now also nailed down in >>>>>> the ePAPR architecture spec. While alternatives might be more appealing, >>>>>> this is how guests work today :). >>>>> >>>>> OK. I hoped that there were no implementations yet. The header (btw >>>>> missing) should point to the spec. >>> >>> The goal with the spin table stuff, suboptimal as it is, was something >>> that would work on any powerpc implementation. Other >>> implementation-specific release mechanisms are allowed, and are >>> indicated by a property in the cpu node, but only if the loader knows >>> that the OS supports it. >>> >>>> IIUC the spec that includes these bits is not finalized yet. It is however >>>> in use on all u-boot versions for e500 that I'm aware of and the method >>>> Linux uses to bring up secondary CPUs. >>> >>> It's in ePAPR 1.0, which has been out for a while now. ePAPR 1.1 was >>> just released which clarifies some things such as WIMG. >>> >>>> Stuart / Scott, do you have any pointers to documentation where the >>>> spinning is explained? >>> >>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf >> >> Chapter 5.5.2 describes the table. This is actually an interface >> between OS and Open Firmware, obviously there can't be a real hardware >> device that magically loads r3 etc. >> >> The device method would break abstraction layers, it's much like >> vmport stuff in x86. Using a hypercall would be a small improvement. >> Instead it should be possible to implement a small boot ROM which puts >> the secondary CPUs into managed halt state without spinning, then the >> boot CPU could send an IPI to a halted CPU to wake them up based on >> the spin table, just like real HW would do. On Sparc32 OpenBIOS this >> is something like a few lines of ASM on both sides. > > That sounds pretty close to what I had implemented in v1. Back then the only > comment was to do it using this method from Scott. Maybe one day we will get > u-boot support. Then u-boot will spin on the CPU itself and when that time > comes, we can check if we can implement a prettier version. > > Btw, we can't do the IPI method without exposing something to the guest that > u-boot would usually not expose. There simply is no event. All that happens > is a write to memory to tell the other CPU that it should wake up. So while > sending an IPI to the other CPU is the "clean" way to go, I agree, we can > either be compatible or "clean". And if I get the choice I'm rather > compatible.
There are also warts in Sparc32 design, for example there is no instruction to halt the CPU, instead a device (only available on some models) can do it. > So we have the choice between having code inside the guest that spins, maybe > even only checks every x ms, by programming a timer, or we can try to make an > event out of the memory write. V1 was the former, v2 (this one) is the > latter. This version performs a lot better and is easier to understand. The abstraction layers should not be broken lightly, I suppose some performance or laziness^Wlocal optimization reasons were behind vmport design too. The ideal way to solve this could be to detect a spinning CPU and optimize that for all architectures, that could be tricky though (if a CPU remains in the same TB for extended periods, inspect the TB: if it performs a loop with a single load instruction, replace the load by a special wait operation for any memory stores to that page).