On 27.09.2011, at 18:53, Blue Swirl wrote: > On Tue, Sep 27, 2011 at 3:59 PM, Alexander Graf <ag...@suse.de> wrote: >> >> On 27.09.2011, at 17:50, Blue Swirl wrote: >> >>> On Mon, Sep 26, 2011 at 11:19 PM, Scott Wood <scottw...@freescale.com> >>> wrote: >>>> On 09/24/2011 05:00 AM, Alexander Graf wrote: >>>>> On 24.09.2011, at 10:44, Blue Swirl wrote: >>>>>> On Sat, Sep 24, 2011 at 8:03 AM, Alexander Graf <ag...@suse.de> wrote: >>>>>>> On 24.09.2011, at 09:41, Blue Swirl wrote: >>>>>>>> On Mon, Sep 19, 2011 at 4:12 PM, Scott Wood <scottw...@freescale.com> >>>>>>>> wrote: >>>>>>>>> The goal with the spin table stuff, suboptimal as it is, was something >>>>>>>>> that would work on any powerpc implementation. Other >>>>>>>>> implementation-specific release mechanisms are allowed, and are >>>>>>>>> indicated by a property in the cpu node, but only if the loader knows >>>>>>>>> that the OS supports it. >>>>>>>>> >>>>>>>>>> IIUC the spec that includes these bits is not finalized yet. It is >>>>>>>>>> however in use on all u-boot versions for e500 that I'm aware of and >>>>>>>>>> the method Linux uses to bring up secondary CPUs. >>>>>>>>> >>>>>>>>> It's in ePAPR 1.0, which has been out for a while now. ePAPR 1.1 was >>>>>>>>> just released which clarifies some things such as WIMG. >>>>>>>>> >>>>>>>>>> Stuart / Scott, do you have any pointers to documentation where the >>>>>>>>>> spinning is explained? >>>>>>>>> >>>>>>>>> https://www.power.org/resources/downloads/Power_ePAPR_APPROVED_v1.1.pdf >>>>>>>> >>>>>>>> Chapter 5.5.2 describes the table. This is actually an interface >>>>>>>> between OS and Open Firmware, obviously there can't be a real hardware >>>>>>>> device that magically loads r3 etc. >>>> >>>> Not Open Firmware, but rather an ePAPR-compliant loader. >>> >>> 'boot program to client program interface definition'. >>> >>>>>>>> The device method would break abstraction layers, >>>> >>>> Which abstraction layers? >>> >>> QEMU system emulation emulates hardware, not software. Hardware >>> devices don't touch CPU registers. >> >> The great part about this emulated device is that it's basically guest >> software running in host context. To the guest, it's not a device in the >> ordinary sense, such as vmport, but rather the same as software running on >> another core, just that the other core isn't running any software. >> >> Sure, if you consider this a device, it does break abstraction layers. Just >> consider it as host running guest code, then it makes sense :). >> >>> >>>>>>>> it's much like >>>>>>>> vmport stuff in x86. Using a hypercall would be a small improvement. >>>>>>>> Instead it should be possible to implement a small boot ROM which puts >>>>>>>> the secondary CPUs into managed halt state without spinning, then the >>>>>>>> boot CPU could send an IPI to a halted CPU to wake them up based on >>>>>>>> the spin table, just like real HW would do. >>>> >>>> The spin table, with no IPI or halt state, is what real HW does (or >>>> rather, what software does on real HW) today. It's ugly and inefficient >>>> but it should work everywhere. Anything else would be dependent on a >>>> specific HW implementation. >>> >>> Yes. Hardware doesn't ever implement the spin table. >>> >>>>>>>> On Sparc32 OpenBIOS this >>>>>>>> is something like a few lines of ASM on both sides. >>>>>>> >>>>>>> That sounds pretty close to what I had implemented in v1. Back then the >>>>>>> only comment was to do it using this method from Scott. >>>> >>>> I had some comments on the actual v1 implementation as well. :-) >>>> >>>>>>> So we have the choice between having code inside the guest that >>>>>>> spins, maybe even only checks every x ms, by programming a timer, >>>>>>> or we can try to make an event out of the memory write. V1 was >>>>>>> the former, v2 (this one) is the latter. This version performs a >>>>>>> lot better and is easier to understand. >>>>>> >>>>>> The abstraction layers should not be broken lightly, I suppose some >>>>>> performance or laziness^Wlocal optimization reasons were behind vmport >>>>>> design too. The ideal way to solve this could be to detect a spinning >>>>>> CPU and optimize that for all architectures, that could be tricky >>>>>> though (if a CPU remains in the same TB for extended periods, inspect >>>>>> the TB: if it performs a loop with a single load instruction, replace >>>>>> the load by a special wait operation for any memory stores to that >>>>>> page). >>>> >>>> How's that going to work with KVM? >>>> >>>>> In fact, the whole kernel loading way we go today is pretty much >>>>> wrong. We should rather do it similar to OpenBIOS where firmware >>>>> always loads and then pulls the kernel from QEMU using a PV >>>>> interface. At that point, we would have to implement such an >>>>> optimization as you suggest. Or implement a hypercall :). >>>> >>>> I think the current approach is more usable for most purposes. If you >>>> start U-Boot instead of a kernel, how do pass information on from the >>>> user (kernel, rfs, etc)? Require the user to create flash images[1]? >>> >>> No, for example OpenBIOS gets the kernel command line from fw_cfg device. >>> >>>> Maybe that's a useful mode of operation in some cases, but I don't think >>>> we should be slavishly bound to it. Think of the current approach as >>>> something between whole-system and userspace emulation. >>> >>> This is similar to ARM, M68k and Xtensa semi-hosting mode, but not at >>> kernel level but lower. Perhaps this mode should be enabled with >>> -semihosting flag or a new flag. Then the bare metal version could be >>> run without the flag. >> >> and then we'd have 2 implementations for running in system emulation mode >> and need to maintain both. I don't think that scales very well. > > No, but such hacks are not common. > >>> >>>> Where does the device tree come from? How do you tell the guest about >>>> what devices it has, especially in virtualization scenarios with non-PCI >>>> passthrough devices, or custom qdev instantiations? >>>> >>>>> But at least we'd always be running the same guest software stack. >>>> >>>> No we wouldn't. Any U-Boot that runs under QEMU would have to be >>>> heavily modified, unless we want to implement a ton of random device >>>> emulation, at least one extra memory translation layer (LAWs, localbus >>>> windows, CCSRBAR, and such), hacks to allow locked cache lines to >>>> operate despite a lack of backing store, etc. >>> >>> I'd say HW emulation business as usual. Now with the new memory API, >>> it should be possible to emulate the caches with line locking and TLBs >>> etc., this was not previously possible. IIRC implementing locked cache >>> lines would allow x86 to boot unmodified coreboot. >> >> So how would you emulate cache lines with line locking on KVM? > > The cache would be a MMIO device which registers to handle all memory > space. Configuring the cache controller changes how the device > operates. Put this device between CPU and memory and other devices. > Performance would probably be horrible, so CPU should disable the > device automatically after some time.
So how would you execute code on this region then? :) > >> However, we already have a number of hacks in SeaBIOS to run in QEMU, so I >> don't see an issue in adding a few here and there in u-boot. The memory >> pressure is a real issue though. I'm not sure how we'd manage that one. >> Maybe we could try and reuse the host u-boot binary? heh > > I don't think SeaBIOS breaks layering except for fw_cfg. I'm not saying we're breaking layering there. I'm saying that changing u-boot is not so bad, since it's the same as we do with SeaBIOS. It was an argument in favor of your position. > For extremely > memory limited situation, perhaps QEMU (or Native KVM Tool for lean > and mean version) could be run without glibc, inside kernel or even > interfacing directly with the hypervisor. I'd also continue making it > possible to disable building unused devices and features. I'm pretty sure you're not the only one with that goal ;). Alex