On Thu, Oct 18, 2012 at 6:32 AM, Alexander Graf <ag...@suse.de> wrote: > > > On 18.10.2012, at 03:18, Benjamin Herrenschmidt <b...@kernel.crashing.org> > wrote: > >> On Thu, 2012-10-18 at 11:09 +1100, David Gibson wrote: >> >>>>> That's horrible; if you use -boot just once it will clobber a >>>>> persistent NVRAM's boot order. I see that a means of changing the >>>>> default boot order from management tools is desirable, but that >>>>> shouldn't be the normal behaviour of -boot. And the objections to (2) >>>>> apply even more strongly - we'd need to translate arbitrary -boot >>>>> strings to NVRAM representation which may not be at all >>>>> straightforward from the information qemu has available. >>>> >>>> It may not be straight forward, but it's what makes the most sense from >>>> a user's PoV. >>> >>> Bollocks. Using -boot to override the normal boot sequence >>> permanently changing the normal boot sequence absoultely does not make >>> sense from a user's PoV. >> >> I strongly agree with David here. -boot should not change the persistent >> state. > > I think Anthony and you are looking at 2 different use cases, each with their > own sane reasoning. > > You want to have the chance to override the boot order temporarily for things > like cd boot or quick guest rescue missions. > > You also want to be able to permanently change the guest's boot order from a > management tool. At that same place you want to be able to display it, so you > don't have to boot your vm to know what it would be doing. > > As for device detection logic, both face the same problems. You need to be > able to say 'boot from cd-rom first temporarily' just the same as you need to > be able to say 'boot from the first cd-rom as first boot option permanently'. > The permanent change needs to be possible with the vm turned off though. > > I suppose that Anthony's reasoning is that we can implement temporary in the > management layer (or even qemu) if we have the permanent mechanism, by > switching back to the previous state after shutdown if the guest written boot > order didn't change. > > I don't mind personally if we have one interface for temporary and persistent > or 2 separate ones, but I think we should aim for having both options > available in the long run. Though doing permanent changes first and reverting > them later could raise problems when you kill your vm, since that wouldn't > clean up the temporary change. > >> >> In our case, the persistent state will have been carefully crafted by >> complicated scripts by the distro installer, and while I may want to use >> -boot to boot once off a cd image or similar, I certainly don't want >> that to affect my nvram setting pointing to the right on-disk >> bootloader. >> >> Additionally I don't want qemu to have to understand all the intricacies >> of expressing OFW boot path if we can avoid it. > > Yes, the same problem as EFI for example is facing. The solution here is as > simple as it gets: a new device name space. Instead of having a boot list > entry saying 'boot from device x, part y, file z' you would get an entry > saying 'boot from /qemu/disk0' and leave the rest to the firmware. The good > thing about this approach is that it again is persistable and can be used in > boot order lists. So you can directly translate -boot cd into > /qemu/disk0,/qemu/cdrom. And if you screwed up your guest boot config, just > put that order in by hand into permanent config.
For comparison, QEMU passes boot device flag (c/d) to OpenBIOS, which translates it to either first disk or cdrom. Then we have -prom-env which can be used to pass further paths (/iommu/sbus/esp/sd@0,0:/unix) but this is controlled by user. OpenBIOS does not understand bootindex yet, neither QEMU can't generate useful paths at least for Sparc32. So far I have been thinking that bootindex would equal OFW boot path, but why would that not work? > >> >> Qemu gives as much info as it can and let the firmware itself inside the >> guest figure things out. > > Yes, that's the only chance we have really. Even for bootindex, which could > for example get translated to /qemu/pci/0.10.0/disk0 which again would then > get aliased to the actual disk device node behind pci device 0.10.0 (first > disk) by SLOF. > >> >> In fact, I don't want Qemu to know anything about our internal nvram >> format. This is a business between the guest FW and the guest OS. The >> only thing qemu is allowed to do is wipe it out if asked to do so :-) > > It might be useful to use fdt in nvram to store the permanent boot order. > That way QEMU / management tools have the chance to make persistent changes. > Everyone around already understands fdt anyways :). > >> >>> Um.. as far as I can tell that's a point in favour of my position. It >>> makes it impossible for qemu to correctly describe boot sequences >>> using these devices in the terms firmware uses internally. On the >>> other hand it certainly is possible for qemu to pass bootorder="cd" >>> (or whatever) to the firmware via device tree of fw_cfg and have >>> firmware locally interpret that in tersm of what it knows about >>> available devices. >> >> This is more/less what happens with -boot today. IE. If you pass "c" >> SLOF looks for a bootable disk (though arguably the algorithm could be >> improved), "d" for a bootable optical media etc... >> >> We definitely want something a bit more expressive and in some case >> might even be able to pass down from the command line a full path to an >> actual device but we don't necessarily want qemu to understand the nvram >> format of this. >> >> Make it an expressive representation that makes sense to qemu, and let >> the FW "translate" that to something it understands internally. > > Yes :). > > Regardless of this problem, I think the conclusion on how to gandle default > -boot makes sense to everyone, so you (Avik?) can already start working on > that one while we nail down the details of the boot protocol handshakes > between QEMU and SLOF. > > > Alex > >