hi Julius, I very much appreciate you taking the time to elaborate on the issues we'd face trying to pair a "mismatched" RO/RW combination. Thoughtful, well articulated discussions like this are a big part of what makes our community a great one IMO.
-Matt On Wed, Aug 21, 2024 at 6:29 PM Julius Werner <jwer...@chromium.org> wrote: > [resent with correct sender address] > > Hi, > > I wanted to pick up the discussion from the leadership meeting again > here because I think it's easier to show my point with a few examples. > If you want an old coreboot RO build to work together with a ToT RW > build, then you don't just care about the vboot workbuffer matching... > *everything* that is shared in any way between bootblock/verstage and > romstage/ramstage/payload needs to be kept in sync or risks > introducing dangerous and not necessarily immediately obvious errors. > > One major issue is the pre-RAM memory layout. Take for example this > simple one-line CL: https://review.coreboot.org/83680 . Would most > reviewers expect that it makes RW firmware built after that point > incompatible with RO firmware built before? Well, it does, because it > increases the size of a memory region here > > https://review.coreboot.org/plugins/gitiles/coreboot/+/refs/heads/main/src/arch/x86/car.ld#34 > which then moves the offsets of all subsequent memory regions > downwards accordingly. Some of these regions contain buffers that were > passed by the bootblock to later stages (e.g. CBFS_MCACHE, > FMAP_CACHE), and when the romstage tries to reuse those buffers it > suddenly finds different data at the same offsets. > > We actually had to figure out how to support this for an RW update to > one of our shipped Chromebooks, and it looks like this: > > https://chromium-review.googlesource.com/c/chromiumos/third_party/coreboot/+/5732763/3/src/arch/x86/assembly_entry.S > . It needs an ugly, complicated assembly hack to copy bytes around > from the locations where RO firmware left them into the locations > where RW firmware expects them. Since one of these regions is the > stack, you can't even do that in C code because you might already be > trampling over the data you're still trying to save once you enter a C > function. And this patch is just tailored to this one particular case > (and platform), it's not a universal solution — if we were resizing a > different region, or swapping the order of two regions, that would > require a completely different (maybe impossible) workaround. > > Now, we could try to replace our simple memory region management with > something more complicated that comes with a built-in tagged directory > (kinda like a second CBMEM in pre-RAM). Then, if every single access > to one of those regions went through that directory, it would > automatically find the right location. We would need to write the code > to parse that directory in assembly for every architecture, since the > stack setup is one of the things that uses them. But even then, that > doesn't solve the resize problem: if we e.g. find a rare stack > overflow in the FSP that requires us to have a bigger romstage stack, > then we can't just reuse the directory the bootblock has written, we > need to actually change it to fix that bug and move all those regions > around to fit the new requirements. And what if migrating from the old > to the new layout on the fly is not possible? If we flip the order of > two sections, there might not be enough free scratch space to > temporarily save the contents of one section while we move the > contents of the other into its place. (Note that car.ld is only the > shared pre-RAM layout for Intel platforms. Every Arm and (recent) AMD > platform has their own separate layout like this, and some of them > tend to be very tight. We regularly have to land patches like > https://review.coreboot.org/78970 to make some sections larger or > smaller so that things still fit on certain platforms. Each of those > changes would lead to one of these situations where suddenly the > romstage needs to move everything into a different place from where > the bootblock left it because it needs space for new things that the > older code didn't account for.) > > There are plenty of other dependencies between RO and RW stages, many > of them very abstract and platform specific. For example, on Arm SoCs > we often have a clock tree where a few shared top-level PLLs get > multiplexed onto many peripheral devices that each have their own > pre-divisors. The PLLs are usually set up by bootblock code (because > some of them are needed very early), and the peripheral divisors are > usually configured by the driver that initializes that peripheral. If > you e.g. initialize an I2C controller in ramstage to 400KHz, the I2C > driver can calculate what divider to set based on the target frequency > and a #define constant of what the top-level PLL was set to. But what > if for some reason the top-level PLL frequency needs to change? Today > we'd submit a patch to change that #define and all the peripheral > driver code will continue to calculate accurate divisor values. But if > you then try to combine that RW ramstage with an RO bootblock built > before that change, suddenly your divisor will be calculated with an > incorrect value and your I2C clock will either be too fast or too > slow, resulting in maybe the peripheral not working, or maybe your > boot speed just regressing in a way nobody will ever track down, or > maybe the bus running slightly too fast so that on 99.9% of devices it > will still work but on 0.1% of devices you suddenly start having an > awful heisenbug that will be a royal pain to track down. > > There can be a near-infinite number of scenarios like this on various > platforms where code running in a later stage depends on some hardware > initialization that code running in an earlier stage did, and I see no > practical way to track them all even if we did try to implement > explicit backwards-compatibility for all of them. vboot is an optional > feature in coreboot and most people aren't even using it. How are we > going to get every developer/reviewer to care about whether a change > they're about to make could subtly break compatibility for a feature > they've never used? > > And even if we did find a way to perfectly track every possible > dependency and wanted to implement backwards-compatibility for all of > them, what is the cost for doing so (both the effort of writing it, > and the amount of extra romstage code we need to carry around in our > binaries)? Not every incompatibility is easy to fix up once you detect > it. Let's go back to the vboot workbuffer, for example: the workbuffer > contains among other things a couple of data structures that have been > read from the TPM and contain information relevant at various points > in the boot flow. Right now, two of those (secdata_firmware and > secdata_kernel) get loaded in verstage, while the third (secdata_fwmp) > only gets loaded in our payload. But we have been discussing that we > might want to (for various reasons) change this in the future so that > secdata_fwmp is also loaded in verstage. Let's say we make that > change, and we want to stay backwards compatible to RO firmware built > before it. That means that early in romstage, before any new code that > depends on secdata_fwmp runs, we'll need to run some code that checks > the workbuffer version and fixes this incompatibility if it is too > old. But in order to be able to load secdata_fwmp, we'll need to link > the entire TPM driver stack into romstage which otherwise wouldn't be > needed there on many platforms. So we are bloating our binary size in > a potentially space-restricted pre-RAM stage just to be able to fix an > issue that most people are never going to care about. Multiply that by > the dozens of possible incompatibilities that will accumulate over the > years. > > I totally understand that it would be great if we could just build a > system where you can just run any RO and RW together and have them > work it out, but I really don't think that's feasible in practice. > vboot was not designed to support that, for good reason. And if we > tried to just get started on adding compatibility code for specific > issues that people have on a specific platform right now to "see how > far we get", I don't think it's going to work out in the long term and > it will just lead to disruption and extra code complexity right now > that's eventually not going to pay off for anyone. It may work for the > one platform you care about for a while but sooner or later there'll > be another compatibility break that you can't easily mitigate. And if > people start to rely on this working then they may get angry if > someone else makes a change that is going to break compatibility on > their platform in a way that can't be fixed, which I think would be a > bad thing, because we do need to be able to develop new features and > fix architectural issues (both in vboot and in other code that can > cause RO->RW dependencies) unburdened by legacy code from years ago. >
_______________________________________________ coreboot mailing list -- coreboot@coreboot.org To unsubscribe send an email to coreboot-le...@coreboot.org