On Tue, Jan 31, 2023 at 08:39:08PM +0100, Jason A. Donenfeld wrote: > On Mon, Jan 30, 2023 at 03:19:59PM -0500, Michael S. Tsirkin wrote: > > From: "Jason A. Donenfeld" <ja...@zx2c4.com> > > > > The setup_data links are appended to the compressed kernel image. Since > > the kernel image is typically loaded at 0x100000, setup_data lives at > > `0x100000 + compressed_size`, which does not get relocated during the > > kernel's boot process. > > > > The kernel typically decompresses the image starting at address > > 0x1000000 (note: there's one more zero there than the compressed image > > above). This usually is fine for most kernels. > > > > However, if the compressed image is actually quite large, then > > setup_data will live at a `0x100000 + compressed_size` that extends into > > the decompressed zone at 0x1000000. In other words, if compressed_size > > is larger than `0x1000000 - 0x100000`, then the decompression step will > > clobber setup_data, resulting in crashes. > > > > Visually, what happens now is that QEMU appends setup_data to the kernel > > image: > > > > kernel image setup_data > > |--------------------------||----------------| > > 0x100000 0x100000+l1 0x100000+l1+l2 > > > > The problem is that this decompresses to 0x1000000 (one more zero). So > > if l1 is > (0x1000000-0x100000), then this winds up looking like: > > > > kernel image setup_data > > |--------------------------||----------------| > > 0x100000 0x100000+l1 0x100000+l1+l2 > > > > d e c o m p r e s s e d k e r n e l > > > > |-------------------------------------------------------------| > > 0x1000000 > > 0x1000000+l3 > > > > The decompressed kernel seemingly overwriting the compressed kernel > > image isn't a problem, because that gets relocated to a higher address > > early on in the boot process, at the end of startup_64. setup_data, > > however, stays in the same place, since those links are self referential > > and nothing fixes them up. So the decompressed kernel clobbers it. > > > > Fix this by appending setup_data to the cmdline blob rather than the > > kernel image blob, which remains at a lower address that won't get > > clobbered. > > > > This could have been done by overwriting the initrd blob instead, but > > that poses big difficulties, such as no longer being able to use memory > > mapped files for initrd, hurting performance, and, more importantly, the > > initrd address calculation is hard coded in qboot, and it always grows > > down rather than up, which means lots of brittle semantics would have to > > be changed around, incurring more complexity. In contrast, using cmdline > > is simple and doesn't interfere with anything. > > > > The microvm machine has a gross hack where it fiddles with fw_cfg data > > after the fact. So this hack is updated to account for this appending, > > by reserving some bytes. > > > > Fixup-by: Michael S. Tsirkin <m...@redhat.com> > > Cc: x...@kernel.org > > Cc: Philippe Mathieu-Daudé <phi...@linaro.org> > > Cc: H. Peter Anvin <h...@zytor.com> > > Cc: Borislav Petkov <b...@alien8.de> > > Cc: Eric Biggers <ebigg...@kernel.org> > > Signed-off-by: Jason A. Donenfeld <ja...@zx2c4.com> > > Message-Id: <20221230220725.618763-1-ja...@zx2c4.com> > > Message-ID: <20230128061015-mutt-send-email-...@kernel.org> > > Reviewed-by: Michael S. Tsirkin <m...@redhat.com> > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > Tested-by: Eric Biggers <ebigg...@google.com> > > Tested-by: Mathias Krause <mini...@grsecurity.net> > > This one should wind up in the stable point release too. Dunno what the > procedure for that is. > > Jason
If you want that you need to include Cc: qemu-sta...@nongnu.org Fixes: <hash> ("subject") you can still reply to the original mail with this. -- MST