> -----Original Message----- > From: Laszlo Ersek [mailto:ler...@redhat.com] > Sent: 26 August 2022 13:07 > To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; > qemu-devel@nongnu.org; qemu-...@nongnu.org > Cc: imamm...@redhat.com; peter.mayd...@linaro.org; Linuxarm > <linux...@huawei.com>; chenxiang (M) <chenxian...@hisilicon.com>; Ard > Biesheuvel (kernel.org address) <a...@kernel.org>; Gerd Hoffmann > <kra...@redhat.com> > Subject: Re: [PATCH] fw_cfg: Don't set callback_opaque NULL in > fw_cfg_modify_bytes_read() > > +Ard +Gerd, one pointer at the bottom > > On 08/26/22 13:59, Laszlo Ersek wrote: > > On 08/25/22 18:18, Shameer Kolothum wrote: > >> Hi > >> > >> On arm/virt platform, Chen Xiang reported a Guest crash while > >> attempting the below steps, > >> > >> 1. Launch the Guest with nvdimm=on > >> 2. Hot-add a NVDIMM dev > >> 3. Reboot > >> 4. Guest boots fine. > >> 5. Reboot again. > >> 6. Guest boot fails. > >> > >> QEMU_EFI reports the below error: > >> ProcessCmdAddPointer: invalid pointer value in "etc/acpi/tables" > >> OnRootBridgesConnected: InstallAcpiTables: Protocol Error > >> > >> Debugging shows that on first reboot(after hot-adding NVDIMM), > >> Qemu updates the etc/table-loader len, > >> > >> qemu_ram_resize() > >> fw_cfg_modify_file() > >> fw_cfg_modify_bytes_read() > >> > >> And in fw_cfg_modify_bytes_read() we set the "callback_opaque" for > >> the "key" entry to NULL. Because of this, on the second reboot, > >> virt_acpi_build_update() is called with a NULL "build_state" and > >> returns without updating the ACPI tables. This seems to be > >> upsetting the firmware. > >> > >> To fix this, don't change the callback_opaque in > fw_cfg_modify_bytes_read(). > >> > >> Reported-by: chenxiang <chenxian...@hisilicon.com> > >> Signed-off-by: Shameer Kolothum > <shameerali.kolothum.th...@huawei.com> > >> --- > >> I am still not very convinced this is the root cause of the issue. > >> Though it looks like setting callback_opaque to NULL while updating > >> the file size is wrong, what puzzles me is that on the second reboot > >> we don't have any ACPI table size changes and ideally firmware should > >> see the updated tables from the first reboot itself. > >> > >> Please take a look and let me know. > >> > >> Thanks, > >> Shameer > >> > >> --- > >> hw/nvram/fw_cfg.c | 1 - > >> 1 file changed, 1 deletion(-) > >> > >> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c > >> index d605f3f45a..dfe8404c01 100644 > >> --- a/hw/nvram/fw_cfg.c > >> +++ b/hw/nvram/fw_cfg.c > >> @@ -728,7 +728,6 @@ static void > *fw_cfg_modify_bytes_read(FWCfgState *s, uint16_t key, > >> ptr = s->entries[arch][key].data; > >> s->entries[arch][key].data = data; > >> s->entries[arch][key].len = len; > >> - s->entries[arch][key].callback_opaque = NULL; > >> s->entries[arch][key].allow_write = false; > >> > >> return ptr; > >> > > > > I vaguely recall seeing the same issue report years ago (also in > > relation to hot-adding NVDIMM). However, I have no capacity to > > participate in the discussion. Making this remark just for clarity. > > The earlier report I've had in mind was from Shameer as well: > > http://mid.mail-archive.com/5FC3163CFD30C246ABAA99954A238FA83F3F > b...@lhreml524-mbs.china.huawei.com
Right. That was a slightly different issue though. It was basically ACPI table size not getting updated on the first reboot of Guest after we hot-add NVDIMM dev. The error from firmware was different in that case, ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables" OnRootBridgesConnected: InstallAcpiTables: Protocol Error And it was fixed with this series here, https://patchwork.kernel.org/project/qemu-devel/cover/20200403101827.30664-1-shameerali.kolothum.th...@huawei.com/ The current issue only happens on the second reboot of the Guest as described in the steps above. Thanks, Shameer