On Mon, Oct 14, 2024 at 9:08 PM Oliver Steffen <ostef...@redhat.com> wrote: > > Since the PixieFail CVE fixes, a strong random number generator is > required to use network functionality, such as booting via PXE or > HTTP. > On modern x86_64 CPUs this is not a problem because these support the > RDRAND instruction. > On older models one needs to add a virtio-rng device otherwise network > initialization fails. > > We now observe a very strange problem [1]: > Network initialization still fails when adding a virtio-rng to a VM > with an old CPU, under certain hardware configurations. > > For example in combination with COM1 and COM2 isa-serial port, while > it works if only one of them is there (it doesn't matter which one, as > long as they are not both configured in QEMU). > > Steps to reproduce the issue: > > Use a recent edk2 master branch, for example 596773f5e33e. We used > qemu-8.2.7-1.fc40. > > Build OVMF for X64 like this: > > build -t GCC5 -b DEBUG -a X64 \ > -p OvmfPkg/OvmfPkgX64.dsc \ > -D NETWORK_HTTP_BOOT_ENABLE=TRUE \ > -D NETWORK_IP6_ENABLE=TRUE \ > -D NETWORK_TLS_ENABLE=TRUE \ > -D NETWORK_ALLOW_HTTP_CONNECTIONS=TRUE \ > -D DEBUG_PRINT_ERROR_LEVEL=0xFFFFFFFF > > Run QEMU with a CPU that does not feature RDRAND: > > qemu-system-x86_64 \ > -machine q35,accel=kvm -m 1G -display none -nodefaults \ > -drive file=OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \ > -drive file=OVMF_VARS.fd,if=pflash,format=raw,unit=1,readonly=on \ > -chardev file,id=fw,path="firmware.log" -device > isa-debugcon,iobase=0x402,chardev=fw \ > -drive > file=UefiShell.iso,format=raw,if=none,media=cdrom,id=drive-cd1,readonly=on > \ > -device ide-cd,drive=drive-cd1,id=cd1,bootindex=1 \ > -netdev user,id=net0 -device virtio-net-pci,netdev=net0,bootindex=2 \ > -device virtio-rng-pci \ > -serial stdio \ > -serial null \ > -cpu core2duo > > > The attached CD-Rom image [2] contains a EFI Shell executable that is booted. > From the shell one can investigate the available boot options: > > # bcfg boot dump > > Expectation: PXE and HTTP options are listed. > Observation: No network boot options present. > > Changing the CPU model on the QEMU command line to “max” makes PXE and > HTTP options available. We suspected that a virtio-rng-pci is not > working and network support is unavailable due to the lack of an RNG. > > But the same can be achieved by removing the second serial port > (“-serial null”) while keeping the CPU model. We can’t explain this at > all. > > While network boot can be achieved by changing other parts of the > command line too (modifying bootindex, for example) it is very strange > that simply the serial port configuration influences network boot. > > Bisection: > Doing a bisection, the commit that introduces this problem is > 4c4ceb2ceb ("NetworkPkg: SECURITY PATCH CVE-2023-45237"). > > The problem seems to be pre-existing, but as of this commit, DxeNetLib > has a new Depex with gEfiRngProtocolGuid > (3152BCA5-EADE-433D-862E-C01CDC291F44) since it is now a consumer. > Producers can be VirtioRng (when the device is present) and RngDxe > (when the CPU supports for example instructions like RDRAND). Removing > the Depex, just for confirmation, solves the problem, but of course > DxeNetLib fails on an assert where it expects to find random > generators. > > Observing the logs [3,4] with DEBUG_DISPATCH enabled and adding some > printing in VirtioRng, we noticed that in both cases (PXE working or > not), VirtioRng is started at the same time in the log (see on both > logs attached at line 22240), but with both COM1 and COM2 we no longer > see any dispatcher messages after VirtioRng has started, while we see > them when there is only one of them. Just this last stage of the > dispatcher will load the network modules, finding the dependency with > gEfiRngProtocolGuid true.
Going in this direction, I found a hack that solves the problem, but it's obviously not the right solution (sorry, I have little experience in edk2). By analyzing the calls to the dispatcher (`gDS->Dispatch ()`) I found that when we only have COM1, EfiBootManagerConnectDevicePath() at some point invokes `gDS->Dispatch ()` after VirtioRng has started. This call will then get DxeNetLib loaded. With both COM1 and COM2 on the other hand, I don't see this call, maybe because `RemainingDevicePath` in this case is empty, since EDK2 was able to initialize both, but this is just an idea. So the hack is the following, where I force the call to the dispatcher on every call of EfiBootManagerConnectDevicePath(): diff --git a/MdeModulePkg/Library/UefiBootManagerLib/BmConnect.c b/MdeModulePkg/Library/UefiBootManagerLib/BmConnect.c index d1fb0f72ba..621f90d297 100644 --- a/MdeModulePkg/Library/UefiBootManagerLib/BmConnect.c +++ b/MdeModulePkg/Library/UefiBootManagerLib/BmConnect.c @@ -121,6 +121,8 @@ EfiBootManagerConnectDevicePath ( } CurrentTpl = EfiGetCurrentTpl (); + Status = gDS->Dispatch ();^M + DEBUG ((DEBUG_INFO, "%a extra gDS->Dispatch () - Status: %r\n", __func__, Status));^M // // Start the real work of connect with RemainingDevicePath // I try to better understand how the dispatcher works, but I think it is related to the dispatcher and some dependency, but my knowledge is limited. Any suggestions are more than welcome. Thanks, Stefano > > Any help is very much appreciated! > > Regards, > Stefano and Oliver > > [1] https://issues.redhat.com/browse/RHEL-58631 > [2] https://osteffen.fedorapeople.org/OvmfNetbootRngIssue/UefiShell.iso > [3] > https://osteffen.fedorapeople.org/OvmfNetbootRngIssue/edk2_PXE_issue_COM1_COM2.log > [4] > https://osteffen.fedorapeople.org/OvmfNetbootRngIssue/edk2_PXE_working_COM1.log > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120624): https://edk2.groups.io/g/devel/message/120624 Mute This Topic: https://groups.io/mt/109008158/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-