On Sat, 20 May 2023 at 10:37, Oliver Steffen <ostef...@redhat.com> wrote: > > Quoting Ard Biesheuvel (2023-05-19 23:36:53) > > On Fri, 19 May 2023 at 18:32, Oliver Steffen <ostef...@redhat.com> wrote: > > > > > > > > > Hi all, > > > > > > I had another look at this and I can now reproduce the issue consistently, > > > with a quite minimal setup, on recent Linux kernel, Qemu, and EDK2. > > > It requires rebooting the guest in a tight loop. It happens in silent > > > and verbose > > > builds alike, but since the verbose ones are slowed down by the serial > > > output, it > > > takes longer to hit the issue. > > > It is possible to reproduce it with the silent builds within a few > > > minutes. > > > For the verbose case I recommend running multiple Qemu instances in > > > parallel (as > > > many as the machine allows, in my case ~100). > > > > > > > Thanks a lot for all these details, this is extremely helpful. > > > > So what appears to be happening is that we split the 2M block mapping > > that covers the code that we were called from, and hit a level 2 > > translation fault because the updated page table entry is still > > observed to be in its transient 'invalid' state as we return to it. > > > > Could you please check whether this makes a difference? > > > > --- a/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > > +++ b/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > > @@ -65,6 +65,7 @@ > > // write updated entry > > str x1, [x0] > > dsb nshst > > + isb > > > > .L2_\@: > > .endm > > That fixes it - no crash observed within 150k iterations. > Thanks, Ard! >
Fantastic! Thanks a lot for all the effort in tracking this down. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#105093): https://edk2.groups.io/g/devel/message/105093 Mute This Topic: https://groups.io/mt/96075174/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-