On Tue, 7 Feb 2023 at 11:51, Oliver Steffen <ostef...@redhat.com> wrote: > > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen <ostef...@redhat.com> wrote: >> >> >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel <a...@kernel.org> wrote: >>> >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen <ostef...@redhat.com> wrote: >>> > >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel <a...@kernel.org> wrote: >>> >> >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen <ostef...@redhat.com> wrote: >>> >> > >> >> [...] >>> >>> >> > I am sorry, this story does not seem to be over yet. >>> >> > >>> >> > We are using the Erratum patch and also included the commit 406504c7 in >>> >> > the kernel. >>> >> > Now the firmware crashes sometimes (10 out of 89 tests). >>> >> > >>> >> >>> >> Thanks for the report. Is this still on ThunderX2? >>> >> >>> >> > Any hints are very welcome! >>> >> > >>> >> >>> >> Do you have access to those build artifacts? >>> > >>> > >>> > https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm >>> > >>> > and/or here: >>> > >>> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=1835251 >>> > >>> > Source for reference: >>> > https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24 >>> > >>> >>> Any chance the .dll files (which are actually ELF executables) have >>> been preserved somewhere? >> >> Here is the build folder (~90MB): >> https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-issue.tar.xz >> >> I am waiting for the tests with the additional debug output to run. > > > We reran the test suite with the Erratum and the additional debug > output enabled. Strangely, the problem does not occur anymore, the > firmware boots up normally. > > We retried the tests without the additional debug output. > RHEL ships two firmware flavors for AARCH64: a silent and a verbose > version.
Are these RELEASE vs DEBUG builds? > Both were tried. We see no problems with the verbose > one. The silent one fails noticeably more often if a software TPM device > is present. > This smells like some missing cache or TLB maintenance - the verbose one exits to the host much more often, and likely relies on cache/TLB maintenance occurring in the hypervisor. So the build always includes TPM support but the issue only occurs when the sw TPM is actually exposed by QEMU? > Could this be related to how much stuff is going on in the early phase > of the firmware (when logging is enabled: formatting of messages and > sending to serial port...) ? > I'll try to see if I can rig something up that logs into a buffer rather than straight to the serial, and dump it all out when handling the crash -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#99734): https://edk2.groups.io/g/devel/message/99734 Mute This Topic: https://groups.io/mt/96075174/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-