On Tue, Feb 7, 2023 at 12:57 PM Ard Biesheuvel <a...@kernel.org> wrote:

> On Tue, 7 Feb 2023 at 11:51, Oliver Steffen <ostef...@redhat.com> wrote:
> >
> > On Thu, Feb 2, 2023 at 12:09 PM Oliver Steffen <ostef...@redhat.com>
> wrote:
> >>
> >>
> >> On Wed, Feb 1, 2023 at 2:29 PM Ard Biesheuvel <a...@kernel.org> wrote:
> >>>
> >>> On Wed, 1 Feb 2023 at 13:59, Oliver Steffen <ostef...@redhat.com>
> wrote:
> >>> >
> >>> > On Wed, Feb 1, 2023 at 12:52 PM Ard Biesheuvel <a...@kernel.org>
> wrote:
> >>> >>
> >>> >> On Wed, 1 Feb 2023 at 10:14, Oliver Steffen <ostef...@redhat.com>
> wrote:
> >>> >> >
> >>
> >> [...]
> >>>
> >>> >> > I am sorry, this story does not seem to be over yet.
> >>> >> >
> >>> >> > We are using the Erratum patch and also included the commit
> 406504c7 in
> >>> >> > the kernel.
> >>> >> > Now the firmware crashes sometimes (10 out of 89 tests).
> >>> >> >
> >>> >>
> >>> >> Thanks for the report. Is this still on ThunderX2?
> >>> >>
> >>> >> > Any hints are very welcome!
> >>> >> >
> >>> >>
> >>> >> Do  you have access to those build artifacts?
> >>> >
> >>> >
> >>> >
> https://kojihub.stream.centos.org/kojifiles/work/tasks/5251/1835251/edk2-aarch64-20221207gitfff6d81270b5-4.el9.test.noarch.rpm
> >>> >
> >>> > and/or here:
> >>> >
> >>> > https://kojihub.stream.centos.org/koji/taskinfo?taskID=1835251
> >>> >
> >>> > Source for reference:
> >>> > https://gitlab.com/redhat/centos-stream/src/edk2/-/merge_requests/24
> >>> >
> >>>
> >>> Any chance the .dll files (which are actually ELF executables) have
> >>> been preserved somewhere?
> >>
> >> Here is the build folder (~90MB):
> >>
> https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/armvirt-thunderx2-issue.tar.xz
> >>
> >> I am waiting for the tests with the additional debug output to run.
> >
> >
> > We reran the test suite with the Erratum and the additional debug
> > output enabled.  Strangely, the problem does not occur anymore, the
> > firmware boots up normally.
> >
> > We retried the tests without the additional debug output.
> > RHEL ships two firmware flavors for AARCH64: a silent and a verbose
> > version.
>
> Are these RELEASE vs DEBUG builds?
>

All builds are DEBUG, just the amount of information printed on
the serial is different (almost zero for the "silent" one.)


> > Both were tried. We see no problems with the verbose
> > one. The silent one fails noticeably more often if a software TPM device
> > is present.
> >
>
> This smells like some missing cache or TLB maintenance - the verbose
> one exits to the host much more often, and likely relies on cache/TLB
> maintenance occurring in the hypervisor.
>
> So the build always includes TPM support but the issue only occurs
> when the sw TPM is actually exposed by QEMU?
>

Yes.
All builds include support for TPM, but the issue occurs more frequently
if a sw TPM is exposed by QEMU.


> > Could this be related to how much stuff is going on in the early phase
> > of the firmware (when logging is enabled: formatting of messages and
> > sending to serial port...) ?
> >
>
> I'll try to see if I can rig something up that logs into a buffer
> rather than straight to the serial, and dump it all out when handling
> the crash
>
> Awesome.

Thanks,
 Oliver


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#99736): https://edk2.groups.io/g/devel/message/99736
Mute This Topic: https://groups.io/mt/96075174/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to