On Wed, 7 Aug 2024 15:23:57 +0200 Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote:
> Em Wed, 7 Aug 2024 10:34:36 +0100 > Jonathan Cameron <jonathan.came...@huawei.com> escreveu: > > > On Wed, 7 Aug 2024 09:47:50 +0200 > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > Em Tue, 6 Aug 2024 16:31:13 +0200 > > > Igor Mammedov <imamm...@redhat.com> escreveu: > > > > > > > PS: > > > > looking at the code, ACPI_GHES_MAX_RAW_DATA_LENGTH is 1K > > > > and it is the total size of a error block for a error source. > > > > > > > > However acpi_hest_ghes.rst (3) says it should be 4K, > > > > am I mistaken? > > > > > > Maybe Jonathan knows better, but I guess the 1K was just some > > > arbitrary limit to prevent a too big CPER. The 4K limit described > > > at acpi_hest_ghes.rst could be just some limit to cope with > > > the current bios implementation, but I didn't check myself how > > > this is implemented there. > > > > > > I was unable to find any limit at the specs. Yet, if you look at: > > > > > > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-section > > > > > > > I think both limits are just made up. You can in theory log huge > > error records. Just not one does. > > If both are made up, I would sync them, either patching the > documentation or the ghes driver. > > > > > > > > > The processor Error Information Structure, starting at offset > > > 40, can go up to 255*32, meaning an offset of 8200, which is > > > bigger than 4K. > > > > > > Going further, processor context can have up to 65535 (spec > > > actually says 65536, but that sounds a typo, as the size is > > > stored on an uint16_t), containing multiple register values > > > there (the spec calls its length as "P"). > > > > > > So, the CPER record could, in theory, have: > > > 8200 + (65535 * P) + sizeof(vendor-specicific-info) > > > > > > The CPER length is stored in Section Length record, which is > > > uint32_t. > > > > > > So, I'd say that the GHES record can theoretically be a lot > > > bigger than 4K. > > Agreed - but I don't think we care for testing as long as it's > > big enough for plausible records. Unless you really want > > to fuzz the limits? > > Fuzz the limits could be interesting, but it is not on my > current plans. > > Yet, 1K could be a little bit short for ARM CPER. > > See: N.26 ARMv8 AArch64 GPRs (Type 4) has 256 bytes for > registers, plus 8 bytes for the header. So, a total size of > 264 bytes, for a single context register dump. I would expect > that, in real life, type 4 to always be reported on aarch64, > on BIOS with context register support. Maybe other types could > also be dumped altogether (like context registers for EL1, > EL2 and/or EL3). > > If just one type 4 context is encoded, it means that, 1K has > space for 23 errors (of a max limit of 255). > > Just looking at the maximum number, my feeling is that 1K > might be too short to simulate some real life reports, > but that depends on how firmware is actually grouping > such events. per my knowledge firmware is out of picture here, since all it does in HEST case is allocate continuous space for 'etc/hardware_errors' blob as QEMU told it. > > So, maybe this could be expanded to, let's say, 4K, thus > aligning with the ReST documentation. maybe to get moving, 1st get your series in with docs fixed to today limit. And then increase error_block size to desired value on top of that as it's really not relevant to what you are doing here. > Regards, > Mauro >