On Wed, 26 Feb 2025 10:56:28 +0100 Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote:
> Em Tue, 25 Feb 2025 11:01:15 +0100 > Igor Mammedov <imamm...@redhat.com> escreveu: > > > On Fri, 21 Feb 2025 10:21:27 +0000 > > Jonathan Cameron <jonathan.came...@huawei.com> wrote: > > > > > On Fri, 21 Feb 2025 07:38:23 +0100 > > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > > > Em Mon, 3 Feb 2025 16:22:36 +0100 > > > > Igor Mammedov <imamm...@redhat.com> escreveu: > > > > > > > > > On Mon, 3 Feb 2025 11:09:34 +0000 > > > > > Jonathan Cameron <jonathan.came...@huawei.com> wrote: > > > > > > > > > > > On Fri, 31 Jan 2025 18:42:41 +0100 > > > > > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > > > > > > > > > Now that the ghes preparation patches were merged, let's add > > > > > > > support > > > > > > > for error injection. > > > > > > > > > > > > > > On this series, the first 6 patches chang to the math used to > > > > > > > calculate offsets at HEST > > > > > > > table and hardware_error firmware file, together with its > > > > > > > migration code. Migration tested > > > > > > > with both latest QEMU released kernel and upstream, on both > > > > > > > directions. > > > > > > > > > > > > > > The next patches add a new QAPI to allow injecting GHESv2 errors, > > > > > > > and a script using such QAPI > > > > > > > to inject ARM Processor Error records. > > > > > > > > > > > > > > If I'm counting well, this is the 19th submission of my error > > > > > > > inject patches. > > > > > > > > > > > > Looks good to me. All remaining trivial things are in the category > > > > > > of things to consider only if you are doing another spin. The code > > > > > > ends up how I'd like it at the end of the series anyway, just > > > > > > a question of the precise path to that state! > > > > > > > > > > if you look at series as a whole it's more or less fine (I guess you > > > > > and me got used to it) > > > > > > > > > > however if you take it patch by patch (as if you've never seen it) > > > > > ordering is messed up (the same would apply to everyone after a while > > > > > when it's forgotten) > > > > > > > > > > So I'd strongly suggest to restructure the series (especially 2-6/14). > > > > > re sum up my comments wrt ordering: > > > > > > > > > > 0 add testcase for HEST table with current HEST as expected blob > > > > > (currently missing), so that we can be sure that we haven't messed > > > > > existing tables during refactoring. > > > > > > To potentially save time I think Igor is asking that before you do > > > anything > > > at all you plug the existing test hole which is that we don't test HEST > > > at all. Even after this series I think we don't test HEST. You add > > > a stub hest and exclusion but then in patch 12 the HEST stub is deleted > > > whereas > > > it should be replaced with the example data for the test. > > > > that's what I was saying. > > HEST table should be in DSDT, but it's optional and one has to use > > 'ras=on' option to enable that, which we aren't doing ATM. > > So whatever changes are happening we aren't seeing them in tests > > nor will we see any regression for the same reason. > > > > While white listing tables before change should happen and then updating > > them > > is the right thing to do, it's not sufficient since none of tests > > run with 'ras' enabled, hence code is not actually executed. > > Ok. Well, again we're not modifying HEST table structure on this > changeset. The only change affecting HEST is when the number of entries > increased from 1 to 2. > > Now, looking at bios-tables-test.c, if I got it right, I should be doing > something similar to the enclosed patch, right? > > If so, I have a couple of questions: > > 1. from where should I get the HEST table? dumping the table from the > running VM? > > 2. what values should I use to fill those variables: > > int hest_offset = 40 /* HEST */; > int hest_entry_size = 4; you don't need to do that, bios-tables-test will dump all ACPI tables for you automatically, you only need to add or extend a test with ras=on option. 1: 1st add empty table and whitelist it ("tests/data/acpi/aarch64/virt/HEST") 2: enable ras in existing tescase --- a/tests/qtest/bios-tables-test.c +++ b/tests/qtest/bios-tables-test.c @@ -2123,7 +2123,8 @@ static void test_acpi_aarch64_virt_tcg(void) data.smbios_cpu_max_speed = 2900; data.smbios_cpu_curr_speed = 2700; test_acpi_one("-cpu cortex-a57 " - "-smbios type=4,max-speed=2900,current-speed=2700", &data); + "-smbios type=4,max-speed=2900,current-speed=2700 " + "-machine ras=on", &data); free_test_data(&data); } then with installed IASL run V=1 QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/bios-tables-test to see diff 3: rebuild tables and follow the rest of procedure to update expected blobs as described in comment at the top of (tests/qtest/bios-tables-test.c) I'd recommend to add 3 patches as the beginning of the series, that way we can be sure that if something changes unintentionally it won't go unnoticed. > > > > > > > > > That indeed doesn't address testing the error data storage which would be > > > a different problem. > > > > I'd skip hardware_errors/CPER testing from QEMU unit tests. > > That's basically requires functioning 'APEI driver' to test that. > > > > Maybe we can use Ani's framework to parse HEST and all the way > > towards CPER record(s) traversal, but that's certainly out of > > scope of this series. > > It could be done on top, but I won't insist even on that > > since Mauro's out of tree error injection testing will > > cover that using actual guest (which I assume he would > > like to run periodically). > > Yeah, my plan is to periodically test it. I intend to setup somewhere > a CI to test Kernel, QEMU and rasdaemon altogether. > > Thanks, > Mauro > > --- > > diff --git a/tests/qtest/bios-tables-test.c b/tests/qtest/bios-tables-test.c > index 0a333ec43536..31e69d906db4 100644 > --- a/tests/qtest/bios-tables-test.c > +++ b/tests/qtest/bios-tables-test.c > @@ -210,6 +210,8 @@ static void test_acpi_fadt_table(test_data *data) > uint32_t val; > int dsdt_offset = 40 /* DSDT */; > int dsdt_entry_size = 4; > + int hest_offset = 40 /* HEST */; > + int hest_entry_size = 4; > > g_assert(compare_signature(&table, "FACP")); > > @@ -242,6 +244,12 @@ static void test_acpi_fadt_table(test_data *data) > /* update checksum */ > fadt_aml[9 /* Checksum */] = 0; > fadt_aml[9 /* Checksum */] -= acpi_calc_checksum(fadt_aml, fadt_len); > + > + > + > + acpi_fetch_table(data->qts, &table.aml, &table.aml_len, > + fadt_aml + hest_offset, hest_entry_size, "HEST", true); > + g_array_append_val(data->tables, table); > } > > static void dump_aml_files(test_data *data, bool rebuild) > @@ -2411,7 +2419,7 @@ static void test_acpi_aarch64_virt_oem_fields(void) > }; > char *args; > > - args = test_acpi_create_args(&data, "-cpu cortex-a57 "OEM_TEST_ARGS); > + args = test_acpi_create_args(&data, "-ras on -cpu cortex-a57 > "OEM_TEST_ARGS); > data.qts = qtest_init(args); > test_acpi_load_tables(&data); > test_oem_fields(&data); >