Re: [PATCH v3 00/29] PowerPC interrupt rework

Matheus K. Ferst Fri, 21 Oct 2022 07:23:25 -0700

On 21/10/2022 07:56, Daniel Henrique Barboza wrote:

Matheus,
I did some digging yesterday. There are 2 distinct things happening:
- the apparent problem with the avocado test. After doing more and moretests
it seems like the test failure rate is lower than 10%. With a simple script
to exercise it in my laptop:

n=1
while [ 1 ]; do
        make -j check-avocado \
AVOCADO_TESTS='tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500' ;
        if [ $? -ne 0 ]; then
                echo "test failed after $n interactions"
                exit 1
        fi
        ((n=n+1))
done
In master I managed to get up to 100+ runs without failure. Sometimes Iget 90,50, 30 runs before failure and so on. This is an OK failure rate in myopinion,so if any code contribution does not dramatically increase this failurerate I'm
fine with it. This also means that I'll not be skipping the test.

Thanks for this testing, I suspect we may have more than one bug thatcauses this test failure.

- back to this series, I couldn't manage to get a single successful runwith
patch 27 applied. On the other hand, running the aforementioned script with
patches 1-26 I just got 96 test runs before the first failure. This isenoughevidence for me to believe that, yeah, patch 27 is really doingsomething that is
messing with the icount replay for e500 one way or the other.

Patch 27 is definitely wrong - other places that write in specialregisters and SPRs that may cause an interrupt (e.g.,gen_helper_store_decr, gen_mtmsr[d]) call gen_io_start, so we alsoshould use it before helper_ppc_maybe_interrupt. Without that call, wehit the cpu_abort in icount_handle_interrupt when using icount ifwritee[i] unmasks a pending interrupt.

The current writee[i] may be wrong in not calling it too, as it maycause an interrupt to be delivered. However, before the interruptrework, CPU_INTERRUPT_HARD was set somewhere else, so it wouldn'ttrigger the abort.

That said, even after adding this call I still see failures after ~200iterations of this test, so we may have more problems to tackle here.However, it's not a CPU abort anymore, the second QEMU invocation exitswith zero without writing anything to the console.

All that said, patches 1-26 are queued in ppc-next.


On 10/20/22 10:40, Matheus K. Ferst wrote:
On 20/10/2022 08:18, Daniel Henrique Barboza wrote:
On 10/19/22 18:55, Daniel Henrique Barboza wrote:
Matheus,
This series fails 'make check-avocado' in an e500 test. This is theerror output:
Scrap that.
This avocado test is also failing on master 10% of the time, give ortake.It might be case that patch 27 makes the failure more consistent, butI can't
say it's the culprit.
I'll take a closer look and see if I can diagnose one particularcommit thatis making the patch fail 1 out of 10 times. It can be case where Imight need
to skip the test altogether.
Nice catch. I guess we need a gen_icount_io_start before callinghelper_ppc_maybe_interrupt, so maybe it's better to make agen_ppc_maybe_interrupt that calls icount and the helper. I'll give ita bit more testing and re-spin the series.
Don't need to re-spin everything (unless you needed to do some changes in
the patches prior). Just resend patch 27+.


Ok, I'll send 27-29 with based on ppc-next.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

Re: [PATCH v3 00/29] PowerPC interrupt rework

Reply via email to