On 10/24/23 02:50, Clément Chigot wrote:
Hi Richard,

This commit has broken some of our internal bareboard testing on
Risc-V 64. At some point in our programs, there is an AMOSWAP (=
atomic swap) instruction on I/O. But since this commit, can_do_io is
set to false triggering an infinite loop.
IIUC the doc (cf [1]), atomic operations on I/O are allowed.

I think there is a CF_LAST_IO flag missing somewhere to allow it, but
I'm not sure where this should be. Do you have any ideas ?

Sadly I cannot provide a reproducer that easily, mainly because our
microchip has a few patches not yet merged making our binaries not
running on the upstream master.
But here is a bit of the in_asm backtrace:

   | IN: system__bb__riscv_plic__initialize
   | Priv: 3; Virt: 0
   | 0x80000eb4:  1141              addi                    sp,sp,-16
   | 0x80000eb6:  0c0027b7          lui                     a5,49154
   | 0x80000eba:  e406              sd                      ra,8(sp)
   | 0x80000ebc:  00010597          auipc                   a1,16
             # 0x80010ebc
   | 0x80000ec0:  47458593          addi                    a1,a1,1140
   | 0x80000ec4:  f3ffe637          lui                     a2,-49154
   | 0x80000ec8:  01878693          addi                    a3,a5,24
   | 0x80000ecc:  00f58733          add                     a4,a1,a5
   | 0x80000ed0:  9732              add                     a4,a4,a2
   | 0x80000ed2:  4318              lw                      a4,0(a4)
   | 0x80000ed4:  2701              sext.w                  a4,a4
   | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
   | 0x80000eda:  0791              addi                    a5,a5,4
   | 0x80000edc:  fed798e3          bne                     a5,a3,-16
             # 0x80000ecc
   |
   | ----------------
   | IN: system__bb__riscv_plic__initialize
   | Priv: 3; Virt: 0
   | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
   |
   | ----------------
   | IN: system__bb__riscv_plic__initialize
   | Priv: 3; Virt: 0
   | 0x80000ed6:  08e7a02f          amoswap.w               zero,a4,(a5)
   | * Freeze *

I would expect two translations:

(1) with the original TB, aborts execution on !can_do_io.
(2) with the second TB, we get further into the actual execution and abort execution on TLB_MMIO.
(3) with the third TB, we clear CF_PARALLEL and execute under 
cpu_exec_step_atomic.

Both 2 and 3 should have had CF_LAST_IO set.
You can verify this with '-d exec' output.

As a trivial example from qemu-system-alpha bios startup:

Trace 0: 0x7f2584008380 [00000000/fffffc0000003ee4/01000000/ff000000] 
uart_init_line
cpu_io_recompile: rewound execution of TB to fffffc0000003ee4
----------------
IN: uart_init_line
0xfffffc0000003f20:  stb        t8,0(t6)

Trace 0: 0x7f2584008a00 [00000000/fffffc0000003f20/01000000/ff018001] 
uart_init_line

Note that the final "/" field is cflags. The first "Trace" corresponds to (1), where the store is in the middle of the TB. You can see the io_recompile abort, then the second "Trace" contains {CF_COUNT=1, CF_LAST_IO, CF_MEMI_ONLY}.

In the short term, try adding CF_LAST_IO to cflags in cpu_exec_step_atomic.

I think probably the logic of CF_LAST_IO should always be applied now, since can_do_io is always live, and thus the flag itself should go away.


r~

Reply via email to