On 16 March 2015 at 20:08, Emilio G. Cota <c...@braap.org> wrote: > Removing the call to tlb_fill() on a TLB miss solves the problem. > But of course this also means the helper doesn't work as intended. > > I fail to see why calling tlb_fill() from the helper causes > trouble. What I thought would happen is that the exception > (if any) is started from the helper, gets serviced, and then > both the helper and the subsequent store hit in the TLB. I was > seeing this as a "TLB prefetch", but I cannot make it work.
This isn't how tlb_fill handles page faults. What happens is: 1. tlb_fill calls arm_cpu_handle_mmu_fault to do the page table walk 2. if the page table indicates that the vaddr is invalid (ie we need to deliver a guest exception) then we return non-zero to tlb_fill 3. tlb_fill calls cpu_restore_state, passing it the address in generated TCG code where we were when the exception happened; magic happens here to fix up the CPU state (notably the guest PC) to the exact correct values at the guest load/store insn that caused the fault [using the (host) retaddr to determine exactly where in the TB we were and so which guest insn faulted] 3. tlb_fill calls raise_exception, which calls cpu_loop_exit 4. cpu_loop_exit *longjmps out of anything you were in the middle of*, back to the top level loop in cpu-exec.c 5. based on the changes to the CPU state made before calling cpu_loop_exit, the main loop determines that there's a pending exception, and resumes execution at the exception entry point 6. the guest OS may or may not end up fixing up the page tables and reattempting execution of whatever failed, but that's entirely up to it and might never happen I suspect your problem is that the host retaddr in step 3 is wrong, which will result in our generating the guest exception with a wrong value for "guest PC at point of fault". Linux makes extensive use of "if guest PC for this fault is in this magic bit of code then fix up the result so it looks like this kernel load/store accessor function returned -EFAULT". If you're reporting the wrong guest PC this won't work and the kernel will end up in the default case path of it being an unexpected kernel mode data abort and Oopsing. I suggest you check whether the exception PC reported to the guest is correct (it's probably reported by the kernel somewhere in the oops output) compared to the addresses in the kernel of the load/store/whatever that's faulted. thanks -- PMM