On 07/17/15 15:48, Marc Zyngier wrote: > On Fri, 17 Jul 2015 14:39:55 +0100 > Laszlo Ersek <ler...@redhat.com> wrote: > >> On 07/17/15 15:28, Marc Zyngier wrote: >>> On Fri, 17 Jul 2015 10:30:38 +0100 >>> Paolo Bonzini <pbonz...@redhat.com> wrote: >>> >>>> >>>> >>>> On 17/07/2015 06:44, Paolo Bonzini wrote: >>>>> >>>>> >>>>> On 16/07/2015 21:05, Richard W.M. Jones wrote: >>>>>> >>>>>> Sorry to spoil things, but I'm still seeing this bug, although it is >>>>>> now a lot less frequent with your patch. I would estimate it happens >>>>>> more often than 1 in 5 runs with qemu.git, and probably 1 in 200 runs >>>>>> with qemu.git + the v2 patch series. >>>>>> >>>>>> It's the exact same hang in both cases. >>>>>> >>>>>> Is it possible that this patch doesn't completely close any race? >>>>>> >>>>>> Still, it is an improvement, so there is that. >>>>> >>>>> Would seem at first glance like a different bug. >>>>> >>>>> Interestingly, adding some "tracing" (qemu_clock_get_ns) makes the bug >>>>> more likely: now it reproduces in about 10 tries. Of course :) adding >>>>> other kinds of tracing instead make it go away again (>50 tries). >>>>> >>>>> Perhaps this: >>>>> >>>>> i/o thread vcpu thread worker thread >>>>> --------------------------------------------------------------------- >>>>> lock_iothread >>>>> notify_me = 1 >>>>> ... >>>>> unlock_iothread >>>>> lock_iothread >>>>> notify_me = 3 >>>>> ppoll >>>>> notify_me = 1 >>>>> bh->scheduled = 1 >>>>> event_notifier_set >>>>> event_notifier_test_and_clear >>>>> ppoll >>>>> ^^ hang >>>>> >>>>> In the exact shape above, it doesn't seem too likely to happen, but >>>>> perhaps there's another simpler case. Still, the bug exists. >>>>> >>>>> The above is not really related to notify_me. Here the notification is >>>>> not being optimized away! So I wonder if this one has been there forever. >>>>> >>>>> Fam suggested putting the event_notifier_test_and_clear before >>>>> aio_bh_poll(), but it does not work. I'll look more close >>>>> >>>>> However, an unconditional event_notifier_test_and_clear is pretty >>>>> expensive. On one hand, obviously correctness comes first. On the >>>>> other hand, an expensive operation at the wrong place can mask the race >>>>> very easily; I'll let the fix run for a while, but I'm not sure if a >>>>> successful test really says anything useful. >>>> >>>> So it may not be useful, but still successful test is successful. :) >>>> The following patch, which also includes the delta between v2 and v3 >>>> of this series, survived 674 reboots before hitting a definitely >>>> unrelated problem: >>>> >>>> error: kvm run failed Function not implemented >>>> PC=00000000bf671210 SP=00000000c00001f0 >>>> X00=000000000a003e70 X01=0000000000000000 X02=00000000bf680548 >>>> X03=0000000000000030 >>>> X04=00000000bbb5fc18 X05=00000000004b7770 X06=00000000bf721930 >>>> X07=000000000000009a >>>> X08=00000000bf716858 X09=0000000000000090 X10=0000000000000000 >>>> X11=0000000000000046 >>>> X12=00000000a007e03a X13=0000000000000000 X14=0000000000000000 >>>> X15=0000000000000000 >>>> X16=00000000bf716df0 X17=0000000000000000 X18=0000000000000000 >>>> X19=00000000bf6f5f18 >>>> X20=0000000000000000 X21=0000000000000000 X22=0000000000000000 >>>> X23=0000000000000000 >>>> X24=0000000000000000 X25=0000000000000000 X26=0000000000000000 >>>> X27=0000000000000000 >>>> X28=0000000000000000 X29=0000000000000000 X30=0000000000000000 >>>> PSTATE=60000305 (flags -ZC-) >>>> >>>> For the record, this is the kvm_run struct: >>>> >>>> $6 = {request_interrupt_window = 0 '\000', padding1 = >>>> "\000\000\000\000\000\000", exit_reason = 0, >>>> ready_for_interrupt_injection = 0 '\000', if_flag = 0 '\000', flags = 0, >>>> cr8 = 0, apic_base = 0, {hw = { >>>> hardware_exit_reason = 150994968}, fail_entry = >>>> {hardware_entry_failure_reason = 150994968}, ex = { >>>> exception = 150994968, error_code = 0}, io = {direction = 24 '\030', >>>> size = 0 '\000', port = 2304, >>>> count = 0, data_offset = 144}, debug = {arch = {<No data fields>}}, >>>> mmio = {phys_addr = 150994968, >>>> data = "\220\000\000\000\000\000\000", len = 4, is_write = 0 >>>> '\000'}, hypercall = {nr = 150994968, >>>> args = {144, 4, 0, 0, 0, 0}, ret = 0, longmode = 0, pad = 0}, >>>> tpr_access = {rip = 150994968, >>>> is_write = 144, pad = 0}, s390_sieic = {icptcode = 24 '\030', ipa = >>>> 2304, ipb = 0}, >>>> s390_reset_flags = 150994968, s390_ucontrol = {trans_exc_code = >>>> 150994968, pgm_code = 144}, dcr = { >>>> dcrn = 150994968, data = 0, is_write = 144 '\220'}, internal = >>>> {suberror = 150994968, ndata = 0, >>>> data = {144, 4, 0 <repeats 14 times>}}, osi = {gprs = {150994968, >>>> 144, 4, 0 <repeats 29 times>}}, >>>> papr_hcall = {nr = 150994968, ret = 144, args = {4, 0, 0, 0, 0, 0, 0, >>>> 0, 0}}, s390_tsch = { >>>> subchannel_id = 24, subchannel_nr = 2304, io_int_parm = 0, >>>> io_int_word = 144, ipb = 0, >>>> dequeued = 4 '\004'}, epr = {epr = 150994968}, system_event = {type >>>> = 150994968, flags = 144}, >>>> s390_stsi = {addr = 150994968, ar = 144 '\220', reserved = 0 '\000', >>>> fc = 0 '\000', sel1 = 0 '\000', >>>> sel2 = 0}, >>>> padding = >>>> "\030\000\000\t\000\000\000\000\220\000\000\000\000\000\000\000\004", >>>> '\000' <repeats 238 times>}, kvm_valid_regs = 0, kvm_dirty_regs = 0, s = >>>> {regs = {<No data fields>}, >>>> padding = '\000' <repeats 2047 times>}} >>>> >>>> Marc, does it ring any bell? >>> >>> Well, this is an example of a guest accessing non-memory using an >>> instruction that we cannot safely emulate - not an IO accessor (load >>> multiple, for example). >>> >>> In this case, we kill the guest (we could as well inject a fault). >>> >>> This vcpu seems to be accessing 0x9000018 (the mmio structure points >>> there), but I can't immediately relate it to the content of the >>> registers. >> >> [VIRT_UART] = { 0x09000000, 0x00001000 }, >> > > Still: there is nothing in the registers that remotely points to that > area. X0 is the closest, but it'd take a big negative offset to get > there. > > Is that a Linux kernel? or something else?
Assuming I managed to decipher the nesting of the contexts, the register dump was pasted by Paolo. IIRC Paolo has been using the guest firmware (the ArmVirtPkg/ArmVirtQemu.dsc platform build in edk2, also known as "AAVMF" sometimes) to reproduce the issue. Thanks Laszlo