Event delivery and "domain blocking" on PVHv2

Martin Lucina Mon, 15 Jun 2020 07:27:27 -0700

Hi,

puzzle time: In my continuing explorations of the PVHv2 ABIs for the newMirageOS Xen stack, I've run into some issues with what looks likemissed deliveries of events on event channels.

While a simple unikernel that only uses the Xen console and effectivelydoes for (1..5) { printf("foo"); sleep(1); } works fine, once I plug inthe existing OCaml Xenstore and Netfront code, the behaviour I see isthat the unikernel hangs in random places, blocking as if an event thatshould have been delivered has been missed.

Multiple runs of the unikernel have it block at different places duringNetfront setup, and sometimes it will run as far as a fully setupNetfront, and then wait for network packets. However, even if it getsthat far, packets are not actually being delivered:


Solo5: Xen console: port 0x2, ring @0x00000000FEFFF000
            |      ___|
  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.6.5-6-gf4b47d11
Solo5: Memory map: 256 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x28ffff)
Solo5:     rodata @ (0x290000 - 0x2e0fff)
Solo5:       data @ (0x2e1000 - 0x3fafff)
Solo5:       heap >= 0x3fb000 < stack < 0x10000000
gnttab_init(): pages=1 entries=256
2020-06-15 13:42:08 -00:00: INF [net-xen frontend] connect 0

Sometimes we hang here

2020-06-15 13:42:08 -00:00: INF [net-xen frontend] create: id=0 domid=0

2020-06-15 13:42:08 -00:00: INF [net-xen frontend] sg:truegso_tcpv4:true rx_copy:true rx_flip:false smart_poll:false2020-06-15 13:42:08 -00:00: INF [net-xen frontend] MAC:00:16:3e:30:49:52

Or here

gnttab_grant_access(): ref=0x8, domid=0x0, addr=0x8f9000, readonly=0
gnttab_grant_access(): ref=0x9, domid=0x0, addr=0x8fb000, readonly=0
evtchn_alloc_unbound(remote=0x0) = 0x4

2020-06-15 13:42:08 -00:00: INF [ethernet] Connected Ethernet interface00:16:3e:30:49:522020-06-15 13:42:08 -00:00: INF [ARP] Sending gratuitous ARP for10.0.0.2 (00:16:3e:30:49:52)

gnttab_grant_access(): ref=0xa, domid=0x0, addr=0x8fd000, readonly=1

2020-06-15 13:42:08 -00:00: INF [udp] UDP interface connected on10.0.0.22020-06-15 13:42:08 -00:00: INF [tcpip-stack-direct] stack assembled:mac=00:16:3e:30:49:52,ip=10.0.0.2

Gntref.get(): Waiting for free grant
Gntref.get(): Waiting for free grant

The above are also rather odd, but not related to event channeldelivery, so one problem at a time...Once we get this far, packets should be flowing but aren't (eitherway). However, Xenstore is obviously working, as we wouldn't getthrough Netfront setup without it.

Given that I've essentially re-written the low-level event channel Ccode, I'd like to verify that the mechanisms I'm using for eventdelivery are indeed the right thing to do on PVHv2.


For event delivery, I'm registering the upcall with Xen as follows:

    uint64_t val = 32ULL;
    val |= (uint64_t)HVM_PARAM_CALLBACK_TYPE_VECTOR << 56;
    int rc = hypercall_hvm_set_param(HVM_PARAM_CALLBACK_IRQ, val);
    assert(rc == 0);

i.e. upcalls are to be delivered via IDT vector.

Questions:

1. Being based on the Solo5 virtio code, the low-level setup code isdoing the "usual" i8259 PIC setup, to remap the PIC IRQs to vectors 32and above. Should I be doing this initialisation for Xen PVH at all? I'mnot interested in using the PIC for anything, and all interrupts will bedelivered via Xen event channels.

2. Related to the above, the IRQ handler code is ACKing the interruptafter the handler runs. Should I be doing that? Does ACKing "IRQ" 0 onthe PIC have any interactions with Xen's view of event channels/pendingupcalls?

Next, for a PVHv2, uniprocessor only guest, is the following flowsufficient to unmask an event channel?


    struct shared_info *s = SHARED_INFO();
    int pending = 0;

    atomic_sync_btc(port, &s->evtchn_mask[0]);
    pending = sync_bt(port, &s->evtchn_mask[0]);
    if (pending) {
        /*
         * Slow path:
         *

* If pending is set here, then there was a race, and we lostthe* upcall. Mask the port again and force an upcall via a callto

         * hyperspace.
         *

* This should be sufficient for HVM/PVHv2 based on myunderstanding of

         * Linux drivers/xen/events/events_2l.c.
         */
        atomic_sync_bts(port, &s->evtchn_mask[0]);
        hypercall_evtchn_unmask(port);
    }

Lastly, the old PV-only Mini-OS based stack would do delays ("block thedomain") by doing a HYPERVISOR_set_timer_op(deadline) followed by aHYPERVISOR_sched_op(SCHEDOP_block,0 ). In the new code, I'm doing thefollowing (based on what Mini-OS seems to be doing for HVM):


    solo5_time_t deadline = Int64_val(v_deadline);

    if (solo5_clock_monotonic() < deadline) {
        hypercall_set_timer_op(deadline);
        __asm__ __volatile__ ("hlt" : : : "memory");
        /* XXX: cancel timer_op here if woken up early? */
    }

Again, is this the right thing to do for PVH?

As the comment says, do I need to cancel the timer_op? I understood thesemantics to be "fire once at/after the time deadline is reached", ifthat is indeed the case then with my current VIRQ_TIMER handler whichdoes nothing in the interrupt context and has no side effects I shouldbe fine.

I can also post the code that does the actual demuxing of events fromXen (i.e. the upcall handler), but I've read through that several timesnow and I don't think the problem is there; adding diagnostic prints toboth the low-level C event channel code and higher-level OCamlActivations code confirms that received events are being mapped to theirports correctly.


Any advice much appreciated,

Thanks,

Martin

Event delivery and "domain blocking" on PVHv2

Reply via email to