On 2020-06-15 17:03, Roger Pau Monné wrote:
This way of event channel injection is slitgly hackish, and I would
recommend using HVMOP_set_evtchn_upcall_vector, that way vectors will
be properly routed using the lapic.
Using HVM_PARAM_CALLBACK_TYPE_VECTOR vectors are injected without
setting the IRR/ISR bit in the lapic registers.
I picked HVM_PARAM_CALLBACK_TYPE_VECTOR since that seemed like the
easiest option for a uniprocessor, PVH-only guest.
What does <vector> mean in the context of
HVMOP_set_evtchn_upcall_vector? If it's an APIC vector number (sorry,
not too familiar with the post-legacy i8259 world), does that imply that
if I use HVMOP_set_evtchn_upcall_vector I need to do some APIC
initialisation / programming?
All I need for Solo5/Mirage purposes is to have the upcall land on IDT
vector #32 or above.
Questions:
1. Being based on the Solo5 virtio code, the low-level setup code is
doing
the "usual" i8259 PIC setup, to remap the PIC IRQs to vectors 32 and
above.
Should I be doing this initialisation for Xen PVH at all?
Hm, there are no IO-APICs (or legacy PICs) on a PVH domU, so there's
not much to route. If Solo5 is thinking it's somehow configuring them
it's likely writing to some hole in memory, or to some RAM.
Solo5 only has a very primitive understanding of hardware interrupts, so
it's only configuring the legacy PICs via port IO.
IO-APIC presence is signaled on the ACPI MADT table on PVH domU.
Hmm, it'd be very unfortunate if I had to deal with ACPI, so here's
hoping that I don't actually need to touch the APIC.
I'm not interested
in using the PIC for anything, and all interrupts will be delivered
via Xen
event channels.
2. Related to the above, the IRQ handler code is ACKing the interrupt
after
the handler runs. Should I be doing that? Does ACKing "IRQ" 0 on the
PIC
have any interactions with Xen's view of event channels/pending
upcalls?
Which kind of ACking it's doing? Is it writing to the lapic EOI
register? If so that would be OK when using
HVMOP_set_evtchn_upcall_vector. If using
HVM_PARAM_CALLBACK_TYPE_VECTOR there's nothing to Ack I think.
Legacy PIC EOI via port IO.
Next, for a PVHv2, uniprocessor only guest, is the following flow
sufficient
to unmask an event channel?
struct shared_info *s = SHARED_INFO();
int pending = 0;
atomic_sync_btc(port, &s->evtchn_mask[0]);
pending = sync_bt(port, &s->evtchn_mask[0]);
You should check for pending interrupts on evtchn_pending, not
evtchn_mask.
Ah, thanks for spotting that! Fixed, but just that change by itself does
not seem to change the observed behaviour in any way.
if (pending) {
/*
* Slow path:
*
* If pending is set here, then there was a race, and we lost
the
* upcall. Mask the port again and force an upcall via a call
to
* hyperspace.
*
* This should be sufficient for HVM/PVHv2 based on my
understanding
of
* Linux drivers/xen/events/events_2l.c.
*/
atomic_sync_bts(port, &s->evtchn_mask[0]);
hypercall_evtchn_unmask(port);
}
FWIW, I use the hypercall unconditionally on FreeBSD because I didn't
see a performance difference when compared to this method.
Lastly, the old PV-only Mini-OS based stack would do delays ("block
the
domain") by doing a HYPERVISOR_set_timer_op(deadline) followed by a
HYPERVISOR_sched_op(SCHEDOP_block,0 ). In the new code, I'm doing the
following (based on what Mini-OS seems to be doing for HVM):
solo5_time_t deadline = Int64_val(v_deadline);
if (solo5_clock_monotonic() < deadline) {
hypercall_set_timer_op(deadline);
__asm__ __volatile__ ("hlt" : : : "memory");
/* XXX: cancel timer_op here if woken up early? */
}
Again, is this the right thing to do for PVH?
hlt will trap into the hypervisor, so it's fine to use.
Thanks for confirming.
As the comment says, do I need to cancel the timer_op?
I'm not sure. Keep in mind that a new call to hypercall_set_timer_op
will overwrite the previous timer, and hence should be fine I think as
long as you are using the one-shot timer.
Is there something other than a one-shot timer? hypercall_set_timer_op
is just a single-argument hypercall with an uint64_t deadline, right?
(And not documented in xen.h either ...)
I understood the
semantics to be "fire once at/after the time deadline is reached", if
that
is indeed the case then with my current VIRQ_TIMER handler which does
nothing in the interrupt context and has no side effects I should be
fine.
I have no idea how Solo5 works, maybe you should re-set the timer to
the next deadline in the handler?
Or that's fine because the timer is always set before blocking.
Yes, it's always set before blocking, and we always unblock after the
first interrupt (i.e. some event) is received, so should be fine.
I can also post the code that does the actual demuxing of events from
Xen
(i.e. the upcall handler), but I've read through that several times
now and
I don't think the problem is there; adding diagnostic prints to both
the
low-level C event channel code and higher-level OCaml Activations code
confirms that received events are being mapped to their ports
correctly.
I can take a look at the event channel handler if you want, as you
wish. The only weird think I've noticed is the error in the unmask
where you seem to use evtchn_mask instead of evtchn_pending.
Thanks for the offer, this stuff is fairly subtle.
In the Mirage/Xen scenario, there are two parts to the upcall handler.
The top half is executed in interrupt context and basically does nothing
except acknowledge the upcall:
int solo5__xen_evtchn_vector_handler(void *arg __attribute__((unused)))
{
struct vcpu_info *vi = VCPU0_INFO();
/*
* All work is done outside of interrupt context by
evtchn_demux_pending(),
* so all we need to do here is acknowledge the upcall from Xen.
*/
vi->evtchn_upcall_pending = 0;
return 1;
}
The bottom half is then called periodically (and always before blocking)
by the OCaml code:
static bool evtchn_demux_pending(void)
{
struct shared_info *s = SHARED_INFO();
struct vcpu_info *vi = VCPU0_INFO();
bool some_pending = false;
vi->evtchn_upcall_pending = 0;
/*
* Demux events received from Xen.
*
* pending_l1 is the "outer" per-VCPU selector (evtchn_pending_sel).
* pending_l2 is the "inner" system-wide word (evtchn_pending[l1i]).
*/
xen_ulong_t pending_l1, pending_l2;
atomic_sync_xchg(&vi->evtchn_pending_sel, 0, &pending_l1);
while (pending_l1 != 0) {
xen_ulong_t l1i = ffs(pending_l1);
/*
* Masking pending_l2 with ~evtchn_mask[l1i] is necessary to get
the
* *current* masked events; otherwise races like the following
* can occur:
*
* 1. X is generated, upcall scheduled by Xen.
* 2. X is masked.
* 3. Upcall is delivered.
* 4. X fires despite now being masked.
*/
while ((pending_l2 =
(s->evtchn_pending[l1i] & ~s->evtchn_mask[l1i])) !=
0) {
xen_ulong_t l2i = ffs(pending_l2);
evtchn_port_t port = (l1i * (sizeof(xen_ulong_t) * 8)) +
l2i;
/*
* Mark as pending on the OCaml side.
*/
evtchn_callback_ml[port] = 1;
some_pending = true;
atomic_sync_btc(l2i, &s->evtchn_pending[l1i]);
}
pending_l1 &= ~(1UL << l1i);
}
return some_pending;
}
I would also recommend using HVMOP_set_evtchn_upcall_vector instead of
HVM_PARAM_CALLBACK_TYPE_VECTOR. In order to make some tools happy just
set HVM_PARAM_CALLBACK_TYPE_VECTOR to 1. You can take a look at how
Xen does it when running as a guest:
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/guest/xen/xen.c;h=2ff63d370a8a12fef166677e2ded7ed82a628ce8;hb=HEAD#l205
Thanks for the pointer. As I write above, if I can use
HVMOP_set_evtchn_upcall_vector w/o doing too much "extra work" on the
guest side then I will go with that.
Roger.
-mato