On 2020-06-19 01:43, Andrew Cooper wrote:
On 18/06/2020 11:13, Martin Lucina wrote:
On Monday, 15.06.2020 at 17:58, Andrew Cooper wrote:
On 15/06/2020 15:25, Martin Lucina wrote:
Hi,
puzzle time: In my continuing explorations of the PVHv2 ABIs for the
new MirageOS Xen stack, I've run into some issues with what looks
like
missed deliveries of events on event channels.
While a simple unikernel that only uses the Xen console and
effectively does for (1..5) { printf("foo"); sleep(1); } works fine,
once I plug in the existing OCaml Xenstore and Netfront code, the
behaviour I see is that the unikernel hangs in random places,
blocking
as if an event that should have been delivered has been missed.
You can see what is going on, event channel wise, with the 'e'
debug-key. This will highlight cases such as the event channel being
masked and pending, which is a common guest bug ending up in this
state.
Ok, based on your and Roger's suggestions I've made some changes:
1. I've dropped all the legacy PIC initialisation code from the Solo5
parts, written some basic APIC initialisation code and switched to
using
HVMOP_set_evtchn_upcall_vector for upcall registration, along with
setting
HVM_PARAM_CALLBACK_IRQ to 1 as suggested by Roger and done by Xen when
running as a guest. Commit at [1], nothing controversial there.
Well...
uint64_t apic_base = rdmsrq(MSR_IA32_APIC_BASE);
wrmsrq(MSR_IA32_APIC_BASE,
apic_base | (APIC_BASE << 4) | MSR_IA32_APIC_BASE_ENABLE);
apic_base = rdmsrq(MSR_IA32_APIC_BASE);
if (!(apic_base & MSR_IA32_APIC_BASE_ENABLE)) {
log(ERROR, "Solo5: Could not enable APIC or not present\n");
assert(false);
}
The only reason Xen doesn't crash your guest on that WRMSR is because
0xfee00080ull | (0xfee00000u << 4) == 0xfee00080ull, due to truncation
and 0xfe | 0xee == 0xfe.
Either way, the logic isn't correct.
Oh, thanks. Don't you wish C had a "strict" mode where you could
disable/warn
on implicit type promotion? I certainly do.
Xen doesn't support moving the APIC MMIO window (and almost certainly
never will, because the only thing which changes it is malware). You
can rely on the default state being correct, because it is
architecturally specified.
Noted. I'll change the code to just verify that APIC_BASE is indeed
FEE00000
at start of day and that the enable operation succeeded -- I like to
keep the
code robust, e.g. against cut-n-pasting to somewhere else that might be
used
in a non-Xen context later where the precondition may not hold.
Martin
~Andrew