Hi Andrew
On 1/2/17 23:37, Andrew Cooper wrote:
On 31/12/2016 05:45, Suravee Suthikulpanit wrote:
[...]
+/*
+ * Note: Current max index allowed for physical APIC ID table is 255.
+ */
+#define AVIC_PHY_APIC_ID_MAX 0xFF
+
+#define AVIC_DOORBELL 0xc001011b
+
+#define AVIC_HPA_SHIFT 12
+#define AVIC_HPA_MASK (((1ULL << 40) - 1) << AVIC_HPA_SHIFT)
What does this mask represent? I have spotted a truncation bug below,
but how to fix the issue depends on the meaning of the mask.
The only limits I can see in the spec is that the host physical
addresses must be present in system RAM, which presumably means they
should follow maxphysaddr like everything else?
Please see below.
[...]
+/*
+ * Note:
+ * Currently, svm-avic mode is not supported with nested virtualization.
+ * Therefore, it is not yet currently enabled by default. Once the support
+ * is in-place, this should be enabled by default.
+ */
+bool svm_avic = 0;
+boolean_param("svm-avic", svm_avic);
We are trying to avoid the proliferation of top-level options. Please
could you introduce a "svm=" option which takes avic as a sub-boolean,
similar to how iommu= currently works?
Sure, I will introduce the custom_param("svm", parse_svm_param).
[...]
>> +{
+ struct avic_phy_apic_id_ent *avic_phy_apic_id_table;
+ struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
+
+ if ( !d->avic_phy_apic_id_table_mfn )
+ return NULL;
+
+ /*
+ * Note: APIC ID = 0xff is used for broadcast.
+ * APIC ID > 0xff is reserved.
+ */
+ if ( index >= 0xff )
+ return NULL;
+
+ avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn);
This is unsafe and will break on hosts with more than 5TB of RAM.
As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(),
you must use {map,unmap}_domain_page() to get temporary mappings (which
will use mfn_to_virt() in the common case), or use
{map,unmap}_domain_page_global() to get permanent mappings.
As the values in here need modifying across a schedule, I would suggest
making a global mapping at create time, and storing the result in a
properly-typed pointer.
Thanks for pointing this out. I'll update both logical and physical APIC
ID table to use the {map,unmap}_domain_page_global() instead.
[...]
+
+ if ( !svm_avic )
+ return 0;
|| !has_vlapic(d)
HVMLite domains may legitimately be configured without an LAPIC at all.
Hm... That means I would also need to check this in
svm_avic_dom_destroy() and svm_avic_init_vmcb().
[...]
+
+static inline u32 *
+avic_get_bk_page_entry(const struct vcpu *v, u32 offset)
+{
+ const struct vlapic *vlapic = vcpu_vlapic(v);
+ char *tmp;
+
+ if ( !vlapic || !vlapic->regs_page )
+ return NULL;
+
+ tmp = (char *)page_to_virt(vlapic->regs_page);
+ return (u32 *)(tmp + offset);
This is also unsafe over 5TB. vlapic->regs is already a global mapping
which you should use.
Good point.
Also, you should leave a comment by the allocation of regs_page that
AVIC depends on it being a full page allocation.
Ok, I will add a comment in arch/x86/hvm/vlapic.c: vlapic_init().
+}
+
+void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data)
+{
+ const struct arch_svm_struct *s = &v->arch.hvm_svm;
+
+ s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK;
+ s->vmcb->cleanbits.fields.avic = 0;
While I can appreciate what you are trying to do here, altering the
physical address in IA32_APICBASE has never functioned properly under
Xen, as nothing updates the P2M. Worse, with multi-vcpu guests, the use
of a single shared p2m makes this impossible to do properly.
Ahh.. Good to know
I know it is non-architectural behaviour, but IMO vlapic_msr_set()
should be updated to raise #GP[0] when attempting to change the frame,
to make it fail obviously rather than having interrupts go missing.
Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the
same mistake.
Okay, it seems that the hvm_msr_write_intercept() has already handled
this isn't it? So, I should be able to pretty much removing the handling
for the MSR here.
+}
+
+int svm_avic_init_vmcb(struct vcpu *v)
+{
+ paddr_t ma;
+ u32 *apic_id_reg;
+ struct arch_svm_struct *s = &v->arch.hvm_svm;
+ struct vmcb_struct *vmcb = s->vmcb;
+ struct svm_domain *d = &v->domain->arch.hvm_domain.svm;
+ const struct vlapic *vlapic = vcpu_vlapic(v);
+ struct avic_phy_apic_id_ent entry;
+
+ if ( !svm_avic )
+ return 0;
+
+ if ( !vlapic || !vlapic->regs_page )
+ return -EINVAL;
Somewhere you need a bounds check against the APIC ID being within AVIC
limits.
We are checking the bound in the avic_get_phy_apic_id_ent().
+ if ( !s->avic_last_phy_id )
+ return -EINVAL;
Why does a pointer into the physical ID page need storing in
arch_svm_struct?
This was so that we can quickly access the physical APIC ID entry to
update them w/o having to call avic_get_phy_apic_id_entry().
+
+ vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK;
This use of AVIC_HPA_MASK may truncate the the address, which is
definitely a problem.
I'm not quite sure that I got the truncation that you pointed out. Could
you please elaborate?
If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to
pass appropriate memflags into the alloc_domheap_page() calls to get a
suitable page.
If by "maxphysaddr" is the 52-bit physical address limit for the PAE
mode, then I think that's related.
+
+ entry = *(s->avic_last_phy_id);
+ smp_rmb();
+ entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT;
+ entry.is_running = 0;
+ entry.valid = 1;
+ *(s->avic_last_phy_id) = entry;
+ smp_wmb();
During domain creation, no guests are running, so you can edit this cleanly.
>
What values are actually being excluded here? This, and other patches,
look like you are actually just trying to do a read_atomic(), modify,
write_atomic() update, rather than actually requiring ordering.
Besides the read_atomic(), modify write_atomic() to update the entry. I
also want to make sure that the compiler won't shuffle the order around,
which I thought can be achieved via smp_rmb() and smp_wmb().
[...]
@@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr,
uint64_t msr_content)
switch ( msr )
{
+ case MSR_IA32_APICBASE:
+ if ( svm_avic_vcpu_enabled(v) )
+ svm_avic_update_vapic_bar(v, msr_content);
+ break;
I think this is dead code. MSR_IA32_APICBASE is handled in
hvm_msr_write_intercept(), and won't fall into the vendor hook. If you
were to continue down this route, I would add another pi_ops hook, and
replace the VMX layering violation in vlapic_msr_set().
I see now. I will remove the code added here.
Thanks,
Suravee
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel