Re: [Xen-devel] [PATCH L1TF v10 7/8] common/grant_table: block speculative out-of-bound accesses
I looked into these changes after a while again. I will split this larger commit into smaller ones, and address parts of the problem in each of them separately. On 3/29/19 18:11, Jan Beulich wrote: On 14.03.19 at 13:50, wrote: >> Guests can issue grant table operations and provide guest controlled >> data to them. This data is also used for memory loads. To avoid >> speculative out-of-bound accesses, we use the array_index_nospec macro >> where applicable. However, there are also memory accesses that cannot >> be protected by a single array protection, or multiple accesses in a >> row. To protect these, a nospec barrier is placed between the actual >> range check and the access via the block_speculation macro. >> >> Speculative execution is not blocked in case one of the following >> properties is true: >> - path cannot be triggered by the guest >> - path does not return to the guest >> - path does not result in an out-of-bound access >> - path cannot be executed repeatedly >> Only the combination of the above properties allows to actually leak >> continuous chunks of memory. Therefore, we only add the penalty of >> protective mechanisms in case a potential speculative out-of-bound >> access matches all the above properties. >> >> As different versions of grant tables use structures of different size, >> and the status is encoded in an array for version 2, speculative >> execution might perform out-of-bound accesses of version 2 while >> the table is actually using version 1. Hence, speculation is prevented >> when accessing new memory based on the grant table version. In cases, >> where no different memory locations are accessed on the code path that >> follow an if statement, no protection is required. No different memory >> locations are accessed in the following functionsi after a version check: >> >> * _set_status, as the header memory layout is the same > Isn't this rather by virtue of shared_entry_header() having got > hardened? I don't think the memory layout alone can serve as a > reason for there to be no issue - the position in memory matters > as well. To be on the safe side, I will add a fix here as well. > >> * unmap_common, as potentially touched memory locations are allocated >> and initialized > I can't seem to spot any explicit version checks in unmap_common(). > Do you mean unmap_common_complete()? If so I'm afraid I don't > understand what "allocated and initialized" is supposed to mean. > The version check there looks potentially problematic to me, at > least from a purely theoretical pov. That likely meant unmap_common_complete, and that one will be fixed. > >> * gnttab_grow_table, as the touched memory is the same for each >> branch after the conditionals > How that? gnttab_populate_status_frames() could be speculated > into for a v1 guest. > > Next there's a version check in gnttab_setup_table(), but the function > doesn't get changed and also isn't listed here. I will address both. > >> * gnttab_transfer, as no memory access depends on the conditional >> * release_grant_for_copy, as no out-of-bound access depends on this >> conditional > But you add evaluate_nospec() there, and memory accesses very well > look to depend on the condition, just not inside the bodies of the if/else. That seems to be a left over. This function is actually fixed. > >> * gnttab_set_version, as in case of a version change all the memory is >> touched in both cases > And you're sure speculation through NULL pointers is impossible? And > the offset-into-table differences between v1 and v2 don't matter? Yes, I think this is good enough. > >> * gnttab_release_mappings, as this function is called only during domain >> destruction and control is not returned to the guest >> * mem_sharing_gref_to_gfn, as potential dangerous memory accesses are >> covered by the next evaluate_nospec >> * gnttab_get_status_frame, as the potential dangerous memory accesses >> are protected in gnttab_get_status_frame_mfn > But there's quite a bit of code in gnttab_get_status_frame_mfn() > before the addition you make. But I guess speculation in particular > into gnttab_grow_table() might be safe? I think this is save, yes. > >> @@ -963,9 +988,13 @@ map_grant_ref( >> PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", >> op->ref, rgt->domain->domain_id); >> >> -act = active_entry_acquire(rgt, op->ref); >> +/* This call ensures the above check cannot be bypassed speculatively */ >> shah = shared_entry_header(rgt, op->ref); > I know I've come across this several times by now, but I'm afraid I > now get the impression that the comment kind of suggests that > the call is just for this purpose, instead of fulfilling the purpose as > a side effect. Would you mind adding "also" to this (and possible > further instances)? To avoid the line growing too long,
[Xen-devel] L1TF MDS GT v1
Dear all, This patch series attempts to mitigate the issue that have been raised in the XSA-289 (https://xenbits.xen.org/xsa/advisory-289.html). To block speculative execution on Intel hardware, an lfence instruction is required to make sure that selected checks are not bypassed. Speculative out-of-bound accesses can be prevented by using the array_index_nospec macro. This series picks up the last remaining commit of my previous L1TF series, and splits it into three commits to help targetting the discussion better. The actual change is to protect three more functions for grant-table version dependent code execution. This is part of the speculative hardening effort. As for example mentioned in [1], these changes also help to limit leaks via the MDS vulnerability. Best, Norbert [1] https://arxiv.org/abs/1905.05726 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH L1TF MDS GT v1 1/3] common/grant_table: harden helpers
Guests can issue grant table operations and provide guest controlled data to them. This data is used for memory loads in helper functions and macros. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable, or the block_speculation macro. This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey --- Notes: v1: split the gnttab commit of the previous L1TF series into multiple commits xen/common/grant_table.c | 33 + 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -203,8 +204,9 @@ static inline unsigned int nr_status_frames(const struct grant_table *gt) } #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) -#define maptrack_entry(t, e) \ -((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) +#define maptrack_entry(t, e) \ +((t)->maptrack[array_index_nospec(e, (t)->maptrack_limit) / \ +MAPTRACK_PER_PAGE][(e) % MAPTRACK_PER_PAGE]) static inline unsigned int nr_maptrack_frames(struct grant_table *t) @@ -226,10 +228,23 @@ nr_maptrack_frames(struct grant_table *t) static grant_entry_header_t * shared_entry_header(struct grant_table *t, grant_ref_t ref) { -if ( t->gt_version == 1 ) +switch ( t->gt_version ) +{ +case 1: +/* Returned values should be independent of speculative execution */ +block_speculation(); return (grant_entry_header_t*)&shared_entry_v1(t, ref); -else + +case 2: +/* Returned values should be independent of speculative execution */ +block_speculation(); return &shared_entry_v2(t, ref).hdr; +} + +ASSERT_UNREACHABLE(); +block_speculation(); + +return NULL; } /* Active grant entry - used for shadowing GTF_permit_access grants. */ @@ -634,14 +649,24 @@ static unsigned int nr_grant_entries(struct grant_table *gt) case 1: BUILD_BUG_ON(f2e(INITIAL_NR_GRANT_FRAMES, 1) < GNTTAB_NR_RESERVED_ENTRIES); + +/* Make sure we return a value independently of speculative execution */ +block_speculation(); return f2e(nr_grant_frames(gt), 1); + case 2: BUILD_BUG_ON(f2e(INITIAL_NR_GRANT_FRAMES, 2) < GNTTAB_NR_RESERVED_ENTRIES); + +/* Make sure we return a value independently of speculative execution */ +block_speculation(); return f2e(nr_grant_frames(gt), 2); #undef f2e } +ASSERT_UNREACHABLE(); +block_speculation(); + return 0; } -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH L1TF MDS GT v1 2/3] common/grant_table: harden bound accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is used as index for memory loads after bound checks have been done. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable, or the macro block_speculation. Note, that the block_speculation is always used in the calls to shared_entry_header and nr_grant_entries, so that no additional protection is required once these functions have been called. Speculative execution is not blocked in case one of the following properties is true: - path cannot be triggered by the guest - path does not return to the guest - path does not result in an out-of-bound access - path cannot be executed repeatedly Only the combination of the above properties allows to actually leak continuous chunks of memory. Therefore, we only add the penalty of protective mechanisms in case a potential speculative out-of-bound access matches all the above properties. This commit addresses only out-of-bound accesses whose index is directly controlled by the guest, and the index is checked before. Potential out-of-bound accesses that are caused by speculatively evaluating the version of the current table are not addressed in this commit. This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey --- Notes: v1: adapt the comments for shared_entry_header to show that they 'also' block speculative execution xen/common/grant_table.c | 43 ++- 1 file changed, 34 insertions(+), 9 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -988,9 +988,10 @@ map_grant_ref( PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", op->ref, rgt->domain->domain_id); -act = active_entry_acquire(rgt, op->ref); +/* This call also ensures the above check cannot be passed speculatively */ shah = shared_entry_header(rgt, op->ref); status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); +act = active_entry_acquire(rgt, op->ref); /* If already pinned, check the active domid and avoid refcnt overflow. */ if ( act->pin && @@ -1346,6 +1347,9 @@ unmap_common( goto unlock_out; } +/* Make sure the above bound check cannot be bypassed speculatively */ +block_speculation(); + act = active_entry_acquire(rgt, op->ref); /* @@ -1443,7 +1447,7 @@ unmap_common_complete(struct gnttab_unmap_common *op) struct page_info *pg; uint16_t *status; -if ( !op->done ) +if ( evaluate_nospec(!op->done) ) { /* unmap_common() didn't do anything - nothing to complete. */ return; @@ -2051,6 +2055,7 @@ gnttab_prepare_for_transfer( goto fail; } +/* This call also ensures the above check cannot be passed speculatively */ sha = shared_entry_header(rgt, ref); scombo.word = *(u32 *)&sha->flags; @@ -2248,7 +2253,12 @@ gnttab_transfer( spin_unlock(&e->page_alloc_lock); okay = gnttab_prepare_for_transfer(e, d, gop.ref); -if ( unlikely(!okay || assign_pages(e, page, 0, MEMF_no_refcount)) ) +/* + * Make sure the reference bound check in gnttab_prepare_for_transfer + * is respected and speculative execution is blocked accordingly + */ +if ( unlikely(!evaluate_nospec(okay)) || +unlikely(assign_pages(e, page, 0, MEMF_no_refcount)) ) { bool drop_dom_ref; @@ -2435,8 +2445,10 @@ acquire_grant_for_copy( PIN_FAIL(gt_unlock_out, GNTST_bad_gntref, "Bad grant reference %#x\n", gref); -act = active_entry_acquire(rgt, gref); +/* This call also ensures the above check cannot be passed speculatively */ shah = shared_entry_header(rgt, gref); +act = active_entry_acquire(rgt, gref); + if ( rgt->gt_version == 1 ) { sha2 = NULL; @@ -2853,6 +2865,9 @@ static int gnttab_copy_buf(const struct gnttab_copy *op, op->dest.offset, dest->ptr.offset, op->len, dest->len); +/* Make sure the above checks are not bypassed speculatively */ +block_speculation(); + memcpy(dest->virt + op->dest.offset, src->virt + op->source.offset, op->len); gnttab_mark_dirty(dest->domain, dest->mfn); @@ -2972,7 +2987,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop) struct grant_table *gt = currd->grant_table; grant_entry_v1_t reserved_entries[GNTTAB_NR_RESERVED_ENTRIES]; int res; -unsigned int i; +unsigned int i, nr_ents; if ( copy_from_guest(&op, uop, 1) ) return -EFAULT; @@ -2996,7 +3011,8 @@ gnttab_set_version(XEN_GUES
[Xen-devel] [PATCH L1TF MDS GT v1 3/3] common/grant_table: harden version dependent accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is used as index for memory loads after bound checks have been done. Depending on the grant table version, the size of elements in containers differ. As the base data structure is a page, the number of elements per page also differs. Consequently, bound checks are version dependent, so that speculative execution can happen in several stages, the bound check as well as the version check. This commit mitigates cases where out-of-bound accesses could happen due to the version comparison. In cases, where no different memory locations are accessed on the code path that follow an if statement, no protection is required. No different memory locations are accessed in the following functions after a version check: * gnttab_setup_table: only calculated numbersi are used, and then function gnttab_grow_table is called, which is version protected * gnttab_transfer: the case that depends on the version check just gets into copying a page or not * acquire_grant_for_copy: the not fixed comparison is on the abort path and does not access other structures, and on the else branch only accesses structures that are properly allocated * gnttab_set_version: all accessible data is allocated for both versions * gnttab_release_mappings: this function is called only during domain destruction and control is not returned to the guest * mem_sharing_gref_to_gfn: speculation will be stoped by the second if statement, as that places a barrier on any path to be executed. * gnttab_get_status_frame_mfn: no version dependent check, because all accesses, except the gt->status[idx], do not perform actual out-of-bound accesses, including the gnttab_grow_table function call. * gnttab_get_shared_frame: block_speculation in gnttab_get_status_frame_mfn blocks accesses * gnttab_usage_print: cannot be triggered by the guest This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey --- Notes: v1: added additional fixes (compared to L1TF series) to: _set_status unmap_common_complete gnttab_grow_table xen/common/grant_table.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -837,7 +837,7 @@ static int _set_status(unsigned gt_version, grant_status_t *status) { -if ( gt_version == 1 ) +if ( evaluate_nospec(gt_version == 1) ) return _set_status_v1(domid, readonly, mapflag, shah, act); else return _set_status_v2(domid, readonly, mapflag, shah, act, status); @@ -990,9 +990,12 @@ map_grant_ref( /* This call also ensures the above check cannot be passed speculatively */ shah = shared_entry_header(rgt, op->ref); -status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); act = active_entry_acquire(rgt, op->ref); +/* Make sure we do not access memory speculatively */ +status = evaluate_nospec(rgt->gt_version == 1) ? &shah->flags + : &status_entry(rgt, op->ref); + /* If already pinned, check the active domid and avoid refcnt overflow. */ if ( act->pin && ((act->domid != ld->domain_id) || @@ -1013,7 +1016,7 @@ map_grant_ref( if ( !act->pin ) { -unsigned long gfn = rgt->gt_version == 1 ? +unsigned long gfn = evaluate_nospec(rgt->gt_version == 1) ? shared_entry_v1(rgt, op->ref).frame : shared_entry_v2(rgt, op->ref).full_page.frame; @@ -1463,7 +1466,7 @@ unmap_common_complete(struct gnttab_unmap_common *op) act = active_entry_acquire(rgt, op->ref); sha = shared_entry_header(rgt, op->ref); -if ( rgt->gt_version == 1 ) +if ( evaluate_nospec(rgt->gt_version == 1) ) status = &sha->flags; else status = &status_entry(rgt, op->ref); @@ -1795,7 +1798,7 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames) } /* Status pages - version 2 */ -if ( gt->gt_version > 1 ) +if ( evaluate_nospec(gt->gt_version > 1) ) { if ( gnttab_populate_status_frames(d, gt, req_nr_frames) ) goto shared_alloc_failed; @@ -2290,7 +2293,7 @@ gnttab_transfer( grant_read_lock(e->grant_table); act = active_entry_acquire(e->grant_table, gop.ref); -if ( e->grant_table->gt_version == 1 ) +if ( evaluate_nospec(e->grant_table->gt_version == 1) ) { grant_entry_v1_t *sha = &shared_entry_v1(e->grant_table, gop.ref); @@ -
Re: [Xen-devel] [PATCH L1TF MDS GT v1 2/3] common/grant_table: harden bound accesses
On 5/23/19 16:17, Jan Beulich wrote: On 21.05.19 at 09:45, wrote: >> Guests can issue grant table operations and provide guest controlled >> data to them. This data is used as index for memory loads after bound >> checks have been done. To avoid speculative out-of-bound accesses, we >> use the array_index_nospec macro where applicable, or the macro >> block_speculation. Note, that the block_speculation is always used in > s/always/already/ ? They both work, but the 'always' underlines that a caller can rely on the fact that this will happen on all execution path of that function. Hence, I like to stick to 'always' here. > >> the calls to shared_entry_header and nr_grant_entries, so that no >> additional protection is required once these functions have been >> called. > Isn't this too broad a statement? There's some protection, but not > for just anything that follows. You are right, to given protection is that any bound check that could have been bypassed speculatively is enforced after calling one of the two functions. I will rephrase the commit message accordingly. > >> --- a/xen/common/grant_table.c >> +++ b/xen/common/grant_table.c >> @@ -988,9 +988,10 @@ map_grant_ref( >> PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", >> op->ref, rgt->domain->domain_id); >> >> -act = active_entry_acquire(rgt, op->ref); >> +/* This call also ensures the above check cannot be passed >> speculatively */ >> shah = shared_entry_header(rgt, op->ref); >> status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, >> op->ref); >> +act = active_entry_acquire(rgt, op->ref); > I know we've been there before, but what guarantees that the > compiler won't reload op->ref from memory for either of the > latter two accesses? In fact afaict it always will, due to the > memory clobber in alternative(). The compiler can reload op->ref from memory, that is fine here, as the bound check happens above, and the shared_entry call comes with an lfence() by now, so that we will not continue executing speculatively with op->ref being out-of-bounds, independently of whether it's from memory or registers. > >> @@ -3863,6 +3883,9 @@ static int gnttab_get_status_frame_mfn(struct domain >> *d, >> return -EINVAL; >> } >> >> +/* Make sure idx is bounded wrt nr_status_frames */ >> +block_speculation(); >> + >> *mfn = _mfn(virt_to_mfn(gt->status[idx])); >> return 0; >> } > Why don't you use array_index_nospec() here? And how come There is no specific reason. I will swap. > speculation into gnttab_grow_table() is fine a few lines above? I do not see a reason why it would be bad to enter that function speculatively. There are no accesses that would have to be protected by extra checks, afaict. Otherwise, that function should be protected by its own. > And what about the similar code in gnttab_get_shared_frame_mfn()? I will add an array_nospec_index there as well. > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 02/11] is_hvm/pv_domain: block speculation
When checking for being an hvm domain, or PV domain, we have to make sure that speculation cannot bypass that check, and eventually access data that should not end up in cache for the current domain type. Signed-off-by: Norbert Manthey --- xen/include/xen/sched.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -892,7 +892,8 @@ void watchdog_domain_destroy(struct domain *d); static inline bool is_pv_domain(const struct domain *d) { -return IS_ENABLED(CONFIG_PV) ? d->guest_type == guest_type_pv : false; +return IS_ENABLED(CONFIG_PV) + ? evaluate_nospec(d->guest_type == guest_type_pv) : false; } static inline bool is_pv_vcpu(const struct vcpu *v) @@ -923,7 +924,8 @@ static inline bool is_pv_64bit_vcpu(const struct vcpu *v) #endif static inline bool is_hvm_domain(const struct domain *d) { -return IS_ENABLED(CONFIG_HVM) ? d->guest_type == guest_type_hvm : false; +return IS_ENABLED(CONFIG_HVM) + ? evaluate_nospec(d->guest_type == guest_type_hvm) : false; } static inline bool is_hvm_vcpu(const struct vcpu *v) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 05/11] common/grant_table: block speculative out-of-bound accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is also used for memory loads. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable. However, there are also memory accesses that cannot be protected by a single array protection, or multiple accesses in a row. To protect these, an lfence instruction is placed between the actual range check and the access via the newly introduced macro block_speculation. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/common/grant_table.c | 23 +-- xen/include/xen/nospec.h | 9 + 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -963,6 +964,9 @@ map_grant_ref( PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", op->ref, rgt->domain->domain_id); +/* Make sure the above check is not bypassed speculatively */ +op->ref = array_index_nospec(op->ref, nr_grant_entries(rgt)); + act = active_entry_acquire(rgt, op->ref); shah = shared_entry_header(rgt, op->ref); status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); @@ -1268,7 +1272,8 @@ unmap_common( } smp_rmb(); -map = &maptrack_entry(lgt, op->handle); +map = &maptrack_entry(lgt, array_index_nospec(op->handle, + lgt->maptrack_limit)); if ( unlikely(!read_atomic(&map->flags)) ) { @@ -2026,6 +2031,9 @@ gnttab_prepare_for_transfer( goto fail; } +/* Make sure the above check is not bypassed speculatively */ +ref = array_index_nospec(ref, nr_grant_entries(rgt)); + sha = shared_entry_header(rgt, ref); scombo.word = *(u32 *)&sha->flags; @@ -2223,7 +2231,8 @@ gnttab_transfer( okay = gnttab_prepare_for_transfer(e, d, gop.ref); spin_lock(&e->page_alloc_lock); -if ( unlikely(!okay) || unlikely(e->is_dying) ) +/* Make sure this check is not bypassed speculatively */ +if ( evaluate_nospec(unlikely(!okay) || unlikely(e->is_dying)) ) { bool_t drop_dom_ref = !domain_adjust_tot_pages(e, -1); @@ -2408,6 +2417,9 @@ acquire_grant_for_copy( PIN_FAIL(gt_unlock_out, GNTST_bad_gntref, "Bad grant reference %#x\n", gref); +/* Make sure the above check is not bypassed speculatively */ +gref = array_index_nospec(gref, nr_grant_entries(rgt)); + act = active_entry_acquire(rgt, gref); shah = shared_entry_header(rgt, gref); if ( rgt->gt_version == 1 ) @@ -2826,6 +2838,9 @@ static int gnttab_copy_buf(const struct gnttab_copy *op, op->dest.offset, dest->ptr.offset, op->len, dest->len); +/* Make sure the above checks are not bypassed speculatively */ +block_speculation(); + memcpy(dest->virt + op->dest.offset, src->virt + op->source.offset, op->len); gnttab_mark_dirty(dest->domain, dest->mfn); @@ -3215,6 +3230,10 @@ swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b) if ( ref_a == ref_b ) goto out; +/* Make sure the above check is not bypassed speculatively */ +ref_a = array_index_nospec(ref_a, nr_grant_entries(d->grant_table)); +ref_b = array_index_nospec(ref_b, nr_grant_entries(d->grant_table)); + act_a = active_entry_acquire(gt, ref_a); if ( act_a->pin ) PIN_FAIL(out, GNTST_eagain, "ref a %#x busy\n", ref_a); diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -87,6 +87,15 @@ static inline bool lfence_true(void) { return true; } #define evaluate_nospec(condition) ({ bool res = (condition); rmb(); res; }) #endif +/* + * allow to block speculative execution in generic code + */ +#ifdef CONFIG_X86 +#define block_speculation() rmb() +#else +#define block_speculation() +#endif + #endif /* XEN_NOSPEC_H */ /* -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
This commit introduces the configuration option L1TF_LFENCE that allows to control the implementation of the protection of privilege checks via lfence instructions. The following four alternatives are provided: - not injecting lfence instructions - inject an lfence instruction for both outcomes of the conditional - inject an lfence instruction only if the conditional would evaluate to true, so that this case cannot be entered under speculation - evaluating the condition and store the result into a local variable. before using this value, inject an lfence instruction. The different options allow to control the level of protection vs the slowdown the addtional lfence instructions would introduce. The default value is set to protecting both branches. For non-x86 platforms, the protection is disabled by default. Signed-off-by: Norbert Manthey --- xen/arch/x86/Kconfig | 24 xen/include/xen/nospec.h | 12 ++-- 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -176,6 +176,30 @@ config PV_SHIM_EXCLUSIVE firmware, and will not function correctly in other scenarios. If unsure, say N. + +choice + prompt "Default L1TF Branch Protection?" + + config L1TF_LFENCE_BOTH + bool "Protect both branches of certain conditionals" if HVM + ---help--- + Inject an lfence instruction after the condition to be + evaluated for both outcomes of the condition + config L1TF_LFENCE_TRUE + bool "Protect true branch of certain conditionals" if HVM + ---help--- + Protect only the path where the condition is evaluated to true + config L1TF_LFENCE_INTERMEDIATE + bool "Protect before using certain conditionals value" if HVM + ---help--- + Inject an lfence instruction after evaluating the condition + but before forwarding this value from a local variable + config L1TF_LFENCE_NONE + bool "No conditional protection" + ---help--- + Do not inject lfences for conditional evaluations +endchoice + endmenu source "common/Kconfig" diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -68,10 +68,18 @@ static inline bool lfence_true(void) { return true; } #endif /* - * protect evaluation of conditional with respect to speculation + * allow to protect evaluation of conditional with respect to speculation on x86 */ -#define evaluate_nospec(condition) \ +#if defined(CONFIG_L1TF_LFENCE_NONE) || !defined(CONFIG_X86) +#define evaluate_nospec(condition) (condition) +#elif defined(CONFIG_L1TF_LFENCE_BOTH) +#define evaluate_nospec(condition) \ (((condition) && lfence_true()) || !lfence_true()) +#elif defined(CONFIG_L1TF_LFENCE_TRUE) +#define evaluate_nospec(condition) ((condition) && lfence_true()) +#elif defined(CONFIG_L1TF_LFENCE_INTERMEDIATE) +#define evaluate_nospec(condition) ({ bool res = (condition); rmb(); res; }) +#endif #endif /* XEN_NOSPEC_H */ -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] SpectreV1+L1TF Patch Series
Dear all, This patch series attempts to mitigate the issue that have been raised in the XSA-289 (https://xenbits.xen.org/xsa/advisory-289.html). To block speculative execution on Intel hardware, an lfence instruction is required to make sure that selected checks are not bypassed. Speculative out-of-bound accesses can be prevented by using the array_index_nospec macro. The lfence instruction should be added on x86 platforms only. To not affect platforms that are not affected by the L1TF vulnerability, the lfence instruction is patched in via alternative patching on Intel CPUs only. Furthermore, the compile time configuration allows to choose how to protect the evaluation of conditions with the lfence instruction. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 01/11] is_control_domain: block speculation
Checks of domain properties, such as is_hardware_domain or is_hvm_domain, might be bypassed by speculatively executing these instructions. A reason for bypassing these checks is that these macros access the domain structure via a pointer, and check a certain field. Since this memory access is slow, the CPU assumes a returned value and continues the execution. In case an is_control_domain check is bypassed, for example during a hypercall, data that should only be accessible by the control domain could be loaded into the cache. Since the L1TF vulnerability of Intel CPUs, loading hypervisor data into L1 cache is problemetic, because when hyperthreading is used as well, a guest running on the sibling core can leak this potentially secret data. To prevent these speculative accesses, we block speculation after accessing the domain property field by adding lfence instructions. This way, the CPU continues executing and loading data only once the condition is actually evaluated. As the macros are typically used in if statements, the lfence has to come in a compatible way. Therefore, a function that returns true after an lfence instruction is introduced. To protect both branches after a conditional, an lfence instruction has to be added for the two branches. As the L1TF vulnerability is only present on the x86 architecture, the macros will not use the lfence instruction on other architectures. Introducing the lfence instructions catches a lot of potential leaks with a simple unintrusive code change. During performance testing, we did not notice performance effects. Signed-off-by: Norbert Manthey --- xen/include/xen/nospec.h | 15 +++ xen/include/xen/sched.h | 5 +++-- 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -58,6 +58,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, (typeof(_i)) (_i & _mask); \ }) +/* + * allow to insert a read memory barrier into conditionals + */ +#ifdef CONFIG_X86 +static inline bool lfence_true(void) { rmb(); return true; } +#else +static inline bool lfence_true(void) { return true; } +#endif + +/* + * protect evaluation of conditional with respect to speculation + */ +#define evaluate_nospec(condition) \ +(((condition) && lfence_true()) || !lfence_true()) + #endif /* XEN_NOSPEC_H */ /* diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -882,10 +883,10 @@ void watchdog_domain_destroy(struct domain *d); *(that is, this would not be suitable for a driver domain) * - There is never a reason to deny the hardware domain access to this */ -#define is_hardware_domain(_d) ((_d) == hardware_domain) +#define is_hardware_domain(_d) evaluate_nospec((_d) == hardware_domain) /* This check is for functionality specific to a control domain */ -#define is_control_domain(_d) ((_d)->is_privileged) +#define is_control_domain(_d) evaluate_nospec((_d)->is_privileged) #define VM_ASSIST(d, t) (test_bit(VMASST_TYPE_ ## t, &(d)->vm_assist)) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 06/11] common/memory: block speculative out-of-bound accesses
The get_page_from_gfn method returns a pointer to a page that belongs to a gfn. Before returning the pointer, the gfn is checked for being valid. Under speculation, these checks can be bypassed, so that the function get_page is still executed partially. Consequently, the function page_get_owner_and_reference might be executed partially as well. In this function, the computed pointer is accessed, resulting in a speculative out-of-bound address load. As the gfn can be controlled by a guest, this access is problematic. To mitigate the root cause, an lfence instruction is added via the evaluate_nospec macro. To make the protection generic, we do not introduce the lfence instruction for this single check, but add it to the mfn_valid function. This way, other potentially problematic accesses are protected as well. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/common/pdx.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/xen/common/pdx.c b/xen/common/pdx.c --- a/xen/common/pdx.c +++ b/xen/common/pdx.c @@ -18,6 +18,7 @@ #include #include #include +#include /* Parameters for PFN/MADDR compression. */ unsigned long __read_mostly max_pdx; @@ -33,10 +34,10 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( bool __mfn_valid(unsigned long mfn) { -return likely(mfn < max_page) && - likely(!(mfn & pfn_hole_mask)) && - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, - pdx_group_valid)); +return evaluate_nospec(likely(mfn < max_page) && + likely(!(mfn & pfn_hole_mask)) && + likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, + pdx_group_valid))); } /* Sets all bits from the most-significant 1-bit down to the LSB */ -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 04/11] x86/hvm: block speculative out-of-bound accesses
There are multiple arrays in the HVM interface that are accessed with indices that are provided by the guest. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro. When blocking speculative out-of-bound accesses, we can classify arrays into dynamic arrays and static arrays. Where the former are allocated during run time, the size of the latter is known during compile time. On static arrays, compiler might be able to block speculative accesses in the future. We introduce another macro that uses the ARRAY_SIZE macro to block speculative accesses. For arrays that are statically accessed, this macro can be used instead of the usual macro. Using this macro results in more readable code, and allows to modify the way this case is handled in a single place. This commit is part of the SpectreV1+L1TF mitigation patch series. Reported-by: Pawel Wieczorkiewicz Signed-off-by: Norbert Manthey --- xen/arch/x86/hvm/hvm.c | 27 ++- xen/include/xen/nospec.h | 6 ++ 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -2102,7 +2103,7 @@ int hvm_mov_from_cr(unsigned int cr, unsigned int gpr) case 2: case 3: case 4: -val = curr->arch.hvm.guest_cr[cr]; +val = array_access_nospec(curr->arch.hvm.guest_cr, cr); break; case 8: val = (vlapic_get_reg(vcpu_vlapic(curr), APIC_TASKPRI) & 0xf0) >> 4; @@ -3448,13 +3449,15 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( !d->arch.cpuid->basic.mtrr ) goto gp_fault; index = msr - MSR_MTRRfix16K_8; -*msr_content = fixed_range_base[index + 1]; +*msr_content = fixed_range_base[array_index_nospec(index + 1, + ARRAY_SIZE(v->arch.hvm.mtrr.fixed_ranges))]; break; case MSR_MTRRfix4K_C...MSR_MTRRfix4K_F8000: if ( !d->arch.cpuid->basic.mtrr ) goto gp_fault; index = msr - MSR_MTRRfix4K_C; -*msr_content = fixed_range_base[index + 3]; +*msr_content = fixed_range_base[array_index_nospec(index + 3, + ARRAY_SIZE(v->arch.hvm.mtrr.fixed_ranges))]; break; case MSR_IA32_MTRR_PHYSBASE(0)...MSR_IA32_MTRR_PHYSMASK(MTRR_VCNT_MAX - 1): if ( !d->arch.cpuid->basic.mtrr ) @@ -3463,7 +3466,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( (index / 2) >= MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT) ) goto gp_fault; -*msr_content = var_range_base[index]; +*msr_content = var_range_base[array_index_nospec(index, + MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT))]; break; case MSR_IA32_XSS: @@ -4026,7 +4030,8 @@ static int hvmop_set_evtchn_upcall_vector( if ( op.vector < 0x10 ) return -EINVAL; -if ( op.vcpu >= d->max_vcpus || (v = d->vcpu[op.vcpu]) == NULL ) +if ( op.vcpu >= d->max_vcpus || +(v = d->vcpu[array_index_nospec(op.vcpu, d->max_vcpus)]) == NULL ) return -ENOENT; printk(XENLOG_G_INFO "%pv: upcall vector %02x\n", v, op.vector); @@ -4114,6 +4119,12 @@ static int hvmop_set_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; @@ -4380,6 +4391,12 @@ static int hvmop_get_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -59,6 +59,12 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, }) /* + * array_access_nospec - allow nospec access for static size arrays + */ +#define array_access_nospec(array, index) \ +(array)[array_index_nospec(index, ARRAY_SIZE(array))] + +/* * allow to insert a read memory barrier into conditionals */ #ifdef CONFIG_X86 -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsg
[Xen-devel] [PATCH SpectreV1+L1TF v4 09/11] x86/vioapic: block speculative out-of-bound accesses
When interacting with io apic, a guest can specify values that are used as index to structures, and whose values are not compared against upper bounds to prevent speculative out-of-bound accesses. This change prevents these speculative accesses. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/arch/x86/hvm/vioapic.c | 21 - 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -66,6 +67,9 @@ static struct hvm_vioapic *gsi_vioapic(const struct domain *d, { unsigned int i; +/* Make sure the compiler does not optimize the initialization */ +OPTIMIZER_HIDE_VAR(pin); + for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) { struct hvm_vioapic *vioapic = domain_vioapic(d, i); @@ -117,7 +121,8 @@ static uint32_t vioapic_read_indirect(const struct hvm_vioapic *vioapic) break; } -redir_content = vioapic->redirtbl[redir_index].bits; +redir_content = vioapic->redirtbl[array_index_nospec(redir_index, + vioapic->nr_pins)].bits; result = (vioapic->ioregsel & 1) ? (redir_content >> 32) : redir_content; break; @@ -212,7 +217,12 @@ static void vioapic_write_redirent( struct hvm_irq *hvm_irq = hvm_domain_irq(d); union vioapic_redir_entry *pent, ent; int unmasked = 0; -unsigned int gsi = vioapic->base_gsi + idx; +unsigned int gsi; + +/* Make sure no out-of-bound value for idx can be used */ +idx = array_index_nospec(idx, vioapic->nr_pins); + +gsi = vioapic->base_gsi + idx; spin_lock(&d->arch.hvm.irq_lock); @@ -378,7 +388,8 @@ static inline int pit_channel0_enabled(void) static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) { -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; +uint16_t dest = vioapic->redirtbl + [pin = array_index_nospec(pin, vioapic->nr_pins)].fields.dest_id; uint8_t dest_mode = vioapic->redirtbl[pin].fields.dest_mode; uint8_t delivery_mode = vioapic->redirtbl[pin].fields.delivery_mode; uint8_t vector = vioapic->redirtbl[pin].fields.vector; @@ -463,7 +474,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) void vioapic_irq_positive_edge(struct domain *d, unsigned int irq) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ struct hvm_vioapic *vioapic = gsi_vioapic(d, irq, &pin); union vioapic_redir_entry *ent; @@ -560,7 +571,7 @@ int vioapic_get_vector(const struct domain *d, unsigned int gsi) int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 10/11] x86/hvm/hpet: block speculative out-of-bound accesses
When interacting with hpet, read and write operations can be executed during instruction emulation, where the guest controls the data that is used. As it is hard to predict the number of instructions that are executed speculatively, we prevent out-of-bound accesses by using the array_index_nospec function for guest specified addresses that should be used for hpet operations. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/arch/x86/hvm/hpet.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c --- a/xen/arch/x86/hvm/hpet.c +++ b/xen/arch/x86/hvm/hpet.c @@ -25,6 +25,7 @@ #include #include #include +#include #define domain_vhpet(x) (&(x)->arch.hvm.pl_time->vhpet) #define vcpu_vhpet(x) (domain_vhpet((x)->domain)) @@ -124,15 +125,17 @@ static inline uint64_t hpet_read64(HPETState *h, unsigned long addr, case HPET_Tn_CFG(0): case HPET_Tn_CFG(1): case HPET_Tn_CFG(2): -return h->hpet.timers[HPET_TN(CFG, addr)].config; +return array_access_nospec(h->hpet.timers, HPET_TN(CFG, addr)).config; case HPET_Tn_CMP(0): case HPET_Tn_CMP(1): case HPET_Tn_CMP(2): -return hpet_get_comparator(h, HPET_TN(CMP, addr), guest_time); +return hpet_get_comparator(h, array_index_nospec(HPET_TN(CMP, addr), + ARRAY_SIZE(h->hpet.timers)), + guest_time); case HPET_Tn_ROUTE(0): case HPET_Tn_ROUTE(1): case HPET_Tn_ROUTE(2): -return h->hpet.timers[HPET_TN(ROUTE, addr)].fsb; +return array_access_nospec(h->hpet.timers, HPET_TN(ROUTE, addr)).fsb; } return 0; @@ -438,7 +441,7 @@ static int hpet_write( case HPET_Tn_CFG(0): case HPET_Tn_CFG(1): case HPET_Tn_CFG(2): -tn = HPET_TN(CFG, addr); +tn = array_index_nospec(HPET_TN(CFG, addr), ARRAY_SIZE(h->hpet.timers)); h->hpet.timers[tn].config = hpet_fixup_reg(new_val, old_val, @@ -480,7 +483,7 @@ static int hpet_write( case HPET_Tn_CMP(0): case HPET_Tn_CMP(1): case HPET_Tn_CMP(2): -tn = HPET_TN(CMP, addr); +tn = array_index_nospec(HPET_TN(CMP, addr), ARRAY_SIZE(h->hpet.timers)); if ( timer_is_periodic(h, tn) && !(h->hpet.timers[tn].config & HPET_TN_SETVAL) ) { @@ -523,7 +526,7 @@ static int hpet_write( case HPET_Tn_ROUTE(0): case HPET_Tn_ROUTE(1): case HPET_Tn_ROUTE(2): -tn = HPET_TN(ROUTE, addr); +tn = array_index_nospec(HPET_TN(ROUTE, addr), ARRAY_SIZE(h->hpet.timers)); h->hpet.timers[tn].fsb = new_val; break; -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 08/11] xen/evtchn: block speculative out-of-bound accesses
Guests can issue event channel interaction with guest specified data. To avoid speculative out-of-bound accesses, we use the nospec macros. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/common/event_channel.c | 25 - xen/common/event_fifo.c| 16 +--- xen/include/xen/event.h| 5 +++-- 3 files changed, 36 insertions(+), 10 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -368,8 +368,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port) if ( virq_is_global(virq) && (vcpu != 0) ) return -EINVAL; + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); + if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) + ((v = d->vcpu[array_index_nospec(vcpu, d->max_vcpus)]) == NULL) ) return -ENOENT; spin_lock(&d->event_lock); @@ -419,7 +425,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind) long rc = 0; if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - (d->vcpu[vcpu] == NULL) ) + (d->vcpu[array_index_nospec(vcpu, d->max_vcpus)] == NULL) ) return -ENOENT; spin_lock(&d->event_lock); @@ -816,6 +822,12 @@ int set_global_virq_handler(struct domain *d, uint32_t virq) if (!virq_is_global(virq)) return -EINVAL; + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(global_virq_handlers)); + if (global_virq_handlers[virq] == d) return 0; @@ -931,7 +943,8 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) struct evtchn *chn; long rc = 0; -if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) +if ( (vcpu_id >= d->max_vcpus) || + (d->vcpu[array_index_nospec(vcpu_id, d->max_vcpus)] == NULL) ) return -ENOENT; spin_lock(&d->event_lock); @@ -969,8 +982,10 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); chn->notify_vcpu_id = vcpu_id; pirq_set_affinity(d, chn->u.pirq.irq, - cpumask_of(d->vcpu[vcpu_id]->processor)); -link_pirq_port(port, chn, d->vcpu[vcpu_id]); + cpumask_of(d->vcpu[array_index_nospec(vcpu_id, + d->max_vcpus)]->processor)); +link_pirq_port(port, chn, d->vcpu[array_index_nospec(vcpu_id, + d->max_vcpus)]); break; default: rc = -EINVAL; diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -33,7 +33,8 @@ static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d, */ smp_rmb(); -p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; +p = array_index_nospec(port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE, + d->evtchn_fifo->num_evtchns); w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; return d->evtchn_fifo->event_array[p] + w; @@ -516,14 +517,23 @@ int evtchn_fifo_init_control(struct evtchn_init_control *init_control) gfn = init_control->control_gfn; offset = init_control->offset; -if ( vcpu_id >= d->max_vcpus || !d->vcpu[vcpu_id] ) +if ( vcpu_id >= d->max_vcpus || + !d->vcpu[array_index_nospec(vcpu_id, d->max_vcpus)] ) return -ENOENT; -v = d->vcpu[vcpu_id]; + +v = d->vcpu[array_index_nospec(vcpu_id, d->max_vcpus)]; /* Must not cross page boundary. */ if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) return -EINVAL; +/* + * Make sure the guest controlled value offset is bounded even during + * speculative execution. + */ +offset = array_index_nospec(offset, + PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)); + /* Must be 8-bytes aligned. */ if ( offset & (8 - 1) ) return -EINVAL; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -13,6 +13,7 @@ #include #include #include +#include #include /* @@ -96,7 +97,7 @@ void arch_evtchn_inject(struct vcpu *v); * The first bucket is directly accessed via d->evtchn. */ #define group_from_port(d, p) \ -((d)->evtchn_gro
[Xen-devel] [PATCH SpectreV1+L1TF v4 11/11] x86/CPUID: block speculative out-of-bound accesses
During instruction emulation, the cpuid instruction is emulated with data that is controlled by the guest. As speculation might pass bound checks, we have to ensure that no out-of-bound loads are possible. To not rely on the compiler to perform value propagation, instead of using the array_index_nospec macro, we replace the variable with the constant to be propagated instead. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey Reviewed-by: Jan Beulich --- xen/arch/x86/cpuid.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c --- a/xen/arch/x86/cpuid.c +++ b/xen/arch/x86/cpuid.c @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include @@ -629,7 +630,7 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, if ( subleaf >= ARRAY_SIZE(p->cache.raw) ) return; -*res = p->cache.raw[subleaf]; +*res = array_access_nospec(p->cache.raw, subleaf); break; case 0x7: @@ -638,25 +639,25 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, ARRAY_SIZE(p->feat.raw) - 1) ) return; -*res = p->feat.raw[subleaf]; +*res = array_access_nospec(p->feat.raw, subleaf); break; case 0xb: if ( subleaf >= ARRAY_SIZE(p->topo.raw) ) return; -*res = p->topo.raw[subleaf]; +*res = array_access_nospec(p->topo.raw, subleaf); break; case XSTATE_CPUID: if ( !p->basic.xsave || subleaf >= ARRAY_SIZE(p->xstate.raw) ) return; -*res = p->xstate.raw[subleaf]; +*res = array_access_nospec(p->xstate.raw, subleaf); break; default: -*res = p->basic.raw[leaf]; +*res = array_access_nospec(p->basic.raw, leaf); break; } break; @@ -680,7 +681,7 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, ARRAY_SIZE(p->extd.raw) - 1) ) return; -*res = p->extd.raw[leaf & 0x]; +*res = array_access_nospec(p->extd.raw, leaf & 0x); break; default: @@ -847,7 +848,7 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf, if ( is_pv_domain(d) && is_hardware_domain(d) && guest_kernel_mode(v, regs) && cpu_has_monitor && regs->entry_vector == TRAP_gp_fault ) -*res = raw_cpuid_policy.basic.raw[leaf]; +*res = raw_cpuid_policy.basic.raw[5]; break; case 0x7: -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v4 07/11] nospec: enable lfence on Intel
While the lfence instruction was added for all x86 platform in the beginning, it's useful to not block platforms that are not affected by the L1TF vulnerability. Therefore, the lfence instruction should only be introduced, in case the current CPU is an Intel CPU that is capable of hyper threading. This combination of features is added to the features that activate the alternative. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/include/xen/nospec.h | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -7,6 +7,7 @@ #ifndef XEN_NOSPEC_H #define XEN_NOSPEC_H +#include #include /** @@ -68,7 +69,10 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, * allow to insert a read memory barrier into conditionals */ #ifdef CONFIG_X86 -static inline bool lfence_true(void) { rmb(); return true; } +static inline bool lfence_true(void) { +alternative("", "lfence", X86_VENDOR_INTEL); +return true; +} #else static inline bool lfence_true(void) { return true; } #endif @@ -91,7 +95,7 @@ static inline bool lfence_true(void) { return true; } * allow to block speculative execution in generic code */ #ifdef CONFIG_X86 -#define block_speculation() rmb() +#define block_speculation() alternative("", "lfence", X86_VENDOR_INTEL) #else #define block_speculation() #endif -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 01/11] is_control_domain: block speculation
On 1/23/19 14:20, Jan Beulich wrote: On 23.01.19 at 12:51, wrote: >> --- a/xen/include/xen/nospec.h >> +++ b/xen/include/xen/nospec.h >> @@ -58,6 +58,21 @@ static inline unsigned long >> array_index_mask_nospec(unsigned long index, >> (typeof(_i)) (_i & _mask); \ >> }) >> >> +/* >> + * allow to insert a read memory barrier into conditionals >> + */ >> +#ifdef CONFIG_X86 >> +static inline bool lfence_true(void) { rmb(); return true; } >> +#else >> +static inline bool lfence_true(void) { return true; } >> +#endif >> + >> +/* >> + * protect evaluation of conditional with respect to speculation >> + */ >> +#define evaluate_nospec(condition) \ >> +(((condition) && lfence_true()) || !lfence_true()) > It may be just me, but I think > > #define evaluate_nospec(condition) \ > ((condition) ? lfence_true() : !lfence_true()) > > would better express the two-way nature of this. I compared the binary output of the two variants, and they are the same (for my build environment). I'll switch to your variant, in case nobody objects. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
On 1/23/19 14:18, Jan Beulich wrote: On 23.01.19 at 12:51, wrote: >> This commit introduces the configuration option L1TF_LFENCE that allows >> to control the implementation of the protection of privilege checks via >> lfence instructions. The following four alternatives are provided: >> >> - not injecting lfence instructions >> - inject an lfence instruction for both outcomes of the conditional >> - inject an lfence instruction only if the conditional would evaluate >>to true, so that this case cannot be entered under speculation > I'd really like to see justification for this variant, as the asymmetric > handling doesn't look overly helpful. It's also not clear to me how > someone configuring Xen should tell whether this would be a safe > selection to make. I'm tempted to request that this option become > EXPERT dependent. I will drop this option. Without properly defining which property checks should be protected (we currently do not protect any XSM based checks that are used in hypercalls like physdev_op), and what to protect, I agree it's hard to judge whether this is useful. > >> - evaluating the condition and store the result into a local variable. >>before using this value, inject an lfence instruction. >> >> The different options allow to control the level of protection vs the >> slowdown the addtional lfence instructions would introduce. The default >> value is set to protecting both branches. >> >> For non-x86 platforms, the protection is disabled by default. > At least the "by default" is wrong here. I will drop the "by default" in this sentence. > >> --- a/xen/arch/x86/Kconfig >> +++ b/xen/arch/x86/Kconfig >> @@ -176,6 +176,30 @@ config PV_SHIM_EXCLUSIVE >>firmware, and will not function correctly in other scenarios. >> >>If unsure, say N. >> + >> +choice >> +prompt "Default L1TF Branch Protection?" >> + >> +config L1TF_LFENCE_BOTH >> +bool "Protect both branches of certain conditionals" if HVM >> +---help--- >> + Inject an lfence instruction after the condition to be >> + evaluated for both outcomes of the condition >> +config L1TF_LFENCE_TRUE >> +bool "Protect true branch of certain conditionals" if HVM >> +---help--- >> + Protect only the path where the condition is evaluated to true >> +config L1TF_LFENCE_INTERMEDIATE >> +bool "Protect before using certain conditionals value" if HVM >> +---help--- >> + Inject an lfence instruction after evaluating the condition >> + but before forwarding this value from a local variable >> +config L1TF_LFENCE_NONE >> +bool "No conditional protection" >> +---help--- >> + Do not inject lfences for conditional evaluations >> +endchoice > I guess we should settle on some default for this choice. The > description talks about a default, but I don't see it set here (and > I don't think relying merely on the order is a good idea). I will add a "default" statement, and pick the L1TF_LFENCE_BOTH variant there. > >> --- a/xen/include/xen/nospec.h >> +++ b/xen/include/xen/nospec.h >> @@ -68,10 +68,18 @@ static inline bool lfence_true(void) { return true; } >> #endif >> >> /* >> - * protect evaluation of conditional with respect to speculation >> + * allow to protect evaluation of conditional with respect to speculation >> on x86 >> */ >> -#define evaluate_nospec(condition) \ >> +#if defined(CONFIG_L1TF_LFENCE_NONE) || !defined(CONFIG_X86) >> +#define evaluate_nospec(condition) (condition) >> +#elif defined(CONFIG_L1TF_LFENCE_BOTH) >> +#define evaluate_nospec(condition) \ > I'm puzzled by this line changing - do you really need to move the > backslash here? This looks strange as a stand-alone modification, I agree. I will merge the introduction of the barrier with the new name, and merge it with the configuration option and the alternative patching. This way, this change will be removed. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
On 1/23/19 15:45, Jan Beulich wrote: >>>> On 23.01.19 at 14:44, wrote: >> On 23/01/2019 13:39, Jan Beulich wrote: >>>>>> On 23.01.19 at 14:24, wrote: >>>> On 23/01/2019 11:51, Norbert Manthey wrote: >>>>> --- a/xen/include/xen/nospec.h >>>>> +++ b/xen/include/xen/nospec.h >>>>> @@ -68,10 +68,18 @@ static inline bool lfence_true(void) { return true; } >>>>>#endif >>>>> >>>>>/* >>>>> - * protect evaluation of conditional with respect to speculation >>>>> + * allow to protect evaluation of conditional with respect to >>>>> speculation on x86 >>>>> */ >>>>> -#define evaluate_nospec(condition) \ >>>>> +#if defined(CONFIG_L1TF_LFENCE_NONE) || !defined(CONFIG_X86) >>>>> +#define evaluate_nospec(condition) (condition) >>>>> +#elif defined(CONFIG_L1TF_LFENCE_BOTH) >>>>> +#define evaluate_nospec(condition) >>>>> \ >>>>>(((condition) && lfence_true()) || !lfence_true()) >>>>> +#elif defined(CONFIG_L1TF_LFENCE_TRUE) >>>>> +#define evaluate_nospec(condition) ((condition) && lfence_true()) >>>>> +#elif defined(CONFIG_L1TF_LFENCE_INTERMEDIATE) >>>>> +#define evaluate_nospec(condition) ({ bool res = (condition); rmb(); >>>>> res; }) >>>> +#endif >>>> >>>> None of the configs are defined on Arm, so can this be moved in arch-x86? >>> To be honest I'd like to avoid introducing asm/nospec.h for the time >>> being. >> How about adding them in system.h as we did for array_index_mask_nospec? > To tell you the truth, that's where Norbert had it originally. > I think that's not the right place though (also for > array_index_mask_nospec()). But I'll listen to a majority > thinking differently, at least as far as what is currently > lfence_true() goes. evaluate_nospec(), otoh, belongs where > it is now, I think. I will rename the lfence_true macro into "arch_barrier_nospec_true". Furthermore, I will merge the introduction of the macros with the introduction of the configuration and the alternative patching. Finally, I'll reuse the arch_barrier_nospec_true implementation in the evaluate_nospec macro. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 08/11] xen/evtchn: block speculative out-of-bound accesses
On 1/24/19 17:56, Jan Beulich wrote: On 23.01.19 at 12:57, wrote: >> --- a/xen/common/event_channel.c >> +++ b/xen/common/event_channel.c >> @@ -368,8 +368,14 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, >> evtchn_port_t port) >> if ( virq_is_global(virq) && (vcpu != 0) ) >> return -EINVAL; >> >> + /* >> +* Make sure the guest controlled value virq is bounded even during >> +* speculative execution. >> +*/ >> +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); > I think this wants to move ahead of the if() in context, to be independent > of the particular implementation of virq_is_global() (the current shape of > which is mostly fine, perhaps with the exception of the risk of the compiler > translating the switch() there by way of a jump table). This also moves it > closer to the if() the construct is a companion to. I understand the concern. However, because the value of virq would be changed before the virq_is_global check, couldn't that result in returning a wrong error code? The potential out-of-bound value is brought back into the valid range, so that the above check might fire incorrectly? > >> @@ -816,6 +822,12 @@ int set_global_virq_handler(struct domain *d, uint32_t >> virq) >> if (!virq_is_global(virq)) >> return -EINVAL; >> >> + /* >> +* Make sure the guest controlled value virq is bounded even during >> +* speculative execution. >> +*/ >> +virq = array_index_nospec(virq, ARRAY_SIZE(global_virq_handlers)); > Same here then. > >> @@ -931,7 +943,8 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int >> vcpu_id) >> struct evtchn *chn; >> long rc = 0; >> >> -if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) >> +if ( (vcpu_id >= d->max_vcpus) || >> + (d->vcpu[array_index_nospec(vcpu_id, d->max_vcpus)] == NULL) ) >> return -ENOENT; >> >> spin_lock(&d->event_lock); >> @@ -969,8 +982,10 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int >> vcpu_id) >> unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); >> chn->notify_vcpu_id = vcpu_id; >> pirq_set_affinity(d, chn->u.pirq.irq, >> - cpumask_of(d->vcpu[vcpu_id]->processor)); >> -link_pirq_port(port, chn, d->vcpu[vcpu_id]); >> + cpumask_of(d->vcpu[array_index_nospec(vcpu_id, >> + >> d->max_vcpus)]->processor)); >> +link_pirq_port(port, chn, d->vcpu[array_index_nospec(vcpu_id, >> + >> d->max_vcpus)]); > Using Andrew's new domain_vcpu() will improve readability, especially > after your change, quite a bit here. But of course code elsewhere will > benefit as well. You mean I should use the domain_vcpu function in both hunks, because due to the first one, the latter can never return NULL? I will rebase the series on top of this fresh change, and use the domain_vcpu function for the locations where I bound a vcpu_id. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
On 1/25/19 11:14, Jan Beulich wrote: On 24.01.19 at 22:29, wrote: >> Worse is the "evaluate condition, stash result, fence, use variable" >> option, which is almost completely useless. If you work out the >> resulting instruction stream, you'll have a conditional expression >> calculated down into a register, then a fence, then a test register and >> conditional jump into one of two basic blocks. This takes the perf hit, >> and doesn't protect either of the basic blocks for speculative >> mis-execution. > How does it not protect anything? It shrinks the speculation window > to just the register test and conditional branch, which ought to be > far smaller than that behind a memory access which fails to hit any > of the caches (and perhaps even any of the TLBs). This is the more > that LFENCE does specifically not prevent insn fetching from > continuing. > > That said I agree that the LFENCE would better sit between the > register test and the conditional branch, but as we've said so many > times before - this can't be achieved without compiler support. It's > said enough that the default "cc" clobber of asm()-s on x86 alone > prevents this from possibly working, while my over four year old > patch to add a means to avoid this has not seen sufficient > comments to get it into some hopefully acceptable shape, but also > has not been approved as is. > > Then again, following an earlier reply of mine on another sub- > thread, nothing really prevents the compiler from moving ahead > and folding the two LFENCEs of the "both branches" model into > one. It just so happens that apparently right now this never > occurs (assuming Norbert has done full generated code analysis > to confirm the intended placement). I am happy to jump back to my earlier version without a configuration option to protect both branches with a lfence instruction, using logic operators. For this version, I actually looked into the object dump and checked for various locations that the lfence statment was added for both blocks after the jump instruction. So, for the compiler I used did not move the lfence instruction before the jump instruction and merged them. I actually hope that the lazy evaluation of logic prevents the compiler from doing so. A note on performance: I created a set of micro benchmarks that call certain hypercall+command pairs in a tight loop many times. These hypercalls target locations I modified with this patch series. The current state of testing shows that in the worst case the full series adds at most 3% runtime (relative to what the same hypercall took before the modification). The testing used the evaluate_nospec implementation that protects both branches via logic operators. Given that those are micro benchmarks, I expect the impact for usual user work loads is even lower, but I did not measure any userland benchmarks yet. In case you point me to performance tests you typically use, I can also look into that. Thanks! Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 07/11] nospec: enable lfence on Intel
On 1/24/19 23:29, Andrew Cooper wrote: > On 23/01/2019 11:57, Norbert Manthey wrote: >> While the lfence instruction was added for all x86 platform in the >> beginning, it's useful to not block platforms that are not affected >> by the L1TF vulnerability. Therefore, the lfence instruction should >> only be introduced, in case the current CPU is an Intel CPU that is >> capable of hyper threading. This combination of features is added >> to the features that activate the alternative. >> >> This commit is part of the SpectreV1+L1TF mitigation patch series. >> >> Signed-off-by: Norbert Manthey >> >> --- >> xen/include/xen/nospec.h | 8 ++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h >> --- a/xen/include/xen/nospec.h >> +++ b/xen/include/xen/nospec.h >> @@ -7,6 +7,7 @@ >> #ifndef XEN_NOSPEC_H >> #define XEN_NOSPEC_H >> >> +#include >> #include >> >> /** >> @@ -68,7 +69,10 @@ static inline unsigned long >> array_index_mask_nospec(unsigned long index, >> * allow to insert a read memory barrier into conditionals >> */ >> #ifdef CONFIG_X86 >> -static inline bool lfence_true(void) { rmb(); return true; } >> +static inline bool lfence_true(void) { >> +alternative("", "lfence", X86_VENDOR_INTEL); > This doesn't do what you expect. It will cause the lfences to be > patched into existence on any hardware with an FPU (before a recent > patch of mine) or with VME (after a recent patch). After looking more into this, I would introduce another synthesized CPU feature flag, so that alternative patching can use this flag to patch the lfence in, in case the detected platform is vulnerable to L1TF. I would set this flag based on whether an L1TF vulnerable platform is detected, and an introduced command line option does not prevent this. Is this what you envision, or do I miss something? Best, Norbert > > ~Andrew Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
On 1/25/19 14:09, Jan Beulich wrote: On 25.01.19 at 11:50, wrote: >> On 1/25/19 11:14, Jan Beulich wrote: >> On 24.01.19 at 22:29, wrote: Worse is the "evaluate condition, stash result, fence, use variable" option, which is almost completely useless. If you work out the resulting instruction stream, you'll have a conditional expression calculated down into a register, then a fence, then a test register and conditional jump into one of two basic blocks. This takes the perf hit, and doesn't protect either of the basic blocks for speculative mis-execution. >>> How does it not protect anything? It shrinks the speculation window >>> to just the register test and conditional branch, which ought to be >>> far smaller than that behind a memory access which fails to hit any >>> of the caches (and perhaps even any of the TLBs). This is the more >>> that LFENCE does specifically not prevent insn fetching from >>> continuing. >>> >>> That said I agree that the LFENCE would better sit between the >>> register test and the conditional branch, but as we've said so many >>> times before - this can't be achieved without compiler support. It's >>> said enough that the default "cc" clobber of asm()-s on x86 alone >>> prevents this from possibly working, while my over four year old >>> patch to add a means to avoid this has not seen sufficient >>> comments to get it into some hopefully acceptable shape, but also >>> has not been approved as is. >>> >>> Then again, following an earlier reply of mine on another sub- >>> thread, nothing really prevents the compiler from moving ahead >>> and folding the two LFENCEs of the "both branches" model into >>> one. It just so happens that apparently right now this never >>> occurs (assuming Norbert has done full generated code analysis >>> to confirm the intended placement). >> I am happy to jump back to my earlier version without a configuration >> option to protect both branches with a lfence instruction, using logic >> operators. > I don't understand this, I'm afraid: What I've said was to support > my thinking of the && + || variant being identical in code and risk > to that using ?: . I.e. I'm not asking you to switch back. I understand that you did not ask. However, Andrew raised concerns, and I analyzed the binary output for the variant with logical operators. Hence, I'd like to keep that variant with the logical operators. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
On 1/28/19 08:35, Jan Beulich wrote: On 27.01.19 at 21:28, wrote: >> On 1/25/19 14:09, Jan Beulich wrote: >> On 25.01.19 at 11:50, wrote: On 1/25/19 11:14, Jan Beulich wrote: On 24.01.19 at 22:29, wrote: >> Worse is the "evaluate condition, stash result, fence, use variable" >> option, which is almost completely useless. If you work out the >> resulting instruction stream, you'll have a conditional expression >> calculated down into a register, then a fence, then a test register and >> conditional jump into one of two basic blocks. This takes the perf hit, >> and doesn't protect either of the basic blocks for speculative >> mis-execution. > How does it not protect anything? It shrinks the speculation window > to just the register test and conditional branch, which ought to be > far smaller than that behind a memory access which fails to hit any > of the caches (and perhaps even any of the TLBs). This is the more > that LFENCE does specifically not prevent insn fetching from > continuing. > > That said I agree that the LFENCE would better sit between the > register test and the conditional branch, but as we've said so many > times before - this can't be achieved without compiler support. It's > said enough that the default "cc" clobber of asm()-s on x86 alone > prevents this from possibly working, while my over four year old > patch to add a means to avoid this has not seen sufficient > comments to get it into some hopefully acceptable shape, but also > has not been approved as is. > > Then again, following an earlier reply of mine on another sub- > thread, nothing really prevents the compiler from moving ahead > and folding the two LFENCEs of the "both branches" model into > one. It just so happens that apparently right now this never > occurs (assuming Norbert has done full generated code analysis > to confirm the intended placement). I am happy to jump back to my earlier version without a configuration option to protect both branches with a lfence instruction, using logic operators. >>> I don't understand this, I'm afraid: What I've said was to support >>> my thinking of the && + || variant being identical in code and risk >>> to that using ?: . I.e. I'm not asking you to switch back. >> I understand that you did not ask. However, Andrew raised concerns, and >> I analyzed the binary output for the variant with logical operators. >> Hence, I'd like to keep that variant with the logical operators. > But didn't you say earlier that there was no difference in generated > code between the two variants? Yes, for the current commit, and for the 1 compiler I used. Personally, I prefer the logic operand variant. You seem to prefer the ternary variant, and Andrew at least raised concerns there. I would really like to move forward somehow, but currently it does not look really clear how to achieve that. I try to apply majority vote for each hunk that has been commented and create a v5 of the series. I even think about separating the introduction of eval_nospec and the arch_nospec_barrier macro into another series to move faster with the array_index_nospec-based changes first. Guidance is very welcome. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 03/11] config: introduce L1TF_LFENCE option
On 1/28/19 09:24, Jan Beulich wrote: On 28.01.19 at 08:56, wrote: >> On 1/28/19 08:35, Jan Beulich wrote: >> On 27.01.19 at 21:28, wrote: On 1/25/19 14:09, Jan Beulich wrote: On 25.01.19 at 11:50, wrote: >> On 1/25/19 11:14, Jan Beulich wrote: >> On 24.01.19 at 22:29, wrote: Worse is the "evaluate condition, stash result, fence, use variable" option, which is almost completely useless. If you work out the resulting instruction stream, you'll have a conditional expression calculated down into a register, then a fence, then a test register and conditional jump into one of two basic blocks. This takes the perf hit, and doesn't protect either of the basic blocks for speculative mis-execution. >>> How does it not protect anything? It shrinks the speculation window >>> to just the register test and conditional branch, which ought to be >>> far smaller than that behind a memory access which fails to hit any >>> of the caches (and perhaps even any of the TLBs). This is the more >>> that LFENCE does specifically not prevent insn fetching from >>> continuing. >>> >>> That said I agree that the LFENCE would better sit between the >>> register test and the conditional branch, but as we've said so many >>> times before - this can't be achieved without compiler support. It's >>> said enough that the default "cc" clobber of asm()-s on x86 alone >>> prevents this from possibly working, while my over four year old >>> patch to add a means to avoid this has not seen sufficient >>> comments to get it into some hopefully acceptable shape, but also >>> has not been approved as is. >>> >>> Then again, following an earlier reply of mine on another sub- >>> thread, nothing really prevents the compiler from moving ahead >>> and folding the two LFENCEs of the "both branches" model into >>> one. It just so happens that apparently right now this never >>> occurs (assuming Norbert has done full generated code analysis >>> to confirm the intended placement). >> I am happy to jump back to my earlier version without a configuration >> option to protect both branches with a lfence instruction, using logic >> operators. > I don't understand this, I'm afraid: What I've said was to support > my thinking of the && + || variant being identical in code and risk > to that using ?: . I.e. I'm not asking you to switch back. I understand that you did not ask. However, Andrew raised concerns, and I analyzed the binary output for the variant with logical operators. Hence, I'd like to keep that variant with the logical operators. >>> But didn't you say earlier that there was no difference in generated >>> code between the two variants? >> Yes, for the current commit, and for the 1 compiler I used. Personally, >> I prefer the logic operand variant. You seem to prefer the ternary >> variant, and Andrew at least raised concerns there. I would really like >> to move forward somehow, but currently it does not look really clear how >> to achieve that. > Well, being able to move forward implies getting a response to my > reply suggesting that both variants are equivalent in risk. If there > are convincing arguments that the (imo) worse (simply from a > readability pov) is indeed better from a risk (of the compiler not > doing what we want it to do) pov, I'd certainly give up my opposition. I understand the readability concern. The C standard makes similar promises about the semantics (left-to-right, and using sequence points). The implementation in the end seems to be up to the compiler. The risk is that future compilers treat the conditional operator differently than the one I used today. I'm fine with what we have right now. Once I'm done with a v5 candidate, I'll look into this comparison one more time. > >> I try to apply majority vote for each hunk that has been commented and >> create a v5 of the series. I even think about separating the >> introduction of eval_nospec and the arch_nospec_barrier macro into >> another series to move faster with the array_index_nospec-based changes >> first. Guidance is very welcome. > I have no problem picking patches out of order for committing. > For example I'd commit patches 10 and 11 of v4 as is once it > has the necessary release manager ack. I notice only now that > you didn't even Cc Jürgen. I guess I'll reply to the cover letter > asking for his opinion on the series as a whole. To be able to merge these patched independently, I will bring back the patch that was listed in the XSA, xsa289/0005-nospec-introduce-method-for-static-arrays.patch, as that function is required by patch 10 and 11. Best, Norbert > > Jan > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen a
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 09/11] x86/vioapic: block speculative out-of-bound accesses
On 1/25/19 17:34, Jan Beulich wrote: On 23.01.19 at 12:57, wrote: >> @@ -66,6 +67,9 @@ static struct hvm_vioapic *gsi_vioapic(const struct domain >> *d, >> { >> unsigned int i; >> >> +/* Make sure the compiler does not optimize the initialization */ >> +OPTIMIZER_HIDE_VAR(pin); > Since there's no initialization here, I think it would help to add "done > in the callers". Perhaps also "optimize away" or "delete"? > > And then I think you mean *pin. True, I will adapt both the comment and the OPTIMIZER_HIDE_VAR call. > >> @@ -212,7 +217,12 @@ static void vioapic_write_redirent( >> struct hvm_irq *hvm_irq = hvm_domain_irq(d); >> union vioapic_redir_entry *pent, ent; >> int unmasked = 0; >> -unsigned int gsi = vioapic->base_gsi + idx; >> +unsigned int gsi; >> + >> +/* Make sure no out-of-bound value for idx can be used */ >> +idx = array_index_nospec(idx, vioapic->nr_pins); >> + >> +gsi = vioapic->base_gsi + idx; > I dislike the disconnect from the respective bounds check: There's > only one caller, so the construct could be moved there, or > otherwise I'd like to see an ASSERT() added documenting that the > bounds check is expected to have happened in the caller. I agree that the idx value is used as an array index in this function only once. However, the gsi value also uses the value of idx, and as that is passed to other functions, I want to bound the gsi variable as well. Therefore, I chose to have a separate assignment for the idx variable. > >> @@ -378,7 +388,8 @@ static inline int pit_channel0_enabled(void) >> >> static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) >> { >> -uint16_t dest = vioapic->redirtbl[pin].fields.dest_id; >> +uint16_t dest = vioapic->redirtbl >> + [pin = array_index_nospec(pin, >> vioapic->nr_pins)].fields.dest_id; >> uint8_t dest_mode = vioapic->redirtbl[pin].fields.dest_mode; >> uint8_t delivery_mode = vioapic->redirtbl[pin].fields.delivery_mode; >> uint8_t vector = vioapic->redirtbl[pin].fields.vector; > I'm sorry, but despite prior discussions I'm still not happy about > this change - all of the callers pass known good values: > - vioapic_write_redirent() gets adjusted above, > - vioapic_irq_positive_edge() gets the value passed into here > from gsi_vioapic(), which you also take care of, > - vioapic_update_EOI() loops over all pins, so only passes in- > range values. > Therefore I still don't see what protection this change adds. > As per above, if it was to stay, some sort of connection to the > range check(s) it guards would otherwise be nice to establish, > but I realize that adding an ASSERT() here would go against > a certain aspect of review comments I gave on earlier versions. I will drop this change. As you called out, all callers are bound checked already. Hence, I will not add an assert. > >> @@ -463,7 +474,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, >> unsigned int pin) >> >> void vioapic_irq_positive_edge(struct domain *d, unsigned int irq) >> { >> -unsigned int pin; >> +unsigned int pin = 0; /* See gsi_vioapic */ >> struct hvm_vioapic *vioapic = gsi_vioapic(d, irq, &pin); >> union vioapic_redir_entry *ent; >> >> @@ -560,7 +571,7 @@ int vioapic_get_vector(const struct domain *d, unsigned >> int gsi) >> >> int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi) >> { >> -unsigned int pin; >> +unsigned int pin = 0; /* See gsi_vioapic */ >> const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); >> >> if ( !vioapic ) > Since there are more callers of gsi_vioapic(), justification should be > added to the description why only some need adjustment (or > otherwise, just to be on the safe side as well as for consistency > all of them should be updated, in which case it would still be nice > to call out the ones which really [don't] need updating). I will extend the explanation in the commit message. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 09/11] x86/vioapic: block speculative out-of-bound accesses
On 1/28/19 12:12, Jan Beulich wrote: On 28.01.19 at 12:03, wrote: >> On 1/25/19 17:34, Jan Beulich wrote: >> On 23.01.19 at 12:57, wrote: @@ -212,7 +217,12 @@ static void vioapic_write_redirent( struct hvm_irq *hvm_irq = hvm_domain_irq(d); union vioapic_redir_entry *pent, ent; int unmasked = 0; -unsigned int gsi = vioapic->base_gsi + idx; +unsigned int gsi; + +/* Make sure no out-of-bound value for idx can be used */ +idx = array_index_nospec(idx, vioapic->nr_pins); + +gsi = vioapic->base_gsi + idx; >>> I dislike the disconnect from the respective bounds check: There's >>> only one caller, so the construct could be moved there, or >>> otherwise I'd like to see an ASSERT() added documenting that the >>> bounds check is expected to have happened in the caller. >> I agree that the idx value is used as an array index in this function >> only once. However, the gsi value also uses the value of idx, and as >> that is passed to other functions, I want to bound the gsi variable as >> well. Therefore, I chose to have a separate assignment for the idx variable. > I don't mind the separate assignment, and I didn't complain > about idx being used just once. What I said is that there's > only one caller of the function. If the bounds checking was > done there, "gsi" here would be equally "bounded" afaict. > And I did suggest an alternative in case you dislike the > moving of the construct you add. Ah, I understand your previous sentence differently now. Thanks for clarifying. I like to keep the nospec statements close to the problematic use, so that eventual future callers benefit from that as well. Therefore, I'll add an ASSERT statement with the bound check. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] SpectreV1+L1TF Patch Series
On 1/24/19 22:05, Andrew Cooper wrote: > On 23/01/2019 11:51, Norbert Manthey wrote: >> Dear all, >> >> This patch series attempts to mitigate the issue that have been raised in the >> XSA-289 (https://xenbits.xen.org/xsa/advisory-289.html). To block speculative >> execution on Intel hardware, an lfence instruction is required to make sure >> that selected checks are not bypassed. Speculative out-of-bound accesses can >> be prevented by using the array_index_nospec macro. >> >> The lfence instruction should be added on x86 platforms only. To not affect >> platforms that are not affected by the L1TF vulnerability, the lfence >> instruction is patched in via alternative patching on Intel CPUs only. >> Furthermore, the compile time configuration allows to choose how to protect >> the >> evaluation of conditions with the lfence instruction. > Hello, > > First of all, I've dusted off an old patch of mine and made it > speculatively safe. > > https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=9e92acf1b752dfdfb294234b32d1fa9f55bfdc0f > > Using the new domain_vcpu() helper should tidy up quite a few patches in > the series. I will use the introduced function and apply it where I touched code, thanks! > > > Next, to the ordering of patches. > > Please introduce the Kconfig variable(s) first. I'll follow up on that > thread about options. I will drop the Kconfig option and go with "protect both branches" only. > > Next, introduce a new synthetic feature bit to cause patching to occur, > and logic to trigger it in appropriate circumstances. Look through the > history of include/asm-x86/cpufeatures.h to see some examples from the > previous speculative mitigation work. In particular, you'll need a > command line parameter to control the use of this functionality when it > is compiled in. I will introduce a synthesized feature, and a command line option, and add documentation. > > Next, introduce eval_nospec(). To avoid interfering with other > architectures, you probably want something like this: Do you want me to introduce the new macro in a separate commit, and use it in follow up commits? I have been told previously to not split introduced functions from their use cases, but merge them with at least one. Your above commit again only introduces an at this point unused function. Is there a Xen specifc style rule for this? > xen/nospec.h contains: > > /* > * Evaluate a condition in a speculation-safe way. > * Stub implementation for builds which don't care. > */ > #ifndef eval_nospec > #define eval_nospec(x) (x) > #endif > > and something containing x86's implementation. TBH, I personally think > asm/nospec.h is overdue for introducing now. For now, I would like to not introduce new files, as Jan also suggested earlier. Best, Norbert > > ~Andrew Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 05/11] common/grant_table: block speculative out-of-bound accesses
On 1/23/19 14:37, Jan Beulich wrote: On 23.01.19 at 12:51, wrote: >> @@ -1268,7 +1272,8 @@ unmap_common( >> } >> >> smp_rmb(); >> -map = &maptrack_entry(lgt, op->handle); >> +map = &maptrack_entry(lgt, array_index_nospec(op->handle, >> + lgt->maptrack_limit)); > It might be better to move this into maptrack_entry() itself, or > make a maptrack_entry_nospec() clone (as several but not all > uses may indeed not be in need of the extra protection). At > least the ones in steal_maptrack_handle() and > put_maptrack_handle() also look potentially suspicious. I will move the nospec protection into the macro. I would like to avoid introducing yet another macro. > >> @@ -2223,7 +2231,8 @@ gnttab_transfer( >> okay = gnttab_prepare_for_transfer(e, d, gop.ref); >> spin_lock(&e->page_alloc_lock); >> >> -if ( unlikely(!okay) || unlikely(e->is_dying) ) >> +/* Make sure this check is not bypassed speculatively */ >> +if ( evaluate_nospec(unlikely(!okay) || unlikely(e->is_dying)) ) >> { >> bool_t drop_dom_ref = !domain_adjust_tot_pages(e, -1); > What is it that makes this particular if() different from other > surrounding ones? In particular the version dependent code (a few > lines down from here as well as elsewhere) look to be easily > divertable onto the wrong branch, then causing out of bounds > speculative accesses due to the different (version dependent) > shared entry sizes. This check evaluates the variable okay, which indicates whether the value of gop.ref is bounded correctly. The next conditional that uses code based on a version should be fine, even when being entered speculatively with the wrong version setup, as the value of gop.ref is correct (i.e. architecturally visible after this lfence) already. As the version dependent macros are used, i.e. shared_entry_v1 and shared_entry_v2, I do not see a risk why speculative out-of-bound access should happen here. > >> @@ -3215,6 +3230,10 @@ swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b) >> if ( ref_a == ref_b ) >> goto out; >> >> +/* Make sure the above check is not bypassed speculatively */ >> +ref_a = array_index_nospec(ref_a, nr_grant_entries(d->grant_table)); >> +ref_b = array_index_nospec(ref_b, nr_grant_entries(d->grant_table)); > I think this wants to move up ahead of the if() in context, and the > comment be changed to plural. I will move the code above the comparison. > >> --- a/xen/include/xen/nospec.h >> +++ b/xen/include/xen/nospec.h >> @@ -87,6 +87,15 @@ static inline bool lfence_true(void) { return true; } >> #define evaluate_nospec(condition) ({ bool res = (condition); rmb(); res; >> }) >> #endif >> >> +/* >> + * allow to block speculative execution in generic code >> + */ > Comment style again. I will fix the comment. > >> +#ifdef CONFIG_X86 >> +#define block_speculation() rmb() >> +#else >> +#define block_speculation() >> +#endif > Why does this not simply resolve to what currently is named lfence_true() > (perhaps with a cast to void)? And why does this not depend on the > Kconfig setting? I will update the definition of this macro to what is called lfence_true() in this series, and cast it to void. I will furthermore split the introduction of this macro and this commit. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 05/11] common/grant_table: block speculative out-of-bound accesses
On 1/28/19 16:09, Jan Beulich wrote: On 28.01.19 at 15:45, wrote: >> On 1/23/19 14:37, Jan Beulich wrote: >> On 23.01.19 at 12:51, wrote: @@ -2223,7 +2231,8 @@ gnttab_transfer( okay = gnttab_prepare_for_transfer(e, d, gop.ref); spin_lock(&e->page_alloc_lock); -if ( unlikely(!okay) || unlikely(e->is_dying) ) +/* Make sure this check is not bypassed speculatively */ +if ( evaluate_nospec(unlikely(!okay) || unlikely(e->is_dying)) ) { bool_t drop_dom_ref = !domain_adjust_tot_pages(e, -1); >>> What is it that makes this particular if() different from other >>> surrounding ones? In particular the version dependent code (a few >>> lines down from here as well as elsewhere) look to be easily >>> divertable onto the wrong branch, then causing out of bounds >>> speculative accesses due to the different (version dependent) >>> shared entry sizes. >> This check evaluates the variable okay, which indicates whether the >> value of gop.ref is bounded correctly. > How does gop.ref come into play here? The if() above does not use > or update it. > >> The next conditional that uses >> code based on a version should be fine, even when being entered >> speculatively with the wrong version setup, as the value of gop.ref is >> correct (i.e. architecturally visible after this lfence) already. As the >> version dependent macros are used, i.e. shared_entry_v1 and >> shared_entry_v2, I do not see a risk why speculative out-of-bound access >> should happen here. > As said - v2 entries are larger than v1 ones. Therefore, if the > processor wrongly speculates along the v2 path, it may use > indexes valid for v1, but beyond the size when scaled by v2 > element size (whereas ->shared_raw[], aliased with > ->shared_v1[] and ->shared_v2[], was actually set up with v1 > element size). I am aware that both version use the same base array, and access it via different macros, which essentially partition the array based on the size of the respective struct. The underlying raw array has the same size for both version. In case the CPU decides to enter the wrong branch, but uses a valid gop.ref value, no out-of-bound accesses will happen, because in each branch, the accesses via shared_entry_v1 or shared_entry_v2 make sure the correct math is used to divide the raw array into chunks of the size of the correct structure. I agree that speculative execution might access a v1 raw array with v2 offsets, but that does not result in an out-of-bound access. The data that is used afterwards might be garbage, here sha->frame. Whether accesses based on this should be protected could be another discussion, but it at least looks complex to turn that into an exploitable pattern. > > And please don't forget - this subsequent conditional was just an > easy example. What I'm really after is why you modify the if() > above, without there being any array index involved. The check that I protected uses the value of the variable okay, which - at least after the introduced protecting lfence instruction - holds the return value of the function gnttab_prepare_for_transfer. This function, among others, checks whether gop.ref is bounded. By protecting the evaluation of okay, I make sure to continue only in case gop.ref is bounded. Consequently, further (speculative) execution is aware of a valid value of gop.ref. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 05/11] common/grant_table: block speculative out-of-bound accesses
On 1/29/19 10:46, Jan Beulich wrote: >>>> Norbert Manthey 01/29/19 9:35 AM >>> >> I am aware that both version use the same base array, and access it via >> different macros, which essentially partition the array based on the >> size of the respective struct. The underlying raw array has the same >> size for both version. > And this is the problem afaics: If a guest has requested its grant table to > be sized as a single page, this page can fit twice as many entries for > v1 than it can fit for v2. Hence the v1 grant reference pointing at the last > entry would point at the last entry in the (not mapped) second page for v2. I might understand the code wrong, but a guest would ask to get at most N grant frames, and this number cannot be increased afterwards, i.e. the field gt->max_grant_frames is written exactly once. Furthermore, the void** shared_raw array is allocated and written exactly once with sufficient pointers for, namely gt->max_grant_frames many in function grant_table_init. Hence, independently of the version being used, at least the shared_raw array cannot be used for out-of-bound accesses during speculation with my above evaluate_nospec. That being said, let's assume we have a v1 grant table, and speculation uses the v2 accesses. In that case, an existing and zero-initialized entry of shared_raw might be used in the first part of the shared_entry_v2 macro, and even if that pointer would be non-NULL, the page it would point to would have been cleared when growing the grant table in function gnttab_grow_table. That being said, I believe it is fine to let the above speculative happen without extra hardening. Best, Norbert PS: I just noticed that the shared_raw array might be allocated with a smaller size, as long as more than 1 grant_entry fits into a page. > > > Jan > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] SpectreV1+L1TF Patch Series v5
Dear all, This patch series attempts to mitigate the issue that have been raised in the XSA-289 (https://xenbits.xen.org/xsa/advisory-289.html). To block speculative execution on Intel hardware, an lfence instruction is required to make sure that selected checks are not bypassed. Speculative out-of-bound accesses can be prevented by using the array_index_nospec macro. The lfence instruction should be added on x86 platforms only. To not affect platforms that are not affected by the L1TF vulnerability, the lfence instruction is patched in via alternative patching on L1TF vulnerable CPUs only. To control the patching mechanism, I introduced a command line option and a synthesized CPU feature flag. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 1/9] xen/evtchn: block speculative out-of-bound accesses
Guests can issue event channel interaction with guest specified data. To avoid speculative out-of-bound accesses, we use the nospec macros. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/common/event_channel.c | 25 ++--- xen/common/event_fifo.c| 15 --- xen/include/xen/event.h| 5 +++-- 3 files changed, 33 insertions(+), 12 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -365,11 +365,16 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port) if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) return -EINVAL; + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); + if ( virq_is_global(virq) && (vcpu != 0) ) return -EINVAL; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) +if ( (vcpu < 0) || ((v = domain_vcpu(d, vcpu)) == NULL) ) return -ENOENT; spin_lock(&d->event_lock); @@ -418,8 +423,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind) intport, vcpu = bind->vcpu; long rc = 0; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - (d->vcpu[vcpu] == NULL) ) +if ( (vcpu < 0) || domain_vcpu(d, vcpu) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -813,6 +817,13 @@ int set_global_virq_handler(struct domain *d, uint32_t virq) if (virq >= NR_VIRQS) return -EINVAL; + + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(global_virq_handlers)); + if (!virq_is_global(virq)) return -EINVAL; @@ -931,7 +942,7 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) struct evtchn *chn; long rc = 0; -if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) +if ( !domain_vcpu(d, vcpu_id) ) return -ENOENT; spin_lock(&d->event_lock); @@ -969,8 +980,8 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); chn->notify_vcpu_id = vcpu_id; pirq_set_affinity(d, chn->u.pirq.irq, - cpumask_of(d->vcpu[vcpu_id]->processor)); -link_pirq_port(port, chn, d->vcpu[vcpu_id]); + cpumask_of(domain_vcpu(d, vcpu_id)->processor)); +link_pirq_port(port, chn, domain_vcpu(d, vcpu_id)); break; default: rc = -EINVAL; diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -33,7 +33,8 @@ static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d, */ smp_rmb(); -p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; +p = array_index_nospec(port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE, + d->evtchn_fifo->num_evtchns); w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; return d->evtchn_fifo->event_array[p] + w; @@ -516,14 +517,22 @@ int evtchn_fifo_init_control(struct evtchn_init_control *init_control) gfn = init_control->control_gfn; offset = init_control->offset; -if ( vcpu_id >= d->max_vcpus || !d->vcpu[vcpu_id] ) +if ( !domain_vcpu(d, vcpu_id) ) return -ENOENT; -v = d->vcpu[vcpu_id]; + +v = domain_vcpu(d, vcpu_id); /* Must not cross page boundary. */ if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) return -EINVAL; +/* + * Make sure the guest controlled value offset is bounded even during + * speculative execution. + */ +offset = array_index_nospec(offset, + PAGE_SIZE - sizeof(evtchn_fifo_control_block_t) + 1); + /* Must be 8-bytes aligned. */ if ( offset & (8 - 1) ) return -EINVAL; diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h --- a/xen/include/xen/event.h +++ b/xen/include/xen/event.h @@ -13,6 +13,7 @@ #include #include #include +#include #include /* @@ -96,7 +97,7 @@ void arch_evtchn_inject(struct vcpu *v); * The first bucket is directly accessed via d->evtchn. */ #define group_from_port(d, p) \ -((d)->evtchn_group[(p) / EVTCHNS_PER_GROUP]) +array_access_nospec((d)->evtchn_group, (p) / EVTCHNS_PER_GROUP) #define bucket_from_port(d, p) \ ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET]) @@ -110,7 +111,7 @@ static inline bool_t port_is_v
[Xen-devel] [PATCH SpectreV1+L1TF v5 2/9] x86/vioapic: block speculative out-of-bound accesses
When interacting with io apic, a guest can specify values that are used as index to structures, and whose values are not compared against upper bounds to prevent speculative out-of-bound accesses. This change prevents these speculative accesses. Furthermore, two variables are initialized and the compiler is asked to not optimized these initializations, as the uninitialized, potentially guest controlled, variables might be used in a speculative out-of-bound access. As the two problematic variables are both used in the common function gsi_vioapic, the mitigation is implemented there. Currently, the problematic callers are the functions vioapic_irq_positive_edge and vioapic_get_trigger_mode. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/arch/x86/hvm/vioapic.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -66,6 +67,12 @@ static struct hvm_vioapic *gsi_vioapic(const struct domain *d, { unsigned int i; +/* + * Make sure the compiler does not optimize away the initialization done by + * callers + */ +OPTIMIZER_HIDE_VAR(*pin); + for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) { struct hvm_vioapic *vioapic = domain_vioapic(d, i); @@ -117,7 +124,8 @@ static uint32_t vioapic_read_indirect(const struct hvm_vioapic *vioapic) break; } -redir_content = vioapic->redirtbl[redir_index].bits; +redir_content = vioapic->redirtbl[array_index_nospec(redir_index, + vioapic->nr_pins)].bits; result = (vioapic->ioregsel & 1) ? (redir_content >> 32) : redir_content; break; @@ -212,7 +220,15 @@ static void vioapic_write_redirent( struct hvm_irq *hvm_irq = hvm_domain_irq(d); union vioapic_redir_entry *pent, ent; int unmasked = 0; -unsigned int gsi = vioapic->base_gsi + idx; +unsigned int gsi; + +/* Callers of this function should make sure idx is bounded appropriately*/ +ASSERT(idx < vioapic->nr_pins); + +/* Make sure no out-of-bound value for idx can be used */ +idx = array_index_nospec(idx, vioapic->nr_pins); + +gsi = vioapic->base_gsi + idx; spin_lock(&d->arch.hvm.irq_lock); @@ -467,7 +483,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) void vioapic_irq_positive_edge(struct domain *d, unsigned int irq) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ struct hvm_vioapic *vioapic = gsi_vioapic(d, irq, &pin); union vioapic_redir_entry *ent; @@ -564,7 +580,7 @@ int vioapic_get_vector(const struct domain *d, unsigned int gsi) int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 3/9] x86/hvm: block speculative out-of-bound accesses
There are multiple arrays in the HVM interface that are accessed with indices that are provided by the guest. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro. When blocking speculative out-of-bound accesses, we can classify arrays into dynamic arrays and static arrays. Where the former are allocated during run time, the size of the latter is known during compile time. On static arrays, compiler might be able to block speculative accesses in the future. We introduce another macro that uses the ARRAY_SIZE macro to block speculative accesses. For arrays that are statically accessed, this macro can be used instead of the usual macro. Using this macro results in more readable code, and allows to modify the way this case is handled in a single place. This commit is part of the SpectreV1+L1TF mitigation patch series. Reported-by: Pawel Wieczorkiewicz Signed-off-by: Norbert Manthey --- xen/arch/x86/hvm/hvm.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -2092,7 +2093,7 @@ int hvm_mov_from_cr(unsigned int cr, unsigned int gpr) case 2: case 3: case 4: -val = curr->arch.hvm.guest_cr[cr]; +val = array_access_nospec(curr->arch.hvm.guest_cr, cr); break; case 8: val = (vlapic_get_reg(vcpu_vlapic(curr), APIC_TASKPRI) & 0xf0) >> 4; @@ -3438,13 +3439,15 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( !d->arch.cpuid->basic.mtrr ) goto gp_fault; index = msr - MSR_MTRRfix16K_8; -*msr_content = fixed_range_base[index + 1]; +*msr_content = fixed_range_base[array_index_nospec(index + 1, + ARRAY_SIZE(v->arch.hvm.mtrr.fixed_ranges))]; break; case MSR_MTRRfix4K_C...MSR_MTRRfix4K_F8000: if ( !d->arch.cpuid->basic.mtrr ) goto gp_fault; index = msr - MSR_MTRRfix4K_C; -*msr_content = fixed_range_base[index + 3]; +*msr_content = fixed_range_base[array_index_nospec(index + 3, + ARRAY_SIZE(v->arch.hvm.mtrr.fixed_ranges))]; break; case MSR_IA32_MTRR_PHYSBASE(0)...MSR_IA32_MTRR_PHYSMASK(MTRR_VCNT_MAX - 1): if ( !d->arch.cpuid->basic.mtrr ) @@ -3453,7 +3456,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( (index / 2) >= MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT) ) goto gp_fault; -*msr_content = var_range_base[index]; +*msr_content = var_range_base[array_index_nospec(index, + MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT))]; break; case MSR_IA32_XSS: @@ -4016,7 +4020,7 @@ static int hvmop_set_evtchn_upcall_vector( if ( op.vector < 0x10 ) return -EINVAL; -if ( op.vcpu >= d->max_vcpus || (v = d->vcpu[op.vcpu]) == NULL ) +if ( (v = domain_vcpu(d, op.vcpu)) == NULL ) return -ENOENT; printk(XENLOG_G_INFO "%pv: upcall vector %02x\n", v, op.vector); @@ -4104,6 +4108,12 @@ static int hvmop_set_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; @@ -4370,6 +4380,12 @@ static int hvmop_get_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 5/9] nospec: introduce evaluate_nospec
Since the L1TF vulnerability of Intel CPUs, loading hypervisor data into L1 cache is problemetic, because when hyperthreading is used as well, a guest running on the sibling core can leak this potentially secret data. To prevent these speculative accesses, we block speculation after accessing the domain property field by adding lfence instructions. This way, the CPU continues executing and loading data only once the condition is actually evaluated. As the macros are typically used in if statements, the lfence has to come in a compatible way. Therefore, a function that returns true after an lfence instruction is introduced. To protect both branches after a conditional, an lfence instruction has to be added for the two branches. To be able to block speculation after several evalauations, the generic barrier macro block_speculation is also introduced. As the L1TF vulnerability is only present on the x86 architecture, the macros will not use the lfence instruction on other architectures and the protection is disabled during compilation. By default, the lfence instruction is not present either. Only when a L1TF vulnerable platform is detected, the lfence instruction is patched in via alterantive patching. Introducing the lfence instructions catches a lot of potential leaks with a simple unintrusive code change. During performance testing, we did not notice performance effects. Signed-off-by: Norbert Manthey --- xen/include/xen/nospec.h | 28 1 file changed, 28 insertions(+) diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -7,6 +7,7 @@ #ifndef XEN_NOSPEC_H #define XEN_NOSPEC_H +#include #include /** @@ -64,6 +65,33 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, #define array_access_nospec(array, index) \ (array)[array_index_nospec(index, ARRAY_SIZE(array))] +/* + * Allow to insert a read memory barrier into conditionals + */ +#if defined(CONFIG_X86) && defined(CONFIG_HVM) +static inline bool arch_barrier_nospec_true(void) { +alternative("", "lfence", X86_FEATURE_SC_L1TF_VULN); +return true; +} +#else +static inline bool arch_barrier_nospec_true(void) { return true; } +#endif + +/* + * Allow to protect evaluation of conditional with respect to speculation on x86 + */ +#ifndef CONFIG_X86 +#define evaluate_nospec(condition) (condition) +#else +#define evaluate_nospec(condition) \ +((condition) ? arch_barrier_nospec_true() : !arch_barrier_nospec_true()) +#endif + +/* + * Allow to block speculative execution in generic code + */ +#define block_speculation() (void)arch_barrier_nospec_true() + #endif /* XEN_NOSPEC_H */ /* -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 4/9] spec: add l1tf-barrier
To control the runtime behavior on L1TF vulnerable platforms better, the command line option l1tf-barrier is introduced. This option controls whether on vulnerable x86 platforms the lfence instruction is used to prevent speculative execution from bypassing the evaluation of conditionals that are protected with the evaluate_nospec macro. By now, Xen is capable of identifying L1TF vulnerable hardware. However, this information cannot be used for alternative patching, as a CPU feature is required. To control alternative patching with the command line option, a new x86 feature "X86_FEATURE_SC_L1TF_VULN" is introduced. This feature is used to patch the lfence instruction into the arch_barrier_nospec_true function. The feature is enabled only if L1TF vulnerable hardware is detected and the command line option does not prevent using this feature. Signed-off-by: Norbert Manthey --- docs/misc/xen-command-line.pandoc | 14 ++ xen/arch/x86/spec_ctrl.c | 18 -- xen/include/asm-x86/cpufeatures.h | 1 + xen/include/asm-x86/spec_ctrl.h | 1 + 4 files changed, 28 insertions(+), 6 deletions(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -463,9 +463,9 @@ accounting for hardware capabilities as enumerated via CPUID. Currently accepted: -The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`, -`l1d-flush` and `ssbd` are used by default if available and applicable. They can -be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and +The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`, `l1d-flush`, +`l1tf-barrier` and `ssbd` are used by default if available and applicable. They +can be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and won't offer them to guests. ### cpuid_mask_cpu @@ -1876,7 +1876,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`). ### spec-ctrl (x86) > `= List of [ , xen=, {pv,hvm,msr-sc,rsb}=, > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu, -> l1d-flush}= ]` +> l1d-flush,l1tf-barrier}= ]` Controls for speculative execution sidechannel mitigations. By default, Xen will pick the most appropriate mitigations based on compiled in support, @@ -1942,6 +1942,12 @@ Irrespective of Xen's setting, the feature is virtualised for HVM guests to use. By default, Xen will enable this mitigation on hardware believed to be vulnerable to L1TF. +On hardware vulnerable to L1TF, the `l1tf-barrier=` option can be used to force +or prevent Xen from protecting evaluations inside the hypervisor with a barrier +instruction to not load potentially secret information into L1 cache. By +default, Xen will enable this mitigation on hardware believed to be vulnerable +to L1TF. + ### sync_console > `= ` diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c --- a/xen/arch/x86/spec_ctrl.c +++ b/xen/arch/x86/spec_ctrl.c @@ -21,6 +21,7 @@ #include #include +#include #include #include #include @@ -50,6 +51,7 @@ bool __read_mostly opt_ibpb = true; bool __read_mostly opt_ssbd = false; int8_t __read_mostly opt_eager_fpu = -1; int8_t __read_mostly opt_l1d_flush = -1; +int8_t __read_mostly opt_l1tf_barrier = -1; bool __initdata bsp_delay_spec_ctrl; uint8_t __read_mostly default_xen_spec_ctrl; @@ -100,6 +102,7 @@ static int __init parse_spec_ctrl(const char *s) opt_ibpb = false; opt_ssbd = false; opt_l1d_flush = 0; +opt_l1tf_barrier = 0; } else if ( val > 0 ) rc = -EINVAL; @@ -157,6 +160,8 @@ static int __init parse_spec_ctrl(const char *s) opt_eager_fpu = val; else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 ) opt_l1d_flush = val; +else if ( (val = parse_boolean("l1tf-barrier", s, ss)) >= 0 ) +opt_l1tf_barrier = val; else rc = -EINVAL; @@ -248,7 +253,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) "\n"); /* Settings for Xen's protection, irrespective of guests. */ -printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n", +printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n", thunk == THUNK_NONE ? "N/A" : thunk == THUNK_RETPOLINE ? "RETPOLINE" : thunk == THUNK_LFENCE? "LFENCE" : @@ -258,7 +263,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) !boot_cpu_has(X86_FEATURE_SSBD) ? "" : (default_xen_spec_ctrl & SPEC_CTRL_SSBD) ? " SSBD+" : " SSBD-",
[Xen-devel] [PATCH SpectreV1+L1TF v5 6/9] is_control_domain: block speculation
Checks of domain properties, such as is_hardware_domain or is_hvm_domain, might be bypassed by speculatively executing these instructions. A reason for bypassing these checks is that these macros access the domain structure via a pointer, and check a certain field. Since this memory access is slow, the CPU assumes a returned value and continues the execution. In case an is_control_domain check is bypassed, for example during a hypercall, data that should only be accessible by the control domain could be loaded into the cache. Signed-off-by: Norbert Manthey --- xen/include/xen/sched.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -908,10 +909,10 @@ void watchdog_domain_destroy(struct domain *d); *(that is, this would not be suitable for a driver domain) * - There is never a reason to deny the hardware domain access to this */ -#define is_hardware_domain(_d) ((_d) == hardware_domain) +#define is_hardware_domain(_d) evaluate_nospec((_d) == hardware_domain) /* This check is for functionality specific to a control domain */ -#define is_control_domain(_d) ((_d)->is_privileged) +#define is_control_domain(_d) evaluate_nospec((_d)->is_privileged) #define VM_ASSIST(d, t) (test_bit(VMASST_TYPE_ ## t, &(d)->vm_assist)) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 7/9] is_hvm/pv_domain: block speculation
When checking for being an hvm domain, or PV domain, we have to make sure that speculation cannot bypass that check, and eventually access data that should not end up in cache for the current domain type. Signed-off-by: Norbert Manthey --- xen/include/xen/sched.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -918,7 +918,8 @@ void watchdog_domain_destroy(struct domain *d); static inline bool is_pv_domain(const struct domain *d) { -return IS_ENABLED(CONFIG_PV) ? d->guest_type == guest_type_pv : false; +return IS_ENABLED(CONFIG_PV) + ? evaluate_nospec(d->guest_type == guest_type_pv) : false; } static inline bool is_pv_vcpu(const struct vcpu *v) @@ -949,7 +950,8 @@ static inline bool is_pv_64bit_vcpu(const struct vcpu *v) #endif static inline bool is_hvm_domain(const struct domain *d) { -return IS_ENABLED(CONFIG_HVM) ? d->guest_type == guest_type_hvm : false; +return IS_ENABLED(CONFIG_HVM) + ? evaluate_nospec(d->guest_type == guest_type_hvm) : false; } static inline bool is_hvm_vcpu(const struct vcpu *v) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 8/9] common/grant_table: block speculative out-of-bound accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is also used for memory loads. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable. However, there are also memory accesses that cannot be protected by a single array protection, or multiple accesses in a row. To protect these, a nospec barrier is placed between the actual range check and the access via the block_speculation macro. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/common/grant_table.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -203,8 +204,9 @@ static inline unsigned int nr_status_frames(const struct grant_table *gt) } #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) -#define maptrack_entry(t, e) \ -((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) +#define maptrack_entry(t, e) \ +((t)->maptrack[array_index_nospec(e, (t)->maptrack_limit) \ + /MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) static inline unsigned int nr_maptrack_frames(struct grant_table *t) @@ -963,6 +965,9 @@ map_grant_ref( PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", op->ref, rgt->domain->domain_id); +/* Make sure the above check is not bypassed speculatively */ +op->ref = array_index_nospec(op->ref, nr_grant_entries(rgt)); + act = active_entry_acquire(rgt, op->ref); shah = shared_entry_header(rgt, op->ref); status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); @@ -2026,6 +2031,9 @@ gnttab_prepare_for_transfer( goto fail; } +/* Make sure the above check is not bypassed speculatively */ +ref = array_index_nospec(ref, nr_grant_entries(rgt)); + sha = shared_entry_header(rgt, ref); scombo.word = *(u32 *)&sha->flags; @@ -2223,7 +2231,8 @@ gnttab_transfer( okay = gnttab_prepare_for_transfer(e, d, gop.ref); spin_lock(&e->page_alloc_lock); -if ( unlikely(!okay) || unlikely(e->is_dying) ) +/* Make sure this check is not bypassed speculatively */ +if ( evaluate_nospec(unlikely(!okay) || unlikely(e->is_dying)) ) { bool_t drop_dom_ref = !domain_adjust_tot_pages(e, -1); @@ -2408,6 +2417,9 @@ acquire_grant_for_copy( PIN_FAIL(gt_unlock_out, GNTST_bad_gntref, "Bad grant reference %#x\n", gref); +/* Make sure the above check is not bypassed speculatively */ +gref = array_index_nospec(gref, nr_grant_entries(rgt)); + act = active_entry_acquire(rgt, gref); shah = shared_entry_header(rgt, gref); if ( rgt->gt_version == 1 ) @@ -2826,6 +2838,9 @@ static int gnttab_copy_buf(const struct gnttab_copy *op, op->dest.offset, dest->ptr.offset, op->len, dest->len); +/* Make sure the above checks are not bypassed speculatively */ +block_speculation(); + memcpy(dest->virt + op->dest.offset, src->virt + op->source.offset, op->len); gnttab_mark_dirty(dest->domain, dest->mfn); @@ -3211,6 +3226,10 @@ swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b) if ( unlikely(ref_b >= nr_grant_entries(d->grant_table))) PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-b %#x\n", ref_b); +/* Make sure the above checks are not bypassed speculatively */ +ref_a = array_index_nospec(ref_a, nr_grant_entries(d->grant_table)); +ref_b = array_index_nospec(ref_b, nr_grant_entries(d->grant_table)); + /* Swapping the same ref is a no-op. */ if ( ref_a == ref_b ) goto out; -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v5 9/9] common/memory: block speculative out-of-bound accesses
The get_page_from_gfn method returns a pointer to a page that belongs to a gfn. Before returning the pointer, the gfn is checked for being valid. Under speculation, these checks can be bypassed, so that the function get_page is still executed partially. Consequently, the function page_get_owner_and_reference might be executed partially as well. In this function, the computed pointer is accessed, resulting in a speculative out-of-bound address load. As the gfn can be controlled by a guest, this access is problematic. To mitigate the root cause, an lfence instruction is added via the evaluate_nospec macro. To make the protection generic, we do not introduce the lfence instruction for this single check, but add it to the mfn_valid function. This way, other potentially problematic accesses are protected as well. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- xen/common/pdx.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/xen/common/pdx.c b/xen/common/pdx.c --- a/xen/common/pdx.c +++ b/xen/common/pdx.c @@ -18,6 +18,7 @@ #include #include #include +#include /* Parameters for PFN/MADDR compression. */ unsigned long __read_mostly max_pdx; @@ -33,10 +34,10 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( bool __mfn_valid(unsigned long mfn) { -return likely(mfn < max_page) && - likely(!(mfn & pfn_hole_mask)) && - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, - pdx_group_valid)); +return evaluate_nospec(likely(mfn < max_page) && + likely(!(mfn & pfn_hole_mask)) && + likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, + pdx_group_valid))); } /* Sets all bits from the most-significant 1-bit down to the LSB */ -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v4 05/11] common/grant_table: block speculative out-of-bound accesses
On 1/29/19 16:11, Jan Beulich wrote: >>>> On 29.01.19 at 14:47, wrote: >> On 1/29/19 10:46, Jan Beulich wrote: >>>>>> Norbert Manthey 01/29/19 9:35 AM >>> >>>> I am aware that both version use the same base array, and access it via >>>> different macros, which essentially partition the array based on the >>>> size of the respective struct. The underlying raw array has the same >>>> size for both version. >>> And this is the problem afaics: If a guest has requested its grant table to >>> be sized as a single page, this page can fit twice as many entries for >>> v1 than it can fit for v2. Hence the v1 grant reference pointing at the last >>> entry would point at the last entry in the (not mapped) second page for v2. >> I might understand the code wrong, but a guest would ask to get at most >> N grant frames, and this number cannot be increased afterwards, i.e. the >> field gt->max_grant_frames is written exactly once. Furthermore, the >> void** shared_raw array is allocated and written exactly once with >> sufficient pointers for, namely gt->max_grant_frames many in function >> grant_table_init. Hence, independently of the version being used, at >> least the shared_raw array cannot be used for out-of-bound accesses >> during speculation with my above evaluate_nospec. > I'm afraid I'm still not following: A give number of pages is worth > twice as many grants in v1 than it is in v2. Therefore a v1 grant > reference to a grant entry tracked in the second half of the > first page would cause a speculative access to anywhere in the > second page when wrongly interpreted as a v2 ref. Agreed. So you want me to add another lfence to make sure the wrong interpretation does not lead to other out-of-bound accesses down the speculative window? In my opinion, the v1 vs v2 code does not result in actual out-of-bound accesses, except for the NULL page case below. To make the PV case happy, I will add the evaluate_nospec macro for the v1 vs v2 conditionals in functions with guest controlled ref indexes. > >> That being said, let's assume we have a v1 grant table, and speculation >> uses the v2 accesses. In that case, an existing and zero-initialized >> entry of shared_raw might be used in the first part of the >> shared_entry_v2 macro, and even if that pointer would be non-NULL, the >> page it would point to would have been cleared when growing the grant >> table in function gnttab_grow_table. > Not if the v1 ref is no smaller than half the maximum number of > v1 refs. In that case, if taken as a v2 ref, ->shared_raw[] > would need to be twice as big to cope with the larger index > (resulting from the smaller divisor in shared_entry_v2() > compared to shared_entry_v1()) in order to not be overrun. > > Let's look at an example: gref 256 points into the middle of > the first page when using v1 calculations, but at the start > of the second page when using v2 calculations. Hence, if the > maximum number of grant frames was 1, we'd overrun the > array, consisting of just a single element (256 is valid as a > v1 gref in that case, but just out of bounds as a v2 one). If 256 is a valid gref, then the shared_raw array holds sufficient zero-initialized elements for such an access, even without the division operator that is used in the shared_entry_v*() macros. Hence, no out-of-bound access will happen here. > > Furthermore, even if ->shared_raw[] itself could not be overrun, > an entry of it being NULL could be a problem with PV guests, who > can install a translation for the first page of the address space, > and thus perhaps partly control subsequent speculative execution. I understand the concern. I add the evaluate_nospec as mentioned above. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 1/9] xen/evtchn: block speculative out-of-bound accesses
On 1/31/19 16:05, Jan Beulich wrote: On 29.01.19 at 15:43, wrote: >> --- a/xen/common/event_channel.c >> +++ b/xen/common/event_channel.c >> @@ -365,11 +365,16 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, >> evtchn_port_t port) >> if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) >> return -EINVAL; >> >> + /* >> +* Make sure the guest controlled value virq is bounded even during >> +* speculative execution. >> +*/ >> +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); >> + >> if ( virq_is_global(virq) && (vcpu != 0) ) >> return -EINVAL; >> >> -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || >> - ((v = d->vcpu[vcpu]) == NULL) ) >> +if ( (vcpu < 0) || ((v = domain_vcpu(d, vcpu)) == NULL) ) >> return -ENOENT; > Is there a reason for the less-than-zero check to survive? Yes, domain_vcpu uses unsigned integers, and I want to return the proper error code, in case somebody comes with a vcpu number that would overflow into the valid range. > >> @@ -418,8 +423,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind) >> intport, vcpu = bind->vcpu; >> long rc = 0; >> >> -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || >> - (d->vcpu[vcpu] == NULL) ) >> +if ( (vcpu < 0) || domain_vcpu(d, vcpu) == NULL ) >> return -ENOENT; > I'm not sure about this one: We're not after the struct vcpu pointer > here. Right now subsequent code looks fine, but what if the actual > "vcpu" local variable was used again in a risky way further down? I > think here and elsewhere it would be best to eliminate that local > variable, and use v->vcpu_id only for subsequent consumers (or > alternatively latch the local variable's value only _after_ the call to > domain_vcpu(), which might be better especially in cases like). I agree with getting rid of using the local variable. As discussed elsewhere, updating such a variable might not fix the problem. However, in this commit I want to avoid speculative out-of-bound accesses using a guest controlled variable (vcpu). Hence, I add protection to the locations where it is used as index. As the domain_vcpu function comes with protection, I prefer this function over explicitly using array_index_nospec, if possible. > >> @@ -969,8 +980,8 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int >> vcpu_id) >> unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); >> chn->notify_vcpu_id = vcpu_id; >> pirq_set_affinity(d, chn->u.pirq.irq, >> - cpumask_of(d->vcpu[vcpu_id]->processor)); >> -link_pirq_port(port, chn, d->vcpu[vcpu_id]); >> + cpumask_of(domain_vcpu(d, vcpu_id)->processor)); >> +link_pirq_port(port, chn, domain_vcpu(d, vcpu_id)); > ... this one, where you then wouldn't need to alter code other than > that actually checking the vCPU ID. Instead, I will introduce a struct vcpu variable, assign it in the first check of the function, and continue using this variable instead of performing array accesses again in this function. > >> @@ -516,14 +517,22 @@ int evtchn_fifo_init_control(struct >> evtchn_init_control >> *init_control) >> gfn = init_control->control_gfn; >> offset = init_control->offset; >> >> -if ( vcpu_id >= d->max_vcpus || !d->vcpu[vcpu_id] ) >> +if ( !domain_vcpu(d, vcpu_id) ) >> return -ENOENT; >> -v = d->vcpu[vcpu_id]; >> + >> +v = domain_vcpu(d, vcpu_id); > Please don't call the function twice. I will assign the variable as part of the if statement. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 2/9] x86/vioapic: block speculative out-of-bound accesses
On 1/31/19 17:05, Jan Beulich wrote: >>>> On 29.01.19 at 15:43, wrote: >> When interacting with io apic, a guest can specify values that are used >> as index to structures, and whose values are not compared against >> upper bounds to prevent speculative out-of-bound accesses. This change >> prevents these speculative accesses. >> >> Furthermore, two variables are initialized and the compiler is asked to >> not optimized these initializations, as the uninitialized, potentially >> guest controlled, variables might be used in a speculative out-of-bound >> access. As the two problematic variables are both used in the common >> function gsi_vioapic, the mitigation is implemented there. Currently, >> the problematic callers are the functions vioapic_irq_positive_edge and >> vioapic_get_trigger_mode. > I would have wished for you to say why the other two are _not_ > a problem. Afaict in both cases the functions only ever get > internal data passed. > > Then again I'm not convinced it's worth taking the risk that a > problematic caller gets added down the road. How about you add > initializers everywhere, clarifying in the description that it's "just > in case" for the two currently safe ones? I will add the other initialization and update the commit message. > >> This commit is part of the SpectreV1+L1TF mitigation patch series. >> >> Signed-off-by: Norbert Manthey >> >> --- > Btw., could you please get used to the habit of adding a brief > summary of changes for at least the most recent version here, > which aids review quite a bit? I will start to do this with the next version. > >> @@ -212,7 +220,15 @@ static void vioapic_write_redirent( >> struct hvm_irq *hvm_irq = hvm_domain_irq(d); >> union vioapic_redir_entry *pent, ent; >> int unmasked = 0; >> -unsigned int gsi = vioapic->base_gsi + idx; >> +unsigned int gsi; >> + >> +/* Callers of this function should make sure idx is bounded >> appropriately*/ > Missing blank at the end of the comment (which, if this was the > only open point, would be easy enough to adjust while committing). Will fix. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 3/9] x86/hvm: block speculative out-of-bound accesses
On 1/31/19 17:19, Jan Beulich wrote: On 29.01.19 at 15:43, wrote: >> There are multiple arrays in the HVM interface that are accessed >> with indices that are provided by the guest. To avoid speculative >> out-of-bound accesses, we use the array_index_nospec macro. >> >> When blocking speculative out-of-bound accesses, we can classify arrays >> into dynamic arrays and static arrays. Where the former are allocated >> during run time, the size of the latter is known during compile time. >> On static arrays, compiler might be able to block speculative accesses >> in the future. >> >> We introduce another macro that uses the ARRAY_SIZE macro to block >> speculative accesses. For arrays that are statically accessed, this macro >> can be used instead of the usual macro. Using this macro results in more >> readable code, and allows to modify the way this case is handled in a >> single place. > I think this paragraph is stale now. I will drop the paragraph. > >> @@ -3453,7 +3456,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t >> *msr_content) >> if ( (index / 2) >= >> MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT) ) >> goto gp_fault; >> -*msr_content = var_range_base[index]; >> +*msr_content = var_range_base[array_index_nospec(index, >> + MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, >> MTRRcap_VCNT))]; >> break; > I clearly should have noticed this earlier on - the bound passed into > the macro is not in line with the if() condition. I think you're funneling > half the number of entries into array slot 0. I will fix the bound that's used in the array_index_nospec macro. > >> @@ -4104,6 +4108,12 @@ static int hvmop_set_param( >> if ( a.index >= HVM_NR_PARAMS ) >> return -EINVAL; >> >> +/* >> + * Make sure the guest controlled value a.index is bounded even during >> + * speculative execution. >> + */ >> +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); > I'd like to come back to this model of updating local variables: > Is this really safe to do? If such a variable lives in memory > (which here it quite likely does), does speculation always > recognize the update to the value? Wouldn't it rather read > what's currently in that slot, and re-do the calculation in case > a subsequent write happens? (I know I did suggest doing so > earlier on, so I apologize if this results in you having to go > back to some earlier used model.) I will reply to this on the thread that evolved. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 3/9] x86/hvm: block speculative out-of-bound accesses
On 2/1/19 09:23, Jan Beulich wrote: On 31.01.19 at 21:02, wrote: >> On 31/01/2019 16:19, Jan Beulich wrote: @@ -4104,6 +4108,12 @@ static int hvmop_set_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); >>> I'd like to come back to this model of updating local variables: >>> Is this really safe to do? If such a variable lives in memory >>> (which here it quite likely does), does speculation always >>> recognize the update to the value? Wouldn't it rather read >>> what's currently in that slot, and re-do the calculation in case >>> a subsequent write happens? (I know I did suggest doing so >>> earlier on, so I apologize if this results in you having to go >>> back to some earlier used model.) >> I'm afraid that is a very complicated set of questions to answer. >> >> The processor needs to track write=>read dependencies to avoid wasting a >> large quantity of time doing erroneous speculation, therefore it does. >> Pending writes which have happened under speculation are forwarded to >> dependant instructions. >> >> This behaviour is what gives rise to Bounds Check Bypass Store - a half >> spectre-v1 gadget but with a store rather than a write. You can e.g. >> speculatively modify the return address on the stack, and hijack >> speculation to an attacker controlled address for a brief period of >> time. If the speculation window is long enough, the processor first >> follows the RSB/RAS (correctly), then later notices that the real value >> on the stack was different, discards the speculation from the RSB/RAS >> and uses the attacker controlled value instead, then eventually notices >> that all of this was bogus and rewinds back to the original branch. >> >> Another corner case is Speculative Store Bypass, where memory >> disambiguation speculation can miss the fact that there is a real >> write=>read dependency, and cause speculation using the older stale >> value for a period of time. >> >> >> As to overall safety, array_index_nospec() only works as intended when >> the index remains in a register between the cmp/sbb which bounds it >> under speculation, and the array access. There is no way to guarantee >> this property, as the compiler can spill any value if it thinks it needs to. >> >> The general safety of the construct relies on the fact that an >> optimising compiler will do its very best to avoid spilling variable to >> the stack. > "Its very best" may be extremely limited with enough variables. > Even if we were to annotate them with the "register" keyword, > that still wouldn't help, as that's only a hint. We simply have no > way to control which variables the compiler wants to hold in > registers. I dare to guess that in the particular example above > it's rather unlikely to be put in a register. > > In any event it looks like you support my suspicion that earlier > comments of mine may have driven things into a less safe > direction, and we instead need to accept the more heavy > clutter of scattering around array_{access,index}_nospec() > at all use sites instead of latching the result of > array_index_nospec() into whatever shape of local variable. > > Which raises another interesting question: Can't CSE and > alike get in the way here? OPTIMIZER_HIDE_VAR() expands > to a non-volatile asm() (and as per remarks elsewhere I'm > unconvinced adding volatile would actually help), so the > compiler recognizing the same multiple times (perhaps in a > loop) could make it decide to calculate the thing just once. > array_index_mask_nospec() in effect is a pure (and actually > even const) function, and the lack of a respective attribute > doesn't make the compiler not treat it as such if it recognized > the fact. (In effect what I had asked Norbert to do to limit > the clutter was just CSE which the compiler may or may not > have recognized anyway. IOW I'm not convinced going back > would actually buy us anything.) So this means I should stick to the current approach and continue updating variables after their bound check with an array_index_nospec call, correct? Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH L1TF MDS GT v3 1/2] common/grant_table: harden bound accesses
On 7/18/19 14:09, Jan Beulich wrote: > On 12.07.2019 10:51, Norbert Manthey wrote: >> Guests can issue grant table operations and provide guest controlled >> data to them. This data is used as index for memory loads after bound >> checks have been done. To avoid speculative out-of-bound accesses, we >> use the array_index_nospec macro where applicable, or the macro >> block_speculation. Note, the block_speculation macro is used on all >> path in shared_entry_header and nr_grant_entries. This way, after a >> call to such a function, all bound checks that happened before become >> architectural visible, so that no additional protection is required >> for corresponding array accesses. As the way we introduce an lfence >> instruction might allow the compiler to reload certain values from >> memory multiple times, we try to avoid speculatively continuing >> execution with stale register data by moving relevant data into >> function local variables. >> >> Speculative execution is not blocked in case one of the following >> properties is true: >> - path cannot be triggered by the guest >> - path does not return to the guest >> - path does not result in an out-of-bound access >> - path cannot be executed repeatedly > I notice this sentence is still there without modification. If you > don't want to drop it (and then perhaps make changes to a few more > paths), can we at least settle on a less firm statement like "path > is unlikely to be executed repeatedly in rapid succession"? I will drop the last condition, and post an update one more time. For code path that can be executed once, e.g. during initialization, no need for mitigation might be obvious enough, and for other path' one has to decide whether a guest can actually trigger them often enough so that a fix is required. Best, Norbert ... snip ... Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH L1TF MDS GT v4 0/2] grant table protection
Dear all, This patch series attempts to mitigate the issue that have been raised in the XSA-289 (https://xenbits.xen.org/xsa/advisory-289.html). To block speculative execution on Intel hardware, an lfence instruction is required to make sure that selected checks are not bypassed. Speculative out-of-bound accesses can be prevented by using the array_index_nospec macro. This series picks up the last remaining commit of my previous L1TF series, and splits it into several commits to help targetting the discussion better. The actual change is to protect grant-table code. This is part of the speculative hardening effort. Best, Norbert Norbert Manthey (2): common/grant_table: harden bound accesses common/grant_table: harden version dependent accesses xen/common/grant_table.c | 107 +-- 1 file changed, 75 insertions(+), 32 deletions(-) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH L1TF MDS GT v4 1/2] common/grant_table: harden bound accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is used as index for memory loads after bound checks have been done. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable, or the macro block_speculation. Note, the block_speculation macro is used on all path in shared_entry_header and nr_grant_entries. This way, after a call to such a function, all bound checks that happened before become architectural visible, so that no additional protection is required for corresponding array accesses. As the way we introduce an lfence instruction might allow the compiler to reload certain values from memory multiple times, we try to avoid speculatively continuing execution with stale register data by moving relevant data into function local variables. Speculative execution is not blocked in case one of the following properties is true: - path cannot be triggered by the guest - path does not return to the guest - path does not result in an out-of-bound access Only the combination of the above properties allows to actually leak continuous chunks of memory. Therefore, we only add the penalty of protective mechanisms in case a potential speculative out-of-bound access matches all the above properties. This commit addresses only out-of-bound accesses whose index is directly controlled by the guest, and the index is checked before. Potential out-of-bound accesses that are caused by speculatively evaluating the version of the current table are not addressed in this commit. Hence, speculative out-of-bound accesses might still be possible, for example in gnttab_get_status_frame_mfn, when calling gnttab_grow_table, the assertion that the grant table version equals two might not hold under speculation. This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey Reviewed-by: Jan Beulich --- Notes: v3: Drop condition to not fix defects in commit message. Copy in reviewed-by. xen/common/grant_table.c | 72 ++-- 1 file changed, 51 insertions(+), 21 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -911,6 +911,7 @@ map_grant_ref( { struct domain *ld, *rd, *owner = NULL; struct grant_table *lgt, *rgt; +grant_ref_t ref; struct vcpu *led; grant_handle_t handle; mfn_t mfn; @@ -974,13 +975,15 @@ map_grant_ref( grant_read_lock(rgt); /* Bounds check on the grant ref */ -if ( unlikely(op->ref >= nr_grant_entries(rgt))) +ref = op->ref; +if ( unlikely(ref >= nr_grant_entries(rgt))) PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", - op->ref, rgt->domain->domain_id); + ref, rgt->domain->domain_id); -act = active_entry_acquire(rgt, op->ref); -shah = shared_entry_header(rgt, op->ref); -status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); +/* This call also ensures the above check cannot be passed speculatively */ +shah = shared_entry_header(rgt, ref); +status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, ref); +act = active_entry_acquire(rgt, ref); /* If already pinned, check the active domid and avoid refcnt overflow. */ if ( act->pin && @@ -1003,8 +1006,8 @@ map_grant_ref( if ( !act->pin ) { unsigned long gfn = rgt->gt_version == 1 ? -shared_entry_v1(rgt, op->ref).frame : -shared_entry_v2(rgt, op->ref).full_page.frame; +shared_entry_v1(rgt, ref).frame : +shared_entry_v2(rgt, ref).full_page.frame; rc = get_paged_frame(gfn, &mfn, &pg, op->flags & GNTMAP_readonly, rd); @@ -1017,7 +1020,7 @@ map_grant_ref( act->length = PAGE_SIZE; act->is_sub_page = false; act->trans_domain = rd; -act->trans_gref = op->ref; +act->trans_gref = ref; } } @@ -1268,6 +1271,7 @@ unmap_common( domid_t dom; struct domain *ld, *rd; struct grant_table *lgt, *rgt; +grant_ref_t ref; struct active_grant_entry *act; s16 rc = 0; struct grant_mapping *map; @@ -1321,6 +1325,7 @@ unmap_common( op->rd = rd; op->ref = map->ref; +ref = map->ref; /* * We can't assume there was no racing unmap for this maptrack entry, @@ -1330,7 +1335,7 @@ unmap_common( * invalid lock. */ smp_rmb(); -if ( unlikely(op->ref >= nr_grant_entries(rgt)) ) +if ( unlikely(ref >= nr_grant_entries(rgt)) )
[Xen-devel] [PATCH L1TF MDS GT v4 2/2] common/grant_table: harden version dependent accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is used as index for memory loads after bound checks have been done. Depending on the grant table version, the size of elements in containers differ. As the base data structure is a page, the number of elements per page also differs. Consequently, bound checks are version dependent, so that speculative execution can happen in several stages, the bound check as well as the version check. This commit mitigates cases where out-of-bound accesses could happen due to the version comparison. In cases, where no different memory locations are accessed on the code path that follow an if statement, no protection is required. No different memory locations are accessed in the following functions after a version check: * gnttab_setup_table: only calculated numbersi are used, and then function gnttab_grow_table is called, which is version protected * gnttab_transfer: the case that depends on the version check just gets into copying a page or not * acquire_grant_for_copy: the not fixed comparison is on the abort path and does not access other structures, and on the else branch accesses only structures that have been validated before * gnttab_set_version: all accessible data is allocated for both versions Furthermore, the functions gnttab_populate_status_frames and gnttab_unpopulate_status_frames received a block_speculation macro. Hence, this code will only be executed once the correct version is visible in the architectural state. * gnttab_release_mappings: this function is called only during domain destruction and control is not returned to the guest * mem_sharing_gref_to_gfn: speculation will be stoped by the second if statement, as that places a barrier on any path to be executed. * gnttab_get_status_frame_mfn: no version dependent check, because all accesses, except the gt->status[idx], do not perform index-based accesses, or speculative out-of-bound accesses in the gnttab_grow_table function call. * gnttab_usage_print: cannot be triggered by the guest This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey Reviewed-by: Jan Beulich --- xen/common/grant_table.c | 37 + 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -827,7 +827,7 @@ static int _set_status(const grant_entry_header_t *shah, domid_t ldomid) { -if ( rgt_version == 1 ) +if ( evaluate_nospec(rgt_version == 1) ) return _set_status_v1(shah, rd, act, readonly, mapflag, ldomid); else return _set_status_v2(shah, status, rd, act, readonly, mapflag, ldomid); @@ -982,9 +982,12 @@ map_grant_ref( /* This call also ensures the above check cannot be passed speculatively */ shah = shared_entry_header(rgt, ref); -status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, ref); act = active_entry_acquire(rgt, ref); +/* Make sure we do not access memory speculatively */ +status = evaluate_nospec(rgt->gt_version == 1) ? &shah->flags + : &status_entry(rgt, ref); + /* If already pinned, check the active domid and avoid refcnt overflow. */ if ( act->pin && ((act->domid != ld->domain_id) || @@ -1005,7 +1008,7 @@ map_grant_ref( if ( !act->pin ) { -unsigned long gfn = rgt->gt_version == 1 ? +unsigned long gfn = evaluate_nospec(rgt->gt_version == 1) ? shared_entry_v1(rgt, ref).frame : shared_entry_v2(rgt, ref).full_page.frame; @@ -1461,7 +1464,7 @@ unmap_common_complete(struct gnttab_unmap_common *op) act = active_entry_acquire(rgt, op->ref); sha = shared_entry_header(rgt, op->ref); -if ( rgt->gt_version == 1 ) +if ( evaluate_nospec(rgt->gt_version == 1) ) status = &sha->flags; else status = &status_entry(rgt, op->ref); @@ -1657,6 +1660,10 @@ gnttab_populate_status_frames(struct domain *d, struct grant_table *gt, unsigned req_status_frames; req_status_frames = grant_to_status_frames(req_nr_frames); + +/* Make sure, prior version checks are architectural visible */ +block_speculation(); + for ( i = nr_status_frames(gt); i < req_status_frames; i++ ) { if ( (gt->status[i] = alloc_xenheap_page()) == NULL ) @@ -1685,6 +1692,9 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt) { unsigned int i; +/* Make sure, prior version checks are architectural visible */ +block_speculation(); + for ( i =
Re: [Xen-devel] [PATCH L1TF MDS GT v4 1/2] common/grant_table: harden bound accesses
On 7/30/19 15:38, Jan Beulich wrote: > On 30.07.2019 15:15, Norbert Manthey wrote: >> Guests can issue grant table operations and provide guest controlled >> data to them. This data is used as index for memory loads after bound >> checks have been done. To avoid speculative out-of-bound accesses, we >> use the array_index_nospec macro where applicable, or the macro >> block_speculation. Note, the block_speculation macro is used on all >> path in shared_entry_header and nr_grant_entries. This way, after a >> call to such a function, all bound checks that happened before become >> architectural visible, so that no additional protection is required >> for corresponding array accesses. As the way we introduce an lfence >> instruction might allow the compiler to reload certain values from >> memory multiple times, we try to avoid speculatively continuing >> execution with stale register data by moving relevant data into >> function local variables. >> >> Speculative execution is not blocked in case one of the following >> properties is true: >> - path cannot be triggered by the guest >> - path does not return to the guest >> - path does not result in an out-of-bound access >> >> Only the combination of the above properties allows to actually leak >> continuous chunks of memory. Therefore, we only add the penalty of >> protective mechanisms in case a potential speculative out-of-bound >> access matches all the above properties. >> >> This commit addresses only out-of-bound accesses whose index is >> directly controlled by the guest, and the index is checked before. >> Potential out-of-bound accesses that are caused by speculatively >> evaluating the version of the current table are not addressed in this >> commit. Hence, speculative out-of-bound accesses might still be >> possible, for example in gnttab_get_status_frame_mfn, when calling >> gnttab_grow_table, the assertion that the grant table version equals >> two might not hold under speculation. >> >> This is part of the speculative hardening effort. >> >> Signed-off-by: Norbert Manthey >> Reviewed-by: Jan Beulich >> --- >> >> Notes: >>v3: Drop condition to not fix defects in commit message. >> Copy in reviewed-by. > According to this (which aiui means v4) there are no code changes > compared to v3. At the risk of annoying you, this doesn't fit well > with me having said "and then perhaps make changes to a few more > paths" alongside the option of doing this removal in reply to v3. > After all you've now dropped a condition from what is covered by > "Only the combination of ...", and hence there's a wider set of > paths that would need to be fixed. It was for this reason that as > the other alternative I did suggest to simply weaken the wording > of the item you've now dropped. IOW I'm afraid my R-b is not > applicable to v4. I see, and am sorry for the misunderstanding. I am fine with adding the 4th condition in a weakened form (essentially modifying the commit message to the form you suggested). I wonder whether the summary when to fix a potential speculative out-of-bound access should actually be documented somewhere else than in the commit message of this (more or less random) commit. Best, Norbert > > Jan Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH svm] svm: fix p2mt type
A pointer mismatch has been reported when compiling with the compiler goto-gcc of the bounded model checker CBMC. Fixes: 9a779e4f (Implement SVM specific part for Nested Virtualization) Signed-off-by: Norbert Manthey --- xen/arch/x86/hvm/svm/svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1794,7 +1794,7 @@ static void svm_do_nested_pgfault(struct vcpu *v, uint64_t gpa; uint64_t mfn; uint32_t qualification; -uint32_t p2mt; +p2m_type_t p2mt; } _d; p2m = p2m_get_p2m(v); -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 1/9] xen/evtchn: block speculative out-of-bound accesses
On 2/1/19 15:08, Jan Beulich wrote: On 01.02.19 at 14:45, wrote: >> On 1/31/19 16:05, Jan Beulich wrote: >> On 29.01.19 at 15:43, wrote: --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -365,11 +365,16 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port) if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) return -EINVAL; + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); + if ( virq_is_global(virq) && (vcpu != 0) ) return -EINVAL; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) +if ( (vcpu < 0) || ((v = domain_vcpu(d, vcpu)) == NULL) ) return -ENOENT; >>> Is there a reason for the less-than-zero check to survive? >> Yes, domain_vcpu uses unsigned integers, and I want to return the proper >> error code, in case somebody comes with a vcpu number that would >> overflow into the valid range. > I don't see how an overflow into the valid range could occur: Negative > numbers, when converted to unsigned, become large positive numbers. > If anything in this regard was to change here, then the type of _both_ > local variable (which get initialized from a field of type uint32_t). True, I will drop the < 0 check as well. > @@ -418,8 +423,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind) intport, vcpu = bind->vcpu; long rc = 0; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - (d->vcpu[vcpu] == NULL) ) +if ( (vcpu < 0) || domain_vcpu(d, vcpu) == NULL ) return -ENOENT; >>> I'm not sure about this one: We're not after the struct vcpu pointer >>> here. Right now subsequent code looks fine, but what if the actual >>> "vcpu" local variable was used again in a risky way further down? I >>> think here and elsewhere it would be best to eliminate that local >>> variable, and use v->vcpu_id only for subsequent consumers (or >>> alternatively latch the local variable's value only _after_ the call to >>> domain_vcpu(), which might be better especially in cases like). >> I agree with getting rid of using the local variable. As discussed >> elsewhere, updating such a variable might not fix the problem. However, >> in this commit I want to avoid speculative out-of-bound accesses using a >> guest controlled variable (vcpu). Hence, I add protection to the >> locations where it is used as index. As the domain_vcpu function comes >> with protection, I prefer this function over explicitly using >> array_index_nospec, if possible. > But domain_vcpu() does not alter an out of bounds value passed > into it in any way, i.e. subsequent array accesses using that value > would still be an issue. IOW in the case here what you do is > sufficient because there's no array access in the first place. It's > debatable whether any change is needed at all here (there would > need to be a speculation path which could observe the result of > the speculative write into chn->notify_vcpu_id). In this method, the access to d->vcpu[vcpu] has to be protected. That happens by using the domain_vcpu function. The rest of this function does not read the vcpu variable, as you mentioned. Therefore, I would keep this version of the fix, and also drop the sign check as above. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 4/9] spec: add l1tf-barrier
On 1/31/19 17:35, Jan Beulich wrote: On 29.01.19 at 15:43, wrote: >> @@ -1942,6 +1942,12 @@ Irrespective of Xen's setting, the feature is >> virtualised for HVM guests to >> use. By default, Xen will enable this mitigation on hardware believed to >> be >> vulnerable to L1TF. >> >> +On hardware vulnerable to L1TF, the `l1tf-barrier=` option can be used to >> force >> +or prevent Xen from protecting evaluations inside the hypervisor with a >> barrier >> +instruction to not load potentially secret information into L1 cache. By >> +default, Xen will enable this mitigation on hardware believed to be >> vulnerable >> +to L1TF. > ... and having SMT enabled, since aiui this is a non-issue without. In case flushing the L1 cache is not enabled, that is still an issue, because the transition guest -> hypervisor -> guest would allow to retrieve hypervisor data from the cache still. Do you want me to extend the logic to consider L1 cache flushing as well? > >> --- a/xen/arch/x86/spec_ctrl.c >> +++ b/xen/arch/x86/spec_ctrl.c >> @@ -21,6 +21,7 @@ >> #include >> #include >> >> +#include > asm/cpuid.h please Will fix. > >> @@ -100,6 +102,7 @@ static int __init parse_spec_ctrl(const char *s) >> opt_ibpb = false; >> opt_ssbd = false; >> opt_l1d_flush = 0; >> +opt_l1tf_barrier = 0; >> } >> else if ( val > 0 ) >> rc = -EINVAL; > Is this really something we want "spec-ctrl=no-xen" to disable? > It would seem to me that this should be restricted to "spec-ctrl=no". I have no strong opinion here. If you ask me to move it somewhere else, I will do that. I just want to make sure it's disable in case speculation mitigations should be disabled. > >> @@ -843,6 +849,14 @@ void __init init_speculation_mitigations(void) >> opt_l1d_flush = cpu_has_bug_l1tf && !(caps & ARCH_CAPS_SKIP_L1DFL); >> >> /* >> + * By default, enable L1TF_VULN on L1TF-vulnerable hardware >> + */ > This ought to be a single line comment. Will fix. > >> +if ( opt_l1tf_barrier == -1 ) >> +opt_l1tf_barrier = cpu_has_bug_l1tf; > At the very least opt_smt should be taken into account here. But > I guess this setting of the default may need to be deferred > further, until the topology of the system is known (there may > not be any hyperthreads after all). Again, cache flushing also has to be considered. So, I would like to keep it like this for now. > >> +if ( cpu_has_bug_l1tf && opt_l1tf_barrier > 0) >> +setup_force_cpu_cap(X86_FEATURE_SC_L1TF_VULN); > Why the left side of the &&? IMHO, the CPU flag L1TF should only be set when the CPU is reported to be vulnerable, even if the command line wants to enforce mitigations. > >> +/* >> * We do not disable HT by default on affected hardware. >> * >> * Firstly, if the user intends to use exclusively PV, or HVM shadow > Furthermore, as per the comment and logic here and below a > !HVM configuration ought to be safe too, unless "pv-l1tf=" was > used (in which case we defer to the admin anyway), so it's > questionable whether the whole logic should be there in the > first place in this case. This would then in particular keep all > of this out for the PV shim. For the PV shim, I could add pv-shim to my check before enabling the CPU flag. >> --- a/xen/include/asm-x86/cpufeatures.h >> +++ b/xen/include/asm-x86/cpufeatures.h >> @@ -31,3 +31,4 @@ XEN_CPUFEATURE(SC_RSB_PV, (FSCAPINTS+0)*32+18) /* >> RSB overwrite needed for >> XEN_CPUFEATURE(SC_RSB_HVM, (FSCAPINTS+0)*32+19) /* RSB overwrite >> needed for HVM */ >> XEN_CPUFEATURE(SC_MSR_IDLE, (FSCAPINTS+0)*32+21) /* (SC_MSR_PV || >> SC_MSR_HVM) && default_xen_spec_ctrl */ >> XEN_CPUFEATURE(XEN_LBR, (FSCAPINTS+0)*32+22) /* Xen uses >> MSR_DEBUGCTL.LBR */ >> +XEN_CPUFEATURE(SC_L1TF_VULN,(FSCAPINTS+0)*32+23) /* L1TF protection >> required */ > Would you mind using one of the unused slots above first? I will pick an unused slot. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 5/9] nospec: introduce evaluate_nospec
On 1/31/19 18:05, Jan Beulich wrote: >>>> On 29.01.19 at 15:43, wrote: >> Since the L1TF vulnerability of Intel CPUs, loading hypervisor data into >> L1 cache is problemetic, because when hyperthreading is used as well, a >> guest running on the sibling core can leak this potentially secret data. >> >> To prevent these speculative accesses, we block speculation after >> accessing the domain property field by adding lfence instructions. This >> way, the CPU continues executing and loading data only once the condition >> is actually evaluated. >> >> As the macros are typically used in if statements, the lfence has to come >> in a compatible way. Therefore, a function that returns true after an >> lfence instruction is introduced. To protect both branches after a >> conditional, an lfence instruction has to be added for the two branches. >> To be able to block speculation after several evalauations, the generic >> barrier macro block_speculation is also introduced. >> >> As the L1TF vulnerability is only present on the x86 architecture, the >> macros will not use the lfence instruction on other architectures and the >> protection is disabled during compilation. By default, the lfence >> instruction is not present either. Only when a L1TF vulnerable platform >> is detected, the lfence instruction is patched in via alterantive patching. >> >> Introducing the lfence instructions catches a lot of potential leaks with >> a simple unintrusive code change. During performance testing, we did not >> notice performance effects. >> >> Signed-off-by: Norbert Manthey > Looks okay to me now, but I'm going to wait with giving an ack > until perhaps others have given comments, as some of this > was not entirely uncontroversial. There are a few cosmetic > issues left though: > >> @@ -64,6 +65,33 @@ static inline unsigned long >> array_index_mask_nospec(unsigned long index, >> #define array_access_nospec(array, index) \ >> (array)[array_index_nospec(index, ARRAY_SIZE(array))] >> >> +/* >> + * Allow to insert a read memory barrier into conditionals >> + */ > Here and below, please make single line comments really be > single lines. Will fix. > >> +#if defined(CONFIG_X86) && defined(CONFIG_HVM) >> +static inline bool arch_barrier_nospec_true(void) { > The brace belongs on its own line. Will fix. > >> +alternative("", "lfence", X86_FEATURE_SC_L1TF_VULN); >> +return true; >> +} >> +#else >> +static inline bool arch_barrier_nospec_true(void) { return true; } > This could be avoided if you placed the #if inside the > function body. I will move the #if inside. > >> +#endif >> + >> +/* >> + * Allow to protect evaluation of conditional with respect to speculation >> on x86 >> + */ >> +#ifndef CONFIG_X86 > Why is this conditional different from the one above? You are right, the two defines should be equal. > >> +#define evaluate_nospec(condition) (condition) >> +#else >> +#define evaluate_nospec(condition) \ >> +((condition) ? arch_barrier_nospec_true() : !arch_barrier_nospec_true()) >> +#endif >> + >> +/* >> + * Allow to block speculative execution in generic code >> + */ >> +#define block_speculation() (void)arch_barrier_nospec_true() > Missing an outer pair of parentheses. Will add them. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH svm v2] svm: fix xentrace p2mt access
A pointer mismatch has been reported when compiling with the compiler goto-gcc of the bounded model checker CBMC. To keep trace entry size independent of the compiler implementation, use the available p2mt variable for the access, and update the trace record independenly. Fixes: 9a779e4f (Implement SVM specific part for Nested Virtualization) Signed-off-by: Norbert Manthey --- Notes: v2: keep type, use local variable in function call and xen/arch/x86/hvm/svm/svm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1800,8 +1800,9 @@ static void svm_do_nested_pgfault(struct vcpu *v, p2m = p2m_get_p2m(v); _d.gpa = gpa; _d.qualification = 0; -mfn = __get_gfn_type_access(p2m, gfn, &_d.p2mt, &p2ma, 0, NULL, 0); +mfn = __get_gfn_type_access(p2m, gfn, &p2mt, &p2ma, 0, NULL, 0); _d.mfn = mfn_x(mfn); +_d.p2mt = p2mt; __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); } -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 4/9] spec: add l1tf-barrier
On 2/5/19 15:43, Jan Beulich wrote: On 05.02.19 at 15:23, wrote: >> On 1/31/19 17:35, Jan Beulich wrote: >> On 29.01.19 at 15:43, wrote: @@ -1942,6 +1942,12 @@ Irrespective of Xen's setting, the feature is >> virtualised for HVM guests to use. By default, Xen will enable this mitigation on hardware believed to be vulnerable to L1TF. +On hardware vulnerable to L1TF, the `l1tf-barrier=` option can be used to force +or prevent Xen from protecting evaluations inside the hypervisor with a barrier +instruction to not load potentially secret information into L1 cache. By +default, Xen will enable this mitigation on hardware believed to be vulnerable +to L1TF. >>> ... and having SMT enabled, since aiui this is a non-issue without. >> In case flushing the L1 cache is not enabled, that is still an issue, >> because the transition guest -> hypervisor -> guest would allow to >> retrieve hypervisor data from the cache still. Do you want me to extend >> the logic to consider L1 cache flushing as well? > Well, I wouldn't be overly concerned of people disabling it from the > command line, but being kind to people without updated microcode > is perhaps a good idea. I will extend the commit message to state that this the CPU flag is set automatically independently of SMT and cache flushing. > @@ -100,6 +102,7 @@ static int __init parse_spec_ctrl(const char *s) opt_ibpb = false; opt_ssbd = false; opt_l1d_flush = 0; +opt_l1tf_barrier = 0; } else if ( val > 0 ) rc = -EINVAL; >>> Is this really something we want "spec-ctrl=no-xen" to disable? >>> It would seem to me that this should be restricted to "spec-ctrl=no". >> I have no strong opinion here. If you ask me to move it somewhere else, >> I will do that. I just want to make sure it's disable in case >> speculation mitigations should be disabled. > Unless anyone else voices a different opinion, I'd like to see it > moved as suggested. I will move the change above the disable_common label. @@ -843,6 +849,14 @@ void __init init_speculation_mitigations(void) opt_l1d_flush = cpu_has_bug_l1tf && !(caps & ARCH_CAPS_SKIP_L1DFL); /* + * By default, enable L1TF_VULN on L1TF-vulnerable hardware + */ >>> This ought to be a single line comment. >> Will fix. +if ( opt_l1tf_barrier == -1 ) +opt_l1tf_barrier = cpu_has_bug_l1tf; >>> At the very least opt_smt should be taken into account here. But >>> I guess this setting of the default may need to be deferred >>> further, until the topology of the system is known (there may >>> not be any hyperthreads after all). >> Again, cache flushing also has to be considered. So, I would like to >> keep it like this for now. > With the "for now" aspect properly explained in the description, > I guess that would be fine as a first step. I will extend the commit message accordingly. > +if ( cpu_has_bug_l1tf && opt_l1tf_barrier > 0) +setup_force_cpu_cap(X86_FEATURE_SC_L1TF_VULN); >>> Why the left side of the &&? >> IMHO, the CPU flag L1TF should only be set when the CPU is reported to >> be vulnerable, even if the command line wants to enforce mitigations. > What's the command line option good for if it doesn't trigger > patching in of the LFENCEs? Command line options exist, among > other purposes, to aid mitigating flaws in our determination of > what is a vulnerable platform. I will remove the extra conditional and enable patching based on the command line only. > +/* * We do not disable HT by default on affected hardware. * * Firstly, if the user intends to use exclusively PV, or HVM shadow >>> Furthermore, as per the comment and logic here and below a >>> !HVM configuration ought to be safe too, unless "pv-l1tf=" was >>> used (in which case we defer to the admin anyway), so it's >>> questionable whether the whole logic should be there in the >>> first place in this case. This would then in particular keep all >>> of this out for the PV shim. >> For the PV shim, I could add pv-shim to my check before enabling the CPU >> flag. > But the PV shim is just a special case. I'd like this code to be > compiled out for all !HVM configurations. The that introduces the evaluate_nospec macro does that already. Based on defined(CONFIG_HVM) lfence patching is disabled there. Do you want me to wrap this command line option into CONFIG_HVM checks as well? Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/x
[Xen-devel] [PATCH Makefile v2] asm: handle comments when creating header file
From: Norbert Manthey In the early steps of compilation, the asm header files are created, such as include/asm-$(TARGET_ARCH)/asm-offsets.h. These files depend on the assembly file arch/$(TARGET_ARCH)/asm-offsets.s, which is generated before. Depending on the used toolchain, there might be comments in the assembly files. Especially the goto-gcc compiler of the bounded model checker CBMC adds comments that start with a '#' symbol at the beginning of the line. This commit adds handling comments in assembler during the creation of the asm header files, especially ignoring lines that start with '#', which indicate comments for both ARM and x86 assembler. The used tool goto-as produces exactly comments of this kind. Signed-off-by: Norbert Manthey Signed-off-by: Michael Tautschnig --- xen/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/Makefile b/xen/Makefile --- a/xen/Makefile +++ b/xen/Makefile @@ -191,7 +191,7 @@ include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s echo "#ifndef __ASM_OFFSETS_H__"; \ echo "#define __ASM_OFFSETS_H__"; \ echo ""; \ - sed -rne "/==>/{s:.*==>(.*)<==.*:\1:; s: [\$$#]: :; p;}"; \ + sed -rne "/^[^#].*==>/{s:.*==>(.*)<==.*:\1:; s: [\$$#]: :; p;}"; \ echo ""; \ echo "#endif") <$< >$@ -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/6/19 15:52, Jan Beulich wrote: On 29.01.19 at 15:43, wrote: >> @@ -963,6 +965,9 @@ map_grant_ref( >> PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", >> op->ref, rgt->domain->domain_id); >> >> +/* Make sure the above check is not bypassed speculatively */ >> +op->ref = array_index_nospec(op->ref, nr_grant_entries(rgt)); >> + >> act = active_entry_acquire(rgt, op->ref); >> shah = shared_entry_header(rgt, op->ref); >> status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, >> op->ref); > Just FTR - this is a case where the change, according to prior > discussion, is pretty unlikely to help at all. The compiler will have > a hard time realizing that it could keep the result in a register past > the active_entry_acquire() invocation, as that - due to the spin > lock acquired there - acts as a compiler barrier. And looking at > generated code (gcc 8.2) confirms that there's a reload from the > stack. I could change this back to a prior version that protects each read operation. >> @@ -2026,6 +2031,9 @@ gnttab_prepare_for_transfer( >> goto fail; >> } >> >> +/* Make sure the above check is not bypassed speculatively */ >> +ref = array_index_nospec(ref, nr_grant_entries(rgt)); >> + >> sha = shared_entry_header(rgt, ref); >> >> scombo.word = *(u32 *)&sha->flags; >> @@ -2223,7 +2231,8 @@ gnttab_transfer( >> okay = gnttab_prepare_for_transfer(e, d, gop.ref); >> spin_lock(&e->page_alloc_lock); >> >> -if ( unlikely(!okay) || unlikely(e->is_dying) ) >> +/* Make sure this check is not bypassed speculatively */ >> +if ( evaluate_nospec(unlikely(!okay) || unlikely(e->is_dying)) ) > I'm still not really happy about this. The comment isn't helpful in > connecting the use of evaluate_nospec() to the problem site > (in the earlier hunk, which I've left in context), and I still don't > understand why the e->is_dying is getting wrapped as well. > Plus it occurs to me now that you're liable to render unlikely() > ineffective here. So how about > > if ( unlikely(evaluate_nospec(!okay)) || unlikely(e->is_dying) ) > > ? I will move the evaluate_nospec closer to the evaluation of okay, and will improve the comment mentioning that the okay variable represents whether the current reference is actually valid. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 6/9] is_control_domain: block speculation
On 2/6/19 16:03, Jan Beulich wrote: On 29.01.19 at 15:43, wrote: >> @@ -908,10 +909,10 @@ void watchdog_domain_destroy(struct domain *d); >> *(that is, this would not be suitable for a driver domain) >> * - There is never a reason to deny the hardware domain access to this >> */ >> -#define is_hardware_domain(_d) ((_d) == hardware_domain) >> +#define is_hardware_domain(_d) evaluate_nospec((_d) == hardware_domain) >> >> /* This check is for functionality specific to a control domain */ >> -#define is_control_domain(_d) ((_d)->is_privileged) >> +#define is_control_domain(_d) evaluate_nospec((_d)->is_privileged) > I'm afraid there's another fly in the ointment here: While looking at > the still questionable grant table change I've started wondering > about constructs like > > case XENMEM_machphys_mapping: > { > struct xen_machphys_mapping mapping = { > .v_start = MACH2PHYS_VIRT_START, > .v_end = MACH2PHYS_VIRT_END, > .max_mfn = MACH2PHYS_NR_ENTRIES - 1 > }; > > if ( !mem_hotplug && is_hardware_domain(current->domain) ) > mapping.max_mfn = max_page - 1; > if ( copy_to_guest(arg, &mapping, 1) ) > return -EFAULT; > > return 0; > } > > Granted the example here could be easily re-arranged, but there > are others where this is less easy or not possible at all. What I'm > trying to get at are constructs where the such-protected > predicates sit on the right side of && or || - afaict (also from > looking at some much simplified code examples) the intended > protection is gone in these cases. I do not follow this. Independently of other conditionals in the if statement, there should be an lfence instruction between the "is_domain_control(...)" evaluation and accessing the max_page variable - in case the code actually protects accessing that variable via that function. I validated this property for the above code snippet in the generated assembly. However, I just noticed another problem: while my initial version just placed the lfence instruction right into the code, not the arch_barrier_nospec_true method is called via callq. I would like to get the instructions to be embedded into the code directly, without the call detour. In case I cannot force the compiler to do that, I would go back to using a fixed lfence statement on all x86 platforms. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 9/9] common/memory: block speculative out-of-bound accesses
On 2/6/19 16:25, Jan Beulich wrote: On 29.01.19 at 15:43, wrote: >> @@ -33,10 +34,10 @@ unsigned long __read_mostly >> pdx_group_valid[BITS_TO_LONGS( >> >> bool __mfn_valid(unsigned long mfn) >> { >> -return likely(mfn < max_page) && >> - likely(!(mfn & pfn_hole_mask)) && >> - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, >> - pdx_group_valid)); >> +return evaluate_nospec(likely(mfn < max_page) && >> + likely(!(mfn & pfn_hole_mask)) && >> + likely(test_bit(pfn_to_pdx(mfn) / >> PDX_GROUP_COUNT, >> + pdx_group_valid))); > Other than in the questionable grant table case, here I agree that > you want to wrap the entire construct. This has an unwanted effect > though: The test_bit() may still be speculated into with an out-of- > bounds mfn. (As mentioned elsewhere, operations on bit arrays are > an open issue altogether.) I therefore think you want to split this into > two: > > bool __mfn_valid(unsigned long mfn) > { > return likely(evaluate_nospec(mfn < max_page)) && >evaluate_nospec(likely(!(mfn & pfn_hole_mask)) && >likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, >pdx_group_valid))); > } I can split the code. However, I wonder whether the test_bit accesses should be protected separately, or actually as part of the test_bit method themselves. Do you have any plans to do that already, because in that case I would not have to modify the code. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 9/9] common/memory: block speculative out-of-bound accesses
On 2/6/19 17:08, Jan Beulich wrote: On 06.02.19 at 16:39, wrote: >> On 2/6/19 16:25, Jan Beulich wrote: >> On 29.01.19 at 15:43, wrote: @@ -33,10 +34,10 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( bool __mfn_valid(unsigned long mfn) { -return likely(mfn < max_page) && - likely(!(mfn & pfn_hole_mask)) && - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, - pdx_group_valid)); +return evaluate_nospec(likely(mfn < max_page) && + likely(!(mfn & pfn_hole_mask)) && + likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, + pdx_group_valid))); >>> Other than in the questionable grant table case, here I agree that >>> you want to wrap the entire construct. This has an unwanted effect >>> though: The test_bit() may still be speculated into with an out-of- >>> bounds mfn. (As mentioned elsewhere, operations on bit arrays are >>> an open issue altogether.) I therefore think you want to split this into >>> two: >>> >>> bool __mfn_valid(unsigned long mfn) >>> { >>> return likely(evaluate_nospec(mfn < max_page)) && >>>evaluate_nospec(likely(!(mfn & pfn_hole_mask)) && >>>likely(test_bit(pfn_to_pdx(mfn) / >>> PDX_GROUP_COUNT, >>>pdx_group_valid))); >>> } >> I can split the code. However, I wonder whether the test_bit accesses >> should be protected separately, or actually as part of the test_bit >> method themselves. Do you have any plans to do that already, because in >> that case I would not have to modify the code. > I don't think we want to do that in test_bit() and friends > themselves, as that would likely produce more unnecessary > changes than necessary ones. Even the change here > already looks to have much bigger impact than would be > wanted, as in the common case MFNs aren't guest controlled. > ISTR that originally you had modified just a single call site, > but I can't seem to find that in my inbox anymore. If that > was the case, what exactly were the criteria upon which > you had chosen this sole caller? I understand that these fixes should not go into test_bit itself. I could add a local array_index_nospec fix for this call, to not introduce another lfence to be passed. I picked the specific caller in the first versions, because there was a direct path from a hypercall where the guest had full control over mfn. Iirc, that call was not spotted by tooling, but by manual analysis. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/6/19 16:53, Jan Beulich wrote: On 06.02.19 at 16:06, wrote: >> On 2/6/19 15:52, Jan Beulich wrote: >> On 29.01.19 at 15:43, wrote: @@ -963,6 +965,9 @@ map_grant_ref( PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", op->ref, rgt->domain->domain_id); +/* Make sure the above check is not bypassed speculatively */ +op->ref = array_index_nospec(op->ref, nr_grant_entries(rgt)); + act = active_entry_acquire(rgt, op->ref); shah = shared_entry_header(rgt, op->ref); status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); >>> Just FTR - this is a case where the change, according to prior >>> discussion, is pretty unlikely to help at all. The compiler will have >>> a hard time realizing that it could keep the result in a register past >>> the active_entry_acquire() invocation, as that - due to the spin >>> lock acquired there - acts as a compiler barrier. And looking at >>> generated code (gcc 8.2) confirms that there's a reload from the >>> stack. >> I could change this back to a prior version that protects each read >> operation. > That or use block_speculation() with a comment explaining why. > > Also - why are there no changes at all to the unmap_grant_ref() / > unmap_and_replace() call paths? Note in particular the security > related comment next to the bounds check of op->ref there. I've > gone through earlier review rounds, but I couldn't find an indication > that this might have been the result of review feedback. You are right. I am not sure whether I had a fix placed there in the beginning. I will replace the first "smp_rmb();" in function unmap_common for the next iteration with the "block_speculation" macro. The other check unlikely(op->ref >= nr_grant_entries(rgt)) can only reach out-of-bounds for the unmap case, in case the map->ref entry has been out-of-bounds beforehand. I did not find an assignment that is not protected by a bound check and a speculation barrier or array_nospec_index. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 6/9] is_control_domain: block speculation
On 2/6/19 17:01, Jan Beulich wrote: On 06.02.19 at 16:36, wrote: >> On 2/6/19 16:03, Jan Beulich wrote: >> On 29.01.19 at 15:43, wrote: @@ -908,10 +909,10 @@ void watchdog_domain_destroy(struct domain *d); *(that is, this would not be suitable for a driver domain) * - There is never a reason to deny the hardware domain access to this */ -#define is_hardware_domain(_d) ((_d) == hardware_domain) +#define is_hardware_domain(_d) evaluate_nospec((_d) == hardware_domain) /* This check is for functionality specific to a control domain */ -#define is_control_domain(_d) ((_d)->is_privileged) +#define is_control_domain(_d) evaluate_nospec((_d)->is_privileged) >>> snip >> I validated this property for the above code snippet in the generated >> assembly. However, I just noticed another problem: while my initial >> version just placed the lfence instruction right into the code, not the >> arch_barrier_nospec_true method is called via callq. I would like to get >> the instructions to be embedded into the code directly, without the call >> detour. In case I cannot force the compiler to do that, I would go back >> to using a fixed lfence statement on all x86 platforms. > I think we had made pretty clear that incurring the overhead even > onto unaffected platforms is not an option. Did you try whether > adding always_inline helps? (I take it that this is another case of > the size-of-asm issue that's being worked on in Linux as well iirc.) I fully understand that just using lfence everywhere is not an option. I just tested the always_inline option, and that works for my binary. I will adapt the function definition accordingly. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/7/19 10:50, Norbert Manthey wrote: > On 2/6/19 16:53, Jan Beulich wrote: >>>>> On 06.02.19 at 16:06, wrote: >>> On 2/6/19 15:52, Jan Beulich wrote: >>>>>>> On 29.01.19 at 15:43, wrote: >>>>> @@ -963,6 +965,9 @@ map_grant_ref( >>>>> PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", >>>>> op->ref, rgt->domain->domain_id); >>>>> >>>>> +/* Make sure the above check is not bypassed speculatively */ >>>>> +op->ref = array_index_nospec(op->ref, nr_grant_entries(rgt)); >>>>> + >>>>> act = active_entry_acquire(rgt, op->ref); >>>>> shah = shared_entry_header(rgt, op->ref); >>>>> status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, >>>>> op->ref); >>>> Just FTR - this is a case where the change, according to prior >>>> discussion, is pretty unlikely to help at all. The compiler will have >>>> a hard time realizing that it could keep the result in a register past >>>> the active_entry_acquire() invocation, as that - due to the spin >>>> lock acquired there - acts as a compiler barrier. And looking at >>>> generated code (gcc 8.2) confirms that there's a reload from the >>>> stack. >>> I could change this back to a prior version that protects each read >>> operation. >> That or use block_speculation() with a comment explaining why. >> >> Also - why are there no changes at all to the unmap_grant_ref() / >> unmap_and_replace() call paths? Note in particular the security >> related comment next to the bounds check of op->ref there. I've >> gone through earlier review rounds, but I couldn't find an indication >> that this might have been the result of review feedback. > You are right. I am not sure whether I had a fix placed there in the > beginning. I will replace the first "smp_rmb();" in function > unmap_common for the next iteration with the "block_speculation" macro. I just checked this one more time. The maptrack_entry macro has been extended with the array_index_nospec macro already, so that the assignment to the map variable is in bound. Therefore, I actually will not introduce the block_speculation macro. > > The other check unlikely(op->ref >= nr_grant_entries(rgt)) can only > reach out-of-bounds for the unmap case, in case the map->ref entry has > been out-of-bounds beforehand. I did not find an assignment that is not > protected by a bound check and a speculation barrier or array_nospec_index. > > Best, > Norbert > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v5 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/7/19 15:00, Jan Beulich wrote: >>>> On 07.02.19 at 11:20, wrote: >> On 2/7/19 10:50, Norbert Manthey wrote: >>> On 2/6/19 16:53, Jan Beulich wrote: >>>>>>> On 06.02.19 at 16:06, wrote: >>>>> On 2/6/19 15:52, Jan Beulich wrote: >>>>>>>>> On 29.01.19 at 15:43, wrote: >>>>>>> @@ -963,6 +965,9 @@ map_grant_ref( >>>>>>> PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", >>>>>>> op->ref, rgt->domain->domain_id); >>>>>>> >>>>>>> +/* Make sure the above check is not bypassed speculatively */ >>>>>>> +op->ref = array_index_nospec(op->ref, nr_grant_entries(rgt)); >>>>>>> + >>>>>>> act = active_entry_acquire(rgt, op->ref); >>>>>>> shah = shared_entry_header(rgt, op->ref); >>>>>>> status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, >>>>>>> op->ref); >>>>>> Just FTR - this is a case where the change, according to prior >>>>>> discussion, is pretty unlikely to help at all. The compiler will have >>>>>> a hard time realizing that it could keep the result in a register past >>>>>> the active_entry_acquire() invocation, as that - due to the spin >>>>>> lock acquired there - acts as a compiler barrier. And looking at >>>>>> generated code (gcc 8.2) confirms that there's a reload from the >>>>>> stack. >>>>> I could change this back to a prior version that protects each read >>>>> operation. >>>> That or use block_speculation() with a comment explaining why. >>>> >>>> Also - why are there no changes at all to the unmap_grant_ref() / >>>> unmap_and_replace() call paths? Note in particular the security >>>> related comment next to the bounds check of op->ref there. I've >>>> gone through earlier review rounds, but I couldn't find an indication >>>> that this might have been the result of review feedback. >>> You are right. I am not sure whether I had a fix placed there in the >>> beginning. I will replace the first "smp_rmb();" in function >>> unmap_common for the next iteration with the "block_speculation" macro. >> I just checked this one more time. The maptrack_entry macro has been >> extended with the array_index_nospec macro already, so that the >> assignment to the map variable is in bound. Therefore, I actually will >> not introduce the block_speculation macro. > unmap_common() uses maptrack_entry() with op->handle. I didn't > refer to that, because - as you say - maptrack_entry() is itself > getting hardened already. Instead I am, as said, referring to > map->ref / op->ref. > > And no, replacing _any_ smb_rmb() would not be correct: The > barriers are needed unconditionally, whereas block_speculation() > inserts a barrier only in a subset of cases (for example never on > Arm). Right. I will protect the index operations based on op->ref in unmap_common via array_index_nospec. > >>> The other check unlikely(op->ref >= nr_grant_entries(rgt)) can only >>> reach out-of-bounds for the unmap case, in case the map->ref entry has >>> been out-of-bounds beforehand. I did not find an assignment that is not >>> protected by a bound check and a speculation barrier or array_nospec_index. > I can only refer you to the comment there again. In essence, the prior > bounds check done may have been against the grant table limits of > another domain. You may want to look at the full commit introducing this > comment. In unmap_common_complete, IHMO it is sufficient to evaluate the first check op->done via evaluate_nospec, so that the return is taken in case nothing has been done, and then invalid values of op->ref should not be used under speculation, or out-of-bounds. On the other hand, this function is always called after gnttab_flush_tlb. I did not spot a good indicator for that function blocking speculation, hence, I would still add the macro. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] SpectreV1+L1TF Patch Series v6
Dear all, This patch series attempts to mitigate the issue that have been raised in the XSA-289 (https://xenbits.xen.org/xsa/advisory-289.html). To block speculative execution on Intel hardware, an lfence instruction is required to make sure that selected checks are not bypassed. Speculative out-of-bound accesses can be prevented by using the array_index_nospec macro. The major changes between v5 and v6 of this series are the introduction of asm specific nospec.h files that introduce macros to add lfence instructions to conditionals to be evaluated. Furthermore, updating variable that might not be located in a register is tried to be avoided. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v6 1/9] xen/evtchn: block speculative out-of-bound accesses
Guests can issue event channel interaction with guest specified data. To avoid speculative out-of-bound accesses, we use the nospec macros, or the domain_vcpu function. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- Notes: v6: drop vcpu < 0 check use struct vpcu in evtchn_bind_vcpu do not call domain_vcpu twice in evtchn_fifo_word_from_port xen/common/event_channel.c | 34 +++--- xen/common/event_fifo.c| 13 ++--- xen/include/xen/event.h| 5 +++-- 3 files changed, 36 insertions(+), 16 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -365,11 +365,16 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port) if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) return -EINVAL; + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); + if ( virq_is_global(virq) && (vcpu != 0) ) return -EINVAL; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) +if ( (v = domain_vcpu(d, vcpu)) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -418,8 +423,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind) intport, vcpu = bind->vcpu; long rc = 0; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - (d->vcpu[vcpu] == NULL) ) +if ( domain_vcpu(d, vcpu) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -813,6 +817,13 @@ int set_global_virq_handler(struct domain *d, uint32_t virq) if (virq >= NR_VIRQS) return -EINVAL; + + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(global_virq_handlers)); + if (!virq_is_global(virq)) return -EINVAL; @@ -930,8 +941,9 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) struct domain *d = current->domain; struct evtchn *chn; long rc = 0; +struct vcpu *v; -if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) +if ( (v = domain_vcpu(d, vcpu_id)) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -955,22 +967,22 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) { case ECS_VIRQ: if ( virq_is_global(chn->u.virq) ) -chn->notify_vcpu_id = vcpu_id; +chn->notify_vcpu_id = v->vcpu_id; else rc = -EINVAL; break; case ECS_UNBOUND: case ECS_INTERDOMAIN: -chn->notify_vcpu_id = vcpu_id; +chn->notify_vcpu_id = v->vcpu_id; break; case ECS_PIRQ: -if ( chn->notify_vcpu_id == vcpu_id ) +if ( chn->notify_vcpu_id == v->vcpu_id ) break; unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); -chn->notify_vcpu_id = vcpu_id; +chn->notify_vcpu_id = v->vcpu_id; pirq_set_affinity(d, chn->u.pirq.irq, - cpumask_of(d->vcpu[vcpu_id]->processor)); -link_pirq_port(port, chn, d->vcpu[vcpu_id]); + cpumask_of(v->processor)); +link_pirq_port(port, chn, v); break; default: rc = -EINVAL; diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -33,7 +33,8 @@ static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d, */ smp_rmb(); -p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; +p = array_index_nospec(port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE, + d->evtchn_fifo->num_evtchns); w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; return d->evtchn_fifo->event_array[p] + w; @@ -516,14 +517,20 @@ int evtchn_fifo_init_control(struct evtchn_init_control *init_control) gfn = init_control->control_gfn; offset = init_control->offset; -if ( vcpu_id >= d->max_vcpus || !d->vcpu[vcpu_id] ) +if ( (v = domain_vcpu(d, vcpu_id)) == NULL ) return -ENOENT; -v = d->vcpu[vcpu_id]; /* Must not cross page boundary. */ if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) return -EINVAL; +/* + * Make sure the guest controlled value offset is bounded even during + * speculative execution. + */ +offset = array_index_nospec(offset, + PAGE_SIZE - sizeof(evtchn_fifo_control_block_t) + 1); + /* M
[Xen-devel] [PATCH SpectreV1+L1TF v6 3/9] x86/hvm: block speculative out-of-bound accesses
There are multiple arrays in the HVM interface that are accessed with indices that are provided by the guest. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro. When blocking speculative out-of-bound accesses, we can classify arrays into dynamic arrays and static arrays. Where the former are allocated during run time, the size of the latter is known during compile time. On static arrays, compiler might be able to block speculative accesses in the future. This commit is part of the SpectreV1+L1TF mitigation patch series. Reported-by: Pawel Wieczorkiewicz Signed-off-by: Norbert Manthey --- Notes: v6: Match commit message with code Fix nospec bound in hvm_msr_read_intercept xen/arch/x86/hvm/hvm.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -2092,7 +2093,7 @@ int hvm_mov_from_cr(unsigned int cr, unsigned int gpr) case 2: case 3: case 4: -val = curr->arch.hvm.guest_cr[cr]; +val = array_access_nospec(curr->arch.hvm.guest_cr, cr); break; case 8: val = (vlapic_get_reg(vcpu_vlapic(curr), APIC_TASKPRI) & 0xf0) >> 4; @@ -3438,13 +3439,15 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( !d->arch.cpuid->basic.mtrr ) goto gp_fault; index = msr - MSR_MTRRfix16K_8; -*msr_content = fixed_range_base[index + 1]; +*msr_content = fixed_range_base[array_index_nospec(index + 1, + ARRAY_SIZE(v->arch.hvm.mtrr.fixed_ranges))]; break; case MSR_MTRRfix4K_C...MSR_MTRRfix4K_F8000: if ( !d->arch.cpuid->basic.mtrr ) goto gp_fault; index = msr - MSR_MTRRfix4K_C; -*msr_content = fixed_range_base[index + 3]; +*msr_content = fixed_range_base[array_index_nospec(index + 3, + ARRAY_SIZE(v->arch.hvm.mtrr.fixed_ranges))]; break; case MSR_IA32_MTRR_PHYSBASE(0)...MSR_IA32_MTRR_PHYSMASK(MTRR_VCNT_MAX - 1): if ( !d->arch.cpuid->basic.mtrr ) @@ -3453,7 +3456,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( (index / 2) >= MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT) ) goto gp_fault; -*msr_content = var_range_base[index]; +*msr_content = var_range_base[array_index_nospec(index, +2*MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT))]; break; case MSR_IA32_XSS: @@ -4016,7 +4020,7 @@ static int hvmop_set_evtchn_upcall_vector( if ( op.vector < 0x10 ) return -EINVAL; -if ( op.vcpu >= d->max_vcpus || (v = d->vcpu[op.vcpu]) == NULL ) +if ( (v = domain_vcpu(d, op.vcpu)) == NULL ) return -ENOENT; printk(XENLOG_G_INFO "%pv: upcall vector %02x\n", v, op.vector); @@ -4104,6 +4108,12 @@ static int hvmop_set_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; @@ -4370,6 +4380,12 @@ static int hvmop_get_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v6 2/9] x86/vioapic: block speculative out-of-bound accesses
When interacting with io apic, a guest can specify values that are used as index to structures, and whose values are not compared against upper bounds to prevent speculative out-of-bound accesses. This change prevents these speculative accesses. Furthermore, variables are initialized and the compiler is asked to not optimized these initializations, as the uninitialized, potentially guest controlled, variables might be used in a speculative out-of-bound access. Out of the four initialized variables, two are potentially problematic, namely ones in the functions vioapic_irq_positive_edge and vioapic_get_trigger_mode. As the two problematic variables are both used in the common function gsi_vioapic, the mitigation is implemented there. As the access pattern of the currently non-guest-controlled functions might change in the future as well, the other variables are initialized as well. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- Notes: v6: Explain initialization in commit message Initialize pin in all 4 functions that call gsi_vioapic Fix space in comment xen/arch/x86/hvm/vioapic.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -66,6 +67,12 @@ static struct hvm_vioapic *gsi_vioapic(const struct domain *d, { unsigned int i; +/* + * Make sure the compiler does not optimize away the initialization done by + * callers + */ +OPTIMIZER_HIDE_VAR(*pin); + for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) { struct hvm_vioapic *vioapic = domain_vioapic(d, i); @@ -117,7 +124,8 @@ static uint32_t vioapic_read_indirect(const struct hvm_vioapic *vioapic) break; } -redir_content = vioapic->redirtbl[redir_index].bits; +redir_content = vioapic->redirtbl[array_index_nospec(redir_index, + vioapic->nr_pins)].bits; result = (vioapic->ioregsel & 1) ? (redir_content >> 32) : redir_content; break; @@ -212,7 +220,15 @@ static void vioapic_write_redirent( struct hvm_irq *hvm_irq = hvm_domain_irq(d); union vioapic_redir_entry *pent, ent; int unmasked = 0; -unsigned int gsi = vioapic->base_gsi + idx; +unsigned int gsi; + +/* Callers of this function should make sure idx is bounded appropriately */ +ASSERT(idx < vioapic->nr_pins); + +/* Make sure no out-of-bound value for idx can be used */ +idx = array_index_nospec(idx, vioapic->nr_pins); + +gsi = vioapic->base_gsi + idx; spin_lock(&d->arch.hvm.irq_lock); @@ -467,7 +483,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) void vioapic_irq_positive_edge(struct domain *d, unsigned int irq) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ struct hvm_vioapic *vioapic = gsi_vioapic(d, irq, &pin); union vioapic_redir_entry *ent; @@ -542,7 +558,7 @@ void vioapic_update_EOI(struct domain *d, u8 vector) int vioapic_get_mask(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) @@ -553,7 +569,7 @@ int vioapic_get_mask(const struct domain *d, unsigned int gsi) int vioapic_get_vector(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) @@ -564,7 +580,7 @@ int vioapic_get_vector(const struct domain *d, unsigned int gsi) int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v6 7/9] is_hvm/pv_domain: block speculation
When checking for being an hvm domain, or PV domain, we have to make sure that speculation cannot bypass that check, and eventually access data that should not end up in cache for the current domain type. Signed-off-by: Norbert Manthey --- xen/include/xen/sched.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -922,7 +922,8 @@ void watchdog_domain_destroy(struct domain *d); static inline bool is_pv_domain(const struct domain *d) { -return IS_ENABLED(CONFIG_PV) ? d->guest_type == guest_type_pv : false; +return IS_ENABLED(CONFIG_PV) + ? evaluate_nospec(d->guest_type == guest_type_pv) : false; } static inline bool is_pv_vcpu(const struct vcpu *v) @@ -953,7 +954,8 @@ static inline bool is_pv_64bit_vcpu(const struct vcpu *v) #endif static inline bool is_hvm_domain(const struct domain *d) { -return IS_ENABLED(CONFIG_HVM) ? d->guest_type == guest_type_hvm : false; +return IS_ENABLED(CONFIG_HVM) + ? evaluate_nospec(d->guest_type == guest_type_hvm) : false; } static inline bool is_hvm_vcpu(const struct vcpu *v) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v6 8/9] common/grant_table: block speculative out-of-bound accesses
Guests can issue grant table operations and provide guest controlled data to them. This data is also used for memory loads. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable. However, there are also memory accesses that cannot be protected by a single array protection, or multiple accesses in a row. To protect these, a nospec barrier is placed between the actual range check and the access via the block_speculation macro. As different versions of grant tables use structures of different size, and the status is encoded in an array for version 2, speculative execution might touch zero-initialized structures of version 2 while the table is actually using version 1. As PV guests can have control over their NULL page, these accesses are prevented by protecting the grant table version evaluation. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- Notes: v6: Explain version 1 vs version 2 case in commit message Protect grant table version checks Use block_speculation in map_grant_ref instead of updating op->ref Move evaluate_nospec closer to the okay variable in gnttab_transfer xen/common/grant_table.c | 48 1 file changed, 36 insertions(+), 12 deletions(-) diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c --- a/xen/common/grant_table.c +++ b/xen/common/grant_table.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include @@ -203,8 +204,9 @@ static inline unsigned int nr_status_frames(const struct grant_table *gt) } #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) -#define maptrack_entry(t, e) \ -((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) +#define maptrack_entry(t, e) \ +((t)->maptrack[array_index_nospec(e, (t)->maptrack_limit) \ + /MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) static inline unsigned int nr_maptrack_frames(struct grant_table *t) @@ -963,9 +965,13 @@ map_grant_ref( PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", op->ref, rgt->domain->domain_id); +/* Make sure the above check is not bypassed speculatively */ +block_speculation(); + act = active_entry_acquire(rgt, op->ref); shah = shared_entry_header(rgt, op->ref); -status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); +status = evaluate_nospec(rgt->gt_version == 1) ? &shah->flags + : &status_entry(rgt, op->ref); /* If already pinned, check the active domid and avoid refcnt overflow. */ if ( act->pin && @@ -987,7 +993,7 @@ map_grant_ref( if ( !act->pin ) { -unsigned long gfn = rgt->gt_version == 1 ? +unsigned long gfn = evaluate_nospec(rgt->gt_version == 1) ? shared_entry_v1(rgt, op->ref).frame : shared_entry_v2(rgt, op->ref).full_page.frame; @@ -1321,7 +1327,8 @@ unmap_common( goto unlock_out; } -act = active_entry_acquire(rgt, op->ref); +act = active_entry_acquire(rgt, array_index_nospec(op->ref, + nr_grant_entries(rgt))); /* * Note that we (ab)use the active entry lock here to protect against @@ -1418,7 +1425,7 @@ unmap_common_complete(struct gnttab_unmap_common *op) struct page_info *pg; uint16_t *status; -if ( !op->done ) +if ( evaluate_nospec(!op->done) ) { /* unmap_common() didn't do anything - nothing to complete. */ return; @@ -2026,6 +2033,9 @@ gnttab_prepare_for_transfer( goto fail; } +/* Make sure the above check is not bypassed speculatively */ +ref = array_index_nospec(ref, nr_grant_entries(rgt)); + sha = shared_entry_header(rgt, ref); scombo.word = *(u32 *)&sha->flags; @@ -2223,7 +2233,11 @@ gnttab_transfer( okay = gnttab_prepare_for_transfer(e, d, gop.ref); spin_lock(&e->page_alloc_lock); -if ( unlikely(!okay) || unlikely(e->is_dying) ) +/* + * Make sure the reference bound check in gnttab_prepare_for_transfer + * is respected and speculative execution is blocked accordingly + */ +if ( unlikely(!evaluate_nospec(okay)) || unlikely(e->is_dying) ) { bool_t drop_dom_ref = !domain_adjust_tot_pages(e, -1); @@ -2253,7 +2267,7 @@ gnttab_transfer( grant_read_lock(e->grant_table); act = active_entry_acquire(e->grant_table, gop.ref); -if ( e->grant_table->gt_version == 1 ) +
[Xen-devel] [PATCH SpectreV1+L1TF v6 5/9] nospec: introduce evaluate_nospec
Since the L1TF vulnerability of Intel CPUs, loading hypervisor data into L1 cache is problematic, because when hyperthreading is used as well, a guest running on the sibling core can leak this potentially secret data. To prevent these speculative accesses, we block speculation after accessing the domain property field by adding lfence instructions. This way, the CPU continues executing and loading data only once the condition is actually evaluated. As the macros are typically used in if statements, the lfence has to come in a compatible way. Therefore, a function that returns true after an lfence instruction is introduced. To protect both branches after a conditional, an lfence instruction has to be added for the two branches. To be able to block speculation after several evaluations, the generic barrier macro block_speculation is also introduced. As the L1TF vulnerability is only present on the x86 architecture, there is no need to add protection for other architectures. Hence, the introduced macros are defined but empty. On the x86 architecture, by default, the lfence instruction is not present either. Only when a L1TF vulnerable platform is detected, the lfence instruction is patched in via alternative patching. Similarly, PV guests are protected wrt L1TF by default, so that the protection is furthermore disabled in case HVM is exclueded via the build configuration. Introducing the lfence instructions catches a lot of potential leaks with a simple unintrusive code change. During performance testing, we did not notice performance effects. Signed-off-by: Norbert Manthey --- Notes: v6: Introduce asm nospec.h files Check CONFIG_HVM consistently Extend commit message to explain CONFIG_HVM and new files Fix typos in commit message xen/include/asm-arm/nospec.h | 20 xen/include/asm-x86/nospec.h | 39 +++ xen/include/xen/nospec.h | 1 + 3 files changed, 60 insertions(+) create mode 100644 xen/include/asm-arm/nospec.h create mode 100644 xen/include/asm-x86/nospec.h diff --git a/xen/include/asm-arm/nospec.h b/xen/include/asm-arm/nospec.h new file mode 100644 --- /dev/null +++ b/xen/include/asm-arm/nospec.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. */ + +#ifndef _ASM_ARM_NOSPEC_H +#define _ASM_ARM_NOSPEC_H + +#define evaluate_nospec(condition) (condition) + +#define block_speculation() + +#endif /* _ASM_ARM_NOSPEC_H */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-x86/nospec.h b/xen/include/asm-x86/nospec.h new file mode 100644 --- /dev/null +++ b/xen/include/asm-x86/nospec.h @@ -0,0 +1,39 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. */ + +#ifndef _ASM_X86_NOSPEC_H +#define _ASM_X86_NOSPEC_H + +#include +#include + +/* Allow to insert a read memory barrier into conditionals */ +static always_inline bool arch_barrier_nospec_true(void) +{ +#if defined(CONFIG_HVM) +alternative("", "lfence", X86_FEATURE_SC_L1TF_VULN); +#endif +return true; +} + +/* Allow to protect evaluation of conditionaasl with respect to speculation */ +#if defined(CONFIG_HVM) +#define evaluate_nospec(condition) \ +((condition) ? arch_barrier_nospec_true() : !arch_barrier_nospec_true()) +#else +#define evaluate_nospec(condition) (condition) +#endif + +/* Allow to block speculative execution in generic code */ +#define block_speculation() (void)arch_barrier_nospec_true() + +#endif /* _ASM_X86_NOSPEC_H */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h --- a/xen/include/xen/nospec.h +++ b/xen/include/xen/nospec.h @@ -8,6 +8,7 @@ #define XEN_NOSPEC_H #include +#include /** * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v6 4/9] spec: add l1tf-barrier
To control the runtime behavior on L1TF vulnerable platforms better, the command line option l1tf-barrier is introduced. This option controls whether on vulnerable x86 platforms the lfence instruction is used to prevent speculative execution from bypassing the evaluation of conditionals that are protected with the evaluate_nospec macro. By now, Xen is capable of identifying L1TF vulnerable hardware. However, this information cannot be used for alternative patching, as a CPU feature is required. To control alternative patching with the command line option, a new x86 feature "X86_FEATURE_SC_L1TF_VULN" is introduced. This feature is used to patch the lfence instruction into the arch_barrier_nospec_true function. The feature is enabled only if L1TF vulnerable hardware is detected and the command line option does not prevent using this feature. The status of hyperthreading is not considered when automatically enabling adding the lfence instruction, because platforms without hyperthreading can still be vulnerable to L1TF in case the L1 cache is not flushed properly. Signed-off-by: Norbert Manthey --- Notes: v6: Move disabling l1tf-barrier into spec-ctrl=no Use gap in existing flags Force barrier based on commandline, independently of L1TF detection docs/misc/xen-command-line.pandoc | 14 ++ xen/arch/x86/spec_ctrl.c | 17 +++-- xen/include/asm-x86/cpufeatures.h | 1 + xen/include/asm-x86/spec_ctrl.h | 1 + 4 files changed, 27 insertions(+), 6 deletions(-) diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -483,9 +483,9 @@ accounting for hardware capabilities as enumerated via CPUID. Currently accepted: -The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`, -`l1d-flush` and `ssbd` are used by default if available and applicable. They can -be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and +The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`, `l1d-flush`, +`l1tf-barrier` and `ssbd` are used by default if available and applicable. They +can be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and won't offer them to guests. ### cpuid_mask_cpu @@ -1896,7 +1896,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`). ### spec-ctrl (x86) > `= List of [ , xen=, {pv,hvm,msr-sc,rsb}=, > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu, -> l1d-flush}= ]` +> l1d-flush,l1tf-barrier}= ]` Controls for speculative execution sidechannel mitigations. By default, Xen will pick the most appropriate mitigations based on compiled in support, @@ -1962,6 +1962,12 @@ Irrespective of Xen's setting, the feature is virtualised for HVM guests to use. By default, Xen will enable this mitigation on hardware believed to be vulnerable to L1TF. +On hardware vulnerable to L1TF, the `l1tf-barrier=` option can be used to force +or prevent Xen from protecting evaluations inside the hypervisor with a barrier +instruction to not load potentially secret information into L1 cache. By +default, Xen will enable this mitigation on hardware believed to be vulnerable +to L1TF. + ### sync_console > `= ` diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c --- a/xen/arch/x86/spec_ctrl.c +++ b/xen/arch/x86/spec_ctrl.c @@ -21,6 +21,7 @@ #include #include +#include #include #include #include @@ -50,6 +51,7 @@ bool __read_mostly opt_ibpb = true; bool __read_mostly opt_ssbd = false; int8_t __read_mostly opt_eager_fpu = -1; int8_t __read_mostly opt_l1d_flush = -1; +int8_t __read_mostly opt_l1tf_barrier = -1; bool __initdata bsp_delay_spec_ctrl; uint8_t __read_mostly default_xen_spec_ctrl; @@ -91,6 +93,8 @@ static int __init parse_spec_ctrl(const char *s) if ( opt_pv_l1tf_domu < 0 ) opt_pv_l1tf_domu = 0; +opt_l1tf_barrier = 0; + disable_common: opt_rsb_pv = false; opt_rsb_hvm = false; @@ -157,6 +161,8 @@ static int __init parse_spec_ctrl(const char *s) opt_eager_fpu = val; else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 ) opt_l1d_flush = val; +else if ( (val = parse_boolean("l1tf-barrier", s, ss)) >= 0 ) +opt_l1tf_barrier = val; else rc = -EINVAL; @@ -248,7 +254,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps) "\n"); /* Settings for Xen's protection, irrespective of guests. */ -printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s\n", +printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s, Other:%s%s%s\n", thunk == THUNK_NONE ? "N/A&q
[Xen-devel] [PATCH SpectreV1+L1TF v6 6/9] is_control_domain: block speculation
Checks of domain properties, such as is_hardware_domain or is_hvm_domain, might be bypassed by speculatively executing these instructions. A reason for bypassing these checks is that these macros access the domain structure via a pointer, and check a certain field. Since this memory access is slow, the CPU assumes a returned value and continues the execution. In case an is_control_domain check is bypassed, for example during a hypercall, data that should only be accessible by the control domain could be loaded into the cache. Signed-off-by: Norbert Manthey --- Notes: v6: Drop nospec.h include xen/include/xen/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -913,10 +913,10 @@ void watchdog_domain_destroy(struct domain *d); *(that is, this would not be suitable for a driver domain) * - There is never a reason to deny the hardware domain access to this */ -#define is_hardware_domain(_d) ((_d) == hardware_domain) +#define is_hardware_domain(_d) evaluate_nospec((_d) == hardware_domain) /* This check is for functionality specific to a control domain */ -#define is_control_domain(_d) ((_d)->is_privileged) +#define is_control_domain(_d) evaluate_nospec((_d)->is_privileged) #define VM_ASSIST(d, t) (test_bit(VMASST_TYPE_ ## t, &(d)->vm_assist)) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH SpectreV1+L1TF v6 9/9] common/memory: block speculative out-of-bound accesses
The get_page_from_gfn method returns a pointer to a page that belongs to a gfn. Before returning the pointer, the gfn is checked for being valid. Under speculation, these checks can be bypassed, so that the function get_page is still executed partially. Consequently, the function page_get_owner_and_reference might be executed partially as well. In this function, the computed pointer is accessed, resulting in a speculative out-of-bound address load. As the gfn can be controlled by a guest, this access is problematic. To mitigate the root cause, an lfence instruction is added via the evaluate_nospec macro. To make the protection generic, we do not introduce the lfence instruction for this single check, but add it to the mfn_valid function. This way, other potentially problematic accesses are protected as well. This commit is part of the SpectreV1+L1TF mitigation patch series. Signed-off-by: Norbert Manthey --- Notes: v6: Add array_index_nospec to test_bit call xen/common/pdx.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/xen/common/pdx.c b/xen/common/pdx.c --- a/xen/common/pdx.c +++ b/xen/common/pdx.c @@ -18,6 +18,7 @@ #include #include #include +#include /* Parameters for PFN/MADDR compression. */ unsigned long __read_mostly max_pdx; @@ -33,10 +34,11 @@ unsigned long __read_mostly pdx_group_valid[BITS_TO_LONGS( bool __mfn_valid(unsigned long mfn) { -return likely(mfn < max_page) && - likely(!(mfn & pfn_hole_mask)) && - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, - pdx_group_valid)); +return evaluate_nospec( +likely(mfn < max_page) && +likely(!(mfn & pfn_hole_mask)) && +likely(test_bit(pfn_to_pdx(array_index_nospec(mfn, max_page)) + / PDX_GROUP_COUNT, pdx_group_valid))); } /* Sets all bits from the most-significant 1-bit down to the LSB */ -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH XTF perf 1/4] categories: add perf
As XTF allows to write tests that interact with the hypervisor, we would like to use this capability to implement micro benchmarks, so that we can measure the performance impact of modifications to the hypervisor. Signed-off-by: Norbert Manthey --- build/common.mk | 2 +- xtf-runner | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/build/common.mk b/build/common.mk --- a/build/common.mk +++ b/build/common.mk @@ -1,4 +1,4 @@ -ALL_CATEGORIES := special functional xsa utility in-development +ALL_CATEGORIES := special functional xsa utility in-development perf ALL_ENVIRONMENTS := pv64 pv32pae hvm64 hvm32pae hvm32pse hvm32 diff --git a/xtf-runner b/xtf-runner --- a/xtf-runner +++ b/xtf-runner @@ -37,7 +37,7 @@ def exit_code(state): # All test categories default_categories = set(("functional", "xsa")) -non_default_categories = set(("special", "utility", "in-development")) +non_default_categories = set(("special", "utility", "in-development", "perf")) all_categories = default_categories | non_default_categories # All test environments -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] XTF perf - Micro Benchmarks
Dear all, I added a perf category to XTF, and added functions to measure time in the guest. Finally, I added a first micro benchmark that measures the time to call a specified hypercall, and print the average time the hypercall takes. The added category should be useful to implement further micro benchmarks. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH XTF perf 4/4] perf: measure MMUEXT_MARK_SUPER test
A first simple test is to call a hypercall in a tight loop. To measure implementation aspects of the hypervisor, we picked a hypercall that is not implemented and hence results in a no-op, namely the hypercall mmuext_op with the command MMUEXT_MARK_SUPER. Signed-off-by: Norbert Manthey --- tests/perf-PV-MMUEXT_MARK_SUPER-noop/Makefile | 9 tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c | 66 +++ 2 files changed, 75 insertions(+) create mode 100644 tests/perf-PV-MMUEXT_MARK_SUPER-noop/Makefile create mode 100644 tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c diff --git a/tests/perf-PV-MMUEXT_MARK_SUPER-noop/Makefile b/tests/perf-PV-MMUEXT_MARK_SUPER-noop/Makefile new file mode 100644 --- /dev/null +++ b/tests/perf-PV-MMUEXT_MARK_SUPER-noop/Makefile @@ -0,0 +1,9 @@ +include $(ROOT)/build/common.mk + +NAME := perf-PV-MMUEXT_MARK_SUPER-noop +CATEGORY := perf +TEST-ENVS := pv64 + +obj-perenv += main.o + +include $(ROOT)/build/gen.mk diff --git a/tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c b/tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c new file mode 100644 --- /dev/null +++ b/tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c @@ -0,0 +1,66 @@ +/** + * @file tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c + * @ref test-perf-PV-MMUEXT_MARK_SUPER-noop + * + * @page perf-PV-MMUEXT_MARK_SUPER-noop + * + * This test runs the hypercall mmuext_op with the command MMUEXT_MARK_SUPER in + * a tight loop, and measures how much time it takes for all loops. Finally, the + * test prints this time. + * + * Since this is a performance test, the actual value is furthermore printed + * using the predefined pattern on a separate line. The reported value + * represents the time it takes to run the mmuext_op hypercall in nano seconds. + * The average is calculated by using 5 calls. + * + * perf + * + * @see tests/perf-PV-MMUEXT_MARK_SUPER-noop/main.c + */ + +#define MEASUREMENT_RETRIES 5 + +#include +#include + +const char test_title[] = "Test MMUEXT_MARK_SUPER"; + +/* Use a global struct to avoid local variables in call_MMUEXT_MARK_SUPER */ +mmuext_op_t op = +{ +.cmd = MMUEXT_MARK_SUPER, +}; + +/* Schedule a no-op hypercall */ +int call_MMUEXT_MARK_SUPER(void) +{ +return hypercall_mmuext_op(&op, 1, NULL, DOMID_SELF); +} + +void test_main(void) +{ +int rc = 0; + +/* Test whether the hypercall is implemented as expected */ +rc = hypercall_mmuext_op(&op, 1, NULL, DOMID_SELF); +if(rc != -EOPNOTSUPP) +return xtf_error("Unexpected MMUEXT_MARK_SUPER, rc %d\n", rc); + +/* Measure and print to screen how long calling this hypercall takes */ +measure_performance(test_title, +"mmuext_op(MMUEXT_MARK_SUPER, ...)", +MEASUREMENT_RETRIES, +call_MMUEXT_MARK_SUPER); + +return xtf_success("Success: performed MMUEXT_MARK_SUPER hypercall with expected result\n"); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH XTF perf 2/4] time: add stubs
To measure how long a certain interaction takes, we need time primitives. This commit introduces these primitives, so that future tests can use the gettimeofday function to retrieve the current time. Signed-off-by: Paul Semel Signed-off-by: Norbert Manthey --- build/files.mk | 1 + common/time.c | 203 + include/xtf/time.h | 66 + 3 files changed, 270 insertions(+) create mode 100644 common/time.c create mode 100644 include/xtf/time.h diff --git a/build/files.mk b/build/files.mk --- a/build/files.mk +++ b/build/files.mk @@ -16,6 +16,7 @@ obj-perarch += $(ROOT)/common/libc/vsnprintf.o obj-perarch += $(ROOT)/common/report.o obj-perarch += $(ROOT)/common/setup.o obj-perarch += $(ROOT)/common/xenbus.o +obj-perarch += $(ROOT)/common/time.o obj-perenv += $(ROOT)/arch/x86/decode.o obj-perenv += $(ROOT)/arch/x86/desc.o diff --git a/common/time.c b/common/time.c new file mode 100644 --- /dev/null +++ b/common/time.c @@ -0,0 +1,203 @@ +#include +#include +#include +#include + +/* This function was taken from mini-os source code [tag xen-RELEASE-4.11.1] + + * (C) 2003 - Rolf Neugebauer - Intel Research Cambridge + * (C) 2002-2003 - Keir Fraser - University of Cambridge + * (C) 2005 - Grzegorz Milos - Intel Research Cambridge + * (C) 2006 - Robert Kaiser - FH Wiesbaden + + * + *File: time.c + * Author: Rolf Neugebauer and Keir Fraser + * Changes: Grzegorz Milos + * + * Description: Simple time and timer functions + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ +/* It returns ((delta << shift) * mul_frac) >> 32 */ +static inline uint64_t scale_delta(uint64_t delta, uint32_t mul_frac, int shift) +{ +uint64_t product; +#ifdef __i386__ +uint32_t tmp1, tmp2; +#endif + +if ( shift < 0 ) +delta >>= -shift; +else +delta <<= shift; + +#ifdef __i386__ +__asm__ ( +"mul %5 ; " +"mov %4,%%eax ; " +"mov %%edx,%4 ; " +"mul %5 ; " +"add %4,%%eax ; " +"xor %5,%5; " +"adc %5,%%edx ; " +: "=A" (product), "=r" (tmp1), "=r" (tmp2) +: "a" ((uint32_t)delta), "1" ((uint32_t)(delta >> 32)), "2" (mul_frac) ); +#else +__asm__ ( +"mul %%rdx ; shrd $32,%%rdx,%%rax" +: "=a" (product) : "0" (delta), "d" ((uint64_t)mul_frac) ); +#endif + +return product; +} + + +#if defined(__i386__) +uint32_t since_boot_time(void) +#else +uint64_t since_boot_time(void) +#endif +{ +unsigned long old_tsc, tsc; +#if defined(__i386__) +uint32_t system_time; +#else +uint64_t system_time; +#endif +uint32_t ver1, ver2; + +do { +do { +ver1 = shared_info.vcpu_info[0].time.version; +smp_rmb(); +} while ( (ver1 & 1) == 1 ); + +system_time = shared_info.vcpu_info[0].time.system_time; +old_tsc = shared_info.vcpu_info[0].time.tsc_timestamp; +smp_rmb(); +tsc = rdtscp(); +ver2 = ACCESS_ONCE(shared_info.vcpu_info[0].time.version); +smp_rmb(); +} while ( ver1 != ver2 ); + +system_time += scale_delta(tsc - old_tsc, + shared_info.vcpu_info[0].time.tsc_to_system_mul, + shared_info.vcpu_info[0].time.tsc_shift); + +return system_time; +} + +/* This function return the epoch time (number of seconds elapsed + * since Juanary 1, 1970) */ +#if defined(__i386__) +uint32_t current_time(void) +#else +
[Xen-devel] [PATCH XTF perf 3/4] time: provide measurement template
The added function measure_performance allows to measure the run time of a function, by computing the average time it takes to call that function a given number of retries. The measured total time is returned in nano seconds. Furthermore, the value is printed via printk in a fixed format, to allow processing the output further. This format is, where average-time provides ns with ps granularity: perf test_name ns Signed-off-by: Norbert Manthey --- common/time.c | 36 include/xtf/time.h | 8 +++- 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/common/time.c b/common/time.c --- a/common/time.c +++ b/common/time.c @@ -192,6 +192,42 @@ void msleep(uint64_t t) mspin_sleep(t); } +long measure_performance(const char* test_name, const char* function_name, + unsigned long retries, to_be_measured call) +{ +struct timeval start, end; +int rc = 0; + +printk("Start calling %s %lu times\n", function_name, retries); + +/* Perform all calls, measure start and end time */ +gettimeofday(&start); +for(unsigned long counter = 0; counter < retries; ++ counter) +{ +rc = call(); +} +gettimeofday(&end); + +/* Calculate the total number in nano seconds */ +long total_ns = (end.sec - start.sec)*10 + (end.nsec - start.nsec); +long avg_ns = total_ns / retries; +long avg_ps = (total_ns / (retries/1000)) % 1000; + +/* Show the result of the last query */ +printk("%s last result: %d\n", function_name, rc); + +/* Print average time and total time */ +printk("Avg %s call time: avg: %ld.%s%ld ns total: %ld ns\n", + function_name, avg_ns, + avg_ps < 10 ? "0" : (avg_ps < 100 ? "0" : ""), avg_ps, total_ns); + +/* Print performance value */ +printk("perf %s %ld.%s%ld ns\n", test_name, avg_ns, + avg_ps < 10 ? "0" : (avg_ps < 100 ? "0" : ""), avg_ps); + +return total_ns; +} + /* * Local variables: * mode: C diff --git a/include/xtf/time.h b/include/xtf/time.h --- a/include/xtf/time.h +++ b/include/xtf/time.h @@ -49,10 +49,16 @@ void msleep(uint64_t f); int gettimeofday(struct timeval *tp); - /* This returns the current epoch time */ #define NOW() current_time() +/* Signature of a function to be called for measurement */ +typedef int (*to_be_measured)(void); + +/* Measure the time it takes to call the passed function RETRIES times */ +long measure_performance(const char* test_name, const char* function_name, + unsigned long retries, to_be_measured call); + #endif /* XTF_TIME_H */ /* -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH Makefile v2] asm: handle comments when creating header file
On 2/6/19 16:10, Jan Beulich wrote: >>>> On 06.02.19 at 15:09, wrote: >> From: Norbert Manthey >> >> In the early steps of compilation, the asm header files are created, such >> as include/asm-$(TARGET_ARCH)/asm-offsets.h. These files depend on the >> assembly file arch/$(TARGET_ARCH)/asm-offsets.s, which is generated >> before. Depending on the used toolchain, there might be comments in the >> assembly files. Especially the goto-gcc compiler of the bounded model >> checker CBMC adds comments that start with a '#' symbol at the beginning >> of the line. >> >> This commit adds handling comments in assembler during the creation of the >> asm header files, especially ignoring lines that start with '#', which >> indicate comments for both ARM and x86 assembler. The used tool goto-as >> produces exactly comments of this kind. >> >> Signed-off-by: Norbert Manthey >> Signed-off-by: Michael Tautschnig > Reviewed-by: Jan Beulich > Jürgen, is there a chance to get this patch into the 4.12 release? It would be nice to be able to compile upstream Xen with the tool chain for the CBMC model checker (i.e. the goto-gcc compiler), as that tool chain allows to apply further reasoning. Thanks! Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 3/9] x86/hvm: block speculative out-of-bound accesses
On 2/12/19 14:25, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> @@ -3453,7 +3456,8 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t >> *msr_content) >> if ( (index / 2) >= >> MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, MTRRcap_VCNT) ) >> goto gp_fault; >> -*msr_content = var_range_base[index]; >> +*msr_content = var_range_base[array_index_nospec(index, >> +2*MASK_EXTR(v->arch.hvm.mtrr.mtrr_cap, >> MTRRcap_VCNT))]; > Missing blanks around *. This alone would be easy to adjust while > committing, but there's still the only partially discussed question > regarding ... > >> @@ -4104,6 +4108,12 @@ static int hvmop_set_param( >> if ( a.index >= HVM_NR_PARAMS ) >> return -EINVAL; >> >> +/* >> + * Make sure the guest controlled value a.index is bounded even during >> + * speculative execution. >> + */ >> +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); >> + >> d = rcu_lock_domain_by_any_id(a.domid); >> if ( d == NULL ) >> return -ESRCH; >> @@ -4370,6 +4380,12 @@ static int hvmop_get_param( >> if ( a.index >= HVM_NR_PARAMS ) >> return -EINVAL; >> >> +/* >> + * Make sure the guest controlled value a.index is bounded even during >> + * speculative execution. >> + */ >> +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); > ... the usefulness of these two. To make forward progress it may > be worthwhile to split off these two changes into a separate patch. > If you're fine with this, I could strip these two before committing, > in which case the remaining change is > Reviewed-by: Jan Beulich Taking apart the commit is fine with me. I will submit a follow up change that does not update the values but fixes the reads. Best, Norbert Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 1/9] xen/evtchn: block speculative out-of-bound accesses
On 2/12/19 14:08, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> @@ -813,6 +817,13 @@ int set_global_virq_handler(struct domain *d, uint32_t >> virq) >> >> if (virq >= NR_VIRQS) >> return -EINVAL; >> + >> + /* >> +* Make sure the guest controlled value virq is bounded even during >> +* speculative execution. >> +*/ >> +virq = array_index_nospec(virq, ARRAY_SIZE(global_virq_handlers)); >> + >> if (!virq_is_global(virq)) >> return -EINVAL; > Didn't we agree earlier on that this addition is pointless, as the only > caller is the XEN_DOMCTL_set_virq_handler handler, and most > domctl-s (including this one) are excluded from security considerations > due to XSA-77? I do not recall such a comment, but agree that this hunk can be dropped. > >> @@ -955,22 +967,22 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int >> vcpu_id) >> { >> case ECS_VIRQ: >> if ( virq_is_global(chn->u.virq) ) >> -chn->notify_vcpu_id = vcpu_id; >> +chn->notify_vcpu_id = v->vcpu_id; >> else >> rc = -EINVAL; >> break; >> case ECS_UNBOUND: >> case ECS_INTERDOMAIN: >> -chn->notify_vcpu_id = vcpu_id; >> +chn->notify_vcpu_id = v->vcpu_id; >> break; >> case ECS_PIRQ: >> -if ( chn->notify_vcpu_id == vcpu_id ) >> +if ( chn->notify_vcpu_id == v->vcpu_id ) >> break; >> unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); >> -chn->notify_vcpu_id = vcpu_id; >> +chn->notify_vcpu_id = v->vcpu_id; > Right now we understand why all of these changes are done, but > without a comment this is liable to be converted back as an > optimization down the road. I will extend the commit message accordingly. Best, Norbert > > Everything else here looks fine to me now. > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 2/9] x86/vioapic: block speculative out-of-bound accesses
On 2/12/19 14:16, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> When interacting with io apic, a guest can specify values that are used >> as index to structures, and whose values are not compared against >> upper bounds to prevent speculative out-of-bound accesses. This change >> prevents these speculative accesses. >> >> Furthermore, variables are initialized and the compiler is asked to not >> optimized these initializations, as the uninitialized, potentially guest >> controlled, variables might be used in a speculative out-of-bound access. > Uninitialized variables can't be guest controlled, not even potentially. > What we want to avoid here is speculation with uninitialized values > (or really stale data still on the stack from use by other code), > regardless of direct guest control. I will drop the part "potentially guest controlled". > >> Out of the four initialized variables, two are potentially problematic, >> namely ones in the functions vioapic_irq_positive_edge and >> vioapic_get_trigger_mode. >> >> As the two problematic variables are both used in the common function >> gsi_vioapic, the mitigation is implemented there. As the access pattern >> of the currently non-guest-controlled functions might change in the >> future as well, the other variables are initialized as well. >> >> This commit is part of the SpectreV1+L1TF mitigation patch series. > Oh, I didn't pay attention in patch 1: You had meant to change this > wording to something including "speculative hardening" (throughout > the series). That slipped through as I did not add that right after the discussion. I added this to the whole series now. > >> @@ -212,7 +220,15 @@ static void vioapic_write_redirent( >> struct hvm_irq *hvm_irq = hvm_domain_irq(d); >> union vioapic_redir_entry *pent, ent; >> int unmasked = 0; >> -unsigned int gsi = vioapic->base_gsi + idx; >> +unsigned int gsi; >> + >> +/* Callers of this function should make sure idx is bounded >> appropriately */ >> +ASSERT(idx < vioapic->nr_pins); >> + >> +/* Make sure no out-of-bound value for idx can be used */ > out-of-bounds Will fix. Best, Norbert > > I'm fine now with all the code changes here. > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 5/9] nospec: introduce evaluate_nospec
On 2/12/19 14:50, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> --- /dev/null >> +++ b/xen/include/asm-x86/nospec.h >> @@ -0,0 +1,39 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. >> */ >> + >> +#ifndef _ASM_X86_NOSPEC_H >> +#define _ASM_X86_NOSPEC_H >> + >> +#include >> +#include >> + >> +/* Allow to insert a read memory barrier into conditionals */ >> +static always_inline bool arch_barrier_nospec_true(void) > Now that this is x86-specific (and not used by common code), > I don't think the arch_ prefix is warranted anymore. I will drop the prefix. >> +{ >> +#if defined(CONFIG_HVM) > Here and below I'd prefer if you used the shorter #ifdef. I will use the short version. > >> +alternative("", "lfence", X86_FEATURE_SC_L1TF_VULN); >> +#endif >> +return true; >> +} >> + >> +/* Allow to protect evaluation of conditionaasl with respect to speculation >> */ >> +#if defined(CONFIG_HVM) >> +#define evaluate_nospec(condition) \ >> +((condition) ? arch_barrier_nospec_true() : !arch_barrier_nospec_true()) >> +#else >> +#define evaluate_nospec(condition) (condition) >> +#endif >> + >> +/* Allow to block speculative execution in generic code */ >> +#define block_speculation() (void)arch_barrier_nospec_true() > I'm pretty sure that I did point out before that this lacks an > outer pair of parentheses. You did. I will add them. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 5/9] nospec: introduce evaluate_nospec
On 2/12/19 15:12, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> --- /dev/null >> +++ b/xen/include/asm-x86/nospec.h >> @@ -0,0 +1,39 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. >> */ >> + >> +#ifndef _ASM_X86_NOSPEC_H >> +#define _ASM_X86_NOSPEC_H >> + >> +#include >> +#include > Isn't the latter unnecessary now? You don't use any *mb() construct > anymore. True, I deleted this include. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 6/9] is_control_domain: block speculation
On 2/12/19 15:11, Jan Beulich wrote: >>>> On 08.02.19 at 14:44, wrote: >> Checks of domain properties, such as is_hardware_domain or is_hvm_domain, >> might be bypassed by speculatively executing these instructions. A reason >> for bypassing these checks is that these macros access the domain >> structure via a pointer, and check a certain field. Since this memory >> access is slow, the CPU assumes a returned value and continues the >> execution. >> >> In case an is_control_domain check is bypassed, for example during a >> hypercall, data that should only be accessible by the control domain could >> be loaded into the cache. >> >> Signed-off-by: Norbert Manthey >> >> --- >> >> Notes: >> v6: Drop nospec.h include > And this was because of what? I think it is good practice to include > other headers which added definitions rely on, even if in practice > _right now_ that header gets included already by other means. If > there's some recursion in header dependencies, then it would have > been nice if you had pointed out the actual issue. The nospec.h header has been introduced by the commit "xen/sched: Introduce domain_vcpu() helper" between my v4 and v6, so I had to drop my include there. The sched.h file still includes the nospec.h file, I just do not have to add it any more. I could have been a bit more verbose in the notes section. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 9/9] common/memory: block speculative out-of-bound accesses
On 2/12/19 15:31, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> @@ -33,10 +34,11 @@ unsigned long __read_mostly >> pdx_group_valid[BITS_TO_LONGS( >> >> bool __mfn_valid(unsigned long mfn) >> { >> -return likely(mfn < max_page) && >> - likely(!(mfn & pfn_hole_mask)) && >> - likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, >> - pdx_group_valid)); >> +return evaluate_nospec( >> +likely(mfn < max_page) && >> +likely(!(mfn & pfn_hole_mask)) && >> +likely(test_bit(pfn_to_pdx(array_index_nospec(mfn, max_page)) >> + / PDX_GROUP_COUNT, pdx_group_valid))); >> } > How about this instead: > > bool __mfn_valid(unsigned long mfn) > { > if ( unlikely(evaluate_nospec(mfn >= max_page)) ) > return false; > return likely(!(mfn & pfn_hole_mask)) && >likely(test_bit(pfn_to_pdx(mfn) / PDX_GROUP_COUNT, >pdx_group_valid)); > } > > Initially I really just wanted to improve the line wrapping (at the > very least the / was misplaced), but I think this variant guards > against all that's needed without even introducing wrapping > headaches. That works as well, I will adapt the commit accordingly. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 3/9] x86/hvm: block speculative out-of-bound accesses
On 2/12/19 15:14, Jan Beulich wrote: On 12.02.19 at 15:05, wrote: >> On 2/12/19 14:25, Jan Beulich wrote: >> On 08.02.19 at 14:44, wrote: @@ -4104,6 +4108,12 @@ static int hvmop_set_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; @@ -4370,6 +4380,12 @@ static int hvmop_get_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); >>> ... the usefulness of these two. To make forward progress it may >>> be worthwhile to split off these two changes into a separate patch. >>> If you're fine with this, I could strip these two before committing, >>> in which case the remaining change is >>> Reviewed-by: Jan Beulich >> Taking apart the commit is fine with me. I will submit a follow up >> change that does not update the values but fixes the reads. > As pointed out during the v5 discussion, I'm unconvinced that if > you do so the compiler can't re-introduce the issue via CSE. I'd > really like a reliable solution to be determined first. I cannot give a guarantee what future compilers might do. Furthermore, I do not want to wait until all/most compilers ship with such a controllable guarantee. While I would love to have a reliable solution as well, I'd go with what we can do today for now, and re-iterate once we have something more stable. Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 4/9] spec: add l1tf-barrier
On 2/12/19 14:44, Jan Beulich wrote: >>>> On 08.02.19 at 14:44, wrote: >> To control the runtime behavior on L1TF vulnerable platforms better, the >> command line option l1tf-barrier is introduced. This option controls >> whether on vulnerable x86 platforms the lfence instruction is used to >> prevent speculative execution from bypassing the evaluation of >> conditionals that are protected with the evaluate_nospec macro. >> >> By now, Xen is capable of identifying L1TF vulnerable hardware. However, >> this information cannot be used for alternative patching, as a CPU feature >> is required. To control alternative patching with the command line option, >> a new x86 feature "X86_FEATURE_SC_L1TF_VULN" is introduced. This feature >> is used to patch the lfence instruction into the arch_barrier_nospec_true >> function. The feature is enabled only if L1TF vulnerable hardware is >> detected and the command line option does not prevent using this feature. >> >> The status of hyperthreading is not considered when automatically enabling >> adding the lfence instruction, because platforms without hyperthreading >> can still be vulnerable to L1TF in case the L1 cache is not flushed >> properly. >> >> Signed-off-by: Norbert Manthey >> >> --- >> >> Notes: >> v6: Move disabling l1tf-barrier into spec-ctrl=no >> Use gap in existing flags >> Force barrier based on commandline, independently of L1TF detection >> >> docs/misc/xen-command-line.pandoc | 14 ++ >> xen/arch/x86/spec_ctrl.c | 17 +++-- >> xen/include/asm-x86/cpufeatures.h | 1 + >> xen/include/asm-x86/spec_ctrl.h | 1 + >> 4 files changed, 27 insertions(+), 6 deletions(-) >> >> diff --git a/docs/misc/xen-command-line.pandoc >> b/docs/misc/xen-command-line.pandoc >> --- a/docs/misc/xen-command-line.pandoc >> +++ b/docs/misc/xen-command-line.pandoc >> @@ -483,9 +483,9 @@ accounting for hardware capabilities as enumerated via >> CPUID. >> >> Currently accepted: >> >> -The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`, >> -`l1d-flush` and `ssbd` are used by default if available and applicable. >> They >> can >> -be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and >> +The Speculation Control hardware features `ibrsb`, `stibp`, `ibpb`, >> `l1d-flush`, >> +`l1tf-barrier` and `ssbd` are used by default if available and applicable. >> They >> +can be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, >> and >> won't offer them to guests. >> >> ### cpuid_mask_cpu >> @@ -1896,7 +1896,7 @@ By default SSBD will be mitigated at runtime (i.e >> `ssbd=runtime`). >> ### spec-ctrl (x86) >> > `= List of [ , xen=, {pv,hvm,msr-sc,rsb}=, >> > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu, >> -> l1d-flush}= ]` >> +> l1d-flush,l1tf-barrier}= ]` >> >> Controls for speculative execution sidechannel mitigations. By default, >> Xen >> will pick the most appropriate mitigations based on compiled in support, >> @@ -1962,6 +1962,12 @@ Irrespective of Xen's setting, the feature is >> virtualised for HVM guests to >> use. By default, Xen will enable this mitigation on hardware believed to >> be >> vulnerable to L1TF. >> >> +On hardware vulnerable to L1TF, the `l1tf-barrier=` option can be used to >> force >> +or prevent Xen from protecting evaluations inside the hypervisor with a >> barrier >> +instruction to not load potentially secret information into L1 cache. By >> +default, Xen will enable this mitigation on hardware believed to be >> vulnerable >> +to L1TF. >> + >> ### sync_console >> > `= ` >> >> diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c >> --- a/xen/arch/x86/spec_ctrl.c >> +++ b/xen/arch/x86/spec_ctrl.c >> @@ -21,6 +21,7 @@ >> #include >> #include >> >> +#include >> #include >> #include >> #include >> @@ -50,6 +51,7 @@ bool __read_mostly opt_ibpb = true; >> bool __read_mostly opt_ssbd = false; >> int8_t __read_mostly opt_eager_fpu = -1; >> int8_t __read_mostly opt_l1d_flush = -1; >> +int8_t __read_mostly opt_l1tf_barrier = -1; >> >> bool __initdata bsp_delay_spec_ctrl; >> uint8_t __read_mostly default_xen_spec_ctrl; >> @@ -91,6 +93,8 @@ static int __in
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/13/19 12:50, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> Guests can issue grant table operations and provide guest controlled >> data to them. This data is also used for memory loads. To avoid >> speculative out-of-bound accesses, we use the array_index_nospec macro >> where applicable. However, there are also memory accesses that cannot >> be protected by a single array protection, or multiple accesses in a >> row. To protect these, a nospec barrier is placed between the actual >> range check and the access via the block_speculation macro. >> >> As different versions of grant tables use structures of different size, >> and the status is encoded in an array for version 2, speculative >> execution might touch zero-initialized structures of version 2 while >> the table is actually using version 1. > Why zero-initialized? Did I still not succeed demonstrating to you > that speculation along a v2 path can actually overrun v1 arrays, > not just access parts with may still be zero-initialized? I believe a speculative v2 access can touch data that has been written by valid v1 accesses before, zero initialized data, or touch the NULL page. Given the macros for the access I do not believe that a v2 access can touch a page that is located behind a page holding valid v1 data. > >> @@ -203,8 +204,9 @@ static inline unsigned int nr_status_frames(const struct >> grant_table *gt) >> } >> >> #define MAPTRACK_PER_PAGE (PAGE_SIZE / sizeof(struct grant_mapping)) >> -#define maptrack_entry(t, e) \ >> -((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) >> +#define maptrack_entry(t, e) >>\ >> +((t)->maptrack[array_index_nospec(e, (t)->maptrack_limit) >>\ >> + >> /MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE]) > I would have hoped that the pointing out of similar formatting > issues elsewhere would have had an impact here as well, but > I see the / is still wrongly at the beginning of a line, and is still > not followed by a blank (would be "preceded" if it was well > placed). And while I realize it's only code movement, adding > the missing blanks around % would be appreciated too at this > occasion. I will move the "/" to the upper line, and add the space around the "%". > >> @@ -963,9 +965,13 @@ map_grant_ref( >> PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", >> op->ref, rgt->domain->domain_id); >> >> +/* Make sure the above check is not bypassed speculatively */ >> +block_speculation(); >> + >> act = active_entry_acquire(rgt, op->ref); >> shah = shared_entry_header(rgt, op->ref); >> -status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, >> op->ref); >> +status = evaluate_nospec(rgt->gt_version == 1) ? &shah->flags >> + : &status_entry(rgt, >> op->ref); > Did you consider folding the two pairs of fences you emit into > one? Moving up the assignment to status ought to achieve this, > as then the block_speculation() could be dropped afaict. > > Then again you don't alter shared_entry_header(). If there's > a reason for you not having done so, then a second fence > here is needed in any event. Instead of the block_speculation() macro, I can also protect the op->ref usage before evaluate_nospec via the array_index_nospec function. > > What about the version check in nr_grant_entries()? It appears > to me as if at least its use in grant_map_exists() (which simply is > the first one I've found) is problematic without an adjustment. > Even worse, ... > >> @@ -1321,7 +1327,8 @@ unmap_common( >> goto unlock_out; >> } >> >> -act = active_entry_acquire(rgt, op->ref); >> +act = active_entry_acquire(rgt, array_index_nospec(op->ref, >> + >> nr_grant_entries(rgt))); > ... you add a use e.g. here to _guard_ against speculation. The adjustment you propose is to exchange the switch statement in nr_grant_entries with a if( evaluate_nospec( gt->gt_version == 1 ), so that the returned values are not speculated? Already before this modification the function is called and not inlined. Do you want me to cache the value in functions that call this method regularly to avoid the penalty of the introduced lfence for each call? > > And what about _set_status(), unmap_common_complete(), > gnttab_grow_table(), gnttab_setup_table(), > release_grant_for_copy(), the 2nd one in acquire_grant_for_copy(), > several ones in gnttab_set_version(), gnttab_release_mappings(), > the 3rd one in mem_sharing_gref_to_gfn(), gnttab_map_frame(), > and gnttab_get_status_frame()? Protecting the function itself should allow to not modify the speculation guards in these functions. I would have to check each of them, whether the guest actually has control, and whether it makes sense to introduce a _nospec variant of the nr_gr
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 3/9] x86/hvm: block speculative out-of-bound accesses
On 2/15/19 09:55, Jan Beulich wrote: On 15.02.19 at 09:05, wrote: >> On 2/12/19 15:14, Jan Beulich wrote: >> On 12.02.19 at 15:05, wrote: On 2/12/19 14:25, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> @@ -4104,6 +4108,12 @@ static int hvmop_set_param( >> if ( a.index >= HVM_NR_PARAMS ) >> return -EINVAL; >> >> +/* >> + * Make sure the guest controlled value a.index is bounded even >> during >> + * speculative execution. >> + */ >> +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); >> + >> d = rcu_lock_domain_by_any_id(a.domid); >> if ( d == NULL ) >> return -ESRCH; >> @@ -4370,6 +4380,12 @@ static int hvmop_get_param( >> if ( a.index >= HVM_NR_PARAMS ) >> return -EINVAL; >> >> +/* >> + * Make sure the guest controlled value a.index is bounded even >> during >> + * speculative execution. >> + */ >> +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); > ... the usefulness of these two. To make forward progress it may > be worthwhile to split off these two changes into a separate patch. > If you're fine with this, I could strip these two before committing, > in which case the remaining change is > Reviewed-by: Jan Beulich Taking apart the commit is fine with me. I will submit a follow up change that does not update the values but fixes the reads. >>> As pointed out during the v5 discussion, I'm unconvinced that if >>> you do so the compiler can't re-introduce the issue via CSE. I'd >>> really like a reliable solution to be determined first. >> I cannot give a guarantee what future compilers might do. Furthermore, I >> do not want to wait until all/most compilers ship with such a >> controllable guarantee. > Guarantee? Future compilers are (hopefully) going to get better at > optimizing, and hence are (again hopefully) going to find more > opportunities for CSE. So the problem is going to get worse rather > than better, and the changes you're proposing to re-instate are > therefore more like false promises. I do not want to dive into compilers future here. I would like to fix the issue for todays compilers now and not wait until compilers evolved one way or another. For this patch, the relevant information is whether it should go in like this, or whether you want me to protect all the reads instead. Is there more data I shall provide to help make this decision? Best, Norbert > > Jan > > Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/15/19 11:34, Jan Beulich wrote: On 15.02.19 at 10:55, wrote: >> On 2/13/19 12:50, Jan Beulich wrote: >> On 08.02.19 at 14:44, wrote: Guests can issue grant table operations and provide guest controlled data to them. This data is also used for memory loads. To avoid speculative out-of-bound accesses, we use the array_index_nospec macro where applicable. However, there are also memory accesses that cannot be protected by a single array protection, or multiple accesses in a row. To protect these, a nospec barrier is placed between the actual range check and the access via the block_speculation macro. As different versions of grant tables use structures of different size, and the status is encoded in an array for version 2, speculative execution might touch zero-initialized structures of version 2 while the table is actually using version 1. >>> Why zero-initialized? Did I still not succeed demonstrating to you >>> that speculation along a v2 path can actually overrun v1 arrays, >>> not just access parts with may still be zero-initialized? >> I believe a speculative v2 access can touch data that has been written >> by valid v1 accesses before, zero initialized data, or touch the NULL >> page. Given the macros for the access I do not believe that a v2 access >> can touch a page that is located behind a page holding valid v1 data. > I've given examples before of how I see this to be possible. Would > you mind going back to one of the instances, and explaining to me > how you do _not_ see any room for an overrun there? Having > given examples, I simply don't know how else I can explain this to > you without knowing at what specific part of the explanation we > diverge. (And no, I'm not excluding that I'm making up an issue > where there is none.) What we want to real out is that the actually use version1, while speculation might use version2, right? I hope you refer to this example of your earlier email. On 1/29/19 16:11, Jan Beulich wrote: > Let's look at an example: gref 256 points into the middle of > the first page when using v1 calculations, but at the start > of the second page when using v2 calculations. Hence, if the > maximum number of grant frames was 1, we'd overrun the > array, consisting of just a single element (256 is valid as a > v1 gref in that case, but just out of bounds as a v2 one). From how I read your example and my explanation, the key difference is in the size of the shared_raw array. In case gref 256 is a valid v1 handle, then the shared_raw array has space for at least 256 entries, as shared_raw was allocated for the number of requested entries. The access to shared_raw is controlled with the macro shared_entry_v2: 222 #define SHGNT_PER_PAGE_V2 (PAGE_SIZE / sizeof(grant_entry_v2_t)) 223 #define shared_entry_v2(t, e) \ 224 ((t)->shared_v2[(e)/SHGNT_PER_PAGE_V2][(e)%SHGNT_PER_PAGE_V2]) Since the direct access to the shared_v2 array depends on the SHGNT_PER_PAGE_V2 value, this has to be less than the size of that array. Hence, shared_raw will not be overrun (neither for version 1 nor version 2). However, this division might result in accessing an element of shared_raw that has not been initialized by version1 before. However, right after allocation, shared_raw is zero initialized. Hence, this might result in an access of the NULL page. The second access in the macro allows to access only a single page, as the value e is bound to the elements per page of the correct version (the version 1 macro uses the corresponding value for the modulo operation). Either, this refers to the NULL page, or it refers to a page that has been initialized by version1 (partially). I do not see how an out-of-bound access would be possible there. @@ -963,9 +965,13 @@ map_grant_ref( PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref %#x for d%d\n", op->ref, rgt->domain->domain_id); +/* Make sure the above check is not bypassed speculatively */ +block_speculation(); + act = active_entry_acquire(rgt, op->ref); shah = shared_entry_header(rgt, op->ref); -status = rgt->gt_version == 1 ? &shah->flags : &status_entry(rgt, op->ref); +status = evaluate_nospec(rgt->gt_version == 1) ? &shah->flags + : &status_entry(rgt, op->ref); >>> Did you consider folding the two pairs of fences you emit into >>> one? Moving up the assignment to status ought to achieve this, >>> as then the block_speculation() could be dropped afaict. >>> >>> Then again you don't alter shared_entry_header(). If there's >>> a reason for you not having done so, then a second fence >>> here is needed in any event. >> Instead of the block_speculation() macro, I can also protect the op->ref >> usage before evaluate_nospec via the array_index_nospec function. > That's an option (as before), but doesn't help
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 3/9] x86/hvm: block speculative out-of-bound accesses
On 2/15/19 12:46, Jan Beulich wrote: On 15.02.19 at 11:50, wrote: >> On 2/15/19 09:55, Jan Beulich wrote: >> On 15.02.19 at 09:05, wrote: On 2/12/19 15:14, Jan Beulich wrote: On 12.02.19 at 15:05, wrote: >> On 2/12/19 14:25, Jan Beulich wrote: >> On 08.02.19 at 14:44, wrote: @@ -4104,6 +4108,12 @@ static int hvmop_set_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); + d = rcu_lock_domain_by_any_id(a.domid); if ( d == NULL ) return -ESRCH; @@ -4370,6 +4380,12 @@ static int hvmop_get_param( if ( a.index >= HVM_NR_PARAMS ) return -EINVAL; +/* + * Make sure the guest controlled value a.index is bounded even during + * speculative execution. + */ +a.index = array_index_nospec(a.index, HVM_NR_PARAMS); >>> ... the usefulness of these two. To make forward progress it may >>> be worthwhile to split off these two changes into a separate patch. >>> If you're fine with this, I could strip these two before committing, >>> in which case the remaining change is >>> Reviewed-by: Jan Beulich >> Taking apart the commit is fine with me. I will submit a follow up >> change that does not update the values but fixes the reads. > As pointed out during the v5 discussion, I'm unconvinced that if > you do so the compiler can't re-introduce the issue via CSE. I'd > really like a reliable solution to be determined first. I cannot give a guarantee what future compilers might do. Furthermore, I do not want to wait until all/most compilers ship with such a controllable guarantee. >>> Guarantee? Future compilers are (hopefully) going to get better at >>> optimizing, and hence are (again hopefully) going to find more >>> opportunities for CSE. So the problem is going to get worse rather >>> than better, and the changes you're proposing to re-instate are >>> therefore more like false promises. >> I do not want to dive into compilers future here. I would like to fix >> the issue for todays compilers now and not wait until compilers evolved >> one way or another. For this patch, the relevant information is whether >> it should go in like this, or whether you want me to protect all the >> reads instead. Is there more data I shall provide to help make this >> decision? > I understand that you're not happy with what I've said, and you're > unlikely to become any happier with what I'll add. But please > understand that _if_ we make any changes to address issues with > speculation, the goal has to be that we don't have to come back > an re-investigate after every new compiler release. > > Even beyond that - if, as you say, we'd limit ourselves to current > compilers, did you check that all of them at any optimization level > or with any other flags passed which may affect code generation > produce non-vulnerable code? And in particular considering the > case here never recognize CSE potential where we would like them > not to? > > A code change is, imo, not even worthwhile considering to be put > in if it is solely based on the observations made with a limited set > of compilers and/or options. This might indeed help you, if you > care only about one specific environment. But by putting this in > (and perhaps even backporting it) we're sort of stating that the > issue is under control (to the best of our abilities, and for the given > area of code). For everyone. I do not see how a fix for problems like the discussed one could enter the code base given the above conditions. However, for this very specific fix, there fortunately is a comparison wrt a constant, and there are many instructions until the potential speculative out-of-bound access might happen, so that not fixing the two above access is fine for me. While I cannot guarantee that it is not possible, we did not manage to come up with a PoC for these two places with the effort we put into this. > So, to answer your question: From what we know, we simply > can't take a decision, at least not between the two proposed > variants of how to change the code. If there was a variant that > firmly worked, then there would not even be a need for any > discussion. And again from what we know, there is one > requirement that need to be fulfilled for a change to be > considered "firmly working": The index needs to be in a register. > There must not be a way for the compiler to undermine this, > be it by CSE or any other means. > > Considering changes done elsewhere, of course this may be > taken with a grain of salt. In other places we als
Re: [Xen-devel] [PATCH SpectreV1+L1TF v6 8/9] common/grant_table: block speculative out-of-bound accesses
On 2/18/19 17:08, Jan Beulich wrote: On 18.02.19 at 14:49, wrote: >> On 2/15/19 11:34, Jan Beulich wrote: >> On 15.02.19 at 10:55, wrote: On 2/13/19 12:50, Jan Beulich wrote: On 08.02.19 at 14:44, wrote: >> Guests can issue grant table operations and provide guest controlled >> data to them. This data is also used for memory loads. To avoid >> speculative out-of-bound accesses, we use the array_index_nospec macro >> where applicable. However, there are also memory accesses that cannot >> be protected by a single array protection, or multiple accesses in a >> row. To protect these, a nospec barrier is placed between the actual >> range check and the access via the block_speculation macro. >> >> As different versions of grant tables use structures of different size, >> and the status is encoded in an array for version 2, speculative >> execution might touch zero-initialized structures of version 2 while >> the table is actually using version 1. > Why zero-initialized? Did I still not succeed demonstrating to you > that speculation along a v2 path can actually overrun v1 arrays, > not just access parts with may still be zero-initialized? I believe a speculative v2 access can touch data that has been written by valid v1 accesses before, zero initialized data, or touch the NULL page. Given the macros for the access I do not believe that a v2 access can touch a page that is located behind a page holding valid v1 data. >>> I've given examples before of how I see this to be possible. Would >>> you mind going back to one of the instances, and explaining to me >>> how you do _not_ see any room for an overrun there? Having >>> given examples, I simply don't know how else I can explain this to >>> you without knowing at what specific part of the explanation we >>> diverge. (And no, I'm not excluding that I'm making up an issue >>> where there is none.) >> What we want to real out is that the actually use version1, while >> speculation might use version2, right? I hope you refer to this example >> of your earlier email. >> >> On 1/29/19 16:11, Jan Beulich wrote: >>> Let's look at an example: gref 256 points into the middle of >>> the first page when using v1 calculations, but at the start >>> of the second page when using v2 calculations. Hence, if the >>> maximum number of grant frames was 1, we'd overrun the >>> array, consisting of just a single element (256 is valid as a >>> v1 gref in that case, but just out of bounds as a v2 one). >> From how I read your example and my explanation, the key difference is >> in the size of the shared_raw array. In case gref 256 is a valid v1 >> handle, then the shared_raw array has space for at least 256 entries, as >> shared_raw was allocated for the number of requested entries. The access >> to shared_raw is controlled with the macro shared_entry_v2: >> 222 #define SHGNT_PER_PAGE_V2 (PAGE_SIZE / sizeof(grant_entry_v2_t)) >> 223 #define shared_entry_v2(t, e) \ >> 224 ((t)->shared_v2[(e)/SHGNT_PER_PAGE_V2][(e)%SHGNT_PER_PAGE_V2]) >> Since the direct access to the shared_v2 array depends on the >> SHGNT_PER_PAGE_V2 value, this has to be less than the size of that >> array. Hence, shared_raw will not be overrun (neither for version 1 nor >> version 2). However, this division might result in accessing an element >> of shared_raw that has not been initialized by version1 before. However, >> right after allocation, shared_raw is zero initialized. Hence, this >> might result in an access of the NULL page. > The question is: How much of shared_raw[] will be zero-initialized? > The example I've given uses relatively small grant reference values, > so for the purpose here let's assume gt->max_grant_frames is 1. > In this case shared_raw[] is exactly one entry in size. Hence the > speculative access you describe will not necessarily access the NULL > page. > > Obviously the same issue exists with higher limits and higher grant > reference numbers. The solution to this problem is really simple, I mixed up grant frames and grant entries. I agree that shared_raw can be accessed out-of-bounds and should be protected. I will adapt the commit message accordingly, and revise the modifications I added to the code base. > >> @@ -1321,7 +1327,8 @@ unmap_common( >> goto unlock_out; >> } >> >> -act = active_entry_acquire(rgt, op->ref); >> +act = active_entry_acquire(rgt, array_index_nospec(op->ref, >> + nr_grant_entries(rgt))); > ... you add a use e.g. here to _guard_ against speculation. The adjustment you propose is to exchange the switch statement in nr_grant_entries with a if( evaluate_nospec( gt->gt_version == 1 ), so that the returned values are not speculated? >>> At this point I'm not proposing a particular solution. I'm just >>> putting on the table an is
[Xen-devel] [PATCH SpectreV1+L1TF v7 1/9] xen/evtchn: block speculative out-of-bound accesses
Guests can issue event channel interaction with guest specified data. To avoid speculative out-of-bound accesses, we use the nospec macros, or the domain_vcpu function. Where appropriate, we use the vcpu_id of the seleceted vcpu instead of the parameter that can be influenced by the guest, so that only one access needs to be protected. This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey --- Notes: v7: mention speculative hardening in commit message explain preferred use of internal data in commit message drop update in set_global_virq_handler xen/common/event_channel.c | 29 ++--- xen/common/event_fifo.c| 13 ++--- xen/include/xen/event.h| 5 +++-- 3 files changed, 31 insertions(+), 16 deletions(-) diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -365,11 +365,16 @@ int evtchn_bind_virq(evtchn_bind_virq_t *bind, evtchn_port_t port) if ( (virq < 0) || (virq >= ARRAY_SIZE(v->virq_to_evtchn)) ) return -EINVAL; + /* +* Make sure the guest controlled value virq is bounded even during +* speculative execution. +*/ +virq = array_index_nospec(virq, ARRAY_SIZE(v->virq_to_evtchn)); + if ( virq_is_global(virq) && (vcpu != 0) ) return -EINVAL; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - ((v = d->vcpu[vcpu]) == NULL) ) +if ( (v = domain_vcpu(d, vcpu)) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -418,8 +423,7 @@ static long evtchn_bind_ipi(evtchn_bind_ipi_t *bind) intport, vcpu = bind->vcpu; long rc = 0; -if ( (vcpu < 0) || (vcpu >= d->max_vcpus) || - (d->vcpu[vcpu] == NULL) ) +if ( domain_vcpu(d, vcpu) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -813,6 +817,7 @@ int set_global_virq_handler(struct domain *d, uint32_t virq) if (virq >= NR_VIRQS) return -EINVAL; + if (!virq_is_global(virq)) return -EINVAL; @@ -930,8 +935,10 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) struct domain *d = current->domain; struct evtchn *chn; long rc = 0; +struct vcpu *v; -if ( (vcpu_id >= d->max_vcpus) || (d->vcpu[vcpu_id] == NULL) ) +/* Use the vcpu info to prevent speculative out-of-bound accesses */ +if ( (v = domain_vcpu(d, vcpu_id)) == NULL ) return -ENOENT; spin_lock(&d->event_lock); @@ -955,22 +962,22 @@ long evtchn_bind_vcpu(unsigned int port, unsigned int vcpu_id) { case ECS_VIRQ: if ( virq_is_global(chn->u.virq) ) -chn->notify_vcpu_id = vcpu_id; +chn->notify_vcpu_id = v->vcpu_id; else rc = -EINVAL; break; case ECS_UNBOUND: case ECS_INTERDOMAIN: -chn->notify_vcpu_id = vcpu_id; +chn->notify_vcpu_id = v->vcpu_id; break; case ECS_PIRQ: -if ( chn->notify_vcpu_id == vcpu_id ) +if ( chn->notify_vcpu_id == v->vcpu_id ) break; unlink_pirq_port(chn, d->vcpu[chn->notify_vcpu_id]); -chn->notify_vcpu_id = vcpu_id; +chn->notify_vcpu_id = v->vcpu_id; pirq_set_affinity(d, chn->u.pirq.irq, - cpumask_of(d->vcpu[vcpu_id]->processor)); -link_pirq_port(port, chn, d->vcpu[vcpu_id]); + cpumask_of(v->processor)); +link_pirq_port(port, chn, v); break; default: rc = -EINVAL; diff --git a/xen/common/event_fifo.c b/xen/common/event_fifo.c --- a/xen/common/event_fifo.c +++ b/xen/common/event_fifo.c @@ -33,7 +33,8 @@ static inline event_word_t *evtchn_fifo_word_from_port(const struct domain *d, */ smp_rmb(); -p = port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; +p = array_index_nospec(port / EVTCHN_FIFO_EVENT_WORDS_PER_PAGE, + d->evtchn_fifo->num_evtchns); w = port % EVTCHN_FIFO_EVENT_WORDS_PER_PAGE; return d->evtchn_fifo->event_array[p] + w; @@ -516,14 +517,20 @@ int evtchn_fifo_init_control(struct evtchn_init_control *init_control) gfn = init_control->control_gfn; offset = init_control->offset; -if ( vcpu_id >= d->max_vcpus || !d->vcpu[vcpu_id] ) +if ( (v = domain_vcpu(d, vcpu_id)) == NULL ) return -ENOENT; -v = d->vcpu[vcpu_id]; /* Must not cross page boundary. */ if ( offset > (PAGE_SIZE - sizeof(evtchn_fifo_control_block_t)) ) return -EINVAL; +/* + * Make sure the guest controlled value offset is bounded even during + * speculative execution. + */ +offset = array_index_nospec(offset, + PAGE_SIZE - s
[Xen-devel] [PATCH SpectreV1+L1TF v7 2/9] x86/vioapic: block speculative out-of-bound accesses
When interacting with io apic, a guest can specify values that are used as index to structures, and whose values are not compared against upper bounds to prevent speculative out-of-bound accesses. This change prevents these speculative accesses. Furthermore, variables are initialized and the compiler is asked to not optimized these initializations, as the uninitialized variables might be used in a speculative out-of-bound access. Out of the four initialized variables, two are potentially problematic, namely ones in the functions vioapic_irq_positive_edge and vioapic_get_trigger_mode. As the two problematic variables are both used in the common function gsi_vioapic, the mitigation is implemented there. As the access pattern of the currently non-guest-controlled functions might change in the future as well, the other variables are initialized as well. This is part of the speculative hardening effort. Signed-off-by: Norbert Manthey --- Notes: v7: mention speculative hardening in commit message fix comment typo drop 'guest controlled' from commit message xen/arch/x86/hvm/vioapic.c | 28 ++-- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -66,6 +67,12 @@ static struct hvm_vioapic *gsi_vioapic(const struct domain *d, { unsigned int i; +/* + * Make sure the compiler does not optimize away the initialization done by + * callers + */ +OPTIMIZER_HIDE_VAR(*pin); + for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) { struct hvm_vioapic *vioapic = domain_vioapic(d, i); @@ -117,7 +124,8 @@ static uint32_t vioapic_read_indirect(const struct hvm_vioapic *vioapic) break; } -redir_content = vioapic->redirtbl[redir_index].bits; +redir_content = vioapic->redirtbl[array_index_nospec(redir_index, + vioapic->nr_pins)].bits; result = (vioapic->ioregsel & 1) ? (redir_content >> 32) : redir_content; break; @@ -212,7 +220,15 @@ static void vioapic_write_redirent( struct hvm_irq *hvm_irq = hvm_domain_irq(d); union vioapic_redir_entry *pent, ent; int unmasked = 0; -unsigned int gsi = vioapic->base_gsi + idx; +unsigned int gsi; + +/* Callers of this function should make sure idx is bounded appropriately */ +ASSERT(idx < vioapic->nr_pins); + +/* Make sure no out-of-bounds value for idx can be used */ +idx = array_index_nospec(idx, vioapic->nr_pins); + +gsi = vioapic->base_gsi + idx; spin_lock(&d->arch.hvm.irq_lock); @@ -467,7 +483,7 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin) void vioapic_irq_positive_edge(struct domain *d, unsigned int irq) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ struct hvm_vioapic *vioapic = gsi_vioapic(d, irq, &pin); union vioapic_redir_entry *ent; @@ -542,7 +558,7 @@ void vioapic_update_EOI(struct domain *d, u8 vector) int vioapic_get_mask(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) @@ -553,7 +569,7 @@ int vioapic_get_mask(const struct domain *d, unsigned int gsi) int vioapic_get_vector(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) @@ -564,7 +580,7 @@ int vioapic_get_vector(const struct domain *d, unsigned int gsi) int vioapic_get_trigger_mode(const struct domain *d, unsigned int gsi) { -unsigned int pin; +unsigned int pin = 0; /* See gsi_vioapic */ const struct hvm_vioapic *vioapic = gsi_vioapic(d, gsi, &pin); if ( !vioapic ) -- 2.7.4 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich Ust-ID: DE 289 237 879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel