gt;max_apic_id
> and also checking whether or not map->phys_map[min + i] is NULL since the
> max_apic_id
> is set according to the max apic id, however, some phys_map maybe NULL when
> apic id
> is sparse, in addition, kvm also unconditionally set max_apic_id to 255 to
> r
> On 29 Aug 2018, at 13:29, Dan Carpenter wrote:
>
> On Wed, Aug 29, 2018 at 06:23:08PM +0800, Wanpeng Li wrote:
>> On Wed, 29 Aug 2018 at 18:18, Dan Carpenter wrote:
>>>
>>> On Wed, Aug 29, 2018 at 01:12:05PM +0300, Dan Carpenter wrote:
>>>> On
> On 21 Aug 2018, at 12:57, David Woodhouse wrote:
>
> Another alternative... I'm told POWER8 does an interesting thing with
> hyperthreading and gang scheduling for KVM. The host kernel doesn't
> actually *see* the hyperthreads at all, and KVM just launches the full
> set of siblings when it e
> On 21 Aug 2018, at 17:22, David Woodhouse wrote:
>
> On Tue, 2018-08-21 at 17:01 +0300, Liran Alon wrote:
>>
>>> On 21 Aug 2018, at 12:57, David Woodhouse
>> wrote:
>>>
>>> Another alternative... I'm told POWER8 does an interesting
- dw...@infradead.org wrote:
> On Sun, 2018-01-21 at 14:27 -0800, Linus Torvalds wrote:
> > On Sun, Jan 21, 2018 at 2:00 PM, David Woodhouse
> wrote:
> > >>
> > >> The patches do things like add the garbage MSR writes to the
> kernel
> > >> entry/exit points. That's insane. That says "we're
- alexander.le...@microsoft.com wrote:
> From: Liran Alon
>
> [ Upstream commit ac9b305caa0df6f5b75d294e4b86c1027648991e ]
>
> When running L2, #UD should be intercepted by L1 or just forwarded
> directly to L2. It should not reach L0 x86 emulator.
> Therefore, set i
- vkuzn...@redhat.com wrote:
> I was investigating an issue with seabios >= 1.10 which stopped
> working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction()
- dave.han...@intel.com wrote:
> On 01/23/2018 03:13 AM, Liran Alon wrote:
> > Therefore, breaking KASLR. In order to handle this, every exit from
> > kernel-mode to user-mode should stuff RSB. In addition, this
> stuffing
> > of RSB may need to be done from a fixed
- dave.han...@intel.com wrote:
> On 01/25/2018 06:11 PM, Liran Alon wrote:
> > It is true that attacker cannot speculate to a kernel-address, but
> it
> > doesn't mean it cannot use the leaked kernel-address together with
> > another unrelated vulnerabilit
- d...@amazon.co.uk wrote:
> On Wed, 2018-01-10 at 10:41 -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jan 10, 2018 at 03:28:43PM +0100, Paolo Bonzini wrote:
> > > On 10/01/2018 15:06, Arjan van de Ven wrote:
> > > > On 1/10/2018 5:20 AM, Paolo Bonzini wrote:
> > > >> * a simple specificati
- dw...@infradead.org wrote:
> On Wed, 2018-01-10 at 08:19 -0800, Liran Alon wrote:
> >
> > (1) On VMEntry, Intel recommends to just restore SPEC_CTRL to guest
> > value (using WRMSR or MSR save/load list) and that's it. As I
> > previously said to Jim, I a
- karah...@amazon.de wrote:
> From: Ashok Raj
>
> Add MSR passthrough for MSR_IA32_PRED_CMD and place branch predictor
> barriers on switching between VMs to avoid inter VM specte-v2
> attacks.
>
> [peterz: rebase and changelog rewrite]
> [dwmw2: fixes]
> [karahmed: - vmx: expose PRED_CMD
- dw...@infradead.org wrote:
> On Sun, 2018-01-28 at 15:21 -0500, Konrad Rzeszutek Wilk wrote:
> > >To avoid the overhead of atomically saving and restoring the
> MSR_IA32_SPEC_CTRL
> > >for guests that do not actually use the MSR, only
> add_atomic_switch_msr when a
> > >non-zero is written
(1UL << 0)
+
#define MSR_PPIN_CTL 0x004e
#define MSR_PPIN 0x004f
Trivially,
Reviewed-by: Liran Alon
On 08/01/18 20:08, Paolo Bonzini wrote:
As an interim measure until SPEC_CTRL is supported by upstream
Linux in cpufeatures, add a function that lets vmx.c and svm.c
know whether to save/restore MSR_IA32_SPEC_CTRL.
Signed-off-by: Paolo Bonzini
---
arch/x86/kvm/cpuid.c | 3 ---
arch/x86/kv
On 08/01/18 20:08, Paolo Bonzini wrote:
Direct access to MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD is important
for performance. Allow load/store of MSR_IA32_SPEC_CTRL, restore guest
IBRS on VM entry and set it to 0 on VM exit (because Linux does not use
it yet).
Signed-off-by: Paolo Bonzini
ion.
+*/
+ if (have_spec_ctrl)
+ wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
}
static void vmx_nested_free_vmcs02(struct vcpu_vmx *vmx)
Reviewed-by: Liran Alon
On 08/01/18 20:08, Paolo Bonzini wrote:
Direct access to MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD is important
for performance. Allow load/store of MSR_IA32_SPEC_CTRL, restore guest
IBRS on VM entry and set it to 0 on VM exit (because Linux does not use
it yet).
Signed-off-by: Paolo Bonzini
On 08/01/18 20:08, Paolo Bonzini wrote:
From: Tom Lendacky
Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
going to run a VCPU different from what was previously run. Nested
virtualization uses the same VMCB for the second level guest, but the
L1 hypervisor should be us
+++ b/arch/x86/kvm/x86.c
@@ -1032,6 +1032,7 @@ unsigned int kvm_get_pt_addr_cnt(void)
MSR_IA32_RTIT_ADDR1_A, MSR_IA32_RTIT_ADDR1_B,
MSR_IA32_RTIT_ADDR2_A, MSR_IA32_RTIT_ADDR2_B,
MSR_IA32_RTIT_ADDR3_A, MSR_IA32_RTIT_ADDR3_B,
+ MSR_IA32_SPEC_CTRL,
};
static unsigned num_msrs_to_save;
Reviewed-by: Liran Alon
On 08/01/18 21:18, Jim Mattson wrote:
Guest usage of MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD should be
predicated on guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL).
On Mon, Jan 8, 2018 at 10:08 AM, Paolo Bonzini wrote:
Direct access to MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD is important
for
- pbonz...@redhat.com wrote:
> - Original Message -
> > From: "David Woodhouse"
> > To: "Paolo Bonzini" ,
> linux-kernel@vger.kernel.org, k...@vger.kernel.org
> > Cc: jmatt...@google.com, aligu...@amazon.com, "thomas lendacky"
> , b...@alien8.de
> > Sent: Monday, January 8, 2018 8:41
- haozhong.zh...@intel.com wrote:
> On 01/07/18 00:26 -0700, Ross Zwisler wrote:
> > On Wed, Aug 23, 2017 at 10:21 PM, Wanpeng Li
> wrote:
> > > From: Wanpeng Li
> > >
> > > vmx_complete_interrupts() assumes that the exception is always
> injected,
> > > so it would be dropped by kvm_clear_
- pbonz...@redhat.com wrote:
> On 08/01/2018 21:00, Liran Alon wrote:
> >
> >
> > On 08/01/18 20:08, Paolo Bonzini wrote:
> >> From: Tom Lendacky
> >>
> >> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU
> is
> >
t(7, 0, &eax, &ebx, &ecx, &edx);
> +
> + return edx & bit(KVM_CPUID_BIT_SPEC_CTRL);
> +}
> +
> +static inline bool cpu_has_ibpb_support(void)
> +{
> + return cpuid_ebx(0x8008) & bit(KVM_CPUID_BIT_IBPB_SUPPORT);
> +}
> +
> static inline bool supports_cpuid_fault(struct kvm_vcpu *vcpu)
> {
> return vcpu->arch.msr_platform_info &
> MSR_PLATFORM_INFO_CPUID_FAULT;
> --
> 1.8.3.1
Reviewed-by: Liran Alon
{ .index = MSR_IA32_SYSENTER_CS, .always = true },
> --
> 1.8.3.1
Reviewed-by: Liran Alon
- ar...@linux.intel.com wrote:
> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> > The above ("IBRS simply disables the indirect branch predictor") was
> my
> > take-away message from private discussion with Intel. My guess is
> that
> > the vendors are just handwaving a spec that doesn't match
- ar...@linux.intel.com wrote:
> On 1/9/2018 7:00 AM, Liran Alon wrote:
> >
> > - ar...@linux.intel.com wrote:
> >
> >> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> >>> The above ("IBRS simply disables the indirect branch predictor"
> + (!msr_info->host_initiated &&
> + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)))
> + return 1;
> to_vmx(vcpu)->spec_ctrl = data;
> break;
> case MSR_IA32_CR_PAT:
> --
> 1.8.3.1
Reviewed-by: Liran Alon
The only thing I a bit dislike is that currently these MSRs are always
pass-through to guest and therefore
there is no case vmx_set_msr() is called with !msr_info->host_initiated.
Don't you think we should BUG_ON(!msr_info->host_initiated)?
-Liran
- ar...@linux.intel.com wrote:
> >> I'm sorry I'm not familiar with your L0/L1/L2 terminology
> >> (maybe it's before coffee has had time to permeate the brain)
> >
> > These are standard terminology for guest levels:
> > L0 == hypervisor that runs on bare-metal
> > L1 == hypervisor that run
/*
> + * Speculative execution past the above wrmsrl might encounter
> + * an indirect branch and use guest-controlled contents of the
> + * indirect branch predictor; block it.
> + */
> + asm("lfence");
> +
> /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed
> */
> if (vmx->host_debugctlmsr)
> update_debugctlmsr(vmx->host_debugctlmsr);
> --
> 1.8.3.1
Reviewed-by: Liran Alon
dmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
> + if (svm->spec_ctrl != 0)
> + wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
> + /*
> + * Speculative execution past the above wrmsrl might encounter
> + * an indirect branch and use guest-controlled contents of the
> + * indirect branch predictor; block it.
> + */
> + asm("lfence");
> +
> #ifdef CONFIG_X86_64
> wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> #else
> --
> 1.8.3.1
Reviewed-by: Liran Alon
static void svm_vcpu_run(struct kvm_vcpu
> > *vcpu)
> > #endif
> > );
> >
> > + if (have_spec_ctrl) {
> > + rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl);
> > + if (svm->spec_ctrl != 0)
In second thought, I think this condition is a bug.
Intel explicitly specified that after a #VMExit: Set IBRS bit even if it was
already set.
Therefore, you should remove the "if (svm->spec_ctrl != 0)" condition here.
Otherwise, guest BTB/BHB could be used by host.
> > + wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > + }
> > + /*
> > +* Speculative execution past the above wrmsrl might encounter
> > +* an indirect branch and use guest-controlled contents of the
> > +* indirect branch predictor; block it.
> > +*/
> > + asm("lfence");
> > +
> > #ifdef CONFIG_X86_64
> > wrmsrl(MSR_GS_BASE, svm->host.gs_base);
> > #else
> > --
> > 1.8.3.1
>
> Reviewed-by: Liran Alon
{
> > + rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
> > + if (vmx->spec_ctrl != 0)
As I said also on the AMD patch, I think this is a bug.
Intel specify that we should set IBRS bit even if it was already set on every
#VMExit.
-Liran
> > + wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > + }
> > + /*
> > +* Speculative execution past the above wrmsrl might encounter
> > +* an indirect branch and use guest-controlled contents of the
> > +* indirect branch predictor; block it.
> > +*/
> > + asm("lfence");
> > +
> > /* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed
> > */
> > if (vmx->host_debugctlmsr)
> > update_debugctlmsr(vmx->host_debugctlmsr);
> > --
> > 1.8.3.1
>
> Reviewed-by: Liran Alon
- pbonz...@redhat.com wrote:
> On 09/01/2018 17:48, Liran Alon wrote:
> >>>
> >>> + if (have_spec_ctrl) {
> >>> + rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl);
> >>> + if (vmx->spec_ctrl != 0)
> >>> +
LATE_USER_EXIT)
> - return 0;
> - if (er != EMULATE_DONE)
> - kvm_queue_exception(vcpu, UD_VECTOR);
> - return 1;
> - }
> + if (is_invalid_opcode(intr_info))
> + return handle_ud(vcpu);
>
> error_code = 0;
> if (intr_info & INTR_INFO_DELIVER_CODE_MASK)
> --
> 2.7.4
Reviewed-By: Liran Alon
- kernel...@gmail.com wrote:
> From: Wanpeng Li
>
> This patch introduces a Force Emulation Prefix (ud2a; .ascii "kvm")
> for
> "emulate the next instruction", the codes will be executed by emulator
>
> instead of processor, for testing purposes.
I think this should be better explained i
- pbonz...@redhat.com wrote:
> On 27/03/2018 09:52, Liran Alon wrote:
> > In addition, I think this module parameter should be in kvm module
> > (not kvm_intel) and you should add similar logic to kvm_amd module
> (SVM)
>
> If you can move handle_ud to x86.c, then it
On 20/03/18 16:47, David Miller wrote:
From: Liran Alon
Date: Tue, 13 Mar 2018 17:07:22 +0200
Before this commit, dev_forward_skb() always cleared packet's
per-network-namespace info. Even if the packet doesn't cross
network namespaces.
There was a lot of discussion about
On 20/03/18 18:00, David Miller wrote:
From: Liran Alon
Date: Tue, 20 Mar 2018 17:34:38 +0200
I personally don't understand why we should maintain
backwards-comparability to this behaviour.
The reason is because not breaking things is a cornerstone of Linux
kernel development.
On 20/03/18 18:34, David Miller wrote:
From: Liran Alon
Date: Tue, 20 Mar 2018 18:11:49 +0200
1. Do we want to make a flag for every bug that is user-space visible?
I think there is place for consideration on a per-case basis. I still
don't see how a user can utilize this behaviour.
On 20/03/18 18:24, ebied...@xmission.com wrote:
I don't believe the current behavior is a bug.
I looked through the history. Basically skb_scrub_packet
started out as the scrubbing needed for crossing network
namespaces.
Then tunnels which needed 90% of the functionality started
calling it,
On 20/03/18 20:51, valdis.kletni...@vt.edu wrote:
On Tue, 20 Mar 2018 18:39:47 +0200, Liran Alon said:
What is your opinion in regards if it's OK to put the flag enabling this
"fix" in /proc/sys/net/core? Do you think it's sufficient?
Umm.. *which* /proc/sys/net/core?
l print nothing to dmesg.
After this change, "skb->mark 1337!" will be printed as necessary.
Signed-off-by: Liran Alon
Reviewed-by: Yuval Shaia
Signed-off-by: Yuval Shaia
---
include/linux/netdevice.h | 2 +-
net/core/dev.c| 6 +++---
2 files changed, 4 insertion
atch was applied… Thanks.
Reviewed-by: Liran Alon
> ---
> arch/x86/kvm/vmx.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 4555077d69ce..be6f13f1c25f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -
ive TSC offset to aid debugging. The VMX code is changed to
> look more similar to SVM, which is in my opinion nicer.
>
> Based on a patch by Liran Alon.
>
> Signed-off-by: Paolo Bonzini
I would have applied this refactoring change on top of my original version of
this patch. Easier to
> On 17 Nov 2018, at 0:09, syzbot
> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:006aa39cddee kmsan: don't instrument fixup_bad_iret()
> git tree:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_google_kmsan.git_master&d=DwIBaQ&c=Ro
- kernel...@gmail.com wrote:
> From: Wanpeng Li
>
> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
> It should be checked when PCIDE bit is not set, however commit
> 'd1cd3ce900441 ("KVM: MMU: check guest CR3 reserved bits based on
> its physical address width")' remove
- kernel...@gmail.com wrote:
> From: Wanpeng Li
>
> SDM volume 3, section 4.10.4:
>
> * MOV to CR3. The behavior of the instruction depends on the value of
> CR4.PCIDE:
> — If CR4.PCIDE = 1 and bit 63 of the instruction’s source operand is
> 1, the
> instruction is not required to inval
- kernel...@gmail.com wrote:
> 2018-05-13 15:53 GMT+08:00 Liran Alon :
> >
> > - kernel...@gmail.com wrote:
> >
> >> From: Wanpeng Li
> >>
> >> MSB of CR3 is a reserved bit if the PCIDE bit is not set in CR4.
> >> It sh
- kernel...@gmail.com wrote:
> 2018-05-13 16:28 GMT+08:00 Liran Alon :
> >
> > - kernel...@gmail.com wrote:
> >
> >> 2018-05-13 15:53 GMT+08:00 Liran Alon :
> >> >
> >> > - kernel...@gmail.com wrote:
> >> >
> >&g
o Bonzini
> Cc: Radim Krčmář
> Cc: Junaid Shahid
> Cc: Liran Alon
> Signed-off-by: Wanpeng Li
> ---
> v1 -> v2:
> * remove CR3_PCID_INVD in rsvd when PCIDE is 1 instead of
>removing CR3_PCID_INVD in new_value
>
> arch/x86/kvm/emulate.c | 4 +++-
> arch/
Sync both unicast and multicast lists instead of unicast twice.
Fixes: cfc80d9a116 ("net: Introduce net_failover driver")
Reviewed-by: Joao Martins
Signed-off-by: Liran Alon
---
drivers/net/net_failover.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/d
int kvm_vcpu_ioctl_enable_cap(struct
> kvm_vcpu *vcpu,
> return -EINVAL;
> return kvm_hv_activate_synic(vcpu, cap->cap ==
> KVM_CAP_HYPERV_SYNIC2);
> + case KVM_CAP_HYPERV_ENLIGHTENED_VMCS:
> + r = kvm_x86_ops->nested_enable_evmcs(vcpu, &vmcs_version);
> + if (!r) {
> + user_ptr = (void __user *)(uintptr_t)cap->args[0];
> + if (copy_to_user(user_ptr, &vmcs_version,
> + sizeof(vmcs_version)))
> + r = -EFAULT;
> + }
> + return r;
> +
> default:
> return -EINVAL;
> }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6270a3b38e9..5c4b79c1af19 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_GET_MSR_FEATURES 153
> #define KVM_CAP_HYPERV_EVENTFD 154
> #define KVM_CAP_HYPERV_TLBFLUSH 155
> +#define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 156
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.14.4
Besides above comments,
Reviewed-By: Liran Alon
}
> + return 1;
> +}
> +
> /* Emulate the VMPTRST instruction */
> static int handle_vmptrst(struct kvm_vcpu *vcpu)
> {
> @@ -8858,6 +8936,9 @@ static int handle_vmptrst(struct kvm_vcpu
> *vcpu)
> if (!nested_vmx_check_permission(vcpu))
> return 1;
>
> + if (unlikely(to_vmx(vcpu)->nested.hv_evmcs))
> + return 1;
> +
> if (get_vmx_mem_address(vcpu, exit_qualification,
> vmx_instruction_info, true, &vmcs_gva))
> return 1;
> @@ -12148,7 +12229,10 @@ static int nested_vmx_run(struct kvm_vcpu
> *vcpu, bool launch)
> if (!nested_vmx_check_permission(vcpu))
> return 1;
>
> - if (!nested_vmx_check_vmcs12(vcpu))
> + if (!nested_vmx_handle_enlightened_vmptrld(vcpu))
> + return 1;
> +
> + if (!vmx->nested.hv_evmcs && !nested_vmx_check_vmcs12(vcpu))
> goto out;
>
> vmcs12 = get_vmcs12(vcpu);
> --
> 2.14.4
Reviewed-By: Liran Alon
- vkuzn...@redhat.com wrote:
> Adds hv_evmcs pointer and implement copy_enlightened_to_vmcs12() and
> copy_enlightened_to_vmcs12().
>
> prepare_vmcs02()/prepare_vmcs02_full() separation is not valid for
> Enlightened VMCS, do full sync for now.
>
> Suggested-by: Ladi Prosek
> Signed-off-b
- vkuzn...@redhat.com wrote:
> When Enlightened VMCS is in use by L1 hypervisor we can avoid
> vmwriting
> VMCS fields which did not change.
>
> Our first goal is to achieve minimal impact on traditional VMCS case
> so
> we're not wrapping each vmwrite() with an if-changed checker. We also
vcpu, u64 data, unsigned
> long len);
> void kvm_lapic_init(void);
> void kvm_lapic_exit(void);
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 06dd4cdb2ca8..a57766b940a5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2442,7 +2442,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu,
> struct msr_data *msr_info)
>
> break;
> case MSR_KVM_PV_EOI_EN:
> - if (kvm_lapic_enable_pv_eoi(vcpu, data))
> + if (kvm_lapic_enable_pv_eoi(vcpu, data, sizeof(u8)))
> return 1;
> break;
>
> --
> 2.14.4
Reviewed-By: Liran Alon
> On 26 Dec 2018, at 10:15, Yang Weijiang wrote:
>
> This bit controls whether guest CET states will be loaded on guest entry.
>
> Signed-off-by: Zhang Yi Z
> Signed-off-by: Yang Weijiang
> ---
> arch/x86/kvm/vmx.c | 19 +++
> 1 file changed, 19 insertions(+)
>
> diff --git
n terms of ABI guarantees. Therefore we are
> still in time to break things and conform as much as possible to the
> interface used for VMX.
>
> Suggested-by: Jim Mattson
> Suggested-by: Liran Alon
> Signed-off-by: Paolo Bonzini
> ---
> arch/x86/kvm/vmx.c | 2 +-
> 1 fil
> On 7 Nov 2018, at 14:10, Alexander Potapenko wrote:
>
> On Wed, Nov 7, 2018 at 2:38 AM syzbot
> wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:88b95ef4c780 kmsan: use MSan assembly instrumentation
>> git tree:
>> https://urldefense.proofpoint.
> On 7 Nov 2018, at 14:47, Paolo Bonzini wrote:
>
> On 07/11/2018 13:10, Alexander Potapenko wrote:
>> This appears to be a real bug in KVM.
>> Please see a simplified reproducer attached.
>
> Thanks, I agree it's a reael bug. The basic issue is that the
> kvm_state->size member is too small
> On 7 Nov 2018, at 20:58, syzbot
> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:7438a3b20295 kmsan: print user address when reporting info..
> git tree:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_google_kmsan.git_master&d=DwIB
> On 28 Sep 2018, at 9:12, Wanpeng Li wrote:
>
> From: Wanpeng Li
>
> In cloud environment, lapic_timer_advance_ns is needed to be tuned for every
> CPU
> generations, and every host kernel versions(the
> kvm-unit-tests/tscdeadline_latency.flat
> is 5700 cycles for upstream kernel and 96
> On 8 Oct 2018, at 13:59, Wanpeng Li wrote:
>
> On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote:
>>
>>
>>
>>> On 28 Sep 2018, at 9:12, Wanpeng Li wrote:
>>>
>>> From: Wanpeng Li
>>>
>>> In cloud envi
gt;
> string[12] = 0;
> if (strncmp(string, "KVMKVMKVM\0\0\0", 12) == 0)
> printf("kvm guest\n");
> else
> printf("bare hardware\n");
> }
>
> Suggested-by: Andrew Cooper
> Cc: Paolo Bon
is called from RSM.
>
> Reported-by: Jon Doron
> Suggested-by: Liran Alon
> Fixes: 5bea5123cbf0 ("KVM: VMX: check nested state and CR4.VMXE against SMM")
> Signed-off-by: Vitaly Kuznetsov
Patch looks good to me.
Reviewed-by: Liran Alon
> ---
> - Instread of putting the t
> On 26 Mar 2019, at 15:48, Vitaly Kuznetsov wrote:
>
> Liran Alon writes:
>
>>> On 26 Mar 2019, at 15:07, Vitaly Kuznetsov wrote:
>>> - Instread of putting the temporary HF_SMM_MASK drop to
>>> rsm_enter_protected_mode() (as was suggested by
my guest.
>
> Signed-off-by: Alexander Graf
With some small improvements I written inline below:
Reviewed-by: Liran Alon
> ---
> arch/x86/kvm/vmx/vmx.c | 22 ++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/v
* through the full KVM IRQ code, so refuse to take
> + * any direct PI assignments here.
> + */
> + pr_debug("SVM: %s: use legacy intr remap mode for irq %u\n",
> + __func__, irq.vector);
> + return
used only in the VMWare case and is obsoleted by having the emulator
> itself reinject #GP.
>
> Signed-off-by: Sean Christopherson
Reviewed-by: Liran Alon
> ---
> arch/x86/include/asm/kvm_host.h | 3 +--
> arch/x86/kvm/svm.c | 10 ++
> arch/x86/kvm/vmx
in case vCPU
CPL!=0.
In both cases, only #UD is injected to guest without userspace being aware of
it.
Problem is that if we would change this ABI to not queue #UD on emulation error,
we will definitely break userspace VMMs that rely on it when they re-enter into
guest
in this scenario and expect #UD to be injected.
Therefore, the only way to change this behaviour is to introduce a new KVM_CAP
that needs to be explicitly enabled from userspace.
But because most likely most userspace VMMs just terminate guest in case
of emulation-failure, it’s probably not worth it and Sean’s commit is good
enough.
For the commit itself:
Reviewed-by: Liran Alon
-Liran
kvm_vcpu_do_singlestep(vcpu, &r);
> + r = kvm_vcpu_do_singlestep(vcpu);
> if (!ctxt->have_exception ||
> exception_type(ctxt->exception.vector) == EXCPT_TRAP)
> __kvm_set_rflags(vcpu, ctxt->eflags);
> --
> 2.22.0
>
Reviewed-by: Liran Alon
-Liran
w a
> future patch to move #GP injection (for emulation failure) into
> kvm_emulate_instruction() without having to plumb in the error code.
>
> Signed-off-by: Sean Christopherson
Reviewed-by: Liran Alon
-Liran
> ---
> arch/x86/kvm/svm.c | 6 +-
> arch/x86/kvm/vmx/
only one that use
“no #UD on fail”.
The diff itself looks fine to me, therefore:
Reviewed-by: Liran Alon
-Liran
> ---
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/svm.c | 3 +--
> arch/x86/kvm/vmx/vmx.c | 3 +--
> arch/x86/kvm/x86.c |
UD interception as-well. :P
Besides minor comments inline below:
Reviewed-by: Liran Alon
-Liran
>
> Signed-off-by: Sean Christopherson
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/svm.c | 9 ++---
> arch/x86/kvm/vmx/vmx.c | 9 ++-
> On 23 Aug 2019, at 16:21, Liran Alon wrote:
>
>
>
>> On 23 Aug 2019, at 4:07, Sean Christopherson
>> wrote:
>>
>> The "no #UD on fail" is used only in the VMWare case, and for the VMWare
>> scenario it really means "#GP instea
> On 23 Aug 2019, at 4:07, Sean Christopherson
> wrote:
>
> Add an explicit emulation type for forced #UD emulation and use it to
> detect that KVM should unconditionally inject a #UD instead of falling
> into its standard emulation failure handling.
>
> Signed-off-by: Sean Christopherson
> On 23 Aug 2019, at 4:07, Sean Christopherson
> wrote:
>
> Immediately inject a #UD and return EMULATE done if emulation fails when
> handling an intercepted #UD. This helps pave the way for removing
> EMULATE_FAIL altogether.
>
> Signed-off-by: Sean Christopherson
I suggest squashing th
> On 23 Aug 2019, at 17:44, Sean Christopherson
> wrote:
>
> On Fri, Aug 23, 2019 at 04:47:14PM +0300, Liran Alon wrote:
>>
>>
>>> On 23 Aug 2019, at 4:07, Sean Christopherson
>>> wrote:
>>>
>>> Add an explicit emulation typ
_accept_irq() which also only ever passes Fixed and LowPriority
> interrupts as posted interrupts into the guest.
>
> This fixes a bug I have with code which configures real hardware to
> inject virtual SMIs into my guest.
>
> Signed-off-by: Alexander Graf
Reviewed-by: Liran Al
so that the vCPU's 64-bit mode is determined
> directly from EFER_LMA and the VMCS checks are based on that, which
> matches section 26.2.4 of the SDM.
>
> Cc: Sean Christopherson
> Cc: Jim Mattson
> Cc: Krish Sadhukhan
> Fixes: 5845038c111db27902bc220a4f70070fe945871c
> Signed-off-by: Paolo Bonzini
> —
Reviewed-by: Liran Alon
pu);
> else
> guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr;
> ept_load_pdptrs(vcpu);
> }
>
> - vmcs_writel(GUEST_CR3, guest_cr3);
> + if (!skip_cr3)
Nit: It’s a matter of taste, but I prefer positive conditions. i.e. “bool
write_guest_cr3”.
Anyway, code seems valid to me. Nice catch.
Reviewed-by: Liran Alon
-Liran
> + vmcs_writel(GUEST_CR3, guest_cr3);
> }
>
> int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> --
> 2.22.0
>
> On 27 Sep 2019, at 17:27, Sean Christopherson
> wrote:
>
> On Fri, Sep 27, 2019 at 03:06:02AM +0300, Liran Alon wrote:
>>
>>
>>> On 27 Sep 2019, at 0:43, Sean Christopherson
>>> wrote:
>>>
>>> Write the desired L2 CR3
On 28/04/2020 18:25, Alexander Graf wrote:
On 27.04.20 13:44, Liran Alon wrote:
On 27/04/2020 10:56, Paraschiv, Andra-Irina wrote:
On 25/04/2020 18:25, Liran Alon wrote:
On 23/04/2020 16:19, Paraschiv, Andra-Irina wrote:
The memory and CPUs are carved out of the primary VM, they are
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> + if (evmcs_ver)
> + ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
>
> /*
>* Default number of spinlock retry attempts, matches
> --
> 2.20.1
>
Seems to me that there are 2 unrelated separated patches here. Why not split
them?
For content itself: Reviewed-by: Liran Alon
> On 24 Jan 2019, at 19:39, Vitaly Kuznetsov wrote:
>
> Liran Alon writes:
>
>>> On 24 Jan 2019, at 19:15, Vitaly Kuznetsov wrote:
>>>
>>> We shouldn't probably be suggesting using Enlightened VMCS when it's not
>>> enabled (not s
> On 13 May 2019, at 18:15, Peter Zijlstra wrote:
>
> On Mon, May 13, 2019 at 04:38:09PM +0200, Alexandre Chartre wrote:
>> From: Liran Alon
>>
>> Export symbols needed to create, manage, populate and switch
>> a mm from a kernel module (kvm in this case).
&
uch as L1TF.
>
> These patches are based on an original patches from Liran Alon, completed
> with additional patches to effectively create KVM address space different
> from the full kernel address space.
Great job for pushing this forward! Thank you!
>
> The current code is jus
> On 13 May 2019, at 21:17, Andy Lutomirski wrote:
>
>> I expect that the KVM address space can eventually be expanded to include
>> the ioctl syscall entries. By doing so, and also adding the KVM page table
>> to the process userland page table (which should be safe to do because the
>> KVM a
> On 13 May 2019, at 22:31, Nakajima, Jun wrote:
>
> On 5/13/19, 7:43 AM, "kvm-ow...@vger.kernel.org on behalf of Alexandre
> Chartre" wrote:
>
>Proposal
>
>
>To handle both these points, this series introduce the mechanism of KVM
>address space isolation. Note that
> On 13 May 2019, at 18:15, Peter Zijlstra wrote:
>
> On Mon, May 13, 2019 at 04:38:32PM +0200, Alexandre Chartre wrote:
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index 46df4c6..317e105 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -33,6 +33,10 @@
>>
> On 14 May 2019, at 0:42, Nakajima, Jun wrote:
>
>
>
>> On May 13, 2019, at 2:16 PM, Liran Alon wrote:
>>
>>> On 13 May 2019, at 22:31, Nakajima, Jun wrote:
>>>
>>> On 5/13/19, 7:43 AM, "kvm-ow...@vger.kernel.org on beha
> On 14 May 2019, at 10:29, Peter Zijlstra wrote:
>
>
> (please, wrap our emails at 78 chars)
>
> On Tue, May 14, 2019 at 12:08:23AM +0300, Liran Alon wrote:
>
>> 3) From (2), we should have theoretically deduced that for every
>> #VMExit, there is a nee
> On 14 May 2019, at 5:07, Andy Lutomirski wrote:
>
> On Mon, May 13, 2019 at 2:09 PM Liran Alon wrote:
>>
>>
>>
>>> On 13 May 2019, at 21:17, Andy Lutomirski wrote:
>>>
>>>> I expect that the KVM address space can eventually
> On 28 Mar 2019, at 22:31, Vitaly Kuznetsov wrote:
>
> This is embarassing but we have another Windows/Hyper-V issue to workaround
> in KVM (or QEMU). Hope "RFC" makes it less offensive.
>
> It was noticed that Hyper-V guest on q35 KVM/QEMU VM hangs on boot if e.g.
> 'piix4-usb-uhci' device
> On 29 Mar 2019, at 12:14, Vitaly Kuznetsov wrote:
>
> Liran Alon writes:
>
>>> On 28 Mar 2019, at 22:31, Vitaly Kuznetsov wrote:
>>>
>>> This is embarassing but we have another Windows/Hyper-V issue to workaround
>>> in KVM (or QEMU). Hop
> On 29 Mar 2019, at 18:01, Paolo Bonzini wrote:
>
> On 29/03/19 15:40, Vitaly Kuznetsov wrote:
>> Paolo Bonzini writes:
>>
>>> On 28/03/19 21:31, Vitaly Kuznetsov wrote:
The 'hang' scenario develops like this:
1) Hyper-V boots and QEMU is trying to inject two irq simultaneou
> On 1 Apr 2019, at 11:39, Vitaly Kuznetsov wrote:
>
> Paolo Bonzini writes:
>
>> On 29/03/19 16:32, Liran Alon wrote:
>>> Paolo I am not sure this is the case here. Please read my other
>>> replies in this email thread.
>>>
>>> I th
> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov wrote:
>
> When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
> according to TLFS, this should "transition an enlightened VMCS from the
> active to the non-active state". It is, however, wrong to assume that
> it is only valid to do VMC
1 - 100 of 142 matches
Mail list logo