Hello, Thanks for the patch. Comments below:
On Tue, Apr 20, 2021 at 05:37:36PM +0800, Chenyi Qiang wrote: > Virtual Machines can exploit bus locks to degrade the performance of > system. To address this kind of performance DOS attack, bus lock VM exit > is introduced in KVM and it will report the bus locks detected in guest, > which can help userspace to enforce throttling policies. > Is there anything today that would protect the system from similar attacks from userspace with access to /dev/kvm? > The availability of bus lock VM exit can be detected through the > KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential > policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in > bitmap is the only supported strategy at present. It indicates that KVM > will exit to userspace to handle the bus locks. > > This patch adds a ratelimit on the bus locks acquired in guest as a > mitigation policy. > > Introduce a new field "bld" to record the limited speed of bus locks in > target VM. The user can specify it through the "bus-lock-detection" > as a machine property. In current implementation, the default value of > the speed is 0 per second, which means no restriction on the bus locks. > > Ratelimit enforced in data transmission uses a time slice of 100ms to > get smooth output during regular operations in block jobs. As for > ratelimit on bus lock detection, simply set the ratelimit interval to 1s > and restrict the quota of bus lock occurrence to the value of "bld". A > potential alternative is to introduce the time slice as a property > which can help the user achieve more precise control. > > The detail of Bus lock VM exit can be found in spec: > https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html > > Signed-off-by: Chenyi Qiang <chenyi.qi...@intel.com> > > --- > Changes from RFC v1: > - Remove the rip info output, as the rip can't reflect the bus lock > position correctly. > - RFC v1: > https://lore.kernel.org/qemu-devel/20210317084709.15605-1-chenyi.qi...@intel.com/ > --- > hw/i386/x86.c | 6 ++++++ > include/hw/i386/x86.h | 7 +++++++ > target/i386/kvm/kvm.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 55 insertions(+) > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > index ed796fe6ba..42d10857a6 100644 > --- a/hw/i386/x86.c > +++ b/hw/i386/x86.c > @@ -1256,6 +1256,12 @@ static void x86_machine_initfn(Object *obj) > x86ms->pci_irq_mask = ACPI_BUILD_PCI_IRQS; > x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6); > x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); > + x86ms->bld = 0; > + > + object_property_add_uint64_ptr(obj, "bus-lock-detection", > + &x86ms->bld, OBJ_PROP_FLAG_READWRITE); > + object_property_set_description(obj, "bus-lock-detection", > + "Bus lock detection ratelimit"); I suggest using a name that indicates this is a rate limit (e.g. "bus-lock-rate-limit"). "bus-lock-detection" sounds like a boolean option to just enable/disable detection. Please register a class property at x86_machine_class_init() instead. The plan is to eventually stop using instance properties wherever possible, as class properties make property introspection simpler. See machine_class_init() for some examples of integer class properties. Unfortunately object_class_property_add_uint64_ptr() is not very useful currently, so you'll need to write your own getter/setter function. > } > > static void x86_machine_class_init(ObjectClass *oc, void *data) > diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h > index c09b648dff..d6e198b228 100644 > --- a/include/hw/i386/x86.h > +++ b/include/hw/i386/x86.h > @@ -74,6 +74,13 @@ struct X86MachineState { > * will be translated to MSI messages in the address space. > */ > AddressSpace *ioapic_as; > + > + /* > + * ratelimit enforced on detected bus locks, the default value > + * is 0 per second > + */ I suggest documenting here that 0 means no limit. > + uint64_t bld; > + RateLimit bld_limit; > }; > > #define X86_MACHINE_SMM "smm" > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 7fe9f52710..a75fac0404 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -130,6 +130,8 @@ static bool has_msr_mcg_ext_ctl; > static struct kvm_cpuid2 *cpuid_cache; > static struct kvm_msr_list *kvm_feature_msrs; > > +#define SLICE_TIME 1000000000ULL /* ns */ > + "slice time" is a very generic name. I suggest "BLD_SLICE_TIME" or "BUS_LOCK_SLICE_TIME". > int kvm_has_pit_state2(void) > { > return has_pit_state2; > @@ -2267,6 +2269,27 @@ int kvm_arch_init(MachineState *ms, KVMState *s) > } > } > > + if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) { > + X86MachineState *x86ms = X86_MACHINE(ms); > + > + if (x86ms->bld > 0) { > + ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT); > + if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) { > + error_report("kvm: bus lock detection unsupported"); > + return -ENOTSUP; > + } > + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0, > + KVM_BUS_LOCK_DETECTION_EXIT); > + if (ret < 0) { > + error_report("kvm: Failed to enable bus lock detection cap: > %s", > + strerror(-ret)); > + return ret; > + } > + > + ratelimit_set_speed(&x86ms->bld_limit, x86ms->bld, SLICE_TIME); > + } > + } > + > return 0; > } > > @@ -4221,6 +4244,18 @@ void kvm_arch_pre_run(CPUState *cpu, struct kvm_run > *run) > } > } > > +static void kvm_rate_limit_on_bus_lock(void) > +{ > + MachineState *ms = MACHINE(qdev_get_machine()); > + X86MachineState *x86ms = X86_MACHINE(ms); > + > + uint64_t delay_ns = ratelimit_calculate_delay(&x86ms->bld_limit, 1); > + This doesn't look thread safe. Isn't this going to run from the VCPU thread with no locks acquired? > + if (delay_ns) { > + g_usleep(delay_ns / SCALE_US); > + } > +} > + > MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run) > { > X86CPU *x86_cpu = X86_CPU(cpu); > @@ -4236,6 +4271,9 @@ MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct > kvm_run *run) > } else { > env->eflags &= ~IF_MASK; > } > + if (run->flags & KVM_RUN_X86_BUS_LOCK) { > + kvm_rate_limit_on_bus_lock(); > + } > > /* We need to protect the apic state against concurrent accesses from > * different threads in case the userspace irqchip is used. */ > @@ -4594,6 +4632,10 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run > *run) > ioapic_eoi_broadcast(run->eoi.vector); > ret = 0; > break; > + case KVM_EXIT_X86_BUS_LOCK: > + /* already handled in kvm_arch_post_run */ > + ret = 0; > + break; > default: > fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); > ret = -1; > -- > 2.17.1 > -- Eduardo