Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

Alexander Graf Fri, 04 Jan 2013 00:41:12 -0800

On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote:

> 
> 
>> -----Original Message-----
>> From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
>> Sent: Friday, January 04, 2013 7:08 AM
>> To: Alexander Graf
>> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony Liguori; qemu-
>> de...@nongnu.org qemu-devel; Bhushan Bharat-R65777
>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix
>> do_kvm_cpu_synchronize_state data integrity issue
>> 
>> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote:
>>> 
>>> On 03.01.2013, at 19:48, Jason J. Herne wrote:
>>> 
>>>> On 01/03/2013 08:56 AM, Alexander Graf wrote:
>>>>>> static void do_kvm_cpu_synchronize_state(void *_args)
>>>>>>> {
>>>>>>>    struct kvm_cpu_syncstate_args *args = _args;
>>>>>>> +    CPUArchState *env = args->env;
>>>>>>> +    int register_level = args->register_level;
>>>>>>> 
>>>>> This probably becomes more readable if we explicitly revert back to
>> unsynced state first:
>>>>> 
>>>>> /* Write back local modifications at our current level */ if
>>>>> (register_level > env->kvm_vcpu_dirty) {
>>>>>    kvm_arch_put_registers(...);
>>>>>    env->kvm_vcpu_dirty = 0;
>>>>> }
>>>>> 
>>>>> and then do the sync we are requested to do:
>>>>> 
>>>>> if (!env->kvm_vcpu_dirty) {
>>>>>    ...
>>>>> }
>>>> 
>>>> I agree, but only if we add a second conditional to the if 1st statement  
>>>> as
>> such:
>>>> 
>>>> if (args->env->kvm_vcpu_dirty && register_level >
>>>> env->kvm_vcpu_dirty)
>>>> 
>>>> This is to cover the case where the caller is asking for register level "1"
>> and we're already dirty at level "2". In this case, nothing should happen and
>> we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the case.
>>> 
>>> As before, I'd prefer to make this explicit:
>>> 
>>>> 
>>>> static void do_kvm_cpu_synchronize_state(void *_args) {
>>>>   struct kvm_cpu_syncstate_args *args = _args;
>>>>   CPUArchState *env = args->env;
>>>>   int register_level = args->register_level;
>>> 
>>> if (register_level > env->kvm_vcpu_dirty) {
>>>    /* We are more dirty than we need to - all is well */
>>>    return;
>>> }
>>> 
>>>> 
>>>>   /* Write back local modifications at our current level */
>>>>   if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) {
>>>>       kvm_arch_put_registers(env, env->kvm_vcpu_dirty);
>>>>       env->kvm_vcpu_dirty = 0;
>>>>   }
>>>> 
>>>>   if (!args->env->kvm_vcpu_dirty) {
>>>>       kvm_arch_get_registers(env, register_level);
>>>>       env->kvm_vcpu_dirty = register_level;
>>>>   }
>>>> }
>>>> 
>>>> Do you agree?  Thanks for your time. :)
>>> 
>>> Please also check out the discussions I've had with Bharat about his 
>>> watchdog
>> patches. There we need a mechanism to synchronize registers only when we
>> actually need to, in order to avoid potential race conditions with a kernel
>> timer.
>>> 
>>> That particular case doesn't work well with levels. We can have multiple
>> different potential race producers in the kernel that we need to avoid
>> individually, so we can't always synchronize all of them when only one of 
>> them
>> needs to be synchronized.
>>> 
>>> The big question is what we should be doing about this. We basically have 3
>> options:
>>> 
>>>  * implement levels, treat racy registers as "manually synchronized", as
>> Bharat's latest patch set does
>>>  * implement levels, add a bitmap for additional special synchronization 
>>> bits
>>>  * replace levels by bitmap
>>> 
>>> I'm quite frankly not sure which one of the 3 would be the best way forward.
>>> 
>>> 
>>> Alex
>> 
>> Implement read/write accessors for individual registers, similarly to how its
>> done in the kernel. See kvm_cache_regs.h.
> 
> Read/write for individual registers can be heavy for cases where multiple 
> register needed to be updated. Is not it?


It depends. We can still have classes of registers to synchronize that we could 
even synchronize using MANY_REG. For writes, things become even faster with 
individual accessors since we can easily coalesce all register writes until we 
hit the vcpu again into a MANY_REG array and write them in one go.

> 
> For cases where multiple register needed to be synchronized then I would like 
> to elaborate the options as:
> 
> Option 1:
> int timer_func(CPUArchState env)
> {
>       cpu_synchronize_state(env);
>       //update env->timer_registers
>       env->updated_regs_type |= TIMER_REGS_TYPE_BIT;

To scale this one, it's probably also more clever to swap the logic:

    env->kvm_sync_extra |= SYNC_TIMER_REGS;
    cpu_synchronize_state(env); /* gets timer regs */
    /* update env->timer_registers */
    /* timer regs will be synchronized later because kvm_sync_extra has 
SYNC_TIMER_REGS set */

Your case right now is special in that we don't need to get any register state, 
but only write it. However, if we do

    cpu_synchronize_state(env); /* will not get timer regs */
    env->kvm_sync_extra |= SYNC_TIMER_REGS;

then the next

    cpu_synchronize_state(env);

would overwrite the timer values again. So we would also need to indicate that 
the bits are dirty:

    cpu_synchronize_state(env); /* will not get timer regs */
    env->kvm_sync_extra |= SYNC_TIMER_REGS;
    env->kvm_sync_dirty |= SYNC_TIMER_REGS;

which cpu_synchronize_state() could use to make sure it doesn't read back any 
timer regs again. It would obviously also have to set it in its normal 
operation when it executed first ;).

> }
> 
> Change the kvm_arch_put_registers() to add followings:
> 
> int kvm_arch_put_registers(CPUArchState *env, int level)
> {
>       
>       If (env->updated_regs_type) {
>               struct kvm_sregs sregs;
>               If (env->updated_regs_type & TIMER_REGS_TYPE_BIT) {
>                       // update sregs->timer_registers;
>                       // may be we can have a bitmap to tell kernel for what 
> actually updated
>               }
>               If (env->updated_regs_type & XX_CPU_REGS_TYPE) {
>                       // update respective registers in sregs
>               }
> 
>               ioctl(env, KVM_GET_SREGS, &sregs);
>       }
> }
> 
> This mechanism will get all registers state but this is required only once 
> per entry in QEMU.

I don't understand this bit.

> 
> 
> Option 2:
> Define read_regs_type() and write_regs_type() for cases where it requires 
> multiple registers updates:
> 
> int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type)
> {
>     -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
> 
>       Switch(reg_type) {
>       Case TIMER_REGS:
>               Struct *timer_regs = regs_ptr;
>               Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type); 
>               Break;
> 
>       Default:
>               Printf(error);
> }
> 
> Similarly will be the write function:
> int write_regs_type((CPUArchState env, int reg_type)
> {
>     -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc.
> 
>       Switch(reg_type) {
>       Case TIMER_REGS:
>               Struct timer_regs;
>               Timer_regs.reg1 = env->timer_regs->reg1;
>               Timer_regs.reg2 = env->timer_regs->reg2;
>               Ioctl(env, KVM_SET_SREGS, &timer_regs, CPU_reg_type); 
>               Break;
> 
>       Default:
>               Printf(error);
> }
> 
> Int timer_func(CPUxxState env)
> {
>    Struct timer_regs;
>    read_regs_type((env, &timer_regs,TIMER_REGS);
>    //update env->timer_registers
>    Write_regs_type(env, TIMER_REGS)
> } 
> 
> This will get one type of register_types and can cause multiple IOCTL per 
> entry to QEMU.

This is still too explicit. How about

static inline void ppc_set_tsr(CPUState *env, target_ulong val)
{
    env->kvm_sync_extra |= SYNC_TIMER_REGS;
    cpu_synchronize_registers(env);
    env->spr[SPR_TSR] = val;
}

static inline void ppc_set_tcr(CPUState *env, target_ulong val)
{
    env->kvm_sync_extra |= SYNC_TIMER_REGS;
    cpu_synchronize_registers(env);
    env->spr[SPR_TCR] = val;
}

int timer_func(CPUState *env) {
    ppc_set_tsr(env, 0);
    ppc_set_tcr(env, 0);
}

The main benefit from this method is that we can do a simple search/replace to 
get rid of essentially all synchronization pitfalls where we might miss out on 
some code path that then doesn't synchronize its register set.

> -------
> 
> Keep on using GET/SET_ONE_REG when only one registers needed to be updated.

We can always use ONE_REG for any of the variants.

The main question I have is whether it makes sense to move from

  /* state subset only touched by the VCPU itself during runtime */
  #define KVM_PUT_RUNTIME_STATE   1
  /* state subset modified during VCPU reset */
  #define KVM_PUT_RESET_STATE     2
  /* full state set, modified during initialization or on vmload */
  #define KVM_PUT_FULL_STATE      3

to a bitmap. So we would always only synchronize KVM_PUT_RUNTIME_STATE, but 
keep a bitmap as explained above around for any state that is more elaborate. 
For easy conversion, we could even add bits for SYNC_RESET and SYNC_FULL = 
(SYNC_RESET | SYNC_FULL_REAL).

That should get rid of all the level based logic, giving us a lot more 
flexibility on what we want to synchronize.


Alex

Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix do_kvm_cpu_synchronize_state data integrity issue

Reply via email to