On 04.01.2013, at 09:57, Bhushan Bharat-R65777 wrote: > > >> -----Original Message----- >> From: Alexander Graf [mailto:ag...@suse.de] >> Sent: Friday, January 04, 2013 2:11 PM >> To: Bhushan Bharat-R65777 >> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; >> Anthony >> Liguori; qemu-devel@nongnu.org qemu-devel >> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix >> do_kvm_cpu_synchronize_state data integrity issue >> >> >> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote: >> >>> >>> >>>> -----Original Message----- >>>> From: Marcelo Tosatti [mailto:mtosa...@redhat.com] >>>> Sent: Friday, January 04, 2013 7:08 AM >>>> To: Alexander Graf >>>> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony >>>> Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 >>>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix >>>> do_kvm_cpu_synchronize_state data integrity issue >>>> >>>> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote: >>>>> >>>>> On 03.01.2013, at 19:48, Jason J. Herne wrote: >>>>> >>>>>> On 01/03/2013 08:56 AM, Alexander Graf wrote: >>>>>>>> static void do_kvm_cpu_synchronize_state(void *_args) >>>>>>>>> { >>>>>>>>> struct kvm_cpu_syncstate_args *args = _args; >>>>>>>>> + CPUArchState *env = args->env; >>>>>>>>> + int register_level = args->register_level; >>>>>>>>> >>>>>>> This probably becomes more readable if we explicitly revert back >>>>>>> to >>>> unsynced state first: >>>>>>> >>>>>>> /* Write back local modifications at our current level */ if >>>>>>> (register_level > env->kvm_vcpu_dirty) { >>>>>>> kvm_arch_put_registers(...); >>>>>>> env->kvm_vcpu_dirty = 0; >>>>>>> } >>>>>>> >>>>>>> and then do the sync we are requested to do: >>>>>>> >>>>>>> if (!env->kvm_vcpu_dirty) { >>>>>>> ... >>>>>>> } >>>>>> >>>>>> I agree, but only if we add a second conditional to the if 1st >>>>>> statement as >>>> such: >>>>>> >>>>>> if (args->env->kvm_vcpu_dirty && register_level > >>>>>> env->kvm_vcpu_dirty) >>>>>> >>>>>> This is to cover the case where the caller is asking for register level >>>>>> "1" >>>> and we're already dirty at level "2". In this case, nothing should >>>> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure that is the >> case. >>>>> >>>>> As before, I'd prefer to make this explicit: >>>>> >>>>>> >>>>>> static void do_kvm_cpu_synchronize_state(void *_args) { >>>>>> struct kvm_cpu_syncstate_args *args = _args; >>>>>> CPUArchState *env = args->env; >>>>>> int register_level = args->register_level; >>>>> >>>>> if (register_level > env->kvm_vcpu_dirty) { >>>>> /* We are more dirty than we need to - all is well */ >>>>> return; >>>>> } >>>>> >>>>>> >>>>>> /* Write back local modifications at our current level */ >>>>>> if (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) { >>>>>> kvm_arch_put_registers(env, env->kvm_vcpu_dirty); >>>>>> env->kvm_vcpu_dirty = 0; >>>>>> } >>>>>> >>>>>> if (!args->env->kvm_vcpu_dirty) { >>>>>> kvm_arch_get_registers(env, register_level); >>>>>> env->kvm_vcpu_dirty = register_level; >>>>>> } >>>>>> } >>>>>> >>>>>> Do you agree? Thanks for your time. :) >>>>> >>>>> Please also check out the discussions I've had with Bharat about his >>>>> watchdog >>>> patches. There we need a mechanism to synchronize registers only when >>>> we actually need to, in order to avoid potential race conditions with >>>> a kernel timer. >>>>> >>>>> That particular case doesn't work well with levels. We can have >>>>> multiple >>>> different potential race producers in the kernel that we need to >>>> avoid individually, so we can't always synchronize all of them when >>>> only one of them needs to be synchronized. >>>>> >>>>> The big question is what we should be doing about this. We basically >>>>> have 3 >>>> options: >>>>> >>>>> * implement levels, treat racy registers as "manually >>>>> synchronized", as >>>> Bharat's latest patch set does >>>>> * implement levels, add a bitmap for additional special >>>>> synchronization bits >>>>> * replace levels by bitmap >>>>> >>>>> I'm quite frankly not sure which one of the 3 would be the best way >>>>> forward. >>>>> >>>>> >>>>> Alex >>>> >>>> Implement read/write accessors for individual registers, similarly to >>>> how its done in the kernel. See kvm_cache_regs.h. >>> >>> Read/write for individual registers can be heavy for cases where multiple >> register needed to be updated. Is not it? >> >> It depends. We can still have classes of registers to synchronize that we >> could >> even synchronize using MANY_REG. For writes, things become even faster with >> individual accessors since we can easily coalesce all register writes until >> we >> hit the vcpu again into a MANY_REG array and write them in one go. >> >>> >>> For cases where multiple register needed to be synchronized then I would >>> like >> to elaborate the options as: >>> >>> Option 1: >>> int timer_func(CPUArchState env) >>> { >>> cpu_synchronize_state(env); >>> //update env->timer_registers >>> env->updated_regs_type |= TIMER_REGS_TYPE_BIT; >> >> To scale this one, it's probably also more clever to swap the logic: >> >> env->kvm_sync_extra |= SYNC_TIMER_REGS; >> cpu_synchronize_state(env); /* gets timer regs */ >> /* update env->timer_registers */ >> /* timer regs will be synchronized later because kvm_sync_extra has >> SYNC_TIMER_REGS set */ >> >> Your case right now is special in that we don't need to get any register >> state, >> but only write it. However, if we do >> >> cpu_synchronize_state(env); /* will not get timer regs */ >> env->kvm_sync_extra |= SYNC_TIMER_REGS; >> >> then the next >> >> cpu_synchronize_state(env); > > Currently the cpu_sychronize_state() will fetch registers only if " > !env->kvm_vcpu_dirty " . and on first cpu_synchromize_state it is set dirty. > I want same logic to continue with under this option.
That's exactly what this patch set here is changing, because the logic is too restrictive. > Once the registered, the rest of code can keep on changing registers and > setting respective bitmap in env->update_reg_type. Then finally on > kvm_arch_put_register() will check all bits in env->update_reg_type for > SET_SREGS ioctl. If we want to be extensible, we have to think outside of the "put" bottle and also consider more advanced "get" operations. Alex