> -----Original Message----- > From: Alexander Graf [mailto:ag...@suse.de] > Sent: Friday, January 04, 2013 3:59 PM > To: Bhushan Bharat-R65777 > Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian Borntraeger; > Anthony > Liguori; qemu-devel@nongnu.org qemu-devel > Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix > do_kvm_cpu_synchronize_state data integrity issue > > > On 04.01.2013, at 11:23, Bhushan Bharat-R65777 wrote: > > > > > > >> -----Original Message----- > >> From: Alexander Graf [mailto:ag...@suse.de] > >> Sent: Friday, January 04, 2013 2:11 PM > >> To: Bhushan Bharat-R65777 > >> Cc: Marcelo Tosatti; jjhe...@linux.vnet.ibm.com; Christian > >> Borntraeger; Anthony Liguori; qemu-devel@nongnu.org qemu-devel > >> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix > >> do_kvm_cpu_synchronize_state data integrity issue > >> > >> > >> On 04.01.2013, at 05:22, Bhushan Bharat-R65777 wrote: > >> > >>> > >>> > >>>> -----Original Message----- > >>>> From: Marcelo Tosatti [mailto:mtosa...@redhat.com] > >>>> Sent: Friday, January 04, 2013 7:08 AM > >>>> To: Alexander Graf > >>>> Cc: jjhe...@linux.vnet.ibm.com; Christian Borntraeger; Anthony > >>>> Liguori; qemu- de...@nongnu.org qemu-devel; Bhushan Bharat-R65777 > >>>> Subject: Re: [Qemu-devel] [PATCH 7/7] KVM regsync: Fix > >>>> do_kvm_cpu_synchronize_state data integrity issue > >>>> > >>>> On Thu, Jan 03, 2013 at 08:09:22PM +0100, Alexander Graf wrote: > >>>>> > >>>>> On 03.01.2013, at 19:48, Jason J. Herne wrote: > >>>>> > >>>>>> On 01/03/2013 08:56 AM, Alexander Graf wrote: > >>>>>>>> static void do_kvm_cpu_synchronize_state(void *_args) > >>>>>>>>> { > >>>>>>>>> struct kvm_cpu_syncstate_args *args = _args; > >>>>>>>>> + CPUArchState *env = args->env; > >>>>>>>>> + int register_level = args->register_level; > >>>>>>>>> > >>>>>>> This probably becomes more readable if we explicitly revert back > >>>>>>> to > >>>> unsynced state first: > >>>>>>> > >>>>>>> /* Write back local modifications at our current level */ if > >>>>>>> (register_level > env->kvm_vcpu_dirty) { > >>>>>>> kvm_arch_put_registers(...); > >>>>>>> env->kvm_vcpu_dirty = 0; > >>>>>>> } > >>>>>>> > >>>>>>> and then do the sync we are requested to do: > >>>>>>> > >>>>>>> if (!env->kvm_vcpu_dirty) { > >>>>>>> ... > >>>>>>> } > >>>>>> > >>>>>> I agree, but only if we add a second conditional to the if 1st > >>>>>> statement as > >>>> such: > >>>>>> > >>>>>> if (args->env->kvm_vcpu_dirty && register_level > > >>>>>> env->kvm_vcpu_dirty) > >>>>>> > >>>>>> This is to cover the case where the caller is asking for register level > "1" > >>>> and we're already dirty at level "2". In this case, nothing should > >>>> happen and we'll need the "args->env->kvm_vcpu_dirty" to ensure > >>>> that is the > >> case. > >>>>> > >>>>> As before, I'd prefer to make this explicit: > >>>>> > >>>>>> > >>>>>> static void do_kvm_cpu_synchronize_state(void *_args) { struct > >>>>>> kvm_cpu_syncstate_args *args = _args; CPUArchState *env = > >>>>>> args->env; int register_level = args->register_level; > >>>>> > >>>>> if (register_level > env->kvm_vcpu_dirty) { > >>>>> /* We are more dirty than we need to - all is well */ > >>>>> return; > >>>>> } > >>>>> > >>>>>> > >>>>>> /* Write back local modifications at our current level */ if > >>>>>> (args->env->kvm_vcpu_dirty && register_level > env->kvm_vcpu_dirty) { > >>>>>> kvm_arch_put_registers(env, env->kvm_vcpu_dirty); > >>>>>> env->kvm_vcpu_dirty = 0; > >>>>>> } > >>>>>> > >>>>>> if (!args->env->kvm_vcpu_dirty) { > >>>>>> kvm_arch_get_registers(env, register_level); > >>>>>> env->kvm_vcpu_dirty = register_level; } } > >>>>>> > >>>>>> Do you agree? Thanks for your time. :) > >>>>> > >>>>> Please also check out the discussions I've had with Bharat about > >>>>> his watchdog > >>>> patches. There we need a mechanism to synchronize registers only > >>>> when we actually need to, in order to avoid potential race > >>>> conditions with a kernel timer. > >>>>> > >>>>> That particular case doesn't work well with levels. We can have > >>>>> multiple > >>>> different potential race producers in the kernel that we need to > >>>> avoid individually, so we can't always synchronize all of them when > >>>> only one of them needs to be synchronized. > >>>>> > >>>>> The big question is what we should be doing about this. We > >>>>> basically have 3 > >>>> options: > >>>>> > >>>>> * implement levels, treat racy registers as "manually > >>>>> synchronized", as > >>>> Bharat's latest patch set does > >>>>> * implement levels, add a bitmap for additional special > >>>>> synchronization bits > >>>>> * replace levels by bitmap > >>>>> > >>>>> I'm quite frankly not sure which one of the 3 would be the best way > forward. > >>>>> > >>>>> > >>>>> Alex > >>>> > >>>> Implement read/write accessors for individual registers, similarly > >>>> to how its done in the kernel. See kvm_cache_regs.h. > >>> > >>> Read/write for individual registers can be heavy for cases where > >>> multiple > >> register needed to be updated. Is not it? > >> > >> It depends. We can still have classes of registers to synchronize > >> that we could even synchronize using MANY_REG. For writes, things > >> become even faster with individual accessors since we can easily > >> coalesce all register writes until we hit the vcpu again into a MANY_REG > array and write them in one go. > >> > >>> > >>> For cases where multiple register needed to be synchronized then I > >>> would like > >> to elaborate the options as: > >>> > >>> Option 1: > >>> int timer_func(CPUArchState env) > >>> { > >>> cpu_synchronize_state(env); > >>> //update env->timer_registers > >>> env->updated_regs_type |= TIMER_REGS_TYPE_BIT; > >> > >> To scale this one, it's probably also more clever to swap the logic: > >> > >> env->kvm_sync_extra |= SYNC_TIMER_REGS; > >> cpu_synchronize_state(env); /* gets timer regs */ > >> /* update env->timer_registers */ > >> /* timer regs will be synchronized later because kvm_sync_extra > >> has SYNC_TIMER_REGS set */ > >> > >> Your case right now is special in that we don't need to get any > >> register state, but only write it. However, if we do > >> > >> cpu_synchronize_state(env); /* will not get timer regs */ > >> env->kvm_sync_extra |= SYNC_TIMER_REGS; > >> > >> then the next > >> > >> cpu_synchronize_state(env); > >> > >> would overwrite the timer values again. So we would also need to > >> indicate that the bits are dirty: > >> > >> cpu_synchronize_state(env); /* will not get timer regs */ > >> env->kvm_sync_extra |= SYNC_TIMER_REGS; > >> env->kvm_sync_dirty |= SYNC_TIMER_REGS; > >> > >> which cpu_synchronize_state() could use to make sure it doesn't read > >> back any timer regs again. > > > > yes > > > >> It would obviously also have to set it in its normal operation when > >> it executed first ;). > > > > We also need get all registers like for debugging and we should have flag > > like > ALL_REGS etc. > > Well, we could just take the current level of "general purpose registers" as > one > flag. > > > > > But I general I agree with the idea. > > > >> > >>> } > >>> > >>> Change the kvm_arch_put_registers() to add followings: > >>> > >>> int kvm_arch_put_registers(CPUArchState *env, int level) { > >>> > >>> If (env->updated_regs_type) { > >>> struct kvm_sregs sregs; > >>> If (env->updated_regs_type & TIMER_REGS_TYPE_BIT) { > >>> // update sregs->timer_registers; > >>> // may be we can have a bitmap to tell kernel for what > >> actually updated > >>> } > >>> If (env->updated_regs_type & XX_CPU_REGS_TYPE) { > >>> // update respective registers in sregs > >>> } > >>> > >>> ioctl(env, KVM_GET_SREGS, &sregs); > >>> } > >>> } > >>> > >>> This mechanism will get all registers state but this is required > >>> only once per > >> entry in QEMU. > >> > >> I don't understand this bit > >> > >>> > >>> > >>> Option 2: > >>> Define read_regs_type() and write_regs_type() for cases where it > >>> requires > >> multiple registers updates: > >>> > >>> int read_regs_type((CPUArchState env, void *regs_ptr, int reg_type) { > >>> -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc. > >>> > >>> Switch(reg_type) { > >>> Case TIMER_REGS: > >>> Struct *timer_regs = regs_ptr; > >>> Ioctl(KVM_GET_SREGS, timer_regs, CPU_reg_type); > >>> Break; > >>> > >>> Default: > >>> Printf(error); > >>> } > >>> > >>> Similarly will be the write function: > >>> int write_regs_type((CPUArchState env, int reg_type) { > >>> -> reg_type are bitmap like: TIMER_REGS, DEBUG_REGS, PERFMON_REGS etc. > >>> > >>> Switch(reg_type) { > >>> Case TIMER_REGS: > >>> Struct timer_regs; > >>> Timer_regs.reg1 = env->timer_regs->reg1; > >>> Timer_regs.reg2 = env->timer_regs->reg2; > >>> Ioctl(env, KVM_SET_SREGS, &timer_regs, CPU_reg_type); > >>> Break; > >>> > >>> Default: > >>> Printf(error); > >>> } > >>> > >>> Int timer_func(CPUxxState env) > >>> { > >>> Struct timer_regs; > >>> read_regs_type((env, &timer_regs,TIMER_REGS); > >>> //update env->timer_registers > >>> Write_regs_type(env, TIMER_REGS) > >>> } > >>> > >>> This will get one type of register_types and can cause multiple > >>> IOCTL per > >> entry to QEMU. > >> > >> This is still too explicit. How about > >> > >> static inline void ppc_set_tsr(CPUState *env, target_ulong val) { > >> env->kvm_sync_extra |= SYNC_TIMER_REGS; > >> cpu_synchronize_registers(env); > >> env->spr[SPR_TSR] = val; > > > > You also want env->kvm_sync_dirty also, right? > > Not quite, since SYNC_TIMER_REGS spans more than only TSR. So we need to make > sure we fetch the non-TSR register values before we can declare TIMER_REGS as > dirty.
Right , I actually want communicate mark dirty the TIMER_REGS only :) > > > > >> } > >> > >> static inline void ppc_set_tcr(CPUState *env, target_ulong val) { > >> env->kvm_sync_extra |= SYNC_TIMER_REGS; > >> cpu_synchronize_registers(env); > >> env->spr[SPR_TCR] = val; > > > > Same as above > > > >> } > >> > >> int timer_func(CPUState *env) { > >> ppc_set_tsr(env, 0); > > > > SYNC_TIMERS_REGS are get only once on first call. > > Sure, but that's implicit. A user of these functions doesn't have to care > which > one comes first. Agree -Bharat > > > Alex >