Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-09-10 Thread alvise rigo
Hi Alex, On Thu, Sep 10, 2015 at 6:19 PM, Alex Bennée wrote: > > alvise rigo writes: > >> Hi Paolo, >> >> A brief update on this. I have a first implementation of the idea you >> proposed, though it's not working really well. The failing rate of SCs >> for some reason is very high. > > Due to hi

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-09-10 Thread Paolo Bonzini
On 10/09/2015 15:04, alvise rigo wrote: > Hi Paolo, > > A brief update on this. I have a first implementation of the idea you > proposed, though it's not working really well. The failing rate of SCs > for some reason is very high. > Instead of trying to fix it, I came up with this alternative de

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-09-10 Thread Alex Bennée
alvise rigo writes: > Hi Paolo, > > A brief update on this. I have a first implementation of the idea you > proposed, though it's not working really well. The failing rate of SCs > for some reason is very high. Due to high memory contention on the EXCL page? > Instead of trying to fix it, I ca

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-09-10 Thread alvise rigo
Hi Paolo, A brief update on this. I have a first implementation of the idea you proposed, though it's not working really well. The failing rate of SCs for some reason is very high. Instead of trying to fix it, I came up with this alternative design: we still use 8 bits per page and we group the sm

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread alvise rigo
On Wed, Aug 12, 2015 at 4:10 PM, Paolo Bonzini wrote: > > > On 12/08/2015 16:04, alvise rigo wrote: >>> > clear algorithm: >>> >if bytemap[vaddr] == 254 >>> > bytemap[vaddr] = CPU_ID >> Isn't this also required for the clear algorithm? >> >> if bytemap[vaddr] < 254 >> /* this

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread Paolo Bonzini
On 12/08/2015 16:04, alvise rigo wrote: >> > clear algorithm: >> >if bytemap[vaddr] == 254 >> > bytemap[vaddr] = CPU_ID > Isn't this also required for the clear algorithm? > > if bytemap[vaddr] < 254 > /* this can happen for the TLB_EXCL slow path effect */ > bytema

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread alvise rigo
On Wed, Aug 12, 2015 at 2:36 PM, Paolo Bonzini wrote: > > > On 12/08/2015 09:31, alvise rigo wrote: >> I think that tlb_flush_entry is not enough, since in theory another >> vCPU could have a different TLB address referring the same phys >> address. > > You're right, this is a TLB so it's virtuall

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread Peter Maydell
On 12 August 2015 at 13:36, Paolo Bonzini wrote: > > > On 12/08/2015 09:31, alvise rigo wrote: >> I think that tlb_flush_entry is not enough, since in theory another >> vCPU could have a different TLB address referring the same phys >> address. > > You're right, this is a TLB so it's virtually-ind

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread Paolo Bonzini
On 12/08/2015 09:31, alvise rigo wrote: > I think that tlb_flush_entry is not enough, since in theory another > vCPU could have a different TLB address referring the same phys > address. You're right, this is a TLB so it's virtually-indexed. :( I'm not sure what happens on ARM, since it has a

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread alvise rigo
I think that tlb_flush_entry is not enough, since in theory another vCPU could have a different TLB address referring the same phys address. alvise On Tue, Aug 11, 2015 at 6:32 PM, Paolo Bonzini wrote: > > > On 11/08/2015 18:11, alvise rigo wrote: >>> > Why flush the entire cache (I understand y

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread Paolo Bonzini
On 11/08/2015 18:11, alvise rigo wrote: >> > Why flush the entire cache (I understand you mean TLB)? > Sorry, I meant the TLB. > If for each removal of an exclusive entry we set also the bit to 1, we > force the following LL to make a tlb_flush() on every vCPU. What if you only flush one entry w

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread alvise rigo
On Tue, Aug 11, 2015 at 5:55 PM, Paolo Bonzini wrote: > > > On 11/08/2015 17:54, alvise rigo wrote: >> This can lead to an excessive rate of flush requests, since for one >> CPU that removes the TLB_EXCL flag, all the others that are competing >> for the same excl address will need to flush the en

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread Paolo Bonzini
On 11/08/2015 17:54, alvise rigo wrote: > This can lead to an excessive rate of flush requests, since for one > CPU that removes the TLB_EXCL flag, all the others that are competing > for the same excl address will need to flush the entire cache and > start all over again. Why flush the entire c

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread alvise rigo
On Tue, Aug 11, 2015 at 3:52 PM, Paolo Bonzini wrote: > > > On 07/08/2015 19:03, Alvise Rigo wrote: >> +static inline int cpu_physical_memory_excl_atleast_one_clean(ram_addr_t >> addr) >> +{ >> +unsigned long *bitmap = ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]; >> +unsigned long next,

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread alvise rigo
On Tue, Aug 11, 2015 at 4:24 PM, Peter Maydell wrote: > On 11 August 2015 at 14:52, Paolo Bonzini wrote: >> >> I don't think real hardware has ll/sc per CPU. > > On ARM, the exclusives are handled by the 'global monitor', which > supports tracking an exclusive access per CPU. > >> Can we have th

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread Peter Maydell
On 11 August 2015 at 14:52, Paolo Bonzini wrote: > > I don't think real hardware has ll/sc per CPU. On ARM, the exclusives are handled by the 'global monitor', which supports tracking an exclusive access per CPU. > Can we have the bitmap as: > > - 0 if one or more CPUs have the address set to e

Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-11 Thread Paolo Bonzini
On 07/08/2015 19:03, Alvise Rigo wrote: > +static inline int cpu_physical_memory_excl_atleast_one_clean(ram_addr_t addr) > +{ > +unsigned long *bitmap = ram_list.dirty_memory[DIRTY_MEMORY_EXCLUSIVE]; > +unsigned long next, end; > + > +if (likely(smp_cpus <= BITS_PER_LONG)) { This onl

[Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-07 Thread Alvise Rigo
The purpose of this new bitmap is to flag the memory pages that are in the middle of LL/SC operations (after a LL, before a SC) on a per-vCPU basis. For all these pages, the corresponding TLB entries will be generated in such a way to force the slow-path if at least one vCPU has the bit not set. Wh