[Qemu-devel] [PATCH v5 67/73] sparc: convert to cpu_has_work_with_iothread_lock

2018-12-12 Thread Emilio G. Cota
Soon we will call cpu_has_work without the BQL. Cc: Mark Cave-Ayland Cc: Artyom Tarasenko Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- target/sparc/cpu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c index

[Qemu-devel] [PATCH v5 65/73] s390x: convert to cpu_has_work_with_iothread_lock

2018-12-12 Thread Emilio G. Cota
Soon we will call cpu_has_work without the BQL. Cc: Cornelia Huck Cc: Alexander Graf Cc: David Hildenbrand Cc: qemu-s3...@nongnu.org Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- target/s390x/cpu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a

Re: [Qemu-devel] [RFC PATCH 1/2] tests/test-qht-par: test gets stuck intermittently on OSX

2018-11-09 Thread Emilio G. Cota
On Fri, Nov 09, 2018 at 10:30:01 -0500, Cleber Rosa wrote: > To be fully honest, this may not be a OSX (alone) condition, but may > be a situation that only happens with OSX on Travis-CI, were resources > are quite limited. > > I have personal experience with tests that exercise parallelism or > d

Re: [Qemu-devel] [PATCH v2 2/5] util: introduce threaded workqueue

2018-11-13 Thread Emilio G. Cota
On Tue, Nov 06, 2018 at 20:20:22 +0800, guangrong.x...@gmail.com wrote: > From: Xiao Guangrong > > This modules implements the lockless and efficient threaded workqueue. (snip) > +++ b/util/threaded-workqueue.c > +struct Threads { > +/* > + * in order to avoid contention, the @requests is

Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

2018-11-13 Thread Emilio G. Cota
On Mon, Nov 12, 2018 at 22:44:46 +0100, Richard Henderson wrote: > Based on an idea forwarded by Emilio, which suggests a 5-6% > speed gain is possible. I have not spent too much time > measuring this, as the code size gains are significant. Nice! > I believe that I posted an x86_64-only patch s

Re: [Qemu-devel] [RFC 01/48] cpu: introduce run_on_cpu_no_bql

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 11:30:19 +, Alex Bennée wrote: > > Emilio G. Cota writes: > > > This allows us to queue synchronous CPU work without the BQL. > > > > Will gain a user soon. > > This is also in the cpu-lock series right? No, in the cpu-lock seri

Re: [Qemu-devel] [RFC 02/48] trace: expand mem_info:size_shift to 3 bits

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 13:03:19 +, Alex Bennée wrote: > > Emilio G. Cota writes: > > > This will allow us to trace 16B-long memory accesses. > > > > While at it, add some defines for the mem_info bits and simplify > > trace_mem_get_info by making it a wr

Re: [Qemu-devel] [RFC 06/48] tcg: use QHT for helper_table

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 14:41:53 +, Alex Bennée wrote: > Emilio G. Cota writes: (snip) > > -static GHashTable *helper_table; > > +static struct qht helper_table; > > +static bool helper_table_inited; > > Having a flag for initialisation seems a little excessive con

Re: [Qemu-devel] [RFC 06/48] tcg: use QHT for helper_table

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 16:11:35 +, Alex Bennée wrote: > > Emilio G. Cota writes: (snip) > I needed to do this: > > modified tcg/tcg.c > @@ -884,7 +884,7 @@ static TCGTemp *tcg_global_reg_new_internal(TCGContext > *s, TCGType type, > > static inline uint32_t

Re: [Qemu-devel] [RFC 09/48] tcg: reset runtime helpers when flushing the code cache

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 17:01:13 +, Alex Bennée wrote: > > Emilio G. Cota writes: > > > In preparation for adding plugin support. One of the clean-up > > actions when uninstalling plugins will be to flush the code > > cache. We'll also have to clear the ru

Re: [Qemu-devel] [RFC 44/48] cpus: lockstep execution support

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 16:43:22 +, Alex Bennée wrote: > > Emilio G. Cota writes: > > > Signed-off-by: Emilio G. Cota > > --- > > > > > void cpu_interrupt(CPUState *cpu, int mask); > > diff --git a/cpus.c b/cpus.c > > index 3efe89354d..a44

Re: [Qemu-devel] [PATCH] cpus: run work items for all vCPUs if single-threaded

2018-11-14 Thread Emilio G. Cota
On Wed, Nov 14, 2018 at 12:44:00 +0100, Paolo Bonzini wrote: > This avoids the following deadlock: > > 1) a thread calls run_on_cpu for CPU 2 from a timer, and single_tcg_halt_cond > is signaled > > 2) CPU 1 is running and exits. It finds no work item and enters CPU 2 > > 3) because the I/O thr

Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

2018-11-15 Thread Emilio G. Cota
On Thu, Nov 15, 2018 at 12:32:00 +0100, Richard Henderson wrote: > On 11/14/18 2:00 AM, Emilio G. Cota wrote: > > The following might be related: I'm seeing segfaults with -smp 8 > > and beyond when doing bootup+shutdown of an aarch64 guest on > > an x86-64 host

Re: [Qemu-devel] [PATCH] cpus: run work items for all vCPUs if single-threaded

2018-11-15 Thread Emilio G. Cota
On Fri, Nov 16, 2018 at 00:15:53 +0100, Paolo Bonzini wrote: > On 14/11/2018 20:42, Emilio G. Cota wrote: > > On Wed, Nov 14, 2018 at 12:44:00 +0100, Paolo Bonzini wrote: > >> This avoids the following deadlock: > >> > >> 1) a thread calls ru

Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

2018-11-15 Thread Emilio G. Cota
On Thu, Nov 15, 2018 at 23:04:50 +0100, Richard Henderson wrote: > On 11/15/18 7:48 PM, Emilio G. Cota wrote: > > - Segfault in code_gen_buffer. This one I don't have a fix for, > > but it's *much* easier to reproduce when -tb-size is very small, > > e.g. &quo

Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

2018-11-15 Thread Emilio G. Cota
On Thu, Nov 15, 2018 at 20:13:38 -0500, Emilio G. Cota wrote: > I'll generate now some more perf numbers that we could include in the > commit logs. SPEC numbers are a net perf decrease, unfortunately: Softmmu speedup for SPEC06int (test

Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

2018-11-16 Thread Emilio G. Cota
On Fri, Nov 16, 2018 at 09:07:50 +0100, Richard Henderson wrote: > On 11/16/18 6:10 AM, Emilio G. Cota wrote: > > It's possible that newer machines with larger reorder buffers > > will be able to take better advantage of the higher instruction > > locality, hiding the la

Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line

2018-11-16 Thread Emilio G. Cota
On Fri, Nov 16, 2018 at 09:10:32 +0100, Richard Henderson wrote: > On 11/16/18 2:13 AM, Emilio G. Cota wrote: > > This allows us to discard most TBs; in the example above, > > we end up *not* discarding only ~70 TBs, that is we end up keeping > > only 70/2500 = 2.8% of the

Re: [Qemu-devel] qemu-user performance

2018-11-16 Thread Emilio G. Cota
On Fri, Nov 16, 2018 at 14:55:01 +0100, Etienne Dublé wrote: (snip) > So the idea is: what if we could share the cache of code already translated > between all those processes? > There would be sereral ways to achieve this: > * use a shared memory area for the cache, and locking mechanisms. > * hav

Re: [Qemu-devel] [PATCH v2 2/5] util: introduce threaded workqueue

2018-11-20 Thread Emilio G. Cota
On Tue, Nov 20, 2018 at 18:25:25 +0800, Xiao Guangrong wrote: > On 11/14/18 2:38 AM, Emilio G. Cota wrote: > > On Tue, Nov 06, 2018 at 20:20:22 +0800, guangrong.x...@gmail.com wrote: > > > From: Xiao Guangrong (snip) > > Batching achieves higher performance at high cor

Re: [Qemu-devel] [RFC 6/6] cputlb: dynamically resize TLBs based on use rate

2018-10-07 Thread Emilio G. Cota
On Sun, Oct 07, 2018 at 19:37:50 +0200, Philippe Mathieu-Daudé wrote: > On 10/6/18 11:45 PM, Emilio G. Cota wrote: > > 2. System boot + shutdown, ubuntu 18.04 x86_64: > > You can also run the VM tests to build QEMU: > > $ make vm-test Thanks, will give that a look. >

Re: [Qemu-devel] [RFC PATCH 00/21] Trace updates and plugin RFC

2018-10-08 Thread Emilio G. Cota
On Mon, Oct 08, 2018 at 11:28:38 +0100, Alex Bennée wrote: > Emilio G. Cota writes: > > Again, for performance you'd avoid the tracepoint (i.e. calling > > a helper to call another function) and embed directly the > > callback from TCG. Same thing applies to TB's.

Re: [Qemu-devel] [RFC 6/6] cputlb: dynamically resize TLBs based on use rate

2018-10-08 Thread Emilio G. Cota
On Sun, Oct 07, 2018 at 21:48:34 -0400, Emilio G. Cota wrote: > - 70/40% use rate for growing/shrinking the TLB does not > seem a great choice, if one wants to avoid a pathological > case that can induce constant resizing. Imagine we got > exactly 70% use rate, and all TLB

Re: [Qemu-devel] [PATCH v3 3/4] cputlb: serialize tlb updates with env->tlb_lock

2018-10-08 Thread Emilio G. Cota
On Mon, Oct 08, 2018 at 14:57:18 +0100, Alex Bennée wrote: > Emilio G. Cota writes: > > The readers that do not hold tlb_lock must use atomic reads when > > reading .addr_write, since this field can be updated by other threads; > > the conversion to atomic reads is done in th

Re: [Qemu-devel] [RFC 2/6] cputlb: do not evict invalid entries to the vtlb

2018-10-08 Thread Emilio G. Cota
On Sun, Oct 07, 2018 at 19:09:01 -0700, Richard Henderson wrote: > On 10/6/18 2:45 PM, Emilio G. Cota wrote: > > Currently we evict an entry to the victim TLB when it doesn't match > > the current address. But it could be that there's no match because > > the current

[Qemu-devel] [PATCH v4 0/4] per-TLB lock

2018-10-08 Thread Emilio G. Cota
v3: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01087.html Changes since v3: - Add R-b's - Add comment to copy_tlb_helper_locked to note that it can only be called from the TLB owner thread. The series is checkpatch-clean. You can fetch it from: https://github.com/cota/qemu/tre

[Qemu-devel] [PATCH v4 3/4] cputlb: serialize tlb updates with env->tlb_lock

2018-10-08 Thread Emilio G. Cota
ntire TLB. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 3 + accel/tcg/cputlb.c | 155 ++-- 2 files changed, 87 insertions(+), 71 deletions(-) diff --git a/include/exec/cpu-defs.h b/include

[Qemu-devel] [PATCH v4 1/4] exec: introduce tlb_init

2018-10-08 Thread Emilio G. Cota
Paves the way for the addition of a per-TLB lock. Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/exec-all.h | 8 accel/tcg/cputlb.c | 4 exec.c | 1 + 3 files changed, 13 insertions(+) diff --git a/include/exec/exec-all.h b/include

[Qemu-devel] [PATCH v4 2/4] cputlb: fix assert_cpu_is_self macro

2018-10-08 Thread Emilio G. Cota
Reviewed-by: Richard Henderson Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- accel/tcg/cputlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 502eea2850..f6b388c961 100644 --- a/accel/tcg/cputlb.c +++ b/accel

[Qemu-devel] [PATCH v4 4/4] cputlb: read CPUTLBEntry.addr_write atomically

2018-10-08 Thread Emilio G. Cota
Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 is the current patch series, i.e. with a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota --- accel/tcg/softmmu_template.h | 16 ++-- include/exec/cpu_

Re: [Qemu-devel] [PATCH] tcg: Add tlb_index and tlb_entry helpers

2018-10-08 Thread Emilio G. Cota
On Sun, Oct 07, 2018 at 18:05:22 -0700, Richard Henderson wrote: > Isolate the computation of an index from an address into a > helper before we change that function. > > Signed-off-by: Richard Henderson > --- > > Emilio, this should make your dynamic tlb sizing patch 1/6 > significantly smaller

Re: [Qemu-devel] [RFC 2/6] cputlb: do not evict invalid entries to the vtlb

2018-10-08 Thread Emilio G. Cota
On Mon, Oct 08, 2018 at 12:46:26 -0700, Richard Henderson wrote: > On 10/8/18 7:42 AM, Emilio G. Cota wrote: > > On Sun, Oct 07, 2018 at 19:09:01 -0700, Richard Henderson wrote: > >> On 10/6/18 2:45 PM, Emilio G. Cota wrote: > >>> Currently we evict an entry to the vi

[Qemu-devel] [RFC v2 4/5] cputlb: track TLB use rate

2018-10-08 Thread Emilio G. Cota
This paves the way for implementing a dynamically-sized softmmu. Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 5 + accel/tcg/cputlb.c | 17 ++--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/include/exec/cpu-defs.h b/include/exec/cpu

[Qemu-devel] [RFC v2 2/5] (XXX) cputlb: introduce indirection for TLB size

2018-10-08 Thread Emilio G. Cota
This paves the way for implementing dynamic TLB resizing. XXX: convert other TCG backends Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 10 ++ include/exec/cpu_ldst.h | 14 +- accel/tcg/cputlb.c| 18 +++--- tcg/i386/tcg-target.inc.c

[Qemu-devel] [RFC v2 0/5] Dynamic TLB sizing

2018-10-08 Thread Emilio G. Cota
v1: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01146.html Changes since v1: - Add tlb_index and tlb_entry helpers from Richard - Introduce sizeof_tlb() and tlb_n_entries() - Extract tlb_mask as its own array in CPUArchState, as suggested by Richard. For the associated helpers (t

[Qemu-devel] [RFC v2 3/5] cputlb: do not evict empty entries to the vtlb

2018-10-08 Thread Emilio G. Cota
s keep track of the TLB's use rate. Signed-off-by: Emilio G. Cota --- include/exec/cpu-all.h | 9 + accel/tcg/cputlb.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 117d2fbbca..e21140049b 100644 --- a/in

[Qemu-devel] [RFC v2 1/5] tcg: Add tlb_index and tlb_entry helpers

2018-10-08 Thread Emilio G. Cota
From: Richard Henderson Isolate the computation of an index from an address into a helper before we change that function. Signed-off-by: Richard Henderson [ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ] Signed-off-by: Emilio G. Cota --- accel/tcg/softmmu_template.h

[Qemu-devel] [RFC v2 5/5] cputlb: dynamically resize TLBs based on use rate

2018-10-08 Thread Emilio G. Cota
# ** # ** # ** # ** # * *# * *# * *# * *# * *# * * #* * # | 0 +-+***##***##-**##-**##-**##-**##-***#-***#-***#-***#-***#-***##***##+-+ 401.bzi403.g429445.g456.hm462.libq464.h471.omn4483.xalancbgeomean png: https://imgur.com/a/eXkjMCE After this series, we bring down the average softmmu overhead from 2.77x to 1.80x, with a maximum slowdown of 2.48x (omnetpp).

Re: [Qemu-devel] [RFC v2 0/5] Dynamic TLB sizing

2018-10-09 Thread Emilio G. Cota
On Tue, Oct 09, 2018 at 13:34:40 +0100, Alex Bennée wrote: > > Emilio G. Cota writes: > > > v1: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01146.html > > > > Changes since v1: > > Hmm I'm seeing some qtest failures, for exampl

Re: [Qemu-devel] [RFC v2 0/5] Dynamic TLB sizing

2018-10-09 Thread Emilio G. Cota
On Tue, Oct 09, 2018 at 15:45:36 +0100, Alex Bennée wrote: > > Emilio G. Cota writes: > > > On Tue, Oct 09, 2018 at 13:34:40 +0100, Alex Bennée wrote: > >> > >> Emilio G. Cota writes: > >> > >> > v1: https://lists.gnu.org/archive/html/

Re: [Qemu-devel] [RFC v2 5/5] cputlb: dynamically resize TLBs based on use rate

2018-10-09 Thread Emilio G. Cota
On Tue, Oct 09, 2018 at 15:54:21 +0100, Alex Bennée wrote: > Emilio G. Cota writes: > > +if (new_size == old_size) { > > +return; > > +} > > + > > +g_free(env->tlb_table[mmu_idx]); > > +g_free(env->iotlb[mmu_idx]); > > +

[Qemu-devel] [PATCH v5 1/6] target/alpha: remove tlb_flush from alpha_cpu_initfn

2018-10-09 Thread Emilio G. Cota
As far as I can tell tlb_flush does not need to be called this early. tlb_flush is eventually called after the CPU has been realized. This change paves the way to the introduction of tlb_init, which will be called from cpu_exec_realizefn. Signed-off-by: Emilio G. Cota --- target/alpha/cpu.c

[Qemu-devel] [PATCH v5 4/6] cputlb: fix assert_cpu_is_self macro

2018-10-09 Thread Emilio G. Cota
Reviewed-by: Richard Henderson Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- accel/tcg/cputlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index 502eea2850..f6b388c961 100644 --- a/accel/tcg/cputlb.c +++ b/accel

[Qemu-devel] [PATCH v5 3/6] exec: introduce tlb_init

2018-10-09 Thread Emilio G. Cota
Paves the way for the addition of a per-TLB lock. Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/exec-all.h | 8 accel/tcg/cputlb.c | 4 exec.c | 1 + 3 files changed, 13 insertions(+) diff --git a/include/exec/exec-all.h b/include

[Qemu-devel] [PATCH v5 0/6] per-TLB lock

2018-10-09 Thread Emilio G. Cota
v4: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01421.html Changes since v4: - Add two patches to remove early calls to tlb_flush. You can fetch the series from: https://github.com/cota/qemu/tree/tlb-lock-v5 Thanks, Emilio

[Qemu-devel] [PATCH v5 2/6] target/unicore32: remove tlb_flush from uc32_init_fn

2018-10-09 Thread Emilio G. Cota
As far as I can tell tlb_flush does not need to be called this early. tlb_flush is eventually called after the CPU has been realized. This change paves the way to the introduction of tlb_init, which will be called from cpu_exec_realizefn. Cc: Guan Xuetao Signed-off-by: Emilio G. Cota

[Qemu-devel] [PATCH v5 5/6] cputlb: serialize tlb updates with env->tlb_lock

2018-10-09 Thread Emilio G. Cota
ntire TLB. Tested-by: Alex Bennée Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 3 + accel/tcg/cputlb.c | 155 ++-- 2 files changed, 87 insertions(+), 71 deletions(-) diff --git a/include/exec/cpu-defs.h b/include

[Qemu-devel] [RFC v3 3/5] cputlb: do not evict empty entries to the vtlb

2018-10-09 Thread Emilio G. Cota
s keep track of the TLB's use rate. Reviewed-by: Alex Bennée Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/exec/cpu-all.h | 9 + accel/tcg/cputlb.c | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/exec/cpu-all.h b/include/

[Qemu-devel] [PATCH v5 6/6] cputlb: read CPUTLBEntry.addr_write atomically

2018-10-09 Thread Emilio G. Cota
Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 is the current patch series, i.e. with a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota --- accel/tcg/softmmu_template.h | 16 ++-- include/exec/cpu_

[Qemu-devel] [RFC v3 1/5] tcg: Add tlb_index and tlb_entry helpers

2018-10-09 Thread Emilio G. Cota
From: Richard Henderson Isolate the computation of an index from an address into a helper before we change that function. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson [ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ] Signed-off-by: Emilio G. Cota --- accel/tcg

[Qemu-devel] [RFC v3 5/5] (XXX) cputlb: dynamically resize TLBs based on use rate

2018-10-09 Thread Emilio G. Cota
After this series, we bring down the average softmmu overhead from 2.77x to 1.80x, with a maximum slowdown of 2.48x (omnetpp). Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 39 + accel/tcg/cputlb.c| 41

[Qemu-devel] [RFC v3 0/5] Dynamic TLB sizing

2018-10-09 Thread Emilio G. Cota
v2: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01495.html Changes since v2: - Add R-b's - Apply on top of tlb-lock-v5 series, fixing the alpha boot segfault due to the early tlb_flush + The series now passes `make check-qtest' - Alloc the iotlb with g_new instead of g_new0 -

[Qemu-devel] [RFC v3 4/5] cputlb: track TLB use rate

2018-10-09 Thread Emilio G. Cota
This paves the way for implementing a dynamically-sized softmmu. Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 5 + accel/tcg/cputlb.c | 17 ++--- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/include/exec/cpu

[Qemu-devel] [RFC v3 2/5] (XXX) cputlb: introduce indirection for TLB size

2018-10-09 Thread Emilio G. Cota
This paves the way for implementing dynamic TLB resizing. XXX: convert other TCG backends Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 10 ++ include/exec/cpu_ldst.h | 14 +- accel/tcg/cputlb.c| 18 +++--- tcg/i386/tcg-target.inc.c

Re: [Qemu-devel] [PATCH v5 1/6] target/alpha: remove tlb_flush from alpha_cpu_initfn

2018-10-09 Thread Emilio G. Cota
On Tue, Oct 09, 2018 at 18:55:30 +0100, Peter Maydell wrote: > On 9 October 2018 at 18:45, Emilio G. Cota wrote: (snip) > > @@ -201,7 +201,6 @@ static void alpha_cpu_initfn(Object *obj) > > CPUAlphaState *env = &cpu->env; > > > > cs->

Re: [Qemu-devel] [PATCH v3 0/9] tcg: Reorg 128-bit atomic operations

2018-10-09 Thread Emilio G. Cota
On Wed, Oct 03, 2018 at 14:39:22 -0500, Richard Henderson wrote: (snip) > Richard Henderson (9): > tcg: Split CONFIG_ATOMIC128 > target/i386: Convert to HAVE_CMPXCHG128 > target/arm: Convert to HAVE_CMPXCHG128 > target/arm: Check HAVE_CMPXCHG128 at translate time > target/ppc: Convert to

[Qemu-devel] [Bug 1793119] Re: Wrong floating-point emulation on AArch64 with FPCR set to zero

2018-10-10 Thread Emilio G. Cota
** Changed in: qemu Status: Confirmed => Fix Committed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1793119 Title: Wrong floating-point emulation on AArch64 with FPCR set to zero Status in

[Qemu-devel] [PATCH 0/4] some TCG fixes

2018-10-10 Thread Emilio G. Cota
The first patch we've seen before -- I'm taking it from the atomic interrupt_request series. The other three patches are related to TCG profiling. One of them is a build fix that I suspect has gone unnoticed due to its dependence on CONFIG_PROFILER. The series is checkpatch-clean. You can fetch i

[Qemu-devel] [PATCH 2/4] tcg: fix use of uninitialized variable under CONFIG_PROFILER

2018-10-10 Thread Emilio G. Cota
We forgot to initialize n in commit 15fa08f845 ("tcg: Dynamically allocate TCGOps", 2017-12-29). Signed-off-by: Emilio G. Cota --- tcg/tcg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index f27b22bd3c..8f26916b99 100644 --- a/tcg/tcg.c

[Qemu-devel] [PATCH 4/4] tcg: distribute tcg_time into TCG contexts

2018-10-10 Thread Emilio G. Cota
called "cpu_exec_time", which is more descriptive than "tcg_time". Add a function to query this value directly, and for completeness, fill in the field in tcg_profile_snapshot, even though its callers do not use it. Signed-off-by: Emilio G. Cota --- include/qemu/timer

[Qemu-devel] [PATCH 1/4] tcg: access cpu->icount_decr.u16.high with atomics

2018-10-10 Thread Emilio G. Cota
Consistently access u16.high with atomics to avoid undefined behaviour in MTTCG. Note that icount_decr.u16.low is only used in icount mode, so regular accesses to it are OK. Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- accel/tcg/tcg-all.c | 2 +- accel/tcg/translate

[Qemu-devel] [PATCH 3/4] tcg: plug holes in struct TCGProfile

2018-10-10 Thread Emilio G. Cota
This plugs two 4-byte holes in 64-bit. Signed-off-by: Emilio G. Cota --- tcg/tcg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/tcg.h b/tcg/tcg.h index f9f12378e9..d80ef2a883 100644 --- a/tcg/tcg.h +++ b/tcg/tcg.h @@ -633,8 +633,8 @@ typedef struct TCGProfile

[Qemu-devel] [PATCH v4 2/3] tcg: introduce dynamic TLB sizing

2018-10-12 Thread Emilio G. Cota
Disable for all TCG backends for now. Signed-off-by: Emilio G. Cota --- include/exec/cpu-defs.h | 43 +++- include/exec/cpu_ldst.h | 21 ++ tcg/aarch64/tcg-target.h | 1 + tcg/arm/tcg-target.h | 1 + tcg/i386/tcg-target.h| 1 + tcg/mips/tcg-target.h| 1 + tcg

[Qemu-devel] [PATCH v4 1/3] cputlb: do not evict empty entries to the vtlb

2018-10-12 Thread Emilio G. Cota
s keep track of the TLB's use rate, which we'll use to implement a policy for dynamic TLB sizing. Reviewed-by: Alex Bennée Reviewed-by: Richard Henderson Signed-off-by: Emilio G. Cota --- include/exec/cpu-all.h | 9 + accel/tcg/cputlb.c | 2 +- 2 files changed, 10 insertion

[Qemu-devel] [PATCH v4 3/3] tcg/i386: enable dynamic TLB sizing

2018-10-12 Thread Emilio G. Cota
#* * # ** # ** # ** # ** # * *# * *# * *# * *# * *# * * #* * # | 0 +-+***##***##-**##-**##-**##-**##-***#-***#-***#-***#-***#-***##***##+-+ 401.bzi403.g429445.g456.hm462.libq464.h471.omn4483.xalancbgeomean png: https://imgur.com/a/eXkjMCE After this series, we bring down the average softmmu overhead from 2.77x to 1.80x, with

[Qemu-devel] [PATCH v4 0/3] Dynamic TLB sizing

2018-10-12 Thread Emilio G. Cota
RFC v3: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01753.html Changes since RFC v3: - This is now a proper patch series, since it should not (knowingly) break anything. - Rebase on top of rth's tcg-next (ffd8994b90f5), which includes patch 1 from RFC v3. - Make the feature opt

[Qemu-devel] [PATCH v5 03/13] target/tricore: use float32_is_denormal

2018-10-13 Thread Emilio G. Cota
Reviewed-by: Bastian Koppelmann Signed-off-by: Emilio G. Cota --- target/tricore/fpu_helper.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/target/tricore/fpu_helper.c b/target/tricore/fpu_helper.c index df162902d6..31df462e4a 100644 --- a/target/tricore

[Qemu-devel] [PATCH v5 04/13] softfloat: rename canonicalize to sf_canonicalize

2018-10-13 Thread Emilio G. Cota
an Koppelmann Tested-by: Bastian Koppelmann Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 46ae206172..0cbb08be32 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -336,8 +33

[Qemu-devel] [PATCH v5 01/13] fp-test: pick TARGET_ARM to get its specialization

2018-10-13 Thread Emilio G. Cota
F v [...] - After: In 6133248 tests, no errors found in f64_mulAdd, rounding near_even, tininess before rounding. [...] Signed-off-by: Emilio G. Cota --- tests/fp/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/fp/Makefile b/tests/fp/Makefile index d649a5a1db..49cdcd1bd2 100644 --

[Qemu-devel] [PATCH v5 05/13] softfloat: add float{32, 64}_is_zero_or_normal

2018-10-13 Thread Emilio G. Cota
These will gain some users very soon. Signed-off-by: Emilio G. Cota --- include/fpu/softfloat.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 9eeccd88a5..38a5e99cf3 100644 --- a/include/fpu/softfloat.h +++ b/include/fpu

[Qemu-devel] [PATCH v5 02/13] softfloat: add float{32, 64}_is_{de, }normal

2018-10-13 Thread Emilio G. Cota
This paves the way for upcoming work. Reviewed-by: Bastian Koppelmann Reviewed-by: Alex Bennée Signed-off-by: Emilio G. Cota --- include/fpu/softfloat.h | 20 1 file changed, 20 insertions(+) diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h index 8fd9f9bbae

[Qemu-devel] [PATCH v5 00/13] hardfloat

2018-10-13 Thread Emilio G. Cota
v4: https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg02960.html Changes since v4: - Rebase on current master (a73549f99). - Add a patch for fp-test to pick a specialization; this gets rid of the muladd errors, since our default "no specialization" does not raise invalid when one of t

[Qemu-devel] [PATCH v5 10/13] hardfloat: implement float32/64 division

2018-10-13 Thread Emilio G. Cota
G. Cota --- fpu/softfloat.c | 88 +++-- 1 file changed, 86 insertions(+), 2 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index 78837fa9d8..8ef0571c6e 100644 --- a/fpu/softfloat.c +++ b/fpu/softfloat.c @@ -1678,7 +1678,8 @@ float16

[Qemu-devel] [PATCH v5 09/13] hardfloat: implement float32/64 multiplication

2018-10-13 Thread Emilio G. Cota
: mul-single: 73.41 MFlops mul-double: 76.93 MFlops 3. IBM POWER8E @ 2.1 GHz - before: mul-single: 58.40 MFlops mul-double: 59.33 MFlops - after: mul-single: 60.25 MFlops mul-double: 94.79 MFlops Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 66

[Qemu-devel] [PATCH v5 07/13] fpu: introduce hardfloat

2018-10-13 Thread Emilio G. Cota
ag to disable hardfloat. In the long run though it would be good to fix the targets so that at least the inexact flag passed to softfloat is indeed sticky. Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 341 1 file changed, 341 insertions(+)

[Qemu-devel] [PATCH v5 13/13] hardfloat: implement float32/64 comparison

2018-10-13 Thread Emilio G. Cota
4 +-+..@@@&==@.@&.=.+before +-+ 3 +-+..@.@&.=@.@&.=.+ad@@@&== +-+ 2.5 +-+.##$$%%.@&.=@.@&.=.....+ @m@& = +-+ 2 +-+@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& =

[Qemu-devel] [PATCH v5 08/13] hardfloat: implement float32/64 addition and subtraction

2018-10-13 Thread Emilio G. Cota
machine, having 2F64 set to 1 pays off, but it doesn't for 2F32: - Intel i7-6700K: add-single: [1] 285.79 vs [0] 426.70 MFlops add-double: [1] 302.15 vs [0] 278.82 MFlops Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 106 1 file change

[Qemu-devel] [PATCH v5 12/13] hardfloat: implement float32/64 square root

2018-10-13 Thread Emilio G. Cota
23% slower for single precision, with it enabled, and 17% slower for double precision. Signed-off-by: Emilio G. Cota --- fpu/softfloat.c | 73 +++-- 1 file changed, 71 insertions(+), 2 deletions(-) diff --git a/fpu/softfloat.c b/fpu/softfloat.c index

[Qemu-devel] [PATCH v5 06/13] tests/fp: add fp-bench

2018-10-13 Thread Emilio G. Cota
r-mode). Signed-off-by: Emilio G. Cota --- tests/fp/fp-bench.c | 630 tests/fp/.gitignore | 1 + tests/fp/Makefile | 5 +- 3 files changed, 635 insertions(+), 1 deletion(-) create mode 100644 tests/fp/fp-bench.c diff --git a/tests/fp/fp-b

[Qemu-devel] [PATCH v5 11/13] hardfloat: implement float32/64 fused multiply-add

2018-10-13 Thread Emilio G. Cota
: fma-single: 66.14 MFlops fma-double: 63.10 MFlops 3. IBM POWER8E @ 2.1 GHz - before: fma-single: 37.26 MFlops fma-double: 37.29 MFlops - after: fma-single: 48.90 MFlops fma-double: 59.51 MFlops Here having 3FP64 set to 1 pays off for x86_64: [1] 170.15 vs [0] 153.12 MFlops Signed-off-by: Emilio G

Re: [Qemu-devel] [PATCH v3 4/4] cputlb: read CPUTLBEntry.addr_write atomically

2018-10-16 Thread Emilio G. Cota
On Tue, Oct 16, 2018 at 08:03:03 +0200, Paolo Bonzini wrote: > On 16/10/2018 04:52, Richard Henderson wrote: > > On 10/5/18 2:14 PM, Emilio G. Cota wrote: > >> -target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write; > >> +target_ulong tlb_addr =

[Qemu-devel] [PATCH tcg-next] cputlb: read CPUTLBEntry.addr_write atomically

2018-10-16 Thread Emilio G. Cota
https://imgur.com/a/BHzpPTW Notes: - tlb-lock-v2 corresponds to an implementation with a mutex. - tlb-lock-v3 corresponds to the current implementation, i.e. a spinlock and a single lock acquisition in tlb_set_page_with_attrs. Signed-off-by: Emilio G. Cota --- accel/tcg/softmmu_template.h | 12 +

Re: [Qemu-devel] [PATCH 1/4] ptr_ring: port ptr_ring from linux kernel to QEMU

2018-10-16 Thread Emilio G. Cota
On Tue, Oct 16, 2018 at 19:10:03 +0800, guangrong.x...@gmail.com wrote: (snip) > diff --git a/include/qemu/ptr_ring.h b/include/qemu/ptr_ring.h > new file mode 100644 > index 00..d8266d45f6 > --- /dev/null > +++ b/include/qemu/ptr_ring.h > @@ -0,0 +1,235 @@ (snip) > +#define SMP_CACHE_BYTES

Re: [Qemu-devel] [PULL 2/7] tests/migration: Enable the migration test on s390x, too

2018-10-17 Thread Emilio G. Cota
On Thu, Oct 11, 2018 at 20:25:08 +0100, Dr. David Alan Gilbert (git) wrote: > From: Thomas Huth > > We can re-use the s390-ccw bios code to implement a small firmware > for a s390x guest which prints out the "A" and "B" characters and > modifies the memory, as required for the migration test. >

Re: [Qemu-devel] [PULL 2/7] tests/migration: Enable the migration test on s390x, too

2018-10-18 Thread Emilio G. Cota
On Thu, Oct 18, 2018 at 14:38:01 +0200, Thomas Huth wrote: > On 2018-10-17 21:28, Emilio G. Cota wrote: > > Can anyone reproduce this? Otherwise, let me know what other info > > I could provide. > > I've finally been able to reproduce it - seems like it only happens her

[Qemu-devel] [RFC v3 11/56] sh4: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Aurelien Jarno Signed-off-by: Emilio G. Cota --- target/sh4/op_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c index 4f825bae5a..57cc363ccc 100644 --- a/target/sh4/op_helper.c +++ b/target/sh4/op_helper.c @@ -105,7

[Qemu-devel] [RFC v3 03/56] cpu: introduce cpu_mutex_lock/unlock

2018-10-18 Thread Emilio G. Cota
The few direct users of &cpu->lock will be converted soon. Cc: Peter Crosthwaite Cc: Richard Henderson Signed-off-by: Emilio G. Cota --- include/qom/cpu.h | 26 cpus.c | 48 +++-- stubs/cpu-lock.c

[Qemu-devel] [RFC v3 06/56] cpu: introduce process_queued_cpu_work_locked

2018-10-18 Thread Emilio G. Cota
It will gain a user once we protect more of CPUState under cpu->lock. This completes the conversion to cpu_mutex_lock/unlock in the file. Signed-off-by: Emilio G. Cota --- include/qom/cpu.h | 9 + cpus-common.c | 17 +++-- 2 files changed, 20 insertions(+), 6 deleti

[Qemu-devel] [RFC v3 05/56] cpu: move run_on_cpu to cpus-common

2018-10-18 Thread Emilio G. Cota
We don't pass a pointer to qemu_global_mutex anymore. Cc: Peter Crosthwaite Cc: Richard Henderson Signed-off-by: Emilio G. Cota --- include/qom/cpu.h | 10 -- cpus-common.c | 2 +- cpus.c| 5 - 3 files changed, 1 insertion(+), 16 deletions(-) diff --

[Qemu-devel] [RFC v3 08/56] cpu: define cpu_halted helpers

2018-10-18 Thread Emilio G. Cota
cpu->halted will soon be protected by cpu->lock. We will use these helpers to ease the transition, since right now cpu->halted has many direct callers. Signed-off-by: Emilio G. Cota --- include/qom/cpu.h | 24 1 file changed, 24 insertions(+) diff --git a/in

[Qemu-devel] [RFC v3 21/56] openrisc: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Stafford Horne Signed-off-by: Emilio G. Cota --- target/openrisc/sys_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/openrisc/sys_helper.c b/target/openrisc/sys_helper.c index b66a45c1e0..ab4d8fb520 100644 --- a/target/openrisc/sys_helper.c +++ b/target

[Qemu-devel] [RFC v3 13/56] lm32: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Michael Walle Signed-off-by: Emilio G. Cota --- target/lm32/op_helper.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target/lm32/op_helper.c b/target/lm32/op_helper.c index 234d55e056..392634441b 100644 --- a/target/lm32/op_helper.c +++ b/target/lm32/op_helper.c

[Qemu-devel] [RFC v3 02/56] cpu: rename cpu->work_mutex to cpu->lock

2018-10-18 Thread Emilio G. Cota
This lock will soon protect more fields of the struct. Give it a more appropriate name. Cc: Peter Crosthwaite Cc: Richard Henderson Signed-off-by: Emilio G. Cota --- include/qom/cpu.h | 5 +++-- cpus-common.c | 14 +++--- cpus.c| 4 ++-- qom/cpu.c | 2 +- 4

[Qemu-devel] [RFC v3 16/56] riscv: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Michael Clark Cc: Palmer Dabbelt Cc: Sagar Karandikar Cc: Bastian Koppelmann Cc: Alistair Francis Signed-off-by: Emilio G. Cota --- target/riscv/op_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c index

[Qemu-devel] [RFC v3 04/56] cpu: make qemu_work_cond per-cpu

2018-10-18 Thread Emilio G. Cota
This eliminates the need to use the BQL to queue CPU work. While at it, give the per-cpu field a generic name ("cond") since it will soon be used for more than just queueing CPU work. Cc: Peter Crosthwaite Cc: Richard Henderson Signed-off-by: Emilio G. Cota --- include/qom/

[Qemu-devel] [RFC v3 01/56] cpu: convert queued work to a QSIMPLEQ

2018-10-18 Thread Emilio G. Cota
Instead of open-coding it. While at it, make sure that all accesses to the list are performed while holding the list's lock. Cc: Peter Crosthwaite Cc: Richard Henderson Signed-off-by: Emilio G. Cota --- include/qom/cpu.h | 6 +++--- cpus-common.c | 25 - c

[Qemu-devel] [RFC v3 09/56] arm: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Andrzej Zaborowski Cc: Peter Maydell Cc: qemu-...@nongnu.org Signed-off-by: Emilio G. Cota --- hw/arm/omap1.c| 4 ++-- hw/arm/pxa2xx_gpio.c | 2 +- hw/arm/pxa2xx_pic.c | 2 +- target/arm/arm-powerctl.c | 4 ++-- target/arm/cpu.c | 2 +- target/arm

[Qemu-devel] [RFC v3 15/56] mips: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Aurelien Jarno Cc: Aleksandar Markovic Cc: James Hogan Signed-off-by: Emilio G. Cota --- hw/mips/cps.c | 2 +- hw/misc/mips_itu.c | 4 ++-- target/mips/kvm.c | 2 +- target/mips/op_helper.c | 8 target/mips/translate.c | 4 ++-- 5 files changed, 10 insertions

[Qemu-devel] [RFC v3 19/56] xtensa: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
Cc: Max Filippov Signed-off-by: Emilio G. Cota --- target/xtensa/cpu.c | 2 +- target/xtensa/helper.c| 2 +- target/xtensa/op_helper.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c index a54dbe4260..d4ca35e6cc 100644

[Qemu-devel] [RFC v3 10/56] ppc: convert to cpu_halted

2018-10-18 Thread Emilio G. Cota
In ppce500_spin.c, acquire the lock just once to update both cpu->halted and cpu->stopped. Cc: David Gibson Cc: Alexander Graf Cc: qemu-...@nongnu.org Signed-off-by: Emilio G. Cota --- target/ppc/helper_regs.h| 2 +- hw/ppc/e500.c | 4 ++-- hw/ppc

<    5   6   7   8   9   10   11   12   13   14   >