On Tue, Jul 04, 2017 at 13:12:02 +0200, Paolo Bonzini wrote:
> Reviewed-by: Richard Henderson
> Signed-off-by: Paolo Bonzini
> ---
> accel/tcg/translate-all.c | 5 +
> hmp-commands-info.hx | 4
> monitor.c | 2 ++
> 3 files changed, 11 insertions(+)
>
> diff --git
On Fri, Jul 07, 2017 at 05:46:19 -1000, Richard Henderson wrote:
> I do wonder if we should provide a generic empty hook, so that a target that
> does not need a particular hook need not define an empty function. It could
> just put e.g. "translator_noop" into the structure. Ok, maybe a better na
Signed-off-by: Emilio G. Cota
---
tcg/i386/tcg-target.inc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 01e3b4e..06df01a 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2514,7 +2514,7
t size,
as well as the expansion ratio.
In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.
Signed-off-by: Emilio G. Cota
---
to them.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 38
accel/tcg/translate-all.c | 23 +-
tcg/tcg.c | 108 ++
3 files changed, 124 insertions(+), 45 deletions(-)
diff --git a/tcg/tcg.h b
It is only used by this object, and it's not exported to any other.
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 72ce445..2fa9f65 100644
--- a/acce
h later by configure_accelerator().
Fix it by unconditionally exiting if the flag is passed to a QEMU binary
built with !CONFIG_TCG.
Signed-off-by: Emilio G. Cota
---
vl.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/vl.c b/vl.c
index d17c863..9ece570 100644
--- a/vl.c
+++ b/vl.c
@
Before we make TCGContext thread-local.
Signed-off-by: Emilio G. Cota
---
include/exec/gen-icount.h | 7 +++
tcg/tcg.h | 2 ++
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 9b3cb14..489aff7 100644
Signed-off-by: Emilio G. Cota
---
tcg/mips/tcg-target.inc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 8cff9a6..790b4fc 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -2323,7 +2323,7
r not to lose counts we'd either have to use atomic ops
or distribute the counter, which is more scalable.
This patch does the latter by embedding tlb_flush_count in CPUArchState.
The global count is then easily obtained by iterating over the CPU list.
Signed-off-by: Emilio G. Cota
---
inclu
get MTTCG working with TCI.
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 4 +++-
tcg/tcg.h | 12 +---
accel/tcg/translate-all.c | 20 +---
cpus.c| 3 +++
tcg/optimize.c| 4 ++--
tcg/tcg.c
e check also
returns with tb_lock held. So we can either do the check before tb_lock is
acquired, or just get rid of it. Given that it is redundant, I am going
for the latter option.
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 5 -
1 file changed, 5 deletions(-)
diff --git a/
20% ) (83.26%)
30.340574988 seconds time elapsed
( +- 0.39% )
That is, a speedup of 1.25X.
Signed-off-by: Emilio G. Cota
---
accel/tcg/cpu-exec.c | 7 ++-
accel/tcg/translate-all.c | 22 ++
2 files changed, 28 insertions(+
Original RFC here:
https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg06874.html
I included Richard's feedback (Thanks!) from the original RFC, and
added quite a few things. This is now a proper PATCHset since it is
a lot more mature.
Highlights:
- It works! I tested single/multi-threa
avg cycles111.0
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a936a5f..72ce445 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/trans
Before TCGContext is made thread-local.
The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 2 --
tcg/tcg.c | 10 +-
2 files changed, 5 insertions(+), 7 deletions(-)
diff
Before TCGContext is made thread-local.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/tb-context.h | 2 ++
tcg/tcg.h | 2 --
accel/tcg/cpu-exec.c | 2 +-
accel/tcg/translate-all.c | 57
dicted.
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a388756..fd20bca 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -327,8 +327,6 @@
Will come in handy very soon.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index c19c473..2f003a0 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static int tcg_target_const_match
ut in some systems that would take > 1 byte.
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 8326e7d..a388756 100644
--- a/include/exec/exec-all.h
+++ b/include/ex
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 8096d64..8326e7d 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec
Before we make TCGContext thread-local.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 1 +
tcg/tcg.c | 14 ++
2 files changed, 15 insertions(+)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 2a64ee2..be5f3fd 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -778,6 +778,7 @@ static inline void
that trivial, because vCPUs are spawned in
parallel. So let's just keep it simple and use a list protected by a lock.
Note that this lock will soon be used for other purposes, hence the
generic "tcg_lock" name.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 3 +++
10,803,480,261 branches # 1326.192 M/sec
( +- 1.95% )
195,601,289 branch-misses #1.81% of all branches
( +- 0.39% )
8.828660235 seconds time elapsed
( +- 0.38% )
Signed-off-b
leading to many unnecessary flushes.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 8 +++
accel/tcg/translate-all.c | 61
bsd-user/main.c | 1 +
linux-user/main.c | 1 +
tcg/tcg.c | 175
On Thu, Jul 06, 2017 at 16:26:52 -0400, Emilio G. Cota wrote:
> On Tue, Jul 04, 2017 at 13:12:02 +0200, Paolo Bonzini wrote:
> > Reviewed-by: Richard Henderson
> > Signed-off-by: Paolo Bonzini
(snip)
> > +++ b/accel/tcg/translate-all.c
> > @@ -1851,6 +1851,11 @@
On Sun, Jul 09, 2017 at 03:49:52 -0400, Emilio G. Cota wrote:
> The series applies on top of the current master (b11365867568).
It's a lot of patches -- you can fetch them from:
https://github.com/cota/qemu/commits/multi-tcg
Note that there's a patch in the branch there that is no
On Sun, Jul 09, 2017 at 10:00:01 -1000, Richard Henderson wrote:
> On 07/08/2017 09:49 PM, Emilio G. Cota wrote:
> >+atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
>
> Want atomic_read here, so they're all the same.
It's not needed. Note th
On Sun, Jul 09, 2017 at 10:33:41 -1000, Richard Henderson wrote:
> On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> > #if defined(DEBUG_TB_FLUSH)
> >+nb_tbs = g_tree_nnodes(tcg_ctx.tb_ctx.tb_tree);
> > printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n&
On Sun, Jul 09, 2017 at 10:48:27 -1000, Richard Henderson wrote:
> On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> >@@ -409,6 +411,18 @@ void tcg_context_init(TCGContext *s)
> > }
> > /*
> >+ * Clone the initial TCGContext. Used by TCG threads to copy the TCGContext
&g
On Sun, Jul 09, 2017 at 10:45:55 -1000, Richard Henderson wrote:
> On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> >+/* includes aborted translations because of exceptions */
> >+atomic_set(&prof->tb_count1, prof->tb_count1 + 1);
>
> Again, atomic_set w
On Sun, Jul 09, 2017 at 16:56:23 -0400, Emilio G. Cota wrote:
> On Sun, Jul 09, 2017 at 10:00:01 -1000, Richard Henderson wrote:
> > On 07/08/2017 09:49 PM, Emilio G. Cota wrote:
> > >+atomic_set(&env->tlb_flush_count, env->tlb_flush_count + 1);
> >
> &g
On Sun, Jul 09, 2017 at 11:19:37 -1000, Richard Henderson wrote:
> On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> >This allows us to generate TCG code in parallel. MTTCG already uses
> >it, although the next commit pushes down a lock to actually
> >perform parallel generation
On Sun, Jul 09, 2017 at 11:38:50 -1000, Richard Henderson wrote:
> On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
(snip)
> I think it would be better to have a tb_htable_lookup_or_insert function,
> which performs the insert iff a matching object isn't already there,
> returning th
On Sun, Jul 09, 2017 at 11:48:53 -1000, Richard Henderson wrote:
> On 07/09/2017 11:29 AM, Emilio G. Cota wrote:
(snip)
> >Exactly. Also, in user-mode "vCPU threads" (i.e. host threads) come and
> >go all the time, so this doesn't work well with having a single
>
On Sun, Jul 09, 2017 at 19:59:47 -1000, Richard Henderson wrote:
> On 07/09/2017 05:51 PM, Emilio G. Cota wrote:
> >On Sun, Jul 09, 2017 at 11:38:50 -1000, Richard Henderson wrote:
> >>On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> >(snip)
> >>I t
On Sun, Jul 09, 2017 at 11:44:10 -1000, Richard Henderson wrote:
> On 07/09/2017 11:14 AM, Emilio G. Cota wrote:
> >On Sun, Jul 09, 2017 at 10:45:55 -1000, Richard Henderson wrote:
> >>On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> >>>+/* includes aborted tran
On Mon, Jul 10, 2017 at 14:05:01 +0200, Paolo Bonzini wrote:
> On 09/07/2017 09:50, Emilio G. Cota wrote:
> > User-mode is kept out of this: contention due to concurrent translation
> > is more commonly found in full-system mode.
>
> Out of curiosity, is it harder or you
On Mon, Jul 10, 2017 at 17:33:07 -0400, Paolo Bonzini wrote:
>
> > I agree that it would be nice to have the same mechanism for all.
> >
> > The main hurdle I see is how to allow for concurrent code generation while
> > minimizing flushes of the single, fixed-size[*] code_gen_buffer.
> > In user-
On Sun, Jul 09, 2017 at 10:11:21 -1000, Richard Henderson wrote:
> On 07/08/2017 09:50 PM, Emilio G. Cota wrote:
> >To avoid wasting a byte. I don't have any use in mind for this byte,
> >but I think it's good to leave this byte explicitly free for future use.
> >
On Mon, Aug 07, 2017 at 19:52:16 -0400, Emilio G. Cota wrote:
> This series applies on top of the "multiple TCG contexts" series, v4:
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06769.html
>
> Highlights:
>
> - First, fix a few typos I encountered wh
On Fri, Jul 28, 2017 at 19:05:43 +0300, Lluís Vilanova wrote:
> As for the (minimum) requirements I've collected:
>
> * Peek at register values and guest memory.
> * Enumerate guest cpus.
> * Control when events are raised to minimize overheads (e.g., avoid generating
> TCG code to trace a guest
On Thu, Aug 03, 2017 at 12:54:57 +0100, Stefan Hajnoczi wrote:
> > > Please post an example of the API you'd like.
> >
> > In my opinion, the instrumentation support in this series provides an API
> > that
> > works in the opposite way you're suggesting (let's ignore the fact that it's
> > built
On Sun, Jul 30, 2017 at 17:08:18 +0300, Lluís Vilanova wrote:
> The hypertrace channel allows guest code to emit events in QEMU (the host)
> using
> its tracing infrastructure (see "docs/trace.txt"). This works in both 'system'
> and 'user' modes, is architecture-agnostic and introduces minimal no
On Sun, Aug 27, 2017 at 23:53:25 -0400, Pranith Kumar wrote:
> Using heaptrack, I found that quite a few of our temporary allocations
> are coming from allocating work items. Instead of doing this
> continously, we can cache the allocated items and reuse them instead
> of freeing them.
>
> This re
On Mon, Jul 17, 2017 at 19:09:28 -1000, Richard Henderson wrote:
> On 07/16/2017 10:04 AM, Emilio G. Cota wrote:
> >+#ifdef CONFIG_SOFTMMU
> >+/*
> >+ * It is likely that some vCPUs will translate more code than others, so we
> >+ * first try to set more regions than s
On Mon, Jul 17, 2017 at 19:25:14 -1000, Richard Henderson wrote:
> On 07/16/2017 10:04 AM, Emilio G. Cota wrote:
> >+
> >+/* claim the first free pointer in tcg_ctxs and increment n_tcg_ctxs */
> >+for (i = 0; i < smp_cpus; i++) {
> >+if (atomic
on such that it is
> just two loads and one mask. Which is one practically-free mask
> away from being as minimal as one can get.
Tested-by: Emilio G. Cota
for the series.
I tried to get some perf numbers but really booting linux
doesn't spend much time in lookup_tb_ptr, nor does dbt
On Mon, Jul 17, 2017 at 12:16:32 +0200, Gerd Hoffmann wrote:
> Based on a old patch by Laszlo.
> Time to get this in ...
>
> Signed-off-by: Gerd Hoffmann
> ---
> scripts/git.orderfile | 29 +++++
Reviewed-by: Emilio G. Cota
Been using this orderfile
On Mon, Jul 17, 2017 at 19:29:57 -1000, Richard Henderson wrote:
> On 07/17/2017 06:54 PM, Emilio G. Cota wrote:
> >What threw me off was that in lookup_tb_ptr we're not checking tb->invalid,
> >and that biased me into thinking that it's not needed. But I should have
&
Reusing the have_tb_lock name, which is also defined in translate-all.c,
makes code reviewing unnecessarily harder.
Avoid potential confusion by renaming the local have_tb_lock variable
to something else.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
accel/tcg/cpu-exec.c
v2:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04749.html
v3 applies on top of the current master (d4e59218a).
To ease review/testing, you can pull this series from:
https://github.com/cota/qemu/tree/multi-tcg-v3
Note: I cannot even compile-test _WIN32 bits, help appreciated! S
It is only used by this object, and it's not exported to any other.
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/accel/tcg/translate-all.c b/acce
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Reviewed-by: Philippe Mathieu-Daudé
Signed-off-by: Emilio G. Cota
---
tcg/i386/tcg-target.inc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 01e3b4e
Groundwork for supporting multiple TCG contexts.
The hash table becomes read-only after it is filled in,
so we can save space by keeping just a global pointer to it.
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 2 --
tcg/tcg.c | 10
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
target/i386/translate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/i386/translate.c b/target/i386
on
Signed-off-by: Emilio G. Cota
---
tcg/tcg-op.h | 4 ++--
tcg/tcg-runtime.h | 2 +-
target/alpha/translate.c | 2 +-
target/arm/translate-a64.c | 4 ++--
target/arm/translate.c | 5 +
target/hppa/translate.c| 6 +++---
target/i386/translate
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Reviewed-by: Philippe Mathieu-Daudé
Signed-off-by: Emilio G. Cota
---
tcg/mips/tcg-target.inc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 85756b8
Thereby decoupling the resulting translated code from the current state
of the system.
Signed-off-by: Emilio G. Cota
---
target/m68k/helper.h| 1 +
target/m68k/op_helper.c | 33 -
target/m68k/translate.c | 12 ++--
3 files changed, 31 insertions
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
target/sparc/translate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/sparc/translate.c b/target/sparc
updating the accessors to
tlb_flush_count to use atomic_read/set whenever there may be conflicting
accesses (as defined in C11) to it.
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Signed-off-by: Emilio G. Cota
---
include/exec/cpu-defs.h | 1 +
include/exec/cputlb.h | 3 +--
accel
avg cycles111.0
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Reviewed-by: Philippe Mathieu-Daudé
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/trans
This gets rid of a hole in struct TranslationBlock.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 3 +--
include/exec/tb-lookup.h | 2 +-
accel/tcg/cpu-exec.c | 4 ++--
accel/tcg/translate-all.c | 3 +--
4 files changed, 5 insertions(+), 7
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
target/arm/helper-a64.h| 4
target/arm/helper-a64.c| 38 --
target/arm/op_helper.c | 7
This gets rid of some ifdef checks while ensuring that the debug code
is compiled, which prevents bit rot.
Suggested-by: Alex Bennée
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 20 +---
1 file changed, 13 insertions(+), 7
Thereby decoupling the resulting translated code from the current state
of the system.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
target/hppa/helper.h| 2 ++
target/hppa/op_helper.c | 32
target/hppa/translate.c | 12 ++--
3
isters, which results in a 4-byte hole
in TCGContext. Use this hole for the bit we need, which we store in a bool.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 1 +
accel/tcg/translate-all.c | 1 +
tcg/tcg-op.c | 10 +-
3 files c
This gets rid of an ifdef check while ensuring that the debug code
is compiled, which prevents bit rot.
Suggested-by: Alex Bennée
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 87b1b74..69c1b36 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec
.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
accel/tcg/cpu-exec.c | 30 ++
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index b71e015..526cab3 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel
Thereby decoupling the resulting translated code from the current state
of the system.
Signed-off-by: Emilio G. Cota
---
target/sh4/translate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 9fcaefd..52fabb3 100644
--- a
We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.
Suggested-by: Alex Bennée
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 2 +-
accel/tcg/cpu-exec.c | 2 +-
accel/tcg/translate-
t size,
as well as the expansion ratio.
In the future we might want to consider reporting the accurate numbers for
the total translated code, together with a "bookkeeping/overhead" field to
account for the TB structs.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
elems)
or having to check with ifdef's for usermode/softmmu.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.c | 4
1 file changed, 4 insertions(+)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f907c47..2217314 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -115,6 +115,8 @@ static int tcg_target_c
: Emilio G. Cota
---
accel/tcg/translate-all.c | 28 ++--
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 962e9b3..845585b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -82,6
]
261,962,386 branch-misses #2.03% of all branches
( +- 0.71% ) [83.35%]
19.700174670 seconds time elapsed
( +- 0.56% )
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 20 ++--
accel/tcg/cpu-exec.c | 6 +++---
acce
or target 'accel/tcg/translate-all.o'
failed
make[1]: *** [accel/tcg/translate-all.o] Error 1
Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
make: *** [subdir-mipsn32-linux-user] Error 2
cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$
Reviewed-by: Richard
Signed-off-by: Emilio G. Cota
---
include/qemu/osdep.h | 2 ++
util/osdep.c | 41 +
2 files changed, 43 insertions(+)
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 0cba871..2c7d7db 100644
--- a/include/qemu/osdep.h
+++ b/include
Groundwork for supporting multiple TCG contexts.
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Signed-off-by: Emilio G. Cota
---
include/exec/tb-context.h | 2 ++
tcg/tcg.h | 2 --
accel/tcg/cpu-exec.c | 2 +-
accel/tcg/translate-all.c | 57
Groundwork for supporting multiple TCG contexts.
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Signed-off-by: Emilio G. Cota
---
include/exec/gen-icount.h | 7 +++
tcg/tcg.h | 2 ++
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/exec/gen
The helpers require the address and size to be page-aligned, so
do that before calling them.
Signed-off-by: Emilio G. Cota
---
accel/tcg/translate-all.c | 61 ++-
1 file changed, 13 insertions(+), 48 deletions(-)
diff --git a/accel/tcg/translate
-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/cpu-all.h | 2 --
include/qemu/osdep.h | 6 ++
exec.c | 4
util/pagesize.c| 18 ++
util/Makefile.objs | 1 +
5 files changed, 25 insertions(+), 6 deletions(-)
create
flags" inline.
Signed-off-by: Emilio G. Cota
---
include/exec/exec-all.h | 20 +++-
include/exec/tb-hash-xx.h | 9 ++---
include/exec/tb-hash.h| 4 ++--
include/exec/tb-lookup.h | 6 +++---
tcg/tcg.h | 1 -
accel/tcg/cpu-exec.c
$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z]*)->tb->cflags/tb_cflags($1->tb)/g' $FILES
Then manually fixed the few errors that checkpatch reported.
Compile-tested for all targets.
Suggested-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/gen-icount.
accesses (as defined
in C11) to them.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 38 +---
accel/tcg/translate-all.c | 23 +-
tcg/tcg.c | 110 ++
3 files changed, 126
avg_tb_size=364
That is, 20 flushes. Note how a static partitioning approach uses
the code buffer poorly, leading to many unnecessary flushes.
Signed-off-by: Emilio G. Cota
---
tcg/tcg.h | 6 ++
accel/tcg/translate-all.c | 63 +---
bsd-user/main.c
Thereby decoupling the resulting translated code from the current state
of the system.
Signed-off-by: Emilio G. Cota
---
target/s390x/helper.h | 4 +++
target/s390x/mem_helper.c | 80 +--
target/s390x/translate.c | 26 ---
3 files
Will come in handy very soon.
Reviewed-by: Richard Henderson
Reviewed-by: Alex Bennée
Signed-off-by: Emilio G. Cota
---
tcg/tcg.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0ddd0dc..cb4ecbd 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
Groundwork for supporting multiple TCG contexts.
Compile-tested for all targets on an x86_64 host.
Suggested-by: Richard Henderson
Acked-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
tcg/tci.c | 552 +++---
1 file changed, 279
ild_trampolines),
use_vis3_instructions
Only written by tcg_prologue_init:
- 'struct jit_code_entry one_entry'
- aarch64: tb_ret_addr
- arm: tb_ret_addr
- i386: tb_ret_addr, guest_base_flags
- ia64: tb_ret_addr
- mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr
Signed-off-by: Emili
l branches
( +- 0.95% ) [83.31%]
20.601366430 seconds time elapsed
( +- 0.60% )
That is, 1.77% slowdown.
Suggested-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
tcg/optimize.c | 307 ++
( +- 1.95% )
195,601,289 branch-misses #1.81% of all branches
( +- 0.39% )
8.828660235 seconds time elapsed
( +- 0.38% )
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/ex
- in this case &tcg_init_ctx.
Reviewed-by: Richard Henderson
Signed-off-by: Emilio G. Cota
---
include/exec/gen-icount.h | 10 ++---
include/exec/helper-gen.h | 12 +++---
tcg/tcg-op.h | 80 +--
tcg/tcg.h | 15 +++--
On Wed, Jul 19, 2017 at 22:04:50 -1000, Richard Henderson wrote:
> On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> >+/* We do not yet support multiple TCG contexts, so use one region for
> >now */
> >+n_regions = 1;
> >+
> >+/* start on a pa
On Thu, Jul 20, 2017 at 11:22:10 -1000, Richard Henderson wrote:
> >Perhaps we should then enlarge both the first and last regions so that we
> >fully use the buffer.
>
> I really like the idea. That's a lot of space recovered for 64k page hosts.
>
> I do think we can make the computation cleare
On Wed, Jul 19, 2017 at 21:39:35 -1000, Richard Henderson wrote:
> On 07/19/2017 05:09 PM, Emilio G. Cota wrote:
> >Groundwork for supporting multiple TCG contexts.
> >That is, 2.70% slowdown.
>
> That's disappointing. How about using tcg_malloc?
>
> Maximum al
On Thu, Jul 20, 2017 at 14:02:53 -1000, Richard Henderson wrote:
> On 07/20/2017 01:53 PM, Emilio G. Cota wrote:
> >BTW, is there any chance that the pool will be initialized before we copy
> >tcg_init_ctx? That'd mean the main thread has performed translation, which
> >
tcg_gen_and_i64(tcg_tmp, tcg_tmp, mask);
> +tcg_gen_shli_i64(tcg_rd, tcg_rd, 8);
> +tcg_gen_or_i64(tcg_rd, tcg_rd, tcg_tmp);
>
> tcg_temp_free_i64(tcg_tmp);
const leak! patch below -- cut with `git am --scissors'.
Emilio
---8<---
Signed-off-
v3:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06353.html
To ease review/testing, you can pull this series from:
https://github.com/cota/qemu/tree/multi-tcg-v4
[ head commit: 1d50a9f24e ]
In this iteration I'm sending only the few patches that contain changes
from v3; they are
601 - 700 of 2303 matches
Mail list logo