date:20170127

[Qemu-devel] [RFC PATCH v2 1/2] softfloat: Handle float64 rounding properly for underflow case

2017-01-27 Thread Bharata B Rao

When rounding a floating point result to float64 precision, the
existing code doesn't re-calculate the required round increment
for the underflow case. Fix this.

Signed-off-by: Bharata B Rao 
---
 fpu/softfloat.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index c295f31..b04699c 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -651,6 +651,23 @@ static float64 roundAndPackFloat64(flag zSign, int zExp, 
uint64_t zSig,
 if (isTiny && roundBits) {
 float_raise(float_flag_underflow, status);
 }
+switch (roundingMode) {
+case float_round_nearest_even:
+case float_round_ties_away:
+roundIncrement = 0x200;
+break;
+case float_round_to_zero:
+roundIncrement = 0;
+break;
+case float_round_up:
+roundIncrement = zSign ? 0 : 0x3ff;
+break;
+case float_round_down:
+roundIncrement = zSign ? 0x3ff : 0;
+break;
+default:
+abort();
+}
 }
 }
 if (roundBits) {
-- 
2.7.4

[Qemu-devel] [RFC PATCH v2 0/2] softfloat: Add round-to-odd rounding mode

2017-01-27 Thread Bharata B Rao

Hi,

Here is the next version of round-to-odd rounding mode implementation.

In this version I have addressed the reveiw comments from v1 and added
a new patch to take care of 64 bit rounding in underflow case. This
fix was found necessary when comparing the result of PowerPC ISA 3.0
instruction xscvqpdp between QEMU implementation and a known-good
implementation.

I have tested these patches and compared the results of
xsaddqp[o], xsmulqp[o], xsdivqp[o] and xscvqpdp[0] between QEMU
implementation and a known-good implementation.

I wanted to test with RISU to check if any older floating point
instructions (prior to PowerPC ISA v3.0) were affected by these
rounding changes. But even without my patchset, I am seeing RISU
reporting failures between QEMU implementation and P8 hardware.
While I am investigating the cause for these failures, I also plan
to do RISU verification for ISA 3.0 instructions with a known good
implementation.

Changes in v2:
-
- Do odd or even for the right precision bit in 64bit rounding. (Peter Maydell)
- Handle the overflow case correctly in 64bit rounding. (Peter Maydell)
- Add a patch to handle underflow case correctly in 64bit rounding.
 
v1: http://patchwork.ozlabs.org/patch/717562/

Bharata B Rao (2):
  softfloat: Handle float64 rounding properly for underflow case
  softfloat: Add round-to-odd rounding mode

 fpu/softfloat.c | 34 +-
 include/fpu/softfloat.h |  2 ++
 2 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.7.4

[Qemu-devel] [RFC PATCH v2 2/2] softfloat: Add round-to-odd rounding mode

2017-01-27 Thread Bharata B Rao

Power ISA 3.0 introduces a few quadruple precision floating point
instructions that support round-to-odd rounding mode. The
round-to-odd mode is explained as under:

Let Z be the intermediate arithmetic result or the operand of a convert
operation. If Z can be represented exactly in the target format, the
result is Z. Otherwise the result is either Z1 or Z2 whichever is odd.
Here Z1 and Z2 are the next larger and smaller numbers representable
in the target format respectively.

Signed-off-by: Bharata B Rao 
---
 fpu/softfloat.c | 17 -
 include/fpu/softfloat.h |  2 ++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index b04699c..1c322ad 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -623,6 +623,9 @@ static float64 roundAndPackFloat64(flag zSign, int zExp, 
uint64_t zSig,
 case float_round_down:
 roundIncrement = zSign ? 0x3ff : 0;
 break;
+case float_round_to_odd:
+roundIncrement = (zSig & 0x400) ? 0 : 0x3ff;
+break;
 default:
 abort();
 }
@@ -632,8 +635,10 @@ static float64 roundAndPackFloat64(flag zSign, int zExp, 
uint64_t zSig,
  || (( zExp == 0x7FD )
   && ( (int64_t) ( zSig + roundIncrement ) < 0 ) )
) {
+bool overflow_to_inf = roundingMode != float_round_to_odd &&
+   roundIncrement != 0;
 float_raise(float_flag_overflow | float_flag_inexact, status);
-return packFloat64( zSign, 0x7FF, - ( roundIncrement == 0 ));
+return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));
 }
 if ( zExp < 0 ) {
 if (status->flush_to_zero) {
@@ -665,6 +670,9 @@ static float64 roundAndPackFloat64(flag zSign, int zExp, 
uint64_t zSig,
 case float_round_down:
 roundIncrement = zSign ? 0x3ff : 0;
 break;
+case float_round_to_odd:
+roundIncrement = (zSig & 0x400) ? 0 : 0x3ff;
+break;
 default:
 abort();
 }
@@ -1166,6 +1174,9 @@ static float128 roundAndPackFloat128(flag zSign, int32_t 
zExp,
 case float_round_down:
 increment = zSign && zSig2;
 break;
+case float_round_to_odd:
+increment = !(zSig1 & 0x1) && zSig2;
+break;
 default:
 abort();
 }
@@ -1185,6 +1196,7 @@ static float128 roundAndPackFloat128(flag zSign, int32_t 
zExp,
 if (( roundingMode == float_round_to_zero )
  || ( zSign && ( roundingMode == float_round_up ) )
  || ( ! zSign && ( roundingMode == float_round_down ) )
+ || ( roundingMode == float_round_to_odd )
) {
 return
 packFloat128(
@@ -1232,6 +1244,9 @@ static float128 roundAndPackFloat128(flag zSign, int32_t 
zExp,
 case float_round_down:
 increment = zSign && zSig2;
 break;
+case float_round_to_odd:
+increment = !(zSig1 & 0x1) && zSig2;
+break;
 default:
 abort();
 }
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 842ec6b..8a39028 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -180,6 +180,8 @@ enum {
 float_round_up   = 2,
 float_round_to_zero  = 3,
 float_round_ties_away= 4,
+/* Not an IEEE rounding mode: round to the closest odd mantissa value */
+float_round_to_odd   = 5,
 };
 
 /*
-- 
2.7.4

Re: [Qemu-devel] [RFC PATCH v2 0/2] softfloat: Add round-to-odd rounding mode

2017-01-27 Thread no-reply

Hi,

Your series seems to have some coding style problems. See output below for
more information:

Type: series
Subject: [Qemu-devel] [RFC PATCH v2 0/2] softfloat: Add round-to-odd rounding 
mode
Message-id: 1485504213-21632-1-git-send-email-bhar...@linux.vnet.ibm.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] 
patchew/1485504213-21632-1-git-send-email-bhar...@linux.vnet.ibm.com -> 
patchew/1485504213-21632-1-git-send-email-bhar...@linux.vnet.ibm.com
Switched to a new branch 'test'
3adbab9 softfloat: Add round-to-odd rounding mode
afe7f83 softfloat: Handle float64 rounding properly for underflow case

=== OUTPUT BEGIN ===
Checking PATCH 1/2: softfloat: Handle float64 rounding properly for underflow 
case...
Checking PATCH 2/2: softfloat: Add round-to-odd rounding mode...
ERROR: space prohibited after that '-' (ctx:WxW)
#41: FILE: fpu/softfloat.c:641:
+return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));
   ^

ERROR: space prohibited after that open parenthesis '('
#41: FILE: fpu/softfloat.c:641:
+return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));

ERROR: space prohibited before that close parenthesis ')'
#41: FILE: fpu/softfloat.c:641:
+return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));

ERROR: space prohibited after that open parenthesis '('
#69: FILE: fpu/softfloat.c:1199:
+ || ( roundingMode == float_round_to_odd )

ERROR: space prohibited before that close parenthesis ')'
#69: FILE: fpu/softfloat.c:1199:
+ || ( roundingMode == float_round_to_odd )

total: 5 errors, 0 warnings, 62 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [RFC PATCH v2 0/2] softfloat: Add round-to-odd rounding mode

2017-01-27 Thread Bharata B Rao

On Fri, Jan 27, 2017 at 12:09:13AM -0800, no-re...@patchew.org wrote:
> Hi,
> 
> Your series seems to have some coding style problems. See output below for
> more information:
> 
> Type: series
> Subject: [Qemu-devel] [RFC PATCH v2 0/2] softfloat: Add round-to-odd rounding 
> mode
> Message-id: 1485504213-21632-1-git-send-email-bhar...@linux.vnet.ibm.com
> 
> Checking PATCH 1/2: softfloat: Handle float64 rounding properly for underflow 
> case...
> Checking PATCH 2/2: softfloat: Add round-to-odd rounding mode...
> ERROR: space prohibited after that '-' (ctx:WxW)
> #41: FILE: fpu/softfloat.c:641:
> +return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));
>^
> 
> ERROR: space prohibited after that open parenthesis '('
> #41: FILE: fpu/softfloat.c:641:
> +return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));
> 
> ERROR: space prohibited before that close parenthesis ')'
> #41: FILE: fpu/softfloat.c:641:
> +return packFloat64( zSign, 0x7FF, - ( !overflow_to_inf ));
> 
> ERROR: space prohibited after that open parenthesis '('
> #69: FILE: fpu/softfloat.c:1199:
> + || ( roundingMode == float_round_to_odd )
> 
> ERROR: space prohibited before that close parenthesis ')'
> #69: FILE: fpu/softfloat.c:1199:
> + || ( roundingMode == float_round_to_odd )
> 
> total: 5 errors, 0 warnings, 62 lines checked

fpu/softfloat.c follows a different coding style and hence it made
sense to stick to the existing style in the file.

Regards,
Bharata.

[Qemu-devel] [PATCH] target/sparc: Restore ldstub of odd asis

2017-01-27 Thread Richard Henderson

Fixes the booting of ss20 roms.

Reported-by: Mark Cave-Ayland 
Signed-off-by: Richard Henderson 
---
 target/sparc/translate.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 655060c..aa6734d 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -2448,8 +2448,31 @@ static void gen_ldstub_asi(DisasContext *dc, TCGv dst, 
TCGv addr, int insn)
 gen_ldstub(dc, dst, addr, da.mem_idx);
 break;
 default:
-/* ??? Should be DAE_invalid_asi.  */
-gen_exception(dc, TT_DATA_ACCESS);
+/* ??? In theory, this should be raise DAE_invalid_asi.
+   But the SS-20 roms do ldstuba [%l0] #ASI_M_CTL, %o1.  */
+if (parallel_cpus) {
+gen_helper_exit_atomic(cpu_env);
+} else {
+TCGv_i32 r_asi = tcg_const_i32(da.asi);
+TCGv_i32 r_mop = tcg_const_i32(MO_UB);
+TCGv_i64 s64, t64;
+
+save_state(dc);
+t64 = tcg_temp_new_i64();
+gen_helper_ld_asi(t64, cpu_env, addr, r_asi, r_mop);
+
+s64 = tcg_const_i64(0xff);
+gen_helper_st_asi(cpu_env, addr, s64, r_asi, r_mop);
+tcg_temp_free_i64(s64);
+tcg_temp_free_i32(r_mop);
+tcg_temp_free_i32(r_asi);
+
+tcg_gen_trunc_i64_tl(dst, t64);
+tcg_temp_free_i64(t64);
+
+/* End the TB.  */
+dc->npc = DYNAMIC_PC;
+}
 break;
 }
 }
-- 
2.9.3

Re: [Qemu-devel] [PATCH] spapr: clock should count only if vm is running

2017-01-27 Thread Paolo Bonzini


> This is a port to ppc of the i386 commit:
> 00f4d64 kvmclock: clock should count only if vm is running
> 
> We remove timebase_/pre_save/post_load/ functions,
> and use the VM state change handler to save and restore
> the guest_timebase (on stop and continue).
> 
> Time base offset has originally been introduced by commit
> 98a8b52 spapr: Add support for time base offset migration
> 
> So while VM is paused, the time is stopped. This allows to have
> the same result with date (based on Time Base Register) and
> hwclock (based on "get-time-of-day" RTAS call).
> 
> Moreover in TCG mode, the Time Base is always paused, so this
> patch also adjust the behavior between TCG and KVM.
> 
> VM state field "time_of_the_day_ns" is now useless but we keep
> it to be able to migrate to older version of the machine.
> 
> As vmstate_ppc_timebase structure (with timebase_pre_save() and
> timebase_post_load() functions) was only used by vmstate_spapr,
> we register the VM state change handler only in ppc_spapr_init().
> 
> Signed-off-by: Laurent Vivier 

I think you should keep the pre_save handler, otherwise after
migration the timebase register will be off by as long as the
time needed to do the final RAM transfer.  See commit 6053a86
("kvmclock: reduce kvmclock difference on migration", 2016-12-22).

Paolo


> ---
>  hw/ppc/ppc.c | 76
>  ++--
>  hw/ppc/spapr.c   |  6 +
>  hw/ppc/trace-events  |  3 ---
>  target/ppc/cpu-qom.h |  3 +++
>  4 files changed, 35 insertions(+), 53 deletions(-)
> 
> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> index 8945869..a839e25 100644
> --- a/hw/ppc/ppc.c
> +++ b/hw/ppc/ppc.c
> @@ -847,10 +847,11 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t
> freq)
>  cpu_ppc_store_purr(cpu, 0xULL);
>  }
>  
> -static void timebase_pre_save(void *opaque)
> +#if defined(TARGET_PPC64) && defined(CONFIG_KVM)
> +void cpu_ppc_clock_vm_state_change(void *opaque, int running,
> +   RunState state)
>  {
>  PPCTimebase *tb = opaque;
> -uint64_t ticks = cpu_get_host_ticks();
>  PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
>  
>  if (!first_ppc_cpu->env.tb_env) {
> @@ -858,64 +859,39 @@ static void timebase_pre_save(void *opaque)
>  return;
>  }
>  
> -tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> -/*
> - * tb_offset is only expected to be changed by migration so
> - * there is no need to update it from KVM here
> - */
> -tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> -}
> +if (running) {
> +uint64_t ticks = cpu_get_host_ticks();
> +CPUState *cpu;
> +int64_t tb_offset;
>  
> -static int timebase_post_load(void *opaque, int version_id)
> -{
> -PPCTimebase *tb_remote = opaque;
> -CPUState *cpu;
> -PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
> -int64_t tb_off_adj, tb_off, ns_diff;
> -int64_t migration_duration_ns, migration_duration_tb, guest_tb, host_ns;
> -unsigned long freq;
> +tb_offset = tb->guest_timebase - ticks;
>  
> -if (!first_ppc_cpu->env.tb_env) {
> -error_report("No timebase object");
> -return -1;
> -}
> -
> -freq = first_ppc_cpu->env.tb_env->tb_freq;
> -/*
> - * Calculate timebase on the destination side of migration.
> - * The destination timebase must be not less than the source timebase.
> - * We try to adjust timebase by downtime if host clocks are not
> - * too much out of sync (1 second for now).
> - */
> -host_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> -ns_diff = MAX(0, host_ns - tb_remote->time_of_the_day_ns);
> -migration_duration_ns = MIN(NANOSECONDS_PER_SECOND, ns_diff);
> -migration_duration_tb = muldiv64(freq, migration_duration_ns,
> - NANOSECONDS_PER_SECOND);
> -guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
> -
> -tb_off_adj = guest_tb - cpu_get_host_ticks();
> -
> -tb_off = first_ppc_cpu->env.tb_env->tb_offset;
> -trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,
> -(tb_off_adj - tb_off) / freq);
> -
> -/* Set new offset to all CPUs */
> -CPU_FOREACH(cpu) {
> -PowerPCCPU *pcpu = POWERPC_CPU(cpu);
> -pcpu->env.tb_env->tb_offset = tb_off_adj;
> +/* Set new offset to all CPUs */
> +CPU_FOREACH(cpu) {
> +PowerPCCPU *pcpu = POWERPC_CPU(cpu);
> +pcpu->env.tb_env->tb_offset = tb_offset;
> +kvm_set_one_reg(cpu, KVM_REG_PPC_TB_OFFSET,
> +&pcpu->env.tb_env->tb_offset);
> +}
> +} else {
> +uint64_t ticks = cpu_get_host_ticks();
> +
> +/* not used anymore, we keep it for compatibility */
> +tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
> +/*
> + * tb_offset

Re: [Qemu-devel] [Xen-devel] Commit 3a6c9 breaks QEMU on FreeBSD/Xen

2017-01-27 Thread Juergen Gross

On 26/01/17 22:21, Peter Maydell wrote:
> On 26 January 2017 at 20:47, Peter Maydell  wrote:
>> On 26 January 2017 at 19:36, Stefano Stabellini  
>> wrote:
>>> It should be just a matter of replacing qdev_init_nofail with something
>>> that can fail. I couldn't find a regular qdev_init that can return
>>> error, so maybe we would need to add it.
>>
>> That's just
>> object_property_set_bool(OBJECT(whatever), true, "realized", &err);
>>
>> ie "please realize the device".
> 
> (PS watch out for ownership refcounting issues depending on what
> you did with parenting the device: you likely want to object_unparent()
> and then object_unref() the thing in the error-exit path. See
> qdev_device_add() for some example code, maybe.)

I'll have a try later today or early next week.


Juergen

Re: [Qemu-devel] [PATCH] spapr: clock should count only if vm is running

2017-01-27 Thread Laurent Vivier

On 27/01/2017 09:52, Paolo Bonzini wrote:
> 
>> This is a port to ppc of the i386 commit:
>> 00f4d64 kvmclock: clock should count only if vm is running
>>
>> We remove timebase_/pre_save/post_load/ functions,
>> and use the VM state change handler to save and restore
>> the guest_timebase (on stop and continue).
>>
>> Time base offset has originally been introduced by commit
>> 98a8b52 spapr: Add support for time base offset migration
>>
>> So while VM is paused, the time is stopped. This allows to have
>> the same result with date (based on Time Base Register) and
>> hwclock (based on "get-time-of-day" RTAS call).
>>
>> Moreover in TCG mode, the Time Base is always paused, so this
>> patch also adjust the behavior between TCG and KVM.
>>
>> VM state field "time_of_the_day_ns" is now useless but we keep
>> it to be able to migrate to older version of the machine.
>>
>> As vmstate_ppc_timebase structure (with timebase_pre_save() and
>> timebase_post_load() functions) was only used by vmstate_spapr,
>> we register the VM state change handler only in ppc_spapr_init().
>>
>> Signed-off-by: Laurent Vivier 
> 
> I think you should keep the pre_save handler, otherwise after
> migration the timebase register will be off by as long as the
> time needed to do the final RAM transfer.  See commit 6053a86
> ("kvmclock: reduce kvmclock difference on migration", 2016-12-22).

I will.

Thank you,
Laurent

Re: [Qemu-devel] QEMU websockets support is laggy?

2017-01-27 Thread Daniel P. Berrange

On Tue, Jan 24, 2017 at 05:02:25PM -0500, Brian Rak wrote:
> We've been considering switching over to using qemu's built in websockets
> support (to avoid the overhead of needing websockify running).  We've been
> seeing very poor performance after the switch (it takes the console 4-5
> seconds to update after pressing a key).  So far, I haven't been able to
> find any indication of why this is happening.  The exact same configuration
> works perfectly when running with websockify, but laggy when hitting qemu
> directly.
> 
> I've tried a few things (disabling encryption, bypassing our usual nginx
> proxy, even connecting via a ssh tunnel), and haven't made any sort of
> progress here.  Has anyone else seen this?  Any suggestions as to where I
> should start looking?

Can you clarify the exact setup you have ?  As mentioned on IRC, I don't
see any degradation in performance betweeen builtin websockets vs a
websockets proxy - if anything the builtin websockets is marginally less
laggy.  I was connecting over TCP localhost, however, so would not see
any effects of network latency. My test was QEMU git master, with noVNC
git master.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH] migrate: Migration aborts abruptly for machine "none"

2017-01-27 Thread Daniel P. Berrange

On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote:
> Migration of a "none" machine with no RAM crashes abruptly as
> bitmap_new() fails and thus aborts. Instead, place a check for
> last_ram_offset() being '0' at the start of ram_save_setup() and
> error out with a meaningful error message.
> 
> Signed-off-by: Ashijeet Acharya 
> ---
>  migration/ram.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index ef8fadf..bf05d69 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>  {
>  RAMBlock *block;
>  
> +if (last_ram_offset() == 0) {
> +error_report("Failed to migrate: No RAM available!");
> +return -1;
> +}
> +

If we're merely going to block migration, as opposed to making it work,
then IMHO we should use migration blockers registered at machine setup
for this task.

We have a new cli arg added to QEMU to tell it to abort startup if the
machine configuration is not migratable. That only works if using
migration blockers - this check you've added is too late to be detected
at startup.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH 2/2] qapi2texi: produce type information

2017-01-27 Thread Markus Armbruster

Marc-André Lureau  writes:

> Add type information to the generated documentation. Without it the
> written documentation is not explicit enough to know how to handle
> the various arguments and members.

This is actually a regression of sorts: the type information we used to
have in qmp-commands.txt got lost when we replaced it by generated
qemu-qmp-ref.*.

> Array types have the following syntax: type[]. Ex: str[].
>
> - Struct, commands and events use the following members syntax:
>
>   { 'member': type, ('foo': str), ... }
>
> Optional members are under parentheses.
>
> A structure with a base type will have 'BaseStruct +' prepended.
>
> - Alternates use the following syntax:
>
>   [ 'foo': type, 'bar': type, ... ]
>
> - Simple unions use the following syntax:
>
>   { 'type': str, 'data': 'type' = [ 'foo': type, 'bar': type... ] }
>
> - Flat unions use the following syntax:
>
>   BaseStruct + 'discriminator' = [ 'foo': type, 'bar': type... ]
>
> Signed-off-by: Marc-André Lureau 

This is a formal language to describe QAPI/QMP from the user's point of
view (the QAPI schema specifies it from the implementor's point of
view).

I intend to implement ideas outlined in my review of "[PATCH v6 05/17]
docs: add master qapi texi files"[*], in particular adding type
information to the argument table.  That will address the stated purpose
of this patch, namely fixing the documentation regression.

We can then compare which fix provides the type information in more
readable form, and whether the additional formal description in your fix
is worth having.

Unfortunately, I need to to some QemuOpts / QAPI work first, because it
blocks features we'd like to have in 2.9.

[*] Message-ID: <87zijp8bxr@dusky.pond.sub.org>
http://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03002.html

Re: [Qemu-devel] [libvirt] char: Logging serial pty output when disconnected

2017-01-27 Thread Daniel P. Berrange

On Thu, Jan 26, 2017 at 05:07:16PM -0800, Ed Swierk wrote:
> Interactive access to a guest serial console can be enabled by hooking
> the serial device to a pty backend, e.g. -device
> isa-serial,chardev=cs0 -chardev pty,id=cs0. With libvirt this can be
> configured via  port='0'/>.
> 
> Output from the same serial device can also be logged to a file by
> adding logfile=/somefile to the -chardev option (
> in libvirt).
> 
> Unfortunately output gets logged only when a client like virsh console
> is connected to the pty; otherwise qemu drops it on the floor. This
> makes chardev logging much less useful than it could be for debugging
> guest problems after the fact.
> 
> Currently qemu_chr_fe_write() calls qemu_chr_fe_write_log() only for
> data consumed by the backend chr_write function. With the pty backend,
> pty_chr_write() returns 0 indicating that the data was not consumed
> when the pty is disconnected. Simply changing it to return len instead
> of 0 tricks the caller into logging the data even when the pty is
> disconnected. I don't know what problems this might cause, but one
> data point is that tcp_chr_write() already happens to work this way.
> 
> Alternatively, qemu_chr_fe_write() could be modified to log everything
> passed to it, regardless of how much data chr_write claims to have
> consumed. The trouble is that the serial device retries writing
> unconsumed data, so when the pty is disconnected you'd see every
> character duplicated 4 times in the log file.
> 
> Any opinions on either approach, or other suggestions? If there are no
> objections to the first one, I'll prepare a patch.

If the pty backend intends to just drop data into a blackhole when
no client is connected, then its chr_write() impl should return
the length of the data discarded, not zero.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

[Qemu-devel] [PATCH] qemu-doc: Clarify that -vga std is now the default

2017-01-27 Thread Alberto Garcia

The QEMU manual page states that Cirrus Logic is the default video
card if the user doesn't specify any. However this is not true since
QEMU 2.2.

Signed-off-by: Alberto Garcia 
---
 qemu-options.hx | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 80df52651a..66ee562821 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1195,12 +1195,12 @@ Select type of VGA card to emulate. Valid values for 
@var{type} are
 Cirrus Logic GD5446 Video card. All Windows versions starting from
 Windows 95 should recognize and use this graphic card. For optimal
 performances, use 16 bit color depth in the guest and the host OS.
-(This one is the default)
+(This card was the default before QEMU 2.2)
 @item std
 Standard VGA card with Bochs VBE extensions.  If your guest OS
 supports the VESA 2.0 VBE extensions (e.g. Windows XP) and if you want
 to use high resolution modes (>= 1280x1024x16) then you should use
-this option.
+this option. (This card is the default since QEMU 2.2)
 @item vmware
 VMWare SVGA-II compatible adapter. Use it if you have sufficiently
 recent XFree86/XOrg server or Windows guest with a driver for this
-- 
2.11.0

Re: [Qemu-devel] [PATCH] spapr: clock should count only if vm is running

2017-01-27 Thread Thomas Huth

On 26.01.2017 21:45, Laurent Vivier wrote:
> This is a port to ppc of the i386 commit:
> 00f4d64 kvmclock: clock should count only if vm is running
> 
> We remove timebase_/pre_save/post_load/ functions,
> and use the VM state change handler to save and restore
> the guest_timebase (on stop and continue).
> 
> Time base offset has originally been introduced by commit
> 98a8b52 spapr: Add support for time base offset migration
> 
> So while VM is paused, the time is stopped. This allows to have
> the same result with date (based on Time Base Register) and
> hwclock (based on "get-time-of-day" RTAS call).
> 
> Moreover in TCG mode, the Time Base is always paused, so this
> patch also adjust the behavior between TCG and KVM.
> 
> VM state field "time_of_the_day_ns" is now useless but we keep
> it to be able to migrate to older version of the machine.

Not sure, but the cpu_ppc_clock_vm_state_change() handler is only used
with KVM, isn't it? So what happens if you migrate in TCG mode from a
new QEMU to an older one? Don't you have to update time_of_the_day_ns
here somewhere, too (e.g. in a pre_save handler)?

 Thomas

Re: [Qemu-devel] [PATCH] migrate: Migration aborts abruptly for machine "none"

2017-01-27 Thread Dr. David Alan Gilbert

* Daniel P. Berrange (berra...@redhat.com) wrote:
> On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote:
> > Migration of a "none" machine with no RAM crashes abruptly as
> > bitmap_new() fails and thus aborts. Instead, place a check for
> > last_ram_offset() being '0' at the start of ram_save_setup() and
> > error out with a meaningful error message.
> > 
> > Signed-off-by: Ashijeet Acharya 
> > ---
> >  migration/ram.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ef8fadf..bf05d69 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
> >  {
> >  RAMBlock *block;
> >  
> > +if (last_ram_offset() == 0) {
> > +error_report("Failed to migrate: No RAM available!");
> > +return -1;
> > +}
> > +
> 
> If we're merely going to block migration, as opposed to making it work,
> then IMHO we should use migration blockers registered at machine setup
> for this task.
> 
> We have a new cli arg added to QEMU to tell it to abort startup if the
> machine configuration is not migratable. That only works if using
> migration blockers - this check you've added is too late to be detected
> at startup.

Hmm, yes you're right, that would be better.
Although it does lead to an interesting situation where starting with a 'none'
and constructing it component at a time would get rejected by --only-migratable
even though the final construction is fine.

Dave

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migrate: Migration aborts abruptly for machine "none"

2017-01-27 Thread Ashijeet Acharya

Okay

On Friday, 27 January 2017, Daniel P. Berrange  wrote:

> On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote:
> > Migration of a "none" machine with no RAM crashes abruptly as
> > bitmap_new() fails and thus aborts. Instead, place a check for
> > last_ram_offset() being '0' at the start of ram_save_setup() and
> > error out with a meaningful error message.
> >
> > Signed-off-by: Ashijeet Acharya  >
> > ---
> >  migration/ram.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index ef8fadf..bf05d69 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void
> *opaque)
> >  {
> >  RAMBlock *block;
> >
> > +if (last_ram_offset() == 0) {
> > +error_report("Failed to migrate: No RAM available!");
> > +return -1;
> > +}
> > +
>
> If we're merely going to block migration, as opposed to making it work,
> then IMHO we should use migration blockers registered at machine setup
> for this task.
>
> We have a new cli arg added to QEMU to tell it to abort startup if the
> machine configuration is not migratable. That only works if using


Are you referring to the new "--only-migratable" option I added along with
John last week?

migration blockers - this check you've added is too late to be detected
> at startup.


I think that the machine is not completely 'non-migratable' because if I
boot qemu with "-m 1G" (say) the migration actually completes successfully.

Ashijeet


>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
> :|
>

Re: [Qemu-devel] [PATCH] migrate: Migration aborts abruptly for machine "none"

2017-01-27 Thread Daniel P. Berrange

On Fri, Jan 27, 2017 at 09:46:13AM +, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berra...@redhat.com) wrote:
> > On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote:
> > > Migration of a "none" machine with no RAM crashes abruptly as
> > > bitmap_new() fails and thus aborts. Instead, place a check for
> > > last_ram_offset() being '0' at the start of ram_save_setup() and
> > > error out with a meaningful error message.
> > > 
> > > Signed-off-by: Ashijeet Acharya 
> > > ---
> > >  migration/ram.c | 5 +
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index ef8fadf..bf05d69 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void 
> > > *opaque)
> > >  {
> > >  RAMBlock *block;
> > >  
> > > +if (last_ram_offset() == 0) {
> > > +error_report("Failed to migrate: No RAM available!");
> > > +return -1;
> > > +}
> > > +
> > 
> > If we're merely going to block migration, as opposed to making it work,
> > then IMHO we should use migration blockers registered at machine setup
> > for this task.
> > 
> > We have a new cli arg added to QEMU to tell it to abort startup if the
> > machine configuration is not migratable. That only works if using
> > migration blockers - this check you've added is too late to be detected
> > at startup.
> 
> Hmm, yes you're right, that would be better.
> Although it does lead to an interesting situation where starting with a 'none'
> and constructing it component at a time would get rejected by 
> --only-migratable
> even though the final construction is fine.

Even in that case I would expect the ram to be specified upfront e.g.

 $qemu -machine none -m 500

thus avoiding addition of the migration blocker

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [PATCH] migrate: Migration aborts abruptly for machine "none"

2017-01-27 Thread Daniel P. Berrange

On Fri, Jan 27, 2017 at 03:22:38PM +0530, Ashijeet Acharya wrote:
> Okay
> 
> On Friday, 27 January 2017, Daniel P. Berrange  wrote:
> 
> > On Thu, Jan 26, 2017 at 02:46:52PM +0530, Ashijeet Acharya wrote:
> > > Migration of a "none" machine with no RAM crashes abruptly as
> > > bitmap_new() fails and thus aborts. Instead, place a check for
> > > last_ram_offset() being '0' at the start of ram_save_setup() and
> > > error out with a meaningful error message.
> > >
> > > Signed-off-by: Ashijeet Acharya  > >
> > > ---
> > >  migration/ram.c | 5 +
> > >  1 file changed, 5 insertions(+)
> > >
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index ef8fadf..bf05d69 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -1947,6 +1947,11 @@ static int ram_save_setup(QEMUFile *f, void
> > *opaque)
> > >  {
> > >  RAMBlock *block;
> > >
> > > +if (last_ram_offset() == 0) {
> > > +error_report("Failed to migrate: No RAM available!");
> > > +return -1;
> > > +}
> > > +
> >
> > If we're merely going to block migration, as opposed to making it work,
> > then IMHO we should use migration blockers registered at machine setup
> > for this task.
> >
> > We have a new cli arg added to QEMU to tell it to abort startup if the
> > machine configuration is not migratable. That only works if using
> 
> 
> Are you referring to the new "--only-migratable" option I added along with
> John last week?
> 
> migration blockers - this check you've added is too late to be detected
> > at startup.
> 
> 
> I think that the machine is not completely 'non-migratable' because if I
> boot qemu with "-m 1G" (say) the migration actually completes successfully.

Of course - you would only add the migration blocker when ram size is 0


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

[Qemu-devel] [PATCH 4/4] block/gluster: add missing QLIST_HEAD_INITIALIZER()

2017-01-27 Thread Stefan Hajnoczi

The "qemu/queue.h" data structures provide static initializer macros.
The QLIST version just initializes to NULL so code happens to work when
the initializer is forgotten.  Other types like SLIST are not so
forgiving because they set fields to non-NULL values.

The initializer macro should always be used for consistency and so that
no errors are introduced when switching between list/queue variants.

Signed-off-by: Stefan Hajnoczi 
---
 block/gluster.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/gluster.c b/block/gluster.c
index 181b345..3ac9105 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -63,7 +63,8 @@ typedef struct GlfsPreopened {
 QLIST_ENTRY(GlfsPreopened) list;
 } GlfsPreopened;
 
-static QLIST_HEAD(glfs_list, GlfsPreopened) glfs_list;
+static QLIST_HEAD(glfs_list, GlfsPreopened) glfs_list =
+QLIST_HEAD_INITIALIZER(glfs_list);
 
 static QemuOptsList qemu_gluster_create_opts = {
 .name = "qemu-gluster-create-opts",
-- 
2.9.3

[Qemu-devel] [PATCH 0/4] block/gluster: cleanups for GlfsPreopened

2017-01-27 Thread Stefan Hajnoczi

Code added in commit 6349c15410361d3fe52c9beee309954d606f8ccd ("block/gluster:
memory usage: use one glfs instance per volume") does not follow conventions
and violates QEMU coding style.  Although any single issue in isolation is not
worth patching, there are several of these and I think it's worth resolving
them together in one sweep.

Stefan Hajnoczi (4):
  block/gluster: fix wrong indent in glfs_find_preopened()
  block/gluster: drop intermediate ListElement struct
  block/gluster: use conventional names for GlfsPreopened functions
  block/gluster: add missing QLIST_HEAD_INITIALIZER()

 block/gluster.c | 66 +++--
 1 file changed, 31 insertions(+), 35 deletions(-)

-- 
2.9.3

[Qemu-devel] [PATCH 2/4] block/gluster: drop intermediate ListElement struct

2017-01-27 Thread Stefan Hajnoczi

The "qemu/queue.h" data structures are used without intermediate list
node structs.  They are designed to be embedded in the main struct.
Drop the unnecessary ListElement struct.

Signed-off-by: Stefan Hajnoczi 
---
 block/gluster.c | 39 +--
 1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 516a1e1..171a323 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -56,19 +56,14 @@ typedef struct BDRVGlusterReopenState {
 struct glfs_fd *fd;
 } BDRVGlusterReopenState;
 
-
 typedef struct GlfsPreopened {
 char *volume;
 glfs_t *fs;
 int ref;
+QLIST_ENTRY(GlfsPreopened) list;
 } GlfsPreopened;
 
-typedef struct ListElement {
-QLIST_ENTRY(ListElement) list;
-GlfsPreopened saved;
-} ListElement;
-
-static QLIST_HEAD(glfs_list, ListElement) glfs_list;
+static QLIST_HEAD(glfs_list, GlfsPreopened) glfs_list;
 
 static QemuOptsList qemu_gluster_create_opts = {
 .name = "qemu-gluster-create-opts",
@@ -210,26 +205,26 @@ static QemuOptsList runtime_tcp_opts = {
 
 static void glfs_set_preopened(const char *volume, glfs_t *fs)
 {
-ListElement *entry = NULL;
+GlfsPreopened *entry = NULL;
 
-entry = g_new(ListElement, 1);
+entry = g_new(GlfsPreopened, 1);
 
-entry->saved.volume = g_strdup(volume);
+entry->volume = g_strdup(volume);
 
-entry->saved.fs = fs;
-entry->saved.ref = 1;
+entry->fs = fs;
+entry->ref = 1;
 
 QLIST_INSERT_HEAD(&glfs_list, entry, list);
 }
 
 static glfs_t *glfs_find_preopened(const char *volume)
 {
-ListElement *entry = NULL;
+GlfsPreopened *entry = NULL;
 
 QLIST_FOREACH(entry, &glfs_list, list) {
-if (strcmp(entry->saved.volume, volume) == 0) {
-entry->saved.ref++;
-return entry->saved.fs;
+if (strcmp(entry->volume, volume) == 0) {
+entry->ref++;
+return entry->fs;
 }
 }
 
@@ -238,23 +233,23 @@ static glfs_t *glfs_find_preopened(const char *volume)
 
 static void glfs_clear_preopened(glfs_t *fs)
 {
-ListElement *entry = NULL;
-ListElement *next;
+GlfsPreopened *entry = NULL;
+GlfsPreopened *next;
 
 if (fs == NULL) {
 return;
 }
 
 QLIST_FOREACH_SAFE(entry, &glfs_list, list, next) {
-if (entry->saved.fs == fs) {
-if (--entry->saved.ref) {
+if (entry->fs == fs) {
+if (--entry->ref) {
 return;
 }
 
 QLIST_REMOVE(entry, list);
 
-glfs_fini(entry->saved.fs);
-g_free(entry->saved.volume);
+glfs_fini(entry->fs);
+g_free(entry->volume);
 g_free(entry);
 }
 }
-- 
2.9.3

[Qemu-devel] [PATCH 1/4] block/gluster: fix wrong indent in glfs_find_preopened()

2017-01-27 Thread Stefan Hajnoczi

QEMU uses 4-space indentation.  Fix this now so checkpatch.pl is happy
with future code changes.

Signed-off-by: Stefan Hajnoczi 
---
 block/gluster.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 1a22f29..516a1e1 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -226,12 +226,12 @@ static glfs_t *glfs_find_preopened(const char *volume)
 {
 ListElement *entry = NULL;
 
- QLIST_FOREACH(entry, &glfs_list, list) {
+QLIST_FOREACH(entry, &glfs_list, list) {
 if (strcmp(entry->saved.volume, volume) == 0) {
 entry->saved.ref++;
 return entry->saved.fs;
 }
- }
+}
 
 return NULL;
 }
-- 
2.9.3

[Qemu-devel] [PATCH 3/4] block/gluster: use conventional names for GlfsPreopened functions

2017-01-27 Thread Stefan Hajnoczi

The naming of GlfsPreopened functions is a little unusual:

glfs_set_preopened() appends items to the list.  Normally this operation
is called "add".

glfs_find_preopened() is paired with glfs_clear_preopened().  Normally
this is called "get" and "put" (or "ref" and "unref").

This patch renames these functions.

Signed-off-by: Stefan Hajnoczi 
---
 block/gluster.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 171a323..181b345 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -203,7 +203,7 @@ static QemuOptsList runtime_tcp_opts = {
 },
 };
 
-static void glfs_set_preopened(const char *volume, glfs_t *fs)
+static void glfs_add_preopened(const char *volume, glfs_t *fs)
 {
 GlfsPreopened *entry = NULL;
 
@@ -217,7 +217,7 @@ static void glfs_set_preopened(const char *volume, glfs_t 
*fs)
 QLIST_INSERT_HEAD(&glfs_list, entry, list);
 }
 
-static glfs_t *glfs_find_preopened(const char *volume)
+static glfs_t *glfs_get_preopened(const char *volume)
 {
 GlfsPreopened *entry = NULL;
 
@@ -231,7 +231,7 @@ static glfs_t *glfs_find_preopened(const char *volume)
 return NULL;
 }
 
-static void glfs_clear_preopened(glfs_t *fs)
+static void glfs_put_preopened(glfs_t *fs)
 {
 GlfsPreopened *entry = NULL;
 GlfsPreopened *next;
@@ -393,7 +393,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 GlusterServerList *server;
 unsigned long long port;
 
-glfs = glfs_find_preopened(gconf->volume);
+glfs = glfs_get_preopened(gconf->volume);
 if (glfs) {
 return glfs;
 }
@@ -403,7 +403,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 goto out;
 }
 
-glfs_set_preopened(gconf->volume, glfs);
+glfs_add_preopened(gconf->volume, glfs);
 
 for (server = gconf->server; server; server = server->next) {
 if (server->value->type  == GLUSTER_TRANSPORT_UNIX) {
@@ -463,7 +463,7 @@ static struct glfs 
*qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
 out:
 if (glfs) {
 old_errno = errno;
-glfs_clear_preopened(glfs);
+glfs_put_preopened(glfs);
 errno = old_errno;
 }
 return NULL;
@@ -844,7 +844,7 @@ out:
 glfs_close(s->fd);
 }
 
-glfs_clear_preopened(s->glfs);
+glfs_put_preopened(s->glfs);
 
 return ret;
 }
@@ -913,7 +913,7 @@ static void qemu_gluster_reopen_commit(BDRVReopenState 
*state)
 glfs_close(s->fd);
 }
 
-glfs_clear_preopened(s->glfs);
+glfs_put_preopened(s->glfs);
 
 /* use the newly opened image / connection */
 s->fd = reop_s->fd;
@@ -938,7 +938,7 @@ static void qemu_gluster_reopen_abort(BDRVReopenState 
*state)
 glfs_close(reop_s->fd);
 }
 
-glfs_clear_preopened(reop_s->glfs);
+glfs_put_preopened(reop_s->glfs);
 
 g_free(state->opaque);
 state->opaque = NULL;
@@ -1062,7 +1062,7 @@ static int qemu_gluster_create(const char *filename,
 out:
 g_free(tmp);
 qapi_free_BlockdevOptionsGluster(gconf);
-glfs_clear_preopened(glfs);
+glfs_put_preopened(glfs);
 return ret;
 }
 
@@ -1135,7 +1135,7 @@ static void qemu_gluster_close(BlockDriverState *bs)
 glfs_close(s->fd);
 s->fd = NULL;
 }
-glfs_clear_preopened(s->glfs);
+glfs_put_preopened(s->glfs);
 }
 
 static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
-- 
2.9.3

Re: [Qemu-devel] Qemu-devel] Poll on QEMU documentation project

2017-01-27 Thread Peter Maydell

On 27 January 2017 at 06:51, Markus Armbruster  wrote:
> "What can we cut" is the wrong question.  The right one is "what are our
> requirements".  Here's my try:
>
> HTML: required
> nroff with an macros: required
> PDF: wanted (try printing a website)
> plain text: nice to have (for me personally, more than that)
> info: nice to have
>
> If a solution we like can't provide something that's nice to have, we
> can decide to take it anyway.
>
> If a solution we like can provide something that's nice to have, we
> should let it provide, unless it turns out to be a drag.

Well, every extra documentation format:
 * increases the build time
 * increases the chances of makefile bugs
 * may require extra tooling to produce
 * either requires us to check it for problems or increases
   the chance of confusing users because that output format
   has a formatting problem that doesn't happen in the doc
   formats most people use
 * may require significant extra work to produce something
   that's actually useful: a manpage and an info doc aren't
   just the same content in a different file format, they
   should have definitely different contents and structure
   to fit what people expect a manpage or an info doc to be

So my list is:
 * HTML: required
 * PDF: nice-to-have

thanks
-- PMM

[Qemu-devel] [PATCH] iothread: enable AioContext polling by default

2017-01-27 Thread Stefan Hajnoczi

IOThread AioContexts are likely to consist only of event sources like
virtqueue ioeventfds and LinuxAIO completion eventfds that are pollable
from userspace (without system calls).

We recently merged the AioContext polling feature but didn't enable it
by default yet.  I have gone back over the performance data on the
mailing list and picked a default polling value that gave good results.

Let's enable AioContext polling by default so users don't have another
switch they need to set manually.  If performance regressions are found
we can still disable this for the QEMU 2.9 release.

Cc: Paolo Bonzini 
Cc: Christian Borntraeger 
Cc: Karl Rister 
Signed-off-by: Stefan Hajnoczi 
---
 iothread.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/iothread.c b/iothread.c
index 7bedde8..257b01d 100644
--- a/iothread.c
+++ b/iothread.c
@@ -30,6 +30,12 @@ typedef ObjectClass IOThreadClass;
 #define IOTHREAD_CLASS(klass) \
OBJECT_CLASS_CHECK(IOThreadClass, klass, TYPE_IOTHREAD)
 
+/* Benchmark results from 2016 on NVMe SSD drives show max polling times around
+ * 16-32 microseconds yield IOPS improvements for both iodepth=1 and iodepth=32
+ * workloads.
+ */
+#define IOTHREAD_POLL_MAX_NS_DEFAULT 32768ULL
+
 static __thread IOThread *my_iothread;
 
 AioContext *qemu_get_current_aio_context(void)
@@ -71,6 +77,13 @@ static int iothread_stop(Object *object, void *opaque)
 return 0;
 }
 
+static void iothread_instance_init(Object *obj)
+{
+IOThread *iothread = IOTHREAD(obj);
+
+iothread->poll_max_ns = IOTHREAD_POLL_MAX_NS_DEFAULT;
+}
+
 static void iothread_instance_finalize(Object *obj)
 {
 IOThread *iothread = IOTHREAD(obj);
@@ -215,6 +228,7 @@ static const TypeInfo iothread_info = {
 .parent = TYPE_OBJECT,
 .class_init = iothread_class_init,
 .instance_size = sizeof(IOThread),
+.instance_init = iothread_instance_init,
 .instance_finalize = iothread_instance_finalize,
 .interfaces = (InterfaceInfo[]) {
 {TYPE_USER_CREATABLE},
-- 
2.9.3

Re: [Qemu-devel] [PATCH v2 1/8] hw: Default -drive to if=ide explicitly where it works

2017-01-27 Thread Yongbok Kim


>> Slightly off-topic, but: Is fulong2e still maintained? I did not spot an
>> entry in MAINTAINERS...?
> 
> It's covered by the general MIPS stanza:
> 
> $ scripts/get_maintainer.pl -f hw/mips/mips_fulong2e.c 
> Aurelien Jarno  (maintainer:MIPS)
> Yongbok Kim  (maintainer:MIPS)
> qemu-devel@nongnu.org (open list:All patches CC here)
> 

I'm not actively looking after the device at the moment but if it has any
issues I love to handle that.

Regards,
Yongbok

Re: [Qemu-devel] MIPS machines (was: [PATCH v2 1/8] hw: Default -drive to if=ide explicitly where it works)

2017-01-27 Thread Thomas Huth

On 27.01.2017 11:21, Yongbok Kim wrote:
> 
>>> Slightly off-topic, but: Is fulong2e still maintained? I did not spot an
>>> entry in MAINTAINERS...?
>>
>> It's covered by the general MIPS stanza:
>>
>> $ scripts/get_maintainer.pl -f hw/mips/mips_fulong2e.c 
>> Aurelien Jarno  (maintainer:MIPS)
>> Yongbok Kim  (maintainer:MIPS)
>> qemu-devel@nongnu.org (open list:All patches CC here)
>>
> 
> I'm not actively looking after the device at the moment but if it has any
> issues I love to handle that.

Great! Then could you maybe send a patch for the MAINTAINERS file to add
an entry for that machine?

Also it's a little bit confusing that "magnum" and "pica61" do not show
up in MAINTAINERS, but I guess that's what is meant by the "Jazz" entry?

 Thanks,
  Thomas

[Qemu-devel] [PATCH v8 03/25] mttcg: Add missing tb_lock/unlock() in cpu_exec_step()

2017-01-27 Thread Alex Bennée

From: Pranith Kumar 

The recent patch enabling lock assertions uncovered the missing lock
acquisition in cpu_exec_step(). This patch adds them.

CC: Richard Henderson 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 cpu-exec.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4188fed3c6..1b8685dc21 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -233,14 +233,18 @@ static void cpu_exec_step(CPUState *cpu)
 uint32_t flags;
 
 cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+tb_lock();
 tb = tb_gen_code(cpu, pc, cs_base, flags,
  1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
 tb->orig_tb = NULL;
+tb_unlock();
 /* execute the generated code */
 trace_exec_tb_nocache(tb, pc);
 cpu_tb_exec(cpu, tb);
+tb_lock();
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
+tb_unlock();
 }
 
 void cpu_exec_step_atomic(CPUState *cpu)
-- 
2.11.0

[Qemu-devel] [PATCH v8 02/25] mttcg: translate-all: Enable locking debug in a debug build

2017-01-27 Thread Alex Bennée

From: Pranith Kumar 

Enable tcg lock debug asserts in a debug build by default instead of
relying on DEBUG_LOCKING. None of the other DEBUG_* macros have
asserts, so this patch removes DEBUG_LOCKING and enable these asserts
in a debug build.

CC: Richard Henderson 
Signed-off-by: Pranith Kumar 
[AJB: tweak ifdefs so can be early in series]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 translate-all.c | 52 
 1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 20262938bb..055436a676 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -59,7 +59,6 @@
 
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
-/* #define DEBUG_LOCKING */
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
@@ -74,20 +73,10 @@
  * access to the memory related structures are protected with the
  * mmap_lock.
  */
-#ifdef DEBUG_LOCKING
-#define DEBUG_MEM_LOCKS 1
-#else
-#define DEBUG_MEM_LOCKS 0
-#endif
-
 #ifdef CONFIG_SOFTMMU
 #define assert_memory_lock() do { /* nothing */ } while (0)
 #else
-#define assert_memory_lock() do {   \
-if (DEBUG_MEM_LOCKS) {  \
-g_assert(have_mmap_lock()); \
-}   \
-} while (0)
+#define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
 #define SMC_BITMAP_USE_THRESHOLD 10
@@ -169,10 +158,18 @@ static void page_table_config_init(void)
 assert(v_l2_levels >= 0);
 }
 
+#ifdef CONFIG_USER_ONLY
+#define assert_tb_locked() tcg_debug_assert(have_tb_lock)
+#define assert_tb_unlocked() tcg_debug_assert(!have_tb_lock)
+#else
+#define assert_tb_locked()  do { /* nothing */ } while (0)
+#define assert_tb_unlocked()  do { /* nothing */ } while (0)
+#endif
+
 void tb_lock(void)
 {
 #ifdef CONFIG_USER_ONLY
-assert(!have_tb_lock);
+assert_tb_unlocked();
 qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock++;
 #endif
@@ -181,7 +178,7 @@ void tb_lock(void)
 void tb_unlock(void)
 {
 #ifdef CONFIG_USER_ONLY
-assert(have_tb_lock);
+assert_tb_locked();
 have_tb_lock--;
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
 #endif
@@ -197,23 +194,6 @@ void tb_lock_reset(void)
 #endif
 }
 
-#ifdef DEBUG_LOCKING
-#define DEBUG_TB_LOCKS 1
-#else
-#define DEBUG_TB_LOCKS 0
-#endif
-
-#ifdef CONFIG_SOFTMMU
-#define assert_tb_lock() do { /* nothing */ } while (0)
-#else
-#define assert_tb_lock() do {   \
-if (DEBUG_TB_LOCKS) {   \
-g_assert(have_tb_lock); \
-}   \
-} while (0)
-#endif
-
-
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
@@ -847,7 +827,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
 TranslationBlock *tb;
 
-assert_tb_lock();
+assert_tb_locked();
 
 if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
 return NULL;
@@ -862,7 +842,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
-assert_tb_lock();
+assert_tb_locked();
 
 /* In practice this is mostly used for single use temporary TB
Ignore the hard cases and just back up if this TB happens to
@@ -1104,7 +1084,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 uint32_t h;
 tb_page_addr_t phys_pc;
 
-assert_tb_lock();
+assert_tb_locked();
 
 atomic_set(&tb->invalid, true);
 
@@ -1419,7 +1399,7 @@ static void tb_invalidate_phys_range_1(tb_page_addr_t 
start, tb_page_addr_t end)
 #ifdef CONFIG_SOFTMMU
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
-assert_tb_lock();
+assert_tb_locked();
 tb_invalidate_phys_range_1(start, end);
 }
 #else
@@ -1462,7 +1442,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif /* TARGET_HAS_PRECISE_SMC */
 
 assert_memory_lock();
-assert_tb_lock();
+assert_tb_locked();
 
 p = page_find(start >> TARGET_PAGE_BITS);
 if (!p) {
-- 
2.11.0

[Qemu-devel] [PATCH v8 06/25] tcg: add kick timer for single-threaded vCPU emulation

2017-01-27 Thread Alex Bennée

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v2
  - re-base fixes
  - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
v3
  - add define for TCG_KICK_FREQ
  - fix checkpatch warning
v4
  - wrap next calc in inline qemu_tcg_next_kick() instead of macro
v5
  - move all kick code into own section
  - use global for timer
  - add helper functions to start/stop timer
  - stop timer when all cores paused
v7
  - checkpatch > 80 char fix
---
 cpus.c | 61 +
 1 file changed, 61 insertions(+)

diff --git a/cpus.c b/cpus.c
index 76b6e04332..a98925105c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -767,6 +767,53 @@ void configure_icount(QemuOpts *opts, Error **errp)
 }
 
 /***/
+/* TCG vCPU kick timer
+ *
+ * The kick timer is responsible for moving single threaded vCPU
+ * emulation on to the next vCPU. If more than one vCPU is running a
+ * timer event with force a cpu->exit so the next vCPU can get
+ * scheduled.
+ *
+ * The timer is removed if all vCPUs are idle and restarted again once
+ * idleness is complete.
+ */
+
+static QEMUTimer *tcg_kick_vcpu_timer;
+
+static void qemu_cpu_kick_no_halt(void);
+
+#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
+
+static inline int64_t qemu_tcg_next_kick(void)
+{
+return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+}
+
+static void kick_tcg_thread(void *opaque)
+{
+timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+qemu_cpu_kick_no_halt();
+}
+
+static void start_tcg_kick_timer(void)
+{
+if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+   kick_tcg_thread, NULL);
+timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+}
+}
+
+static void stop_tcg_kick_timer(void)
+{
+if (tcg_kick_vcpu_timer) {
+timer_del(tcg_kick_vcpu_timer);
+tcg_kick_vcpu_timer = NULL;
+}
+}
+
+
+/***/
 void hw_error(const char *fmt, ...)
 {
 va_list ap;
@@ -1020,9 +1067,12 @@ static void qemu_wait_io_event_common(CPUState *cpu)
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
 while (all_cpu_threads_idle()) {
+stop_tcg_kick_timer();
 qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
 }
 
+start_tcg_kick_timer();
+
 while (iothread_requesting_mutex) {
 qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
 }
@@ -1222,6 +1272,15 @@ static void deal_with_unplugged_cpus(void)
 }
 }
 
+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
@@ -1248,6 +1307,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 }
 
+start_tcg_kick_timer();
+
 /* process any pending work */
 atomic_mb_set(&exit_request, 1);
 
-- 
2.11.0

[Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-27 Thread Alex Bennée

From: Jan Kiszka 

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka 
Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic 
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota 
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v8:
 - merged in BQL fixes for PPC target: ppc_set_irq
 - merged in BQL fixes for ARM target: ARM_CP_IO helpers
 - merged in BQL fixes for ARM target: arm_call_el_change_hook

v5 (ajb, base patches):
 - added an assert to BQL unlock/lock functions instead of hanging
 - ensure all cpu->interrupt_requests *modifications* protected by BQL
 - add a re-read on cpu->interrupt_request for correctness
 - BQL fixes for:
   - assert BQL held for PPC hypercalls (emulate_spar_hypercall)
   - SCLP service calls on s390x
 - merge conflict with kick timer patch
v4 (ajb, base patches):
 - protect cpu->interrupt updates with BQL
 - fix wording io_mem_notdirty calls
 - s/we/with/
v3 (ajb, base-patches):
  - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
  with it in the longjmp).
  - fix re-base conflicts
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
- add assert on mem region modification
- ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
v1 (ajb, base-patches):
  - SMP failure now fixed by previous commit

Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c | 20 ++--
 cpus.c | 28 +---
 cputlb.c   | 21 -
 exec.c | 12 +---
 hw/core/irq.c  |  1 +
 hw/i386/kvmvapic.c |  4 ++--
 hw/intc/arm_gicv3_cpuif.c  |  3 +++
 hw/ppc/ppc.c   | 16 +++-
 hw/ppc/spapr.c |  3 +++
 include/qom/cpu.h  |  1 +
 memory.c   |  2 ++
 qom/cpu.c  | 10 ++
 target/arm/helper.c|  6 ++
 target/arm/op_helper.c | 43 +++
 target/i386/smm_helper.c   |  7 +++
 target/s390x/misc_helper.c |  5 -
 translate-all.c|  9 +++--
 translate-common.c | 21 +++--
 18 files changed, 163 insertions(+), 49 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f9e836c8dd..f42a128bdf 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
 && replay_interrupt()) {
 X86CPU *x86_cpu = X86_CPU(cpu);
+qemu_mutex_lock_iothread();
 apic_poll_irq(x86_cpu->apic_state);
 cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+qemu_mutex_unlock_iothread();
 }
 #endif
 if (!cpu_has_work(cpu)) {
@@ -443,7 +446,9 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #else
 if (replay_exception()) {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+qemu_mutex_lock_iothread();
 cc->do_interrupt(cpu);
+qemu_mutex_unlock_iothread();
 cpu->exception_index = -1;
 } else if (!replay_has_interrupt()) {

[Qemu-devel] [PATCH v8 04/25] tcg: move TCG_MO/BAR types into own file

2017-01-27 Thread Alex Bennée

We'll be using the memory ordering definitions to define values for
both the host and guest. To avoid fighting with circular header
dependencies just move these types into their own minimal header.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 tcg/tcg-mo.h | 45 +
 tcg/tcg.h| 18 +-
 2 files changed, 46 insertions(+), 17 deletions(-)
 create mode 100644 tcg/tcg-mo.h

diff --git a/tcg/tcg-mo.h b/tcg/tcg-mo.h
new file mode 100644
index 00..429b022561
--- /dev/null
+++ b/tcg/tcg-mo.h
@@ -0,0 +1,45 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TCG_MO_H
+#define TCG_MO_H
+
+typedef enum {
+/* Used to indicate the type of accesses on which ordering
+   is to be ensured.  Modeled after SPARC barriers.  */
+TCG_MO_LD_LD  = 0x01,
+TCG_MO_ST_LD  = 0x02,
+TCG_MO_LD_ST  = 0x04,
+TCG_MO_ST_ST  = 0x08,
+TCG_MO_ALL= 0x0F,  /* OR of the above */
+
+/* Used to indicate the kind of ordering which is to be ensured by the
+   instruction.  These types are derived from x86/aarch64 instructions.
+   It should be noted that these are different from C11 semantics.  */
+TCG_BAR_LDAQ  = 0x10,  /* Following ops will not come forward */
+TCG_BAR_STRL  = 0x20,  /* Previous ops will not be delayed */
+TCG_BAR_SC= 0x30,  /* No ops cross barrier; OR of the above */
+} TCGBar;
+
+#endif /* TCG_MO_H */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 631c6f69b1..f946452049 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -29,6 +29,7 @@
 #include "cpu.h"
 #include "exec/tb-context.h"
 #include "qemu/bitops.h"
+#include "tcg-mo.h"
 #include "tcg-target.h"
 
 /* XXX: make safe guess about sizes */
@@ -498,23 +499,6 @@ static inline intptr_t QEMU_ARTIFICIAL 
GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_DUMMY_TCGV MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG  ((TCGArg)(-1))
 
-typedef enum {
-/* Used to indicate the type of accesses on which ordering
-   is to be ensured.  Modeled after SPARC barriers.  */
-TCG_MO_LD_LD  = 0x01,
-TCG_MO_ST_LD  = 0x02,
-TCG_MO_LD_ST  = 0x04,
-TCG_MO_ST_ST  = 0x08,
-TCG_MO_ALL= 0x0F,  /* OR of the above */
-
-/* Used to indicate the kind of ordering which is to be ensured by the
-   instruction.  These types are derived from x86/aarch64 instructions.
-   It should be noted that these are different from C11 semantics.  */
-TCG_BAR_LDAQ  = 0x10,  /* Following ops will not come forward */
-TCG_BAR_STRL  = 0x20,  /* Previous ops will not be delayed */
-TCG_BAR_SC= 0x30,  /* No ops cross barrier; OR of the above */
-} TCGBar;
-
 /* Conditions.  Note that these are laid out for easy manipulation by
the functions below:
  bit 0 is used for inverting;
-- 
2.11.0

[Qemu-devel] [PATCH v8 05/25] tcg: add options for enabling MTTCG

2017-01-27 Thread Alex Bennée

From: KONRAD Frederic 

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic 
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v1:
  - merge with add mttcg option.
  - update commit message
v2:
  - machine_init->opts_init
v3:
  - moved from -tcg to -accel tcg,thread=single|multi
  - fix checkpatch warnings
v4:
  - make mttcg_enabled extern, qemu_tcg_mttcg_enabled() now just macro
  - qemu_tcg_configure now propagates Error instead of exiting
  - better error checking of thread=foo
  - use CONFIG flags for default_mttcg_enabled()
  - disable mttcg with icount, error if both forced on
v7
  - explicitly disable MTTCG for TCG_OVERSIZED_GUEST
  - use check_tcg_memory_orders_compatible() instead of CONFIG_MTTCG_HOST
  - change CONFIG_MTTCG_TARGET to TARGET_SUPPORTS_MTTCG
v8
  - fix missing include tcg.h
  - change mismatched MOs to a warning instead of error
---
 cpus.c| 72 +++
 include/qom/cpu.h |  9 +++
 include/sysemu/cpus.h |  2 ++
 qemu-options.hx   | 20 ++
 tcg/tcg.h |  9 +++
 vl.c  | 49 ++-
 6 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 71a82e5004..76b6e04332 100644
--- a/cpus.c
+++ b/cpus.c
@@ -25,6 +25,7 @@
 /* Needed early for CONFIG_BSD etc. */
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/config-file.h"
 #include "cpu.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
@@ -45,6 +46,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/bitmap.h"
 #include "qemu/seqlock.h"
+#include "tcg.h"
 #include "qapi-event.h"
 #include "hw/nmi.h"
 #include "sysemu/replay.h"
@@ -150,6 +152,76 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+bool mttcg_enabled;
+
+/*
+ * We default to false if we know other options have been enabled
+ * which are currently incompatible with MTTCG. Otherwise when each
+ * guest (target) has been updated to support:
+ *   - atomic instructions
+ *   - memory ordering primitives (barriers)
+ * they can set the appropriate CONFIG flags in ${target}-softmmu.mak
+ *
+ * Once a guest architecture has been converted to the new primitives
+ * there are two remaining limitations to check.
+ *
+ * - The guest can't be oversized (e.g. 64 bit guest on 32 bit host)
+ * - The host must have a stronger memory order than the guest
+ *
+ * It may be possible in future to support strong guests on weak hosts
+ * but that will require tagging all load/stores in a guest with their
+ * implicit memory order requirements which would likely slow things
+ * down a lot.
+ */
+
+static bool check_tcg_memory_orders_compatible(void)
+{
+#if defined(TCG_DEFAULT_MO) && defined(TCG_TARGET_DEFAULT_MO)
+return (TCG_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO) == 0;
+#else
+return false;
+#endif
+}
+
+static bool default_mttcg_enabled(void)
+{
+QemuOpts *icount_opts = qemu_find_opts_singleton("icount");
+const char *rr = qemu_opt_get(icount_opts, "rr");
+
+if (rr || TCG_OVERSIZED_GUEST) {
+return false;
+} else {
+#ifdef TARGET_SUPPORTS_MTTCG
+return check_tcg_memory_orders_compatible();
+#else
+return false;
+#endif
+}
+}
+
+void qemu_tcg_configure(QemuOpts *opts, Error **errp)
+{
+const char *t = qemu_opt_get(opts, "thread");
+if (t) {
+if (strcmp(t, "multi") == 0) {
+if (TCG_OVERSIZED_GUEST) {
+error_setg(errp, "No MTTCG when guest word size > hosts");
+} else {
+if (!check_tcg_memory_orders_compatible()) {
+error_report("Guest requires stronger MO that host");
+error_printf("Results will likely be unpredictable");
+}
+mttcg_enabled = true;
+}
+} else if (strcmp(t, "single") == 0) {
+mttcg_enabled = false;
+} else {
+error_setg(errp, "Invalid 'thread' setting %s", t);
+}
+} else {
+mttcg_enabled = default_mttcg_enabled();
+}
+}
 
 int64_t cpu_get_icount_raw(void)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index ca4d0fb1b4..11db2015a4 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -412,6 +412,15 @@ extern struct CPUTailQ cpus;
 extern __thread CPUState *current_cpu;
 
 /**
+ * qemu_tcg_mttcg_enabled:
+ * Check whether we are running MultiThread TCG or not.
+ *
+ * Returns: %true if we are in MTTCG mode %false otherwise.
+ */
+extern bool mttcg_enabled;
+#define qemu_tcg_mttcg_enabled() (mttcg_enabled)
+
+/**
  * cpu_paging_enabled:
  * @cp

[Qemu-devel] [PATCH v8 01/25] docs: new design document multi-thread-tcg.txt

2017-01-27 Thread Alex Bennée

This documents the current design for upgrading TCG emulation to take
advantage of modern CPUs by running a thread-per-CPU. The document goes
through the various areas of the code affected by such a change and
proposes design requirements for each part of the solution.

The text marked with (Current solution[s]) to document what the current
approaches being used are.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v1
  - initial version
v2
  - update discussion on locks
  - bit more detail on vCPU scheduling
  - explicitly mention Translation Blocks
  - emulated hardware state already covered by iomutex
  - a few minor rewords
v3
  - mention this covers system-mode
  - describe main main-loop and lookup hot-path
  - mention multi-concurrent-reader lookups
  - enumerate reasons for invalidation
  - add more details on lookup structures
  - describe the softmmu hot-path better
  - mention store-after-load barrier problem
v4
  - mention some cross-over between linux-user/system emulation
  - various minor grammar and scanning fixes
  - fix reference to tb_ctx.htbale
  - describe the solution for hot-path
  - more detail on TB flushing and invalidation
  - add (Current solution) following design requirements
  - more detail on iothread/BQL mutex
  - mention implicit memory barriers
  - add links to current LL/SC and cmpxchg patch sets
  - add TLB flag setting as an additional requirement
v6
 - remove DRAFTING, update copyright dates
 - document current solutions to each design requirement
 - tb_lock() serialisation for codegen/patch
 - cputlb changes to defer cross-vCPU flushes
 - cputlb atomic updates for slow-path
 - BQL usage for hardware serialisation
 - cmpxchg as initial atomic/synchronisation support mechanism
v7
 - minor format fix
 - include target-mips in list of MB aware front-ends
 - mention BQL around IRQ raising
 - update with notes on _all_cpus and the wait flag
---
 docs/multi-thread-tcg.txt | 350 ++
 1 file changed, 350 insertions(+)
 create mode 100644 docs/multi-thread-tcg.txt

diff --git a/docs/multi-thread-tcg.txt b/docs/multi-thread-tcg.txt
new file mode 100644
index 00..a99b4564c6
--- /dev/null
+++ b/docs/multi-thread-tcg.txt
@@ -0,0 +1,350 @@
+Copyright (c) 2015-2016 Linaro Ltd.
+
+This work is licensed under the terms of the GNU GPL, version 2 or
+later. See the COPYING file in the top-level directory.
+
+Introduction
+
+
+This document outlines the design for multi-threaded TCG system-mode
+emulation. The current user-mode emulation mirrors the thread
+structure of the translated executable. Some of the work will be
+applicable to both system and linux-user emulation.
+
+The original system-mode TCG implementation was single threaded and
+dealt with multiple CPUs with simple round-robin scheduling. This
+simplified a lot of things but became increasingly limited as systems
+being emulated gained additional cores and per-core performance gains
+for host systems started to level off.
+
+vCPU Scheduling
+===
+
+We introduce a new running mode where each vCPU will run on its own
+user-space thread. This will be enabled by default for all FE/BE
+combinations that have had the required work done to support this
+safely.
+
+In the general case of running translated code there should be no
+inter-vCPU dependencies and all vCPUs should be able to run at full
+speed. Synchronisation will only be required while accessing internal
+shared data structures or when the emulated architecture requires a
+coherent representation of the emulated machine state.
+
+Shared Data Structures
+==
+
+Main Run Loop
+-
+
+Even when there is no code being generated there are a number of
+structures associated with the hot-path through the main run-loop.
+These are associated with looking up the next translation block to
+execute. These include:
+
+tb_jmp_cache (per-vCPU, cache of recent jumps)
+tb_ctx.htable (global hash table, phys address->tb lookup)
+
+As TB linking only occurs when blocks are in the same page this code
+is critical to performance as looking up the next TB to execute is the
+most common reason to exit the generated code.
+
+DESIGN REQUIREMENT: Make access to lookup structures safe with
+multiple reader/writer threads. Minimise any lock contention to do it.
+
+The hot-path avoids using locks where possible. The tb_jmp_cache is
+updated with atomic accesses to ensure consistent results. The fall
+back QHT based hash table is also designed for lockless lookups. Locks
+are only taken when code generation is required or TranslationBlocks
+have their block-to-block jumps patched.
+
+Global TCG State
+
+
+We need to protect the entire code generation cycle including any post
+generation patching of the translated code. This also implies a shared
+translation buffer which contains code running on all cores. Any
+execution path that comes to the main run l

[Qemu-devel] [PATCH v8 11/25] tcg: enable thread-per-vCPU

2017-01-27 Thread Alex Bennée

There are a couple of changes that occur at the same time here:

  - introduce a single vCPU qemu_tcg_cpu_thread_fn

  One of these is spawned per vCPU with its own Thread and Condition
  variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old
  single threaded function.

  - the TLS current_cpu variable is now live for the lifetime of MTTCG
vCPU threads. This is for future work where async jobs need to know
the vCPU context they are operating in.

The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test kvm-unit-test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

We enable the parallel_cpus flag to ensure we generate correct barrier
and atomic code if supported by the front and backends. As each back end
and front end is updated they can add CONFIG_MTTCG_TARGET and
CONFIG_MTTCG_HOST to their respective make configurations so
default_mttcg_enabled does the right thing.

Signed-off-by: KONRAD Frederic 
Signed-off-by: Paolo Bonzini 
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v1 (ajb):
  - fix merge conflicts
  - maintain single-thread approach
v2
  - re-base fixes (no longer has tb_find_fast lock tweak ahead)
  - remove bogus break condition on cpu->stop/stopped
  - only process exiting cpus exit_request
  - handle all cpus idle case (fixes shutdown issues)
  - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up)
  - move icount timer into helper
v3
  - update the commit message
  - rm kick_timer tweaks (move to earlier tcg_current_cpu tweaks)
  - ensure linux-user clears cpu->exit_request in loop
  - purging of global exit_request and tcg_current_cpu in earlier patches
  - fix checkpatch warnings
v4
  - don't break loop on stopped, we may never schedule next in RR mode
  - make sure we flush iorequests of current cpu if we exited on one
  - add tcg_cpu_exec_start/end wraps for async work functions
  - stop killing of current_cpu on loop exit
  - set current_cpu in the single thread function
  - remove sleep special case, add qemu_tcg_should_sleep() for mttcg
  - no need to atomic set cpu->exit_request going into the loop
  - removed extraneous setting of exit_request
  - split tb_lock() part of patch
  - rename single thread fn to qemu_tcg_rr_cpu_thread_fn
v5
  - enable parallel_cpus for MTTCG (for barriers/atomics)
  - expand on CONFIG_ flags in commit message
v7
  - move parallel_cpus down into the mttcg leg
  - minor ws merge fix
---
 cpu-exec.c |   5 ---
 cpus.c | 134 +++--
 2 files changed, 103 insertions(+), 36 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index cc09c1fc37..ef328087be 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -396,7 +396,6 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 }
 #endif
 if (!cpu_has_work(cpu)) {
-current_cpu = NULL;
 return true;
 }
 
@@ -540,7 +539,6 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 
 
 if (unlikely(atomic_read(&cpu->exit_request) || replay_has_interrupt())) {
-atomic_set(&cpu->exit_request, 0);
 cpu->exception_index = EXCP_INTERRUPT;
 cpu_loop_exit(cpu);
 }
@@ -675,8 +673,5 @@ int cpu_exec(CPUState *cpu)
 cc->cpu_exec_exit(cpu);
 rcu_read_unlock();
 
-/* fail safe : never use current_cpu outside cpu_exec() */
-current_cpu = NULL;
-
 return ret;
 }
diff --git a/cpus.c b/cpus.c
index 18daf41dae..ecd1ec08d3 100644
--- a/cpus.c
+++ b/cpus.c
@@ -808,7 +808,7 @@ static void kick_tcg_thread(void *opaque)
 
 static void start_tcg_kick_timer(void)
 {
-if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+if (!mttcg_enabled && !tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
 tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
kick_tcg_thread, NULL);
 timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
@@ -1062,27 +1062,34 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+atomic_mb_set(&cpu->thread_kicked, false);
 if (cpu->stop) {
 cpu->stop = false;
 cpu->stopped = true;
 qemu_cond_broadcast(&qemu_pause_cond);
 }
 process_queued_cpu_work(cpu);
-cpu->thread_kicked = false;
+}
+
+static bool qemu_tcg_should_sleep(CPUState *cpu)
+{
+if (mttcg_enabled) {
+return cpu_thread_is_idle(cpu);
+} else {
+return all_cpu_threads_idle();
+}
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-while (all_cpu_threads_idle()) {
+while (qemu_tcg_should_sleep(cpu)) {
 stop_tcg_kick_timer();
 qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);

[Qemu-devel] [PATCH v8 12/25] tcg: handle EXCP_ATOMIC exception for system emulation

2017-01-27 Thread Alex Bennée

From: Pranith Kumar 

The patch enables handling atomic code in the guest. This should be
preferably done in cpu_handle_exception(), but the current assumptions
regarding when we can execute atomic sections cause a deadlock.

Signed-off-by: Pranith Kumar 
[AJB: tweak title]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cpus.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/cpus.c b/cpus.c
index ecd1ec08d3..e3d9f3fe21 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1346,6 +1346,11 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
 if (r == EXCP_DEBUG) {
 cpu_handle_guest_debug(cpu);
 break;
+} else if (r == EXCP_ATOMIC) {
+qemu_mutex_unlock_iothread();
+cpu_exec_step_atomic(cpu);
+qemu_mutex_lock_iothread();
+break;
 }
 } else if (cpu->stop) {
 if (cpu->unplug) {
@@ -1456,6 +1461,10 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
  */
 g_assert(cpu->halted);
 break;
+case EXCP_ATOMIC:
+qemu_mutex_unlock_iothread();
+cpu_exec_step_atomic(cpu);
+qemu_mutex_lock_iothread();
 default:
 /* Ignore everything else? */
 break;
-- 
2.11.0

[Qemu-devel] [PATCH v8 15/25] cputlb: introduce tlb_flush_* async work.

2017-01-27 Thread Alex Bennée

From: KONRAD Frederic 

Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.

We take the tb_lock() when doing this to avoid racing with other threads
which may be invalidating TB's at the same time. The alternative would
be to use proper atomic primitives to clear the tlb entries en-mass.

This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.

Signed-off-by: KONRAD Frederic 
[AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
Signed-off-by: Alex Bennée 

---
v8
  - fix merge failure mentioning global flush
v6 (base patches)
  - don't use cmpxchg_bool (we drop it later anyway)
  - use RUN_ON_CPU macros instead of inlines
  - bug out of tlb_flush if !tcg_enabled() (MacOSX make check failure)
v5 (base patches)
  - take tb_lock() for memset
  - ensure tb_flush_page properly asyncs work for other vCPUs
  - use run_on_cpu_data
v4 (base_patches)
  - brought forward from arm enabling series
  - restore pending_tlb_flush flag
v1
  - Remove tlb_flush_all just do the check in tlb_flush.
  - remove the need to g_malloc
  - tlb_flush calls direct if !cpu->created
---
 cputlb.c| 66 +++--
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h   |  6 +
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 94fa9977c5..5dfd3c3ba9 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -64,6 +64,10 @@
 } \
 } while (0)
 
+/* run_on_cpu_data.target_ptr should always be big enough for a
+ * target_ulong even on 32 bit builds */
+QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
+
 /* statistics */
 int tlb_flush_count;
 
@@ -72,13 +76,22 @@ int tlb_flush_count;
  * flushing more entries than required is only an efficiency issue,
  * not a correctness issue.
  */
-void tlb_flush(CPUState *cpu)
+static void tlb_flush_nocheck(CPUState *cpu)
 {
 CPUArchState *env = cpu->env_ptr;
 
+/* The QOM tests will trigger tlb_flushes without setting up TCG
+ * so we bug out here in that case.
+ */
+if (!tcg_enabled()) {
+return;
+}
+
 assert_cpu_is_self(cpu);
 tlb_debug("(count: %d)\n", tlb_flush_count++);
 
+tb_lock();
+
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -86,6 +99,27 @@ void tlb_flush(CPUState *cpu)
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
+
+tb_unlock();
+
+atomic_mb_set(&cpu->pending_tlb_flush, false);
+}
+
+static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data)
+{
+tlb_flush_nocheck(cpu);
+}
+
+void tlb_flush(CPUState *cpu)
+{
+if (cpu->created && !qemu_cpu_is_self(cpu)) {
+if (atomic_cmpxchg(&cpu->pending_tlb_flush, false, true) == true) {
+async_run_on_cpu(cpu, tlb_flush_global_async_work,
+ RUN_ON_CPU_NULL);
+}
+} else {
+tlb_flush_nocheck(cpu);
+}
 }
 
 static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
@@ -95,6 +129,8 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 assert_cpu_is_self(cpu);
 tlb_debug("start\n");
 
+tb_lock();
+
 for (;;) {
 int mmu_idx = va_arg(argp, int);
 
@@ -109,6 +145,8 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 }
 
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+
+tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, ...)
@@ -131,13 +169,15 @@ static inline void tlb_flush_entry(CPUTLBEntry 
*tlb_entry, target_ulong addr)
 }
 }
 
-void tlb_flush_page(CPUState *cpu, target_ulong addr)
+static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
+target_ulong addr = (target_ulong) data.target_ptr;
 int i;
 int mmu_idx;
 
 assert_cpu_is_self(cpu);
+
 tlb_debug("page :" TARGET_FMT_lx "\n", addr);
 
 /* Check if we need to flush due to large pages.  */
@@ -167,6 +207,18 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 tb_flush_jmp_cache(cpu, addr);
 }
 
+void tlb_flush_page(CPUState *cpu, target_ulong addr)
+{
+tlb_debug("page :" TARGET_FMT_lx "\n", addr);
+
+if (!qemu_cpu_is_self(cpu)) {
+async_run_on_cpu(cpu, tlb_flush_page_async_work,
+ RUN_ON_CPU_TARGET_PTR(addr));
+} else {
+tlb_flush_page_async_work(cpu, RUN_ON_CPU_TARGET_PTR(addr));
+}
+}
+
 void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
 {
 CPUArchState *env = cpu->env_ptr;
@@ -213,6 +265,16 @@ void tlb_flus

[Qemu-devel] [PATCH v8 07/25] tcg: rename tcg_current_cpu to tcg_current_rr_cpu

2017-01-27 Thread Alex Bennée

..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.

qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

For the time being the setting of the global exit_request remains.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v4:
  - keep global exit_request setting for now
  - fix merge conflicts
v5:
  - merge conflicts with kick changes
---
 cpu-exec-common.c   |  1 -
 cpu-exec.c  |  3 ---
 cpus.c  | 41 ++---
 include/exec/exec-all.h |  1 -
 4 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 767d9c6f0c..e2bc053372 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -24,7 +24,6 @@
 #include "exec/memory-internal.h"
 
 bool exit_request;
-CPUState *tcg_current_cpu;
 
 /* exit the current TB, but without causing any exception to be raised */
 void cpu_loop_exit_noexc(CPUState *cpu)
diff --git a/cpu-exec.c b/cpu-exec.c
index 1b8685dc21..f9e836c8dd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -609,7 +609,6 @@ int cpu_exec(CPUState *cpu)
 return EXCP_HALTED;
 }
 
-atomic_mb_set(&tcg_current_cpu, cpu);
 rcu_read_lock();
 
 if (unlikely(atomic_mb_read(&exit_request))) {
@@ -668,7 +667,5 @@ int cpu_exec(CPUState *cpu)
 /* fail safe : never use current_cpu outside cpu_exec() */
 current_cpu = NULL;
 
-/* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-atomic_set(&tcg_current_cpu, NULL);
 return ret;
 }
diff --git a/cpus.c b/cpus.c
index a98925105c..6d64199831 100644
--- a/cpus.c
+++ b/cpus.c
@@ -779,8 +779,7 @@ void configure_icount(QemuOpts *opts, Error **errp)
  */
 
 static QEMUTimer *tcg_kick_vcpu_timer;
-
-static void qemu_cpu_kick_no_halt(void);
+static CPUState *tcg_current_rr_cpu;
 
 #define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
 
@@ -789,10 +788,23 @@ static inline int64_t qemu_tcg_next_kick(void)
 return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
 }
 
+/* Kick the currently round-robin scheduled vCPU */
+static void qemu_cpu_kick_rr_cpu(void)
+{
+CPUState *cpu;
+atomic_mb_set(&exit_request, 1);
+do {
+cpu = atomic_mb_read(&tcg_current_rr_cpu);
+if (cpu) {
+cpu_exit(cpu);
+}
+} while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
+}
+
 static void kick_tcg_thread(void *opaque)
 {
 timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-qemu_cpu_kick_no_halt();
+qemu_cpu_kick_rr_cpu();
 }
 
 static void start_tcg_kick_timer(void)
@@ -812,7 +824,6 @@ static void stop_tcg_kick_timer(void)
 }
 }
 
-
 /***/
 void hw_error(const char *fmt, ...)
 {
@@ -1323,6 +1334,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 
 for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
 qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
   (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1342,6 +1354,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 
 } /* for cpu.. */
+/* Does not need atomic_mb_set because a spurious wakeup is okay.  */
+atomic_set(&tcg_current_rr_cpu, NULL);
 
 /* Pairs with smp_wmb in qemu_cpu_kick.  */
 atomic_mb_set(&exit_request, 0);
@@ -1420,24 +1434,13 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #endif
 }
 
-static void qemu_cpu_kick_no_halt(void)
-{
-CPUState *cpu;
-/* Ensure whatever caused the exit has reached the CPU threads before
- * writing exit_request.
- */
-atomic_mb_set(&exit_request, 1);
-cpu = atomic_mb_read(&tcg_current_cpu);
-if (cpu) {
-cpu_exit(cpu);
-}
-}
-
 void qemu_cpu_kick(CPUState *cpu)
 {
 qemu_cond_broadcast(cpu->halt_cond);
 if (tcg_enabled()) {
-qemu_cpu_kick_no_halt();
+cpu_exit(cpu);
+/* Also ensure current RR cpu is kicked */
+qemu_cpu_kick_rr_cpu();
 } else {
 if (hax_enabled()) {
 /*
@@ -1485,7 +1488,7 @@ void qemu_mutex_lock_iothread(void)
 atomic_dec(&iothread_requesting_mutex);
 } else {
 if (qemu_mutex_trylock(&qemu_global_mutex)) {
-qemu_cpu_kick_no_halt();
+qemu_cpu_kick_rr_cpu();
 qemu_mutex_lock(&qemu_global_mutex);
 }
 atomic_dec(&iothread_requesting_mutex);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bbc9478a50..3cbd359dd7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -404,7 +404,6

[Qemu-devel] [PATCH v8 20/25] target-arm/powerctl: defer cpu reset work to CPU context

2017-01-27 Thread Alex Bennée

When switching a new vCPU on we want to complete a bunch of the setup
work before we start scheduling the vCPU thread. To do this cleanly we
defer vCPU setup to async work which will run the vCPUs execution
context as the thread is woken up. The scheduling of the work will kick
the vCPU awake.

This avoids potential races in MTTCG system emulation.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v7
  - add const to static mode_for_el[] array
  - fix checkpatch long lines
---
 target/arm/arm-powerctl.c | 146 --
 1 file changed, 88 insertions(+), 58 deletions(-)

diff --git a/target/arm/arm-powerctl.c b/target/arm/arm-powerctl.c
index fbb7a15daa..082788e3a4 100644
--- a/target/arm/arm-powerctl.c
+++ b/target/arm/arm-powerctl.c
@@ -48,11 +48,87 @@ CPUState *arm_get_cpu_by_id(uint64_t id)
 return NULL;
 }
 
+struct cpu_on_info {
+uint64_t entry;
+uint64_t context_id;
+uint32_t target_el;
+bool target_aa64;
+};
+
+
+static void arm_set_cpu_on_async_work(CPUState *target_cpu_state,
+  run_on_cpu_data data)
+{
+ARMCPU *target_cpu = ARM_CPU(target_cpu_state);
+struct cpu_on_info *info = (struct cpu_on_info *) data.host_ptr;
+
+/* Initialize the cpu we are turning on */
+cpu_reset(target_cpu_state);
+target_cpu->powered_off = false;
+target_cpu_state->halted = 0;
+
+if (info->target_aa64) {
+if ((info->target_el < 3) && arm_feature(&target_cpu->env,
+ ARM_FEATURE_EL3)) {
+/*
+ * As target mode is AArch64, we need to set lower
+ * exception level (the requested level 2) to AArch64
+ */
+target_cpu->env.cp15.scr_el3 |= SCR_RW;
+}
+
+if ((info->target_el < 2) && arm_feature(&target_cpu->env,
+ ARM_FEATURE_EL2)) {
+/*
+ * As target mode is AArch64, we need to set lower
+ * exception level (the requested level 1) to AArch64
+ */
+target_cpu->env.cp15.hcr_el2 |= HCR_RW;
+}
+
+target_cpu->env.pstate = aarch64_pstate_mode(info->target_el, true);
+} else {
+/* We are requested to boot in AArch32 mode */
+static const uint32_t mode_for_el[] = { 0,
+ARM_CPU_MODE_SVC,
+ARM_CPU_MODE_HYP,
+ARM_CPU_MODE_SVC };
+
+cpsr_write(&target_cpu->env, mode_for_el[info->target_el], CPSR_M,
+   CPSRWriteRaw);
+}
+
+if (info->target_el == 3) {
+/* Processor is in secure mode */
+target_cpu->env.cp15.scr_el3 &= ~SCR_NS;
+} else {
+/* Processor is not in secure mode */
+target_cpu->env.cp15.scr_el3 |= SCR_NS;
+}
+
+/* We check if the started CPU is now at the correct level */
+assert(info->target_el == arm_current_el(&target_cpu->env));
+
+if (info->target_aa64) {
+target_cpu->env.xregs[0] = info->context_id;
+target_cpu->env.thumb = false;
+} else {
+target_cpu->env.regs[0] = info->context_id;
+target_cpu->env.thumb = info->entry & 1;
+info->entry &= 0xfffe;
+}
+
+/* Start the new CPU at the requested address */
+cpu_set_pc(target_cpu_state, info->entry);
+g_free(info);
+}
+
 int arm_set_cpu_on(uint64_t cpuid, uint64_t entry, uint64_t context_id,
uint32_t target_el, bool target_aa64)
 {
 CPUState *target_cpu_state;
 ARMCPU *target_cpu;
+struct cpu_on_info *info;
 
 DPRINTF("cpu %" PRId64 " (EL %d, %s) @ 0x%" PRIx64 " with R0 = 0x%" PRIx64
 "\n", cpuid, target_el, target_aa64 ? "aarch64" : "aarch32", entry,
@@ -109,64 +185,18 @@ int arm_set_cpu_on(uint64_t cpuid, uint64_t entry, 
uint64_t context_id,
 return QEMU_ARM_POWERCTL_INVALID_PARAM;
 }
 
-/* Initialize the cpu we are turning on */
-cpu_reset(target_cpu_state);
-target_cpu->powered_off = false;
-target_cpu_state->halted = 0;
-
-if (target_aa64) {
-if ((target_el < 3) && arm_feature(&target_cpu->env, ARM_FEATURE_EL3)) 
{
-/*
- * As target mode is AArch64, we need to set lower
- * exception level (the requested level 2) to AArch64
- */
-target_cpu->env.cp15.scr_el3 |= SCR_RW;
-}
-
-if ((target_el < 2) && arm_feature(&target_cpu->env, ARM_FEATURE_EL2)) 
{
-/*
- * As target mode is AArch64, we need to set lower
- * exception level (the requested level 1) to AArch64
- */
-target_cpu->env.cp15.hcr_el2 |= HCR_RW;
-}
-
-target_cpu->env.pstate = aarch64_pstate_mode(target_el, true);
-} else {
-/* We are requested to boot in AArch32 mode */
-

[Qemu-devel] [PATCH v8 10/25] tcg: enable tb_lock() for SoftMMU

2017-01-27 Thread Alex Bennée

tb_lock() has long been used for linux-user mode to protect code
generation. By enabling it now we prepare for MTTCG and ensure all code
generation is serialised by this lock. The other major structure that
needs protecting is the l1_map and its PageDesc structures. For the
SoftMMU case we also use tb_lock() to protect these structures instead
of linux-user mmap_lock() which as the name suggests serialises updates
to the structure as a result of guest mmap operations.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v4
  - split from main tcg: enable thread-per-vCPU patch
v7
  - fixed up with Pranith's tcg_debug_assert() changes
---
 translate-all.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 41b36f04c6..87e9d00d14 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -75,7 +75,7 @@
  * mmap_lock.
  */
 #ifdef CONFIG_SOFTMMU
-#define assert_memory_lock() do { /* nothing */ } while (0)
+#define assert_memory_lock() tcg_debug_assert(have_tb_lock)
 #else
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
@@ -135,9 +135,7 @@ TCGContext tcg_ctx;
 bool parallel_cpus;
 
 /* translation block context */
-#ifdef CONFIG_USER_ONLY
 __thread int have_tb_lock;
-#endif
 
 static void page_table_config_init(void)
 {
@@ -159,40 +157,29 @@ static void page_table_config_init(void)
 assert(v_l2_levels >= 0);
 }
 
-#ifdef CONFIG_USER_ONLY
 #define assert_tb_locked() tcg_debug_assert(have_tb_lock)
 #define assert_tb_unlocked() tcg_debug_assert(!have_tb_lock)
-#else
-#define assert_tb_locked()  do { /* nothing */ } while (0)
-#define assert_tb_unlocked()  do { /* nothing */ } while (0)
-#endif
 
 void tb_lock(void)
 {
-#ifdef CONFIG_USER_ONLY
 assert_tb_unlocked();
 qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock++;
-#endif
 }
 
 void tb_unlock(void)
 {
-#ifdef CONFIG_USER_ONLY
 assert_tb_locked();
 have_tb_lock--;
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
-#endif
 }
 
 void tb_lock_reset(void)
 {
-#ifdef CONFIG_USER_ONLY
 if (have_tb_lock) {
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock = 0;
 }
-#endif
 }
 
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-- 
2.11.0

[Qemu-devel] [PATCH v8 04/25] tcg: move TCG_MO/BAR types into own file

2017-01-27 Thread Alex Bennée

We'll be using the memory ordering definitions to define values for
both the host and guest. To avoid fighting with circular header
dependencies just move these types into their own minimal header.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 tcg/tcg-mo.h | 45 +
 tcg/tcg.h| 18 +-
 2 files changed, 46 insertions(+), 17 deletions(-)
 create mode 100644 tcg/tcg-mo.h

diff --git a/tcg/tcg-mo.h b/tcg/tcg-mo.h
new file mode 100644
index 00..429b022561
--- /dev/null
+++ b/tcg/tcg-mo.h
@@ -0,0 +1,45 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TCG_MO_H
+#define TCG_MO_H
+
+typedef enum {
+/* Used to indicate the type of accesses on which ordering
+   is to be ensured.  Modeled after SPARC barriers.  */
+TCG_MO_LD_LD  = 0x01,
+TCG_MO_ST_LD  = 0x02,
+TCG_MO_LD_ST  = 0x04,
+TCG_MO_ST_ST  = 0x08,
+TCG_MO_ALL= 0x0F,  /* OR of the above */
+
+/* Used to indicate the kind of ordering which is to be ensured by the
+   instruction.  These types are derived from x86/aarch64 instructions.
+   It should be noted that these are different from C11 semantics.  */
+TCG_BAR_LDAQ  = 0x10,  /* Following ops will not come forward */
+TCG_BAR_STRL  = 0x20,  /* Previous ops will not be delayed */
+TCG_BAR_SC= 0x30,  /* No ops cross barrier; OR of the above */
+} TCGBar;
+
+#endif /* TCG_MO_H */
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 631c6f69b1..f946452049 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -29,6 +29,7 @@
 #include "cpu.h"
 #include "exec/tb-context.h"
 #include "qemu/bitops.h"
+#include "tcg-mo.h"
 #include "tcg-target.h"
 
 /* XXX: make safe guess about sizes */
@@ -498,23 +499,6 @@ static inline intptr_t QEMU_ARTIFICIAL 
GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_DUMMY_TCGV MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG  ((TCGArg)(-1))
 
-typedef enum {
-/* Used to indicate the type of accesses on which ordering
-   is to be ensured.  Modeled after SPARC barriers.  */
-TCG_MO_LD_LD  = 0x01,
-TCG_MO_ST_LD  = 0x02,
-TCG_MO_LD_ST  = 0x04,
-TCG_MO_ST_ST  = 0x08,
-TCG_MO_ALL= 0x0F,  /* OR of the above */
-
-/* Used to indicate the kind of ordering which is to be ensured by the
-   instruction.  These types are derived from x86/aarch64 instructions.
-   It should be noted that these are different from C11 semantics.  */
-TCG_BAR_LDAQ  = 0x10,  /* Following ops will not come forward */
-TCG_BAR_STRL  = 0x20,  /* Previous ops will not be delayed */
-TCG_BAR_SC= 0x30,  /* No ops cross barrier; OR of the above */
-} TCGBar;
-
 /* Conditions.  Note that these are laid out for easy manipulation by
the functions below:
  bit 0 is used for inverting;
-- 
2.11.0

[Qemu-devel] [PATCH v8 19/25] cputlb: introduce tlb_flush_*_all_cpus[_synced]

2017-01-27 Thread Alex Bennée

This introduces support to the cputlb API for flushing all CPUs TLBs
with one call. This avoids the need for target helpers to iterate
through the vCPUs themselves.

An additional variant of the API (_synced) do not return from the
caller and will cause the work to be scheduled as "safe work". The
result will be all the flush operations will be complete by the time
the originating vCPU starts executing again. It is up to the caller to
ensure enough state has been saved so execution can be restarted at
the next appropriate instruction.

Some guest architectures can defer completion of flush operations
until later. If they later schedule work using the async_safe_work
mechanism they can be sure other vCPUs will have flushed their TLBs by
the point execution returns from the safe work.

Signed-off-by: Alex Bennée 

---
v7
  - some checkpatch long line fixes
v8
  - change from varg to bitmap calling convention
  - add _synced variants, re-factored helper
---
 cputlb.c| 110 +++---
 include/exec/exec-all.h | 114 ++--
 2 files changed, 215 insertions(+), 9 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 65003350e3..7f9a54f253 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -73,6 +73,25 @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > 
sizeof(run_on_cpu_data));
 QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
 #define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
 
+/* flush_all_helper: run fn across all cpus
+ *
+ * If the wait flag is set then the src cpu's helper will be queued as
+ * "safe" work and the loop exited creating a synchronisation point
+ * where all queued work will be finished before execution starts
+ * again.
+ */
+static void flush_all_helper(CPUState *src, run_on_cpu_func fn,
+ run_on_cpu_data d)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+if (cpu != src) {
+async_run_on_cpu(cpu, fn, d);
+}
+}
+}
+
 /* statistics */
 int tlb_flush_count;
 
@@ -128,6 +147,19 @@ void tlb_flush(CPUState *cpu)
 }
 }
 
+void tlb_flush_all_cpus(CPUState *src_cpu)
+{
+flush_all_helper(src_cpu, tlb_flush_global_async_work, RUN_ON_CPU_NULL);
+tlb_flush_global_async_work(src_cpu, RUN_ON_CPU_NULL);
+}
+
+void QEMU_NORETURN tlb_flush_all_cpus_synced(CPUState *src_cpu)
+{
+flush_all_helper(src_cpu, tlb_flush_global_async_work, RUN_ON_CPU_NULL);
+tlb_flush_global_async_work(src_cpu, RUN_ON_CPU_NULL);
+cpu_loop_exit(src_cpu);
+}
+
 static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
@@ -178,6 +210,30 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 }
 }
 
+void tlb_flush_by_mmuidx_all_cpus(CPUState *src_cpu, uint16_t idxmap)
+{
+const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work;
+
+tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
+
+flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
+fn(src_cpu, RUN_ON_CPU_HOST_INT(idxmap));
+}
+
+void QEMU_NORETURN tlb_flush_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+   uint16_t idxmap)
+{
+const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work;
+
+tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
+
+flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
+async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
+cpu_loop_exit(src_cpu);
+}
+
+
+
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
 {
 if (addr == (tlb_entry->addr_read &
@@ -317,14 +373,56 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, uint16_t idxmap)
 }
 }
 
-void tlb_flush_page_all(target_ulong addr)
+void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
+   uint16_t idxmap)
 {
-CPUState *cpu;
+const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work;
+target_ulong addr_and_mmu_idx;
 
-CPU_FOREACH(cpu) {
-async_run_on_cpu(cpu, tlb_flush_page_async_work,
- RUN_ON_CPU_TARGET_PTR(addr));
-}
+tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
+
+/* This should already be page aligned */
+addr_and_mmu_idx = addr & TARGET_PAGE_MASK;
+addr_and_mmu_idx |= idxmap;
+
+flush_all_helper(src_cpu, fn, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
+fn(src_cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
+}
+
+void QEMU_NORETURN tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+target_ulong addr,
+uint16_t idxmap)
+{
+const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work;
+target_ulong addr_and_mmu_idx;
+
+tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
+
+/* This should alread

[Qemu-devel] [PATCH v8 09/25] tcg: remove global exit_request

2017-01-27 Thread Alex Bennée

There are now only two uses of the global exit_request left.

The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.

The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.

We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v5
  - minor merge conflict with kick patch
v4
  - moved to after iothread unlocking patch
  - needed to remove kick exit_request as well.
  - remove extraneous cpu->exit_request check
  - remove stray exit_request setting
  - remove needless atomic operation
---
 cpu-exec-common.c   |  2 --
 cpu-exec.c  |  9 ++---
 cpus.c  | 18 ++
 include/exec/exec-all.h |  3 ---
 4 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index e2bc053372..0504a9457b 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -23,8 +23,6 @@
 #include "exec/exec-all.h"
 #include "exec/memory-internal.h"
 
-bool exit_request;
-
 /* exit the current TB, but without causing any exception to be raised */
 void cpu_loop_exit_noexc(CPUState *cpu)
 {
diff --git a/cpu-exec.c b/cpu-exec.c
index f42a128bdf..cc09c1fc37 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -565,9 +565,8 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 /* Something asked us to stop executing
  * chained TBs; just continue round the main
  * loop. Whatever requested the exit will also
- * have set something else (eg exit_request or
- * interrupt_request) which we will handle
- * next time around the loop.  But we need to
+ * have set something else (eg interrupt_request) which we
+ * will handle next time around the loop.  But we need to
  * ensure the tcg_exit_req read in generated code
  * comes before the next read of cpu->exit_request
  * or cpu->interrupt_request.
@@ -623,10 +622,6 @@ int cpu_exec(CPUState *cpu)
 
 rcu_read_lock();
 
-if (unlikely(atomic_mb_read(&exit_request))) {
-cpu->exit_request = 1;
-}
-
 cc->cpu_exec_enter(cpu);
 
 /* Calculate difference between guest clock and host clock.
diff --git a/cpus.c b/cpus.c
index c48bc8d5b3..18daf41dae 100644
--- a/cpus.c
+++ b/cpus.c
@@ -792,7 +792,6 @@ static inline int64_t qemu_tcg_next_kick(void)
 static void qemu_cpu_kick_rr_cpu(void)
 {
 CPUState *cpu;
-atomic_mb_set(&exit_request, 1);
 do {
 cpu = atomic_mb_read(&tcg_current_rr_cpu);
 if (cpu) {
@@ -1315,11 +1314,11 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
 start_tcg_kick_timer();
 
-/* process any pending work */
-atomic_mb_set(&exit_request, 1);
-
 cpu = first_cpu;
 
+/* process any pending work */
+cpu->exit_request = 1;
+
 while (1) {
 /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
 qemu_account_warp_timer();
@@ -1328,7 +1327,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 cpu = first_cpu;
 }
 
-for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+while (cpu && !cpu->exit_request) {
 atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
 qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
@@ -1348,12 +1347,15 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 break;
 }
 
-} /* for cpu.. */
+cpu = CPU_NEXT(cpu);
+} /* while (cpu && !cpu->exit_request).. */
+
 /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
 atomic_set(&tcg_current_rr_cpu, NULL);
 
-/* Pairs with smp_wmb in qemu_cpu_kick.  */
-atomic_mb_set(&exit_request, 0);
+if (cpu && cpu->exit_request) {
+atomic_mb_set(&cpu->exit_request, 0);
+}
 
 handle_icount_deadline();
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3cbd359dd7..bd4622ac5d 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -403,7 +403,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 /* vl.c */
 extern int singlestep;
 
-/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
-extern bool exit_request;
-
 #endif
-- 
2.11.0

[Qemu-devel] [PATCH v8 03/25] mttcg: Add missing tb_lock/unlock() in cpu_exec_step()

2017-01-27 Thread Alex Bennée

From: Pranith Kumar 

The recent patch enabling lock assertions uncovered the missing lock
acquisition in cpu_exec_step(). This patch adds them.

CC: Richard Henderson 
CC: Alex Bennée 
Signed-off-by: Pranith Kumar 
---
 cpu-exec.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/cpu-exec.c b/cpu-exec.c
index 4188fed3c6..1b8685dc21 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -233,14 +233,18 @@ static void cpu_exec_step(CPUState *cpu)
 uint32_t flags;
 
 cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
+tb_lock();
 tb = tb_gen_code(cpu, pc, cs_base, flags,
  1 | CF_NOCACHE | CF_IGNORE_ICOUNT);
 tb->orig_tb = NULL;
+tb_unlock();
 /* execute the generated code */
 trace_exec_tb_nocache(tb, pc);
 cpu_tb_exec(cpu, tb);
+tb_lock();
 tb_phys_invalidate(tb, -1);
 tb_free(tb);
+tb_unlock();
 }
 
 void cpu_exec_step_atomic(CPUState *cpu)
-- 
2.11.0

[Qemu-devel] [PATCH v8 00/25] Remaining MTTCG Base patches and ARM enablement

2017-01-27 Thread Alex Bennée

Hi,

All of the changes in this revision are addressing comments from v7
posted last week. A new pre-cursor patch was added:

  cputlb and arm/sparc targets: convert mmuidx flushes from varg to
bitmap

To change the cputlb API to use a bitmap instead of vargs. This has
generated quite a bit of churn in the ARM target but it is pretty
mechanical.

I also folded the BQL irq protection patches from v7 into:

  tcg: drop global lock during TCG code execution

This is required to keep the series bisectable although the BQL safety
is only really relevant to guests using MTTCG. I didn't think it was
worth making the asserts conditional on parallel_cpus although it does
mean this patch gets a little bigger.

The other big change was to:

  cputlb: introduce tlb_flush_*_all_cpus[_synced]

Where I replaced the wait flag with an expanded set of API calls. The
*_synced variants which are marked as QEMU_NORETURN to make their
behaviour clear.

The series applies to origin/master as of today and you can find my
tree at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v8

There is the usual collection of r-b tags and minor merge/re-base
fixes all documented in the --- sections of the commit messages.

In terms of merging strategy I would appreciate some thoughts. While I
think the series is ready to go I appreciate it is quite a chunk to
merge in one go. That said an early merge gives us plenty of time to
shake out any lingering issues before feature freeze.

I guess the key decider is that we are happy the design provides for
solutions for any other things we come across?

Cheers,

Alex

Alex Bennée (19):
  docs: new design document multi-thread-tcg.txt
  tcg: move TCG_MO/BAR types into own file
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  tcg: remove global exit_request
  tcg: enable tb_lock() for SoftMMU
  tcg: enable thread-per-vCPU
  cputlb: add assert_cpu_is_self checks
  cputlb: tweak qemu_ram_addr_from_host_nofail reporting
  cputlb and arm/sparc targets: convert mmuidx flushes from varg to
bitmap
  cputlb: add tlb_flush_by_mmuidx async routines
  cputlb: atomically update tlb fields used by tlb_reset_dirty
  cputlb: introduce tlb_flush_*_all_cpus[_synced]
  target-arm/powerctl: defer cpu reset work to CPU context
  target-arm: don't generate WFE/YIELD calls for MTTCG
  target-arm/cpu.h: make ARM_CP defined consistent
  target-arm: introduce ARM_CP_EXIT_PC
  target-arm: ensure all cross vCPUs TLB flushes complete
  tcg: enable MTTCG by default for ARM on x86 hosts

Jan Kiszka (1):
  tcg: drop global lock during TCG code execution

KONRAD Frederic (2):
  tcg: add options for enabling MTTCG
  cputlb: introduce tlb_flush_* async work.

Pranith Kumar (3):
  mttcg: translate-all: Enable locking debug in a debug build
  mttcg: Add missing tb_lock/unlock() in cpu_exec_step()
  tcg: handle EXCP_ATOMIC exception for system emulation

 configure  |   6 +
 cpu-exec-common.c  |   3 -
 cpu-exec.c |  41 ++--
 cpus.c | 343 ++---
 cputlb.c   | 465 +
 docs/multi-thread-tcg.txt  | 350 ++
 exec.c |  12 +-
 hw/core/irq.c  |   1 +
 hw/i386/kvmvapic.c |   4 +-
 hw/intc/arm_gicv3_cpuif.c  |   3 +
 hw/ppc/ppc.c   |  16 +-
 hw/ppc/spapr.c |   3 +
 include/exec/cputlb.h  |   2 -
 include/exec/exec-all.h| 130 +++--
 include/qom/cpu.h  |  16 ++
 include/sysemu/cpus.h  |   2 +
 memory.c   |   2 +
 qemu-options.hx|  20 ++
 qom/cpu.c  |  10 +
 target/arm/arm-powerctl.c  | 146 --
 target/arm/cpu.h   |  73 ---
 target/arm/helper.c| 385 ++---
 target/arm/op_helper.c |  50 -
 target/arm/translate-a64.c |  26 ++-
 target/arm/translate.c |  46 +++--
 target/arm/translate.h |   4 +-
 target/i386/smm_helper.c   |   7 +
 target/s390x/misc_helper.c |   5 +-
 target/sparc/ldst_helper.c |   8 +-
 tcg/i386/tcg-target.h  |  16 ++
 tcg/tcg-mo.h   |  45 +
 tcg/tcg.h  |  27 +--
 translate-all.c|  66 ++-
 translate-common.c |  21 +-
 vl.c   |  49 -
 35 files changed, 1818 insertions(+), 585 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt
 create mode 100644 tcg/tcg-mo.h

-- 
2.11.0

[Qemu-devel] [PATCH v8 02/25] mttcg: translate-all: Enable locking debug in a debug build

2017-01-27 Thread Alex Bennée

From: Pranith Kumar 

Enable tcg lock debug asserts in a debug build by default instead of
relying on DEBUG_LOCKING. None of the other DEBUG_* macros have
asserts, so this patch removes DEBUG_LOCKING and enable these asserts
in a debug build.

CC: Richard Henderson 
Signed-off-by: Pranith Kumar 
[AJB: tweak ifdefs so can be early in series]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 translate-all.c | 52 
 1 file changed, 16 insertions(+), 36 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 20262938bb..055436a676 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -59,7 +59,6 @@
 
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
-/* #define DEBUG_LOCKING */
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
@@ -74,20 +73,10 @@
  * access to the memory related structures are protected with the
  * mmap_lock.
  */
-#ifdef DEBUG_LOCKING
-#define DEBUG_MEM_LOCKS 1
-#else
-#define DEBUG_MEM_LOCKS 0
-#endif
-
 #ifdef CONFIG_SOFTMMU
 #define assert_memory_lock() do { /* nothing */ } while (0)
 #else
-#define assert_memory_lock() do {   \
-if (DEBUG_MEM_LOCKS) {  \
-g_assert(have_mmap_lock()); \
-}   \
-} while (0)
+#define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
 
 #define SMC_BITMAP_USE_THRESHOLD 10
@@ -169,10 +158,18 @@ static void page_table_config_init(void)
 assert(v_l2_levels >= 0);
 }
 
+#ifdef CONFIG_USER_ONLY
+#define assert_tb_locked() tcg_debug_assert(have_tb_lock)
+#define assert_tb_unlocked() tcg_debug_assert(!have_tb_lock)
+#else
+#define assert_tb_locked()  do { /* nothing */ } while (0)
+#define assert_tb_unlocked()  do { /* nothing */ } while (0)
+#endif
+
 void tb_lock(void)
 {
 #ifdef CONFIG_USER_ONLY
-assert(!have_tb_lock);
+assert_tb_unlocked();
 qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock++;
 #endif
@@ -181,7 +178,7 @@ void tb_lock(void)
 void tb_unlock(void)
 {
 #ifdef CONFIG_USER_ONLY
-assert(have_tb_lock);
+assert_tb_locked();
 have_tb_lock--;
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
 #endif
@@ -197,23 +194,6 @@ void tb_lock_reset(void)
 #endif
 }
 
-#ifdef DEBUG_LOCKING
-#define DEBUG_TB_LOCKS 1
-#else
-#define DEBUG_TB_LOCKS 0
-#endif
-
-#ifdef CONFIG_SOFTMMU
-#define assert_tb_lock() do { /* nothing */ } while (0)
-#else
-#define assert_tb_lock() do {   \
-if (DEBUG_TB_LOCKS) {   \
-g_assert(have_tb_lock); \
-}   \
-} while (0)
-#endif
-
-
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
@@ -847,7 +827,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
 TranslationBlock *tb;
 
-assert_tb_lock();
+assert_tb_locked();
 
 if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
 return NULL;
@@ -862,7 +842,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
-assert_tb_lock();
+assert_tb_locked();
 
 /* In practice this is mostly used for single use temporary TB
Ignore the hard cases and just back up if this TB happens to
@@ -1104,7 +1084,7 @@ void tb_phys_invalidate(TranslationBlock *tb, 
tb_page_addr_t page_addr)
 uint32_t h;
 tb_page_addr_t phys_pc;
 
-assert_tb_lock();
+assert_tb_locked();
 
 atomic_set(&tb->invalid, true);
 
@@ -1419,7 +1399,7 @@ static void tb_invalidate_phys_range_1(tb_page_addr_t 
start, tb_page_addr_t end)
 #ifdef CONFIG_SOFTMMU
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
-assert_tb_lock();
+assert_tb_locked();
 tb_invalidate_phys_range_1(start, end);
 }
 #else
@@ -1462,7 +1442,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, 
tb_page_addr_t end,
 #endif /* TARGET_HAS_PRECISE_SMC */
 
 assert_memory_lock();
-assert_tb_lock();
+assert_tb_locked();
 
 p = page_find(start >> TARGET_PAGE_BITS);
 if (!p) {
-- 
2.11.0

[Qemu-devel] [PATCH v8 14/25] cputlb: tweak qemu_ram_addr_from_host_nofail reporting

2017-01-27 Thread Alex Bennée

This moves the helper function closer to where it is called and updates
the error message to report via error_report instead of the deprecated
fprintf.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cputlb.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index af0e65cd2c..94fa9977c5 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -246,18 +246,6 @@ void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, 
uintptr_t start,
 }
 }
 
-static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
-{
-ram_addr_t ram_addr;
-
-ram_addr = qemu_ram_addr_from_host(ptr);
-if (ram_addr == RAM_ADDR_INVALID) {
-fprintf(stderr, "Bad ram pointer %p\n", ptr);
-abort();
-}
-return ram_addr;
-}
-
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
 CPUArchState *env;
@@ -469,6 +457,18 @@ static void report_bad_exec(CPUState *cpu, target_ulong 
addr)
 log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP);
 }
 
+static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
+{
+ram_addr_t ram_addr;
+
+ram_addr = qemu_ram_addr_from_host(ptr);
+if (ram_addr == RAM_ADDR_INVALID) {
+error_report("Bad ram pointer %p", ptr);
+abort();
+}
+return ram_addr;
+}
+
 /* NOTE: this function can trigger an exception */
 /* NOTE2: the returned address is not exactly the physical address: it
  * is actually a ram_addr_t (in system mode; the user mode emulation
-- 
2.11.0

[Qemu-devel] [PATCH v8 10/25] tcg: enable tb_lock() for SoftMMU

2017-01-27 Thread Alex Bennée

tb_lock() has long been used for linux-user mode to protect code
generation. By enabling it now we prepare for MTTCG and ensure all code
generation is serialised by this lock. The other major structure that
needs protecting is the l1_map and its PageDesc structures. For the
SoftMMU case we also use tb_lock() to protect these structures instead
of linux-user mmap_lock() which as the name suggests serialises updates
to the structure as a result of guest mmap operations.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v4
  - split from main tcg: enable thread-per-vCPU patch
v7
  - fixed up with Pranith's tcg_debug_assert() changes
---
 translate-all.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 41b36f04c6..87e9d00d14 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -75,7 +75,7 @@
  * mmap_lock.
  */
 #ifdef CONFIG_SOFTMMU
-#define assert_memory_lock() do { /* nothing */ } while (0)
+#define assert_memory_lock() tcg_debug_assert(have_tb_lock)
 #else
 #define assert_memory_lock() tcg_debug_assert(have_mmap_lock())
 #endif
@@ -135,9 +135,7 @@ TCGContext tcg_ctx;
 bool parallel_cpus;
 
 /* translation block context */
-#ifdef CONFIG_USER_ONLY
 __thread int have_tb_lock;
-#endif
 
 static void page_table_config_init(void)
 {
@@ -159,40 +157,29 @@ static void page_table_config_init(void)
 assert(v_l2_levels >= 0);
 }
 
-#ifdef CONFIG_USER_ONLY
 #define assert_tb_locked() tcg_debug_assert(have_tb_lock)
 #define assert_tb_unlocked() tcg_debug_assert(!have_tb_lock)
-#else
-#define assert_tb_locked()  do { /* nothing */ } while (0)
-#define assert_tb_unlocked()  do { /* nothing */ } while (0)
-#endif
 
 void tb_lock(void)
 {
-#ifdef CONFIG_USER_ONLY
 assert_tb_unlocked();
 qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock++;
-#endif
 }
 
 void tb_unlock(void)
 {
-#ifdef CONFIG_USER_ONLY
 assert_tb_locked();
 have_tb_lock--;
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
-#endif
 }
 
 void tb_lock_reset(void)
 {
-#ifdef CONFIG_USER_ONLY
 if (have_tb_lock) {
 qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
 have_tb_lock = 0;
 }
-#endif
 }
 
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-- 
2.11.0

[Qemu-devel] [PATCH v8 06/25] tcg: add kick timer for single-threaded vCPU emulation

2017-01-27 Thread Alex Bennée

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v2
  - re-base fixes
  - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
v3
  - add define for TCG_KICK_FREQ
  - fix checkpatch warning
v4
  - wrap next calc in inline qemu_tcg_next_kick() instead of macro
v5
  - move all kick code into own section
  - use global for timer
  - add helper functions to start/stop timer
  - stop timer when all cores paused
v7
  - checkpatch > 80 char fix
---
 cpus.c | 61 +
 1 file changed, 61 insertions(+)

diff --git a/cpus.c b/cpus.c
index 76b6e04332..a98925105c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -767,6 +767,53 @@ void configure_icount(QemuOpts *opts, Error **errp)
 }
 
 /***/
+/* TCG vCPU kick timer
+ *
+ * The kick timer is responsible for moving single threaded vCPU
+ * emulation on to the next vCPU. If more than one vCPU is running a
+ * timer event with force a cpu->exit so the next vCPU can get
+ * scheduled.
+ *
+ * The timer is removed if all vCPUs are idle and restarted again once
+ * idleness is complete.
+ */
+
+static QEMUTimer *tcg_kick_vcpu_timer;
+
+static void qemu_cpu_kick_no_halt(void);
+
+#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
+
+static inline int64_t qemu_tcg_next_kick(void)
+{
+return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
+}
+
+static void kick_tcg_thread(void *opaque)
+{
+timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+qemu_cpu_kick_no_halt();
+}
+
+static void start_tcg_kick_timer(void)
+{
+if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+   kick_tcg_thread, NULL);
+timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
+}
+}
+
+static void stop_tcg_kick_timer(void)
+{
+if (tcg_kick_vcpu_timer) {
+timer_del(tcg_kick_vcpu_timer);
+tcg_kick_vcpu_timer = NULL;
+}
+}
+
+
+/***/
 void hw_error(const char *fmt, ...)
 {
 va_list ap;
@@ -1020,9 +1067,12 @@ static void qemu_wait_io_event_common(CPUState *cpu)
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
 while (all_cpu_threads_idle()) {
+stop_tcg_kick_timer();
 qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
 }
 
+start_tcg_kick_timer();
+
 while (iothread_requesting_mutex) {
 qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
 }
@@ -1222,6 +1272,15 @@ static void deal_with_unplugged_cpus(void)
 }
 }
 
+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
@@ -1248,6 +1307,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 }
 
+start_tcg_kick_timer();
+
 /* process any pending work */
 atomic_mb_set(&exit_request, 1);
 
-- 
2.11.0

[Qemu-devel] [PATCH v8 01/25] docs: new design document multi-thread-tcg.txt

2017-01-27 Thread Alex Bennée

This documents the current design for upgrading TCG emulation to take
advantage of modern CPUs by running a thread-per-CPU. The document goes
through the various areas of the code affected by such a change and
proposes design requirements for each part of the solution.

The text marked with (Current solution[s]) to document what the current
approaches being used are.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v1
  - initial version
v2
  - update discussion on locks
  - bit more detail on vCPU scheduling
  - explicitly mention Translation Blocks
  - emulated hardware state already covered by iomutex
  - a few minor rewords
v3
  - mention this covers system-mode
  - describe main main-loop and lookup hot-path
  - mention multi-concurrent-reader lookups
  - enumerate reasons for invalidation
  - add more details on lookup structures
  - describe the softmmu hot-path better
  - mention store-after-load barrier problem
v4
  - mention some cross-over between linux-user/system emulation
  - various minor grammar and scanning fixes
  - fix reference to tb_ctx.htbale
  - describe the solution for hot-path
  - more detail on TB flushing and invalidation
  - add (Current solution) following design requirements
  - more detail on iothread/BQL mutex
  - mention implicit memory barriers
  - add links to current LL/SC and cmpxchg patch sets
  - add TLB flag setting as an additional requirement
v6
 - remove DRAFTING, update copyright dates
 - document current solutions to each design requirement
 - tb_lock() serialisation for codegen/patch
 - cputlb changes to defer cross-vCPU flushes
 - cputlb atomic updates for slow-path
 - BQL usage for hardware serialisation
 - cmpxchg as initial atomic/synchronisation support mechanism
v7
 - minor format fix
 - include target-mips in list of MB aware front-ends
 - mention BQL around IRQ raising
 - update with notes on _all_cpus and the wait flag
---
 docs/multi-thread-tcg.txt | 350 ++
 1 file changed, 350 insertions(+)
 create mode 100644 docs/multi-thread-tcg.txt

diff --git a/docs/multi-thread-tcg.txt b/docs/multi-thread-tcg.txt
new file mode 100644
index 00..a99b4564c6
--- /dev/null
+++ b/docs/multi-thread-tcg.txt
@@ -0,0 +1,350 @@
+Copyright (c) 2015-2016 Linaro Ltd.
+
+This work is licensed under the terms of the GNU GPL, version 2 or
+later. See the COPYING file in the top-level directory.
+
+Introduction
+
+
+This document outlines the design for multi-threaded TCG system-mode
+emulation. The current user-mode emulation mirrors the thread
+structure of the translated executable. Some of the work will be
+applicable to both system and linux-user emulation.
+
+The original system-mode TCG implementation was single threaded and
+dealt with multiple CPUs with simple round-robin scheduling. This
+simplified a lot of things but became increasingly limited as systems
+being emulated gained additional cores and per-core performance gains
+for host systems started to level off.
+
+vCPU Scheduling
+===
+
+We introduce a new running mode where each vCPU will run on its own
+user-space thread. This will be enabled by default for all FE/BE
+combinations that have had the required work done to support this
+safely.
+
+In the general case of running translated code there should be no
+inter-vCPU dependencies and all vCPUs should be able to run at full
+speed. Synchronisation will only be required while accessing internal
+shared data structures or when the emulated architecture requires a
+coherent representation of the emulated machine state.
+
+Shared Data Structures
+==
+
+Main Run Loop
+-
+
+Even when there is no code being generated there are a number of
+structures associated with the hot-path through the main run-loop.
+These are associated with looking up the next translation block to
+execute. These include:
+
+tb_jmp_cache (per-vCPU, cache of recent jumps)
+tb_ctx.htable (global hash table, phys address->tb lookup)
+
+As TB linking only occurs when blocks are in the same page this code
+is critical to performance as looking up the next TB to execute is the
+most common reason to exit the generated code.
+
+DESIGN REQUIREMENT: Make access to lookup structures safe with
+multiple reader/writer threads. Minimise any lock contention to do it.
+
+The hot-path avoids using locks where possible. The tb_jmp_cache is
+updated with atomic accesses to ensure consistent results. The fall
+back QHT based hash table is also designed for lockless lookups. Locks
+are only taken when code generation is required or TranslationBlocks
+have their block-to-block jumps patched.
+
+Global TCG State
+
+
+We need to protect the entire code generation cycle including any post
+generation patching of the translated code. This also implies a shared
+translation buffer which contains code running on all cores. Any
+execution path that comes to the main run l

[Qemu-devel] [PATCH v8 09/25] tcg: remove global exit_request

2017-01-27 Thread Alex Bennée

There are now only two uses of the global exit_request left.

The first ensures we exit the run_loop when we first start to process
pending work and in the kick handler. This is just as easily done by
setting the first_cpu->exit_request flag.

The second use is in the round robin kick routine. The global
exit_request ensured every vCPU would set its local exit_request and
cause a full exit of the loop. Now the iothread isn't being held while
running we can just rely on the kick handler to push us out as intended.

We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v5
  - minor merge conflict with kick patch
v4
  - moved to after iothread unlocking patch
  - needed to remove kick exit_request as well.
  - remove extraneous cpu->exit_request check
  - remove stray exit_request setting
  - remove needless atomic operation
---
 cpu-exec-common.c   |  2 --
 cpu-exec.c  |  9 ++---
 cpus.c  | 18 ++
 include/exec/exec-all.h |  3 ---
 4 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index e2bc053372..0504a9457b 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -23,8 +23,6 @@
 #include "exec/exec-all.h"
 #include "exec/memory-internal.h"
 
-bool exit_request;
-
 /* exit the current TB, but without causing any exception to be raised */
 void cpu_loop_exit_noexc(CPUState *cpu)
 {
diff --git a/cpu-exec.c b/cpu-exec.c
index f42a128bdf..cc09c1fc37 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -565,9 +565,8 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, 
TranslationBlock *tb,
 /* Something asked us to stop executing
  * chained TBs; just continue round the main
  * loop. Whatever requested the exit will also
- * have set something else (eg exit_request or
- * interrupt_request) which we will handle
- * next time around the loop.  But we need to
+ * have set something else (eg interrupt_request) which we
+ * will handle next time around the loop.  But we need to
  * ensure the tcg_exit_req read in generated code
  * comes before the next read of cpu->exit_request
  * or cpu->interrupt_request.
@@ -623,10 +622,6 @@ int cpu_exec(CPUState *cpu)
 
 rcu_read_lock();
 
-if (unlikely(atomic_mb_read(&exit_request))) {
-cpu->exit_request = 1;
-}
-
 cc->cpu_exec_enter(cpu);
 
 /* Calculate difference between guest clock and host clock.
diff --git a/cpus.c b/cpus.c
index c48bc8d5b3..18daf41dae 100644
--- a/cpus.c
+++ b/cpus.c
@@ -792,7 +792,6 @@ static inline int64_t qemu_tcg_next_kick(void)
 static void qemu_cpu_kick_rr_cpu(void)
 {
 CPUState *cpu;
-atomic_mb_set(&exit_request, 1);
 do {
 cpu = atomic_mb_read(&tcg_current_rr_cpu);
 if (cpu) {
@@ -1315,11 +1314,11 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 
 start_tcg_kick_timer();
 
-/* process any pending work */
-atomic_mb_set(&exit_request, 1);
-
 cpu = first_cpu;
 
+/* process any pending work */
+cpu->exit_request = 1;
+
 while (1) {
 /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
 qemu_account_warp_timer();
@@ -1328,7 +1327,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 cpu = first_cpu;
 }
 
-for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+while (cpu && !cpu->exit_request) {
 atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
 qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
@@ -1348,12 +1347,15 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 break;
 }
 
-} /* for cpu.. */
+cpu = CPU_NEXT(cpu);
+} /* while (cpu && !cpu->exit_request).. */
+
 /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
 atomic_set(&tcg_current_rr_cpu, NULL);
 
-/* Pairs with smp_wmb in qemu_cpu_kick.  */
-atomic_mb_set(&exit_request, 0);
+if (cpu && cpu->exit_request) {
+atomic_mb_set(&cpu->exit_request, 0);
+}
 
 handle_icount_deadline();
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 3cbd359dd7..bd4622ac5d 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -403,7 +403,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 /* vl.c */
 extern int singlestep;
 
-/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
-extern bool exit_request;
-
 #endif
-- 
2.11.0

[Qemu-devel] [PATCH v8 05/25] tcg: add options for enabling MTTCG

2017-01-27 Thread Alex Bennée

From: KONRAD Frederic 

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic 
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v1:
  - merge with add mttcg option.
  - update commit message
v2:
  - machine_init->opts_init
v3:
  - moved from -tcg to -accel tcg,thread=single|multi
  - fix checkpatch warnings
v4:
  - make mttcg_enabled extern, qemu_tcg_mttcg_enabled() now just macro
  - qemu_tcg_configure now propagates Error instead of exiting
  - better error checking of thread=foo
  - use CONFIG flags for default_mttcg_enabled()
  - disable mttcg with icount, error if both forced on
v7
  - explicitly disable MTTCG for TCG_OVERSIZED_GUEST
  - use check_tcg_memory_orders_compatible() instead of CONFIG_MTTCG_HOST
  - change CONFIG_MTTCG_TARGET to TARGET_SUPPORTS_MTTCG
v8
  - fix missing include tcg.h
  - change mismatched MOs to a warning instead of error
---
 cpus.c| 72 +++
 include/qom/cpu.h |  9 +++
 include/sysemu/cpus.h |  2 ++
 qemu-options.hx   | 20 ++
 tcg/tcg.h |  9 +++
 vl.c  | 49 ++-
 6 files changed, 160 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 71a82e5004..76b6e04332 100644
--- a/cpus.c
+++ b/cpus.c
@@ -25,6 +25,7 @@
 /* Needed early for CONFIG_BSD etc. */
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/config-file.h"
 #include "cpu.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
@@ -45,6 +46,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/bitmap.h"
 #include "qemu/seqlock.h"
+#include "tcg.h"
 #include "qapi-event.h"
 #include "hw/nmi.h"
 #include "sysemu/replay.h"
@@ -150,6 +152,76 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+bool mttcg_enabled;
+
+/*
+ * We default to false if we know other options have been enabled
+ * which are currently incompatible with MTTCG. Otherwise when each
+ * guest (target) has been updated to support:
+ *   - atomic instructions
+ *   - memory ordering primitives (barriers)
+ * they can set the appropriate CONFIG flags in ${target}-softmmu.mak
+ *
+ * Once a guest architecture has been converted to the new primitives
+ * there are two remaining limitations to check.
+ *
+ * - The guest can't be oversized (e.g. 64 bit guest on 32 bit host)
+ * - The host must have a stronger memory order than the guest
+ *
+ * It may be possible in future to support strong guests on weak hosts
+ * but that will require tagging all load/stores in a guest with their
+ * implicit memory order requirements which would likely slow things
+ * down a lot.
+ */
+
+static bool check_tcg_memory_orders_compatible(void)
+{
+#if defined(TCG_DEFAULT_MO) && defined(TCG_TARGET_DEFAULT_MO)
+return (TCG_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO) == 0;
+#else
+return false;
+#endif
+}
+
+static bool default_mttcg_enabled(void)
+{
+QemuOpts *icount_opts = qemu_find_opts_singleton("icount");
+const char *rr = qemu_opt_get(icount_opts, "rr");
+
+if (rr || TCG_OVERSIZED_GUEST) {
+return false;
+} else {
+#ifdef TARGET_SUPPORTS_MTTCG
+return check_tcg_memory_orders_compatible();
+#else
+return false;
+#endif
+}
+}
+
+void qemu_tcg_configure(QemuOpts *opts, Error **errp)
+{
+const char *t = qemu_opt_get(opts, "thread");
+if (t) {
+if (strcmp(t, "multi") == 0) {
+if (TCG_OVERSIZED_GUEST) {
+error_setg(errp, "No MTTCG when guest word size > hosts");
+} else {
+if (!check_tcg_memory_orders_compatible()) {
+error_report("Guest requires stronger MO that host");
+error_printf("Results will likely be unpredictable");
+}
+mttcg_enabled = true;
+}
+} else if (strcmp(t, "single") == 0) {
+mttcg_enabled = false;
+} else {
+error_setg(errp, "Invalid 'thread' setting %s", t);
+}
+} else {
+mttcg_enabled = default_mttcg_enabled();
+}
+}
 
 int64_t cpu_get_icount_raw(void)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index ca4d0fb1b4..11db2015a4 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -412,6 +412,15 @@ extern struct CPUTailQ cpus;
 extern __thread CPUState *current_cpu;
 
 /**
+ * qemu_tcg_mttcg_enabled:
+ * Check whether we are running MultiThread TCG or not.
+ *
+ * Returns: %true if we are in MTTCG mode %false otherwise.
+ */
+extern bool mttcg_enabled;
+#define qemu_tcg_mttcg_enabled() (mttcg_enabled)
+
+/**
  * cpu_paging_enabled:
  * @cp

[Qemu-devel] [PATCH v8 07/25] tcg: rename tcg_current_cpu to tcg_current_rr_cpu

2017-01-27 Thread Alex Bennée

..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.

qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

For the time being the setting of the global exit_request remains.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v4:
  - keep global exit_request setting for now
  - fix merge conflicts
v5:
  - merge conflicts with kick changes
---
 cpu-exec-common.c   |  1 -
 cpu-exec.c  |  3 ---
 cpus.c  | 41 ++---
 include/exec/exec-all.h |  1 -
 4 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 767d9c6f0c..e2bc053372 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -24,7 +24,6 @@
 #include "exec/memory-internal.h"
 
 bool exit_request;
-CPUState *tcg_current_cpu;
 
 /* exit the current TB, but without causing any exception to be raised */
 void cpu_loop_exit_noexc(CPUState *cpu)
diff --git a/cpu-exec.c b/cpu-exec.c
index 1b8685dc21..f9e836c8dd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -609,7 +609,6 @@ int cpu_exec(CPUState *cpu)
 return EXCP_HALTED;
 }
 
-atomic_mb_set(&tcg_current_cpu, cpu);
 rcu_read_lock();
 
 if (unlikely(atomic_mb_read(&exit_request))) {
@@ -668,7 +667,5 @@ int cpu_exec(CPUState *cpu)
 /* fail safe : never use current_cpu outside cpu_exec() */
 current_cpu = NULL;
 
-/* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-atomic_set(&tcg_current_cpu, NULL);
 return ret;
 }
diff --git a/cpus.c b/cpus.c
index a98925105c..6d64199831 100644
--- a/cpus.c
+++ b/cpus.c
@@ -779,8 +779,7 @@ void configure_icount(QemuOpts *opts, Error **errp)
  */
 
 static QEMUTimer *tcg_kick_vcpu_timer;
-
-static void qemu_cpu_kick_no_halt(void);
+static CPUState *tcg_current_rr_cpu;
 
 #define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
 
@@ -789,10 +788,23 @@ static inline int64_t qemu_tcg_next_kick(void)
 return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
 }
 
+/* Kick the currently round-robin scheduled vCPU */
+static void qemu_cpu_kick_rr_cpu(void)
+{
+CPUState *cpu;
+atomic_mb_set(&exit_request, 1);
+do {
+cpu = atomic_mb_read(&tcg_current_rr_cpu);
+if (cpu) {
+cpu_exit(cpu);
+}
+} while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
+}
+
 static void kick_tcg_thread(void *opaque)
 {
 timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
-qemu_cpu_kick_no_halt();
+qemu_cpu_kick_rr_cpu();
 }
 
 static void start_tcg_kick_timer(void)
@@ -812,7 +824,6 @@ static void stop_tcg_kick_timer(void)
 }
 }
 
-
 /***/
 void hw_error(const char *fmt, ...)
 {
@@ -1323,6 +1334,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 
 for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
 qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
   (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1342,6 +1354,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
 }
 
 } /* for cpu.. */
+/* Does not need atomic_mb_set because a spurious wakeup is okay.  */
+atomic_set(&tcg_current_rr_cpu, NULL);
 
 /* Pairs with smp_wmb in qemu_cpu_kick.  */
 atomic_mb_set(&exit_request, 0);
@@ -1420,24 +1434,13 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #endif
 }
 
-static void qemu_cpu_kick_no_halt(void)
-{
-CPUState *cpu;
-/* Ensure whatever caused the exit has reached the CPU threads before
- * writing exit_request.
- */
-atomic_mb_set(&exit_request, 1);
-cpu = atomic_mb_read(&tcg_current_cpu);
-if (cpu) {
-cpu_exit(cpu);
-}
-}
-
 void qemu_cpu_kick(CPUState *cpu)
 {
 qemu_cond_broadcast(cpu->halt_cond);
 if (tcg_enabled()) {
-qemu_cpu_kick_no_halt();
+cpu_exit(cpu);
+/* Also ensure current RR cpu is kicked */
+qemu_cpu_kick_rr_cpu();
 } else {
 if (hax_enabled()) {
 /*
@@ -1485,7 +1488,7 @@ void qemu_mutex_lock_iothread(void)
 atomic_dec(&iothread_requesting_mutex);
 } else {
 if (qemu_mutex_trylock(&qemu_global_mutex)) {
-qemu_cpu_kick_no_halt();
+qemu_cpu_kick_rr_cpu();
 qemu_mutex_lock(&qemu_global_mutex);
 }
 atomic_dec(&iothread_requesting_mutex);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bbc9478a50..3cbd359dd7 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -404,7 +404,6

[Qemu-devel] [PATCH v8 13/25] cputlb: add assert_cpu_is_self checks

2017-01-27 Thread Alex Bennée

For SoftMMU the TLB flushes are an example of a task that can be
triggered on one vCPU by another. To deal with this properly we need to
use safe work to ensure these changes are done safely. The new assert
can be enabled while debugging to catch these cases.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cputlb.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/cputlb.c b/cputlb.c
index 1cc9d9da51..af0e65cd2c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -58,6 +58,12 @@
 } \
 } while (0)
 
+#define assert_cpu_is_self(this_cpu) do { \
+if (DEBUG_TLB_GATE) { \
+g_assert(!cpu->created || qemu_cpu_is_self(cpu)); \
+} \
+} while (0)
+
 /* statistics */
 int tlb_flush_count;
 
@@ -70,6 +76,9 @@ void tlb_flush(CPUState *cpu)
 {
 CPUArchState *env = cpu->env_ptr;
 
+assert_cpu_is_self(cpu);
+tlb_debug("(count: %d)\n", tlb_flush_count++);
+
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -77,13 +86,13 @@ void tlb_flush(CPUState *cpu)
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
-tlb_flush_count++;
 }
 
 static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
 {
 CPUArchState *env = cpu->env_ptr;
 
+assert_cpu_is_self(cpu);
 tlb_debug("start\n");
 
 for (;;) {
@@ -128,6 +137,7 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 int i;
 int mmu_idx;
 
+assert_cpu_is_self(cpu);
 tlb_debug("page :" TARGET_FMT_lx "\n", addr);
 
 /* Check if we need to flush due to large pages.  */
@@ -165,6 +175,7 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, ...)
 
 va_start(argp, addr);
 
+assert_cpu_is_self(cpu);
 tlb_debug("addr "TARGET_FMT_lx"\n", addr);
 
 /* Check if we need to flush due to large pages.  */
@@ -253,6 +264,8 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, 
ram_addr_t length)
 
 int mmu_idx;
 
+assert_cpu_is_self(cpu);
+
 env = cpu->env_ptr;
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 unsigned int i;
@@ -284,6 +297,8 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 int i;
 int mmu_idx;
 
+assert_cpu_is_self(cpu);
+
 vaddr &= TARGET_PAGE_MASK;
 i = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -343,6 +358,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
 int asidx = cpu_asidx_from_attrs(cpu, attrs);
 
+assert_cpu_is_self(cpu);
 assert(size >= TARGET_PAGE_SIZE);
 if (size != TARGET_PAGE_SIZE) {
 tlb_add_large_page(env, vaddr, size);
-- 
2.11.0

[Qemu-devel] [PATCH v8 22/25] target-arm/cpu.h: make ARM_CP defined consistent

2017-01-27 Thread Alex Bennée

This is a purely mechanical change to make the ARM_CP flags neatly
align and use a consistent format so it is easier to see which bit
each flag is.

Signed-off-by: Alex Bennée 
---
 target/arm/cpu.h | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 274ef17562..f56a96c675 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1398,20 +1398,20 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
  * need to be surrounded by gen_io_start()/gen_io_end(). In particular,
  * registers which implement clocks or timers require this.
  */
-#define ARM_CP_SPECIAL 1
-#define ARM_CP_CONST 2
-#define ARM_CP_64BIT 4
-#define ARM_CP_SUPPRESS_TB_END 8
-#define ARM_CP_OVERRIDE 16
-#define ARM_CP_ALIAS 32
-#define ARM_CP_IO 64
-#define ARM_CP_NO_RAW 128
-#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
-#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
-#define ARM_CP_NZCV (ARM_CP_SPECIAL | (3 << 8))
-#define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | (4 << 8))
-#define ARM_CP_DC_ZVA (ARM_CP_SPECIAL | (5 << 8))
-#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
+#define ARM_CP_SPECIAL (1 << 0)
+#define ARM_CP_CONST   (1 << 1)
+#define ARM_CP_64BIT   (1 << 2)
+#define ARM_CP_SUPPRESS_TB_END (1 << 3)
+#define ARM_CP_OVERRIDE(1 << 4)
+#define ARM_CP_ALIAS   (1 << 5)
+#define ARM_CP_IO  (1 << 6)
+#define ARM_CP_NO_RAW  (1 << 7)
+#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
+#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
+#define ARM_CP_NZCV(ARM_CP_SPECIAL | (3 << 8))
+#define ARM_CP_CURRENTEL   (ARM_CP_SPECIAL | (4 << 8))
+#define ARM_CP_DC_ZVA  (ARM_CP_SPECIAL | (5 << 8))
+#define ARM_LAST_SPECIAL   ARM_CP_DC_ZVA
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
-- 
2.11.0

[Qemu-devel] [PATCH v8 15/25] cputlb: introduce tlb_flush_* async work.

2017-01-27 Thread Alex Bennée

From: KONRAD Frederic 

Some architectures allow to flush the tlb of other VCPUs. This is not a problem
when we have only one thread for all VCPUs but it definitely needs to be an
asynchronous work when we are in true multithreaded work.

We take the tb_lock() when doing this to avoid racing with other threads
which may be invalidating TB's at the same time. The alternative would
be to use proper atomic primitives to clear the tlb entries en-mass.

This patch doesn't do anything to protect other cputlb function being
called in MTTCG mode making cross vCPU changes.

Signed-off-by: KONRAD Frederic 
[AJB: remove need for g_malloc on defer, make check fixes, tb_lock]
Signed-off-by: Alex Bennée 

---
v8
  - fix merge failure mentioning global flush
v6 (base patches)
  - don't use cmpxchg_bool (we drop it later anyway)
  - use RUN_ON_CPU macros instead of inlines
  - bug out of tlb_flush if !tcg_enabled() (MacOSX make check failure)
v5 (base patches)
  - take tb_lock() for memset
  - ensure tb_flush_page properly asyncs work for other vCPUs
  - use run_on_cpu_data
v4 (base_patches)
  - brought forward from arm enabling series
  - restore pending_tlb_flush flag
v1
  - Remove tlb_flush_all just do the check in tlb_flush.
  - remove the need to g_malloc
  - tlb_flush calls direct if !cpu->created
---
 cputlb.c| 66 +++--
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h   |  6 +
 3 files changed, 71 insertions(+), 2 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 94fa9977c5..5dfd3c3ba9 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -64,6 +64,10 @@
 } \
 } while (0)
 
+/* run_on_cpu_data.target_ptr should always be big enough for a
+ * target_ulong even on 32 bit builds */
+QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
+
 /* statistics */
 int tlb_flush_count;
 
@@ -72,13 +76,22 @@ int tlb_flush_count;
  * flushing more entries than required is only an efficiency issue,
  * not a correctness issue.
  */
-void tlb_flush(CPUState *cpu)
+static void tlb_flush_nocheck(CPUState *cpu)
 {
 CPUArchState *env = cpu->env_ptr;
 
+/* The QOM tests will trigger tlb_flushes without setting up TCG
+ * so we bug out here in that case.
+ */
+if (!tcg_enabled()) {
+return;
+}
+
 assert_cpu_is_self(cpu);
 tlb_debug("(count: %d)\n", tlb_flush_count++);
 
+tb_lock();
+
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -86,6 +99,27 @@ void tlb_flush(CPUState *cpu)
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
+
+tb_unlock();
+
+atomic_mb_set(&cpu->pending_tlb_flush, false);
+}
+
+static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data)
+{
+tlb_flush_nocheck(cpu);
+}
+
+void tlb_flush(CPUState *cpu)
+{
+if (cpu->created && !qemu_cpu_is_self(cpu)) {
+if (atomic_cmpxchg(&cpu->pending_tlb_flush, false, true) == true) {
+async_run_on_cpu(cpu, tlb_flush_global_async_work,
+ RUN_ON_CPU_NULL);
+}
+} else {
+tlb_flush_nocheck(cpu);
+}
 }
 
 static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
@@ -95,6 +129,8 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 assert_cpu_is_self(cpu);
 tlb_debug("start\n");
 
+tb_lock();
+
 for (;;) {
 int mmu_idx = va_arg(argp, int);
 
@@ -109,6 +145,8 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 }
 
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+
+tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, ...)
@@ -131,13 +169,15 @@ static inline void tlb_flush_entry(CPUTLBEntry 
*tlb_entry, target_ulong addr)
 }
 }
 
-void tlb_flush_page(CPUState *cpu, target_ulong addr)
+static void tlb_flush_page_async_work(CPUState *cpu, run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
+target_ulong addr = (target_ulong) data.target_ptr;
 int i;
 int mmu_idx;
 
 assert_cpu_is_self(cpu);
+
 tlb_debug("page :" TARGET_FMT_lx "\n", addr);
 
 /* Check if we need to flush due to large pages.  */
@@ -167,6 +207,18 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 tb_flush_jmp_cache(cpu, addr);
 }
 
+void tlb_flush_page(CPUState *cpu, target_ulong addr)
+{
+tlb_debug("page :" TARGET_FMT_lx "\n", addr);
+
+if (!qemu_cpu_is_self(cpu)) {
+async_run_on_cpu(cpu, tlb_flush_page_async_work,
+ RUN_ON_CPU_TARGET_PTR(addr));
+} else {
+tlb_flush_page_async_work(cpu, RUN_ON_CPU_TARGET_PTR(addr));
+}
+}
+
 void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
 {
 CPUArchState *env = cpu->env_ptr;
@@ -213,6 +265,16 @@ void tlb_flus

[Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-27 Thread Alex Bennée

From: Jan Kiszka 

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka 
Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic 
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota 
[AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v8:
 - merged in BQL fixes for PPC target: ppc_set_irq
 - merged in BQL fixes for ARM target: ARM_CP_IO helpers
 - merged in BQL fixes for ARM target: arm_call_el_change_hook

v5 (ajb, base patches):
 - added an assert to BQL unlock/lock functions instead of hanging
 - ensure all cpu->interrupt_requests *modifications* protected by BQL
 - add a re-read on cpu->interrupt_request for correctness
 - BQL fixes for:
   - assert BQL held for PPC hypercalls (emulate_spar_hypercall)
   - SCLP service calls on s390x
 - merge conflict with kick timer patch
v4 (ajb, base patches):
 - protect cpu->interrupt updates with BQL
 - fix wording io_mem_notdirty calls
 - s/we/with/
v3 (ajb, base-patches):
  - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
  with it in the longjmp).
  - fix re-base conflicts
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
- add assert on mem region modification
- ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
v1 (ajb, base-patches):
  - SMP failure now fixed by previous commit

Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c | 20 ++--
 cpus.c | 28 +---
 cputlb.c   | 21 -
 exec.c | 12 +---
 hw/core/irq.c  |  1 +
 hw/i386/kvmvapic.c |  4 ++--
 hw/intc/arm_gicv3_cpuif.c  |  3 +++
 hw/ppc/ppc.c   | 16 +++-
 hw/ppc/spapr.c |  3 +++
 include/qom/cpu.h  |  1 +
 memory.c   |  2 ++
 qom/cpu.c  | 10 ++
 target/arm/helper.c|  6 ++
 target/arm/op_helper.c | 43 +++
 target/i386/smm_helper.c   |  7 +++
 target/s390x/misc_helper.c |  5 -
 translate-all.c|  9 +++--
 translate-common.c | 21 +++--
 18 files changed, 163 insertions(+), 49 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index f9e836c8dd..f42a128bdf 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
 && replay_interrupt()) {
 X86CPU *x86_cpu = X86_CPU(cpu);
+qemu_mutex_lock_iothread();
 apic_poll_irq(x86_cpu->apic_state);
 cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
+qemu_mutex_unlock_iothread();
 }
 #endif
 if (!cpu_has_work(cpu)) {
@@ -443,7 +446,9 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 #else
 if (replay_exception()) {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+qemu_mutex_lock_iothread();
 cc->do_interrupt(cpu);
+qemu_mutex_unlock_iothread();
 cpu->exception_index = -1;
 } else if (!replay_has_interrupt()) {

[Qemu-devel] [PATCH v8 14/25] cputlb: tweak qemu_ram_addr_from_host_nofail reporting

2017-01-27 Thread Alex Bennée

This moves the helper function closer to where it is called and updates
the error message to report via error_report instead of the deprecated
fprintf.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cputlb.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index af0e65cd2c..94fa9977c5 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -246,18 +246,6 @@ void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, 
uintptr_t start,
 }
 }
 
-static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
-{
-ram_addr_t ram_addr;
-
-ram_addr = qemu_ram_addr_from_host(ptr);
-if (ram_addr == RAM_ADDR_INVALID) {
-fprintf(stderr, "Bad ram pointer %p\n", ptr);
-abort();
-}
-return ram_addr;
-}
-
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
 CPUArchState *env;
@@ -469,6 +457,18 @@ static void report_bad_exec(CPUState *cpu, target_ulong 
addr)
 log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP);
 }
 
+static inline ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
+{
+ram_addr_t ram_addr;
+
+ram_addr = qemu_ram_addr_from_host(ptr);
+if (ram_addr == RAM_ADDR_INVALID) {
+error_report("Bad ram pointer %p", ptr);
+abort();
+}
+return ram_addr;
+}
+
 /* NOTE: this function can trigger an exception */
 /* NOTE2: the returned address is not exactly the physical address: it
  * is actually a ram_addr_t (in system mode; the user mode emulation
-- 
2.11.0

[Qemu-devel] [PATCH v8 13/25] cputlb: add assert_cpu_is_self checks

2017-01-27 Thread Alex Bennée

For SoftMMU the TLB flushes are an example of a task that can be
triggered on one vCPU by another. To deal with this properly we need to
use safe work to ensure these changes are done safely. The new assert
can be enabled while debugging to catch these cases.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cputlb.c | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/cputlb.c b/cputlb.c
index 1cc9d9da51..af0e65cd2c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -58,6 +58,12 @@
 } \
 } while (0)
 
+#define assert_cpu_is_self(this_cpu) do { \
+if (DEBUG_TLB_GATE) { \
+g_assert(!cpu->created || qemu_cpu_is_self(cpu)); \
+} \
+} while (0)
+
 /* statistics */
 int tlb_flush_count;
 
@@ -70,6 +76,9 @@ void tlb_flush(CPUState *cpu)
 {
 CPUArchState *env = cpu->env_ptr;
 
+assert_cpu_is_self(cpu);
+tlb_debug("(count: %d)\n", tlb_flush_count++);
+
 memset(env->tlb_table, -1, sizeof(env->tlb_table));
 memset(env->tlb_v_table, -1, sizeof(env->tlb_v_table));
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -77,13 +86,13 @@ void tlb_flush(CPUState *cpu)
 env->vtlb_index = 0;
 env->tlb_flush_addr = -1;
 env->tlb_flush_mask = 0;
-tlb_flush_count++;
 }
 
 static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
 {
 CPUArchState *env = cpu->env_ptr;
 
+assert_cpu_is_self(cpu);
 tlb_debug("start\n");
 
 for (;;) {
@@ -128,6 +137,7 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 int i;
 int mmu_idx;
 
+assert_cpu_is_self(cpu);
 tlb_debug("page :" TARGET_FMT_lx "\n", addr);
 
 /* Check if we need to flush due to large pages.  */
@@ -165,6 +175,7 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, ...)
 
 va_start(argp, addr);
 
+assert_cpu_is_self(cpu);
 tlb_debug("addr "TARGET_FMT_lx"\n", addr);
 
 /* Check if we need to flush due to large pages.  */
@@ -253,6 +264,8 @@ void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, 
ram_addr_t length)
 
 int mmu_idx;
 
+assert_cpu_is_self(cpu);
+
 env = cpu->env_ptr;
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 unsigned int i;
@@ -284,6 +297,8 @@ void tlb_set_dirty(CPUState *cpu, target_ulong vaddr)
 int i;
 int mmu_idx;
 
+assert_cpu_is_self(cpu);
+
 vaddr &= TARGET_PAGE_MASK;
 i = (vaddr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -343,6 +358,7 @@ void tlb_set_page_with_attrs(CPUState *cpu, target_ulong 
vaddr,
 unsigned vidx = env->vtlb_index++ % CPU_VTLB_SIZE;
 int asidx = cpu_asidx_from_attrs(cpu, attrs);
 
+assert_cpu_is_self(cpu);
 assert(size >= TARGET_PAGE_SIZE);
 if (size != TARGET_PAGE_SIZE) {
 tlb_add_large_page(env, vaddr, size);
-- 
2.11.0

[Qemu-devel] [PATCH v8 12/25] tcg: handle EXCP_ATOMIC exception for system emulation

2017-01-27 Thread Alex Bennée

From: Pranith Kumar 

The patch enables handling atomic code in the guest. This should be
preferably done in cpu_handle_exception(), but the current assumptions
regarding when we can execute atomic sections cause a deadlock.

Signed-off-by: Pranith Kumar 
[AJB: tweak title]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 cpus.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/cpus.c b/cpus.c
index ecd1ec08d3..e3d9f3fe21 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1346,6 +1346,11 @@ static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
 if (r == EXCP_DEBUG) {
 cpu_handle_guest_debug(cpu);
 break;
+} else if (r == EXCP_ATOMIC) {
+qemu_mutex_unlock_iothread();
+cpu_exec_step_atomic(cpu);
+qemu_mutex_lock_iothread();
+break;
 }
 } else if (cpu->stop) {
 if (cpu->unplug) {
@@ -1456,6 +1461,10 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
  */
 g_assert(cpu->halted);
 break;
+case EXCP_ATOMIC:
+qemu_mutex_unlock_iothread();
+cpu_exec_step_atomic(cpu);
+qemu_mutex_lock_iothread();
 default:
 /* Ignore everything else? */
 break;
-- 
2.11.0

[Qemu-devel] [PATCH v8 16/25] cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap

2017-01-27 Thread Alex Bennée

While the vargs approach was flexible the original MTTCG ended up
having munge the bits to a bitmap so the data could be used in
deferred work helpers. Instead of hiding that in cputlb we push the
change to the API to make it take a bitmap of MMU indexes instead.

This change is fairly mechanical but as storing the actual index is
useful for cases like the current running context. As a result the
constants are renamed to ARMMMUBit_foo and a couple of helper
functions added to convert between a single bit and a scalar index.

Signed-off-by: Alex Bennée 
---
 cputlb.c   |  60 +---
 include/exec/exec-all.h|  13 +--
 target/arm/cpu.h   |  41 +---
 target/arm/helper.c| 227 ++---
 target/arm/translate-a64.c |  14 +--
 target/arm/translate.c |  24 +++--
 target/arm/translate.h |   4 +-
 target/sparc/ldst_helper.c |   8 +-
 8 files changed, 194 insertions(+), 197 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 5dfd3c3ba9..97e5c12de8 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -122,26 +122,25 @@ void tlb_flush(CPUState *cpu)
 }
 }
 
-static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
+static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
 CPUArchState *env = cpu->env_ptr;
+unsigned long mmu_idx_bitmask = idxmap;
+int mmu_idx;
 
 assert_cpu_is_self(cpu);
 tlb_debug("start\n");
 
 tb_lock();
 
-for (;;) {
-int mmu_idx = va_arg(argp, int);
-
-if (mmu_idx < 0) {
-break;
-}
+for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 
-tlb_debug("%d\n", mmu_idx);
+if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
+tlb_debug("%d\n", mmu_idx);
 
-memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
-memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
+memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+}
 }
 
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -149,12 +148,9 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 tb_unlock();
 }
 
-void tlb_flush_by_mmuidx(CPUState *cpu, ...)
+void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
-va_list argp;
-va_start(argp, cpu);
-v_tlb_flush_by_mmuidx(cpu, argp);
-va_end(argp);
+v_tlb_flush_by_mmuidx(cpu, idxmap);
 }
 
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
@@ -219,13 +215,11 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 }
 }
 
-void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
+void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t 
idxmap)
 {
 CPUArchState *env = cpu->env_ptr;
-int i, k;
-va_list argp;
-
-va_start(argp, addr);
+unsigned long mmu_idx_bitmap = idxmap;
+int i, page, mmu_idx;
 
 assert_cpu_is_self(cpu);
 tlb_debug("addr "TARGET_FMT_lx"\n", addr);
@@ -236,31 +230,23 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, ...)
   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
   env->tlb_flush_addr, env->tlb_flush_mask);
 
-v_tlb_flush_by_mmuidx(cpu, argp);
-va_end(argp);
+v_tlb_flush_by_mmuidx(cpu, idxmap);
 return;
 }
 
 addr &= TARGET_PAGE_MASK;
-i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-
-for (;;) {
-int mmu_idx = va_arg(argp, int);
+page = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 
-if (mmu_idx < 0) {
-break;
-}
-
-tlb_debug("idx %d\n", mmu_idx);
-
-tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
+for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
+tlb_flush_entry(&env->tlb_table[mmu_idx][page], addr);
 
-/* check whether there are vltb entries that need to be flushed */
-for (k = 0; k < CPU_VTLB_SIZE; k++) {
-tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
+/* check whether there are vltb entries that need to be flushed */
+for (i = 0; i < CPU_VTLB_SIZE; i++) {
+tlb_flush_entry(&env->tlb_v_table[mmu_idx][i], addr);
+}
 }
 }
-va_end(argp);
 
 tb_flush_jmp_cache(cpu, addr);
 }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e43cb68355..a6c17ed74a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -106,21 +106,22 @@ void tlb_flush(CPUState *cpu);
  * tlb_flush_page_by_mmuidx:
  * @cpu: CPU whose TLB should be flushed
  * @addr: virtual address of page to be flushed
- * @...: list of MMU indexes to flush, terminated by a negative value
+ * @idxmap: bitmap of MMU indexes to flush
  *
  * Flush one page from the TLB o

[Qemu-devel] [PATCH v8 23/25] target-arm: introduce ARM_CP_EXIT_PC

2017-01-27 Thread Alex Bennée

Some helpers may trigger an immediate exit of the cpu_loop. If this
happens the PC need to be rectified to ensure the restart will begin
on the next instruction.

Signed-off-by: Alex Bennée 
---
 target/arm/cpu.h   | 3 ++-
 target/arm/translate-a64.c | 4 
 target/arm/translate.c | 4 
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f56a96c675..1b0670ae11 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1411,7 +1411,8 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
 #define ARM_CP_NZCV(ARM_CP_SPECIAL | (3 << 8))
 #define ARM_CP_CURRENTEL   (ARM_CP_SPECIAL | (4 << 8))
 #define ARM_CP_DC_ZVA  (ARM_CP_SPECIAL | (5 << 8))
-#define ARM_LAST_SPECIAL   ARM_CP_DC_ZVA
+#define ARM_CP_EXIT_PC (ARM_CP_SPECIAL | (6 << 8))
+#define ARM_LAST_SPECIAL   ARM_CP_EXIT_PC
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 05162f335e..a3f37d8bec 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1561,6 +1561,10 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 tcg_rt = cpu_reg(s, rt);
 gen_helper_dc_zva(cpu_env, tcg_rt);
 return;
+case ARM_CP_EXIT_PC:
+/* The helper may exit the cpu_loop so ensure PC is correct */
+gen_a64_set_pc_im(s->pc);
+break;
 default:
 break;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 444a24c2b6..7bd18cd25d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7508,6 +7508,10 @@ static int disas_coproc_insn(DisasContext *s, uint32_t 
insn)
 gen_set_pc_im(s, s->pc);
 s->is_jmp = DISAS_WFI;
 return 0;
+case ARM_CP_EXIT_PC:
+/* The helper may exit the cpu_loop so ensure PC is correct */
+gen_set_pc_im(s, s->pc);
+break;
 default:
 break;
 }
-- 
2.11.0

[Qemu-devel] [PATCH v8 25/25] tcg: enable MTTCG by default for ARM on x86 hosts

2017-01-27 Thread Alex Bennée

This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:

  - The ARM translate.c/translate-64.c have been converted to
- use MTTCG safe atomic primitives
- emit the appropriate barrier ops
  - The ARM machine has been updated to
- hold the BQL when modifying shared cross-vCPU state
- defer cpu_reset to async safe work

All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.

Signed-off-by: Alex Bennée 

---
v7
  - drop configure check for backend
  - declare backend memory order for x86
  - declare guest memory order for ARM
  - add configure snippet to set TARGET_SUPPORTS_MTTCG
---
 configure |  6 ++
 target/arm/cpu.h  |  3 +++
 tcg/i386/tcg-target.h | 16 
 3 files changed, 25 insertions(+)

diff --git a/configure b/configure
index 86fd833feb..9f2a665f5b 100755
--- a/configure
+++ b/configure
@@ -5879,6 +5879,7 @@ mkdir -p $target_dir
 echo "# Automatically generated by configure - do not modify" > 
$config_target_mak
 
 bflt="no"
+mttcg="no"
 interp_prefix1=$(echo "$interp_prefix" | sed "s/%M/$target_name/g")
 gdb_xml_files=""
 
@@ -5897,11 +5898,13 @@ case "$target_name" in
   arm|armeb)
 TARGET_ARCH=arm
 bflt="yes"
+mttcg="yes"
 gdb_xml_files="arm-core.xml arm-vfp.xml arm-vfp3.xml arm-neon.xml"
   ;;
   aarch64)
 TARGET_BASE_ARCH=arm
 bflt="yes"
+mttcg="yes"
 gdb_xml_files="aarch64-core.xml aarch64-fpu.xml arm-core.xml arm-vfp.xml 
arm-vfp3.xml arm-neon.xml"
   ;;
   cris)
@@ -6066,6 +6069,9 @@ if test "$target_bigendian" = "yes" ; then
 fi
 if test "$target_softmmu" = "yes" ; then
   echo "CONFIG_SOFTMMU=y" >> $config_target_mak
+  if test "$mttcg" = "yes" ; then
+echo "TARGET_SUPPORTS_MTTCG=y" >> $config_target_mak
+  fi
 fi
 if test "$target_user_only" = "yes" ; then
   echo "CONFIG_USER_ONLY=y" >> $config_target_mak
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 1b0670ae11..47a42ec6d6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -29,6 +29,9 @@
 #  define TARGET_LONG_BITS 32
 #endif
 
+/* ARM processors have a weak memory model */
+#define TCG_DEFAULT_MO  (0)
+
 #define CPUArchState struct CPUARMState
 
 #include "qemu-common.h"
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 21d96ec35c..536190f647 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -165,4 +165,20 @@ static inline void flush_icache_range(uintptr_t start, 
uintptr_t stop)
 {
 }
 
+/* This defines the natural memory order supported by this
+ * architecture before guarantees made by various barrier
+ * instructions.
+ *
+ * The x86 has a pretty strong memory ordering which only really
+ * allows for some stores to be re-ordered after loads.
+ */
+#include "tcg-mo.h"
+
+static inline int get_tcg_target_mo(void)
+{
+return TCG_MO_ALL & ~TCG_MO_LD_ST;
+}
+
+#define TCG_TARGET_DEFAULT_MO get_tcg_target_mo()
+
 #endif
-- 
2.11.0

[Qemu-devel] [PATCH v8 11/25] tcg: enable thread-per-vCPU

2017-01-27 Thread Alex Bennée

There are a couple of changes that occur at the same time here:

  - introduce a single vCPU qemu_tcg_cpu_thread_fn

  One of these is spawned per vCPU with its own Thread and Condition
  variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old
  single threaded function.

  - the TLS current_cpu variable is now live for the lifetime of MTTCG
vCPU threads. This is for future work where async jobs need to know
the vCPU context they are operating in.

The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test kvm-unit-test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

We enable the parallel_cpus flag to ensure we generate correct barrier
and atomic code if supported by the front and backends. As each back end
and front end is updated they can add CONFIG_MTTCG_TARGET and
CONFIG_MTTCG_HOST to their respective make configurations so
default_mttcg_enabled does the right thing.

Signed-off-by: KONRAD Frederic 
Signed-off-by: Paolo Bonzini 
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v1 (ajb):
  - fix merge conflicts
  - maintain single-thread approach
v2
  - re-base fixes (no longer has tb_find_fast lock tweak ahead)
  - remove bogus break condition on cpu->stop/stopped
  - only process exiting cpus exit_request
  - handle all cpus idle case (fixes shutdown issues)
  - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up)
  - move icount timer into helper
v3
  - update the commit message
  - rm kick_timer tweaks (move to earlier tcg_current_cpu tweaks)
  - ensure linux-user clears cpu->exit_request in loop
  - purging of global exit_request and tcg_current_cpu in earlier patches
  - fix checkpatch warnings
v4
  - don't break loop on stopped, we may never schedule next in RR mode
  - make sure we flush iorequests of current cpu if we exited on one
  - add tcg_cpu_exec_start/end wraps for async work functions
  - stop killing of current_cpu on loop exit
  - set current_cpu in the single thread function
  - remove sleep special case, add qemu_tcg_should_sleep() for mttcg
  - no need to atomic set cpu->exit_request going into the loop
  - removed extraneous setting of exit_request
  - split tb_lock() part of patch
  - rename single thread fn to qemu_tcg_rr_cpu_thread_fn
v5
  - enable parallel_cpus for MTTCG (for barriers/atomics)
  - expand on CONFIG_ flags in commit message
v7
  - move parallel_cpus down into the mttcg leg
  - minor ws merge fix
---
 cpu-exec.c |   5 ---
 cpus.c | 134 +++--
 2 files changed, 103 insertions(+), 36 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index cc09c1fc37..ef328087be 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -396,7 +396,6 @@ static inline bool cpu_handle_halt(CPUState *cpu)
 }
 #endif
 if (!cpu_has_work(cpu)) {
-current_cpu = NULL;
 return true;
 }
 
@@ -540,7 +539,6 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 
 
 if (unlikely(atomic_read(&cpu->exit_request) || replay_has_interrupt())) {
-atomic_set(&cpu->exit_request, 0);
 cpu->exception_index = EXCP_INTERRUPT;
 cpu_loop_exit(cpu);
 }
@@ -675,8 +673,5 @@ int cpu_exec(CPUState *cpu)
 cc->cpu_exec_exit(cpu);
 rcu_read_unlock();
 
-/* fail safe : never use current_cpu outside cpu_exec() */
-current_cpu = NULL;
-
 return ret;
 }
diff --git a/cpus.c b/cpus.c
index 18daf41dae..ecd1ec08d3 100644
--- a/cpus.c
+++ b/cpus.c
@@ -808,7 +808,7 @@ static void kick_tcg_thread(void *opaque)
 
 static void start_tcg_kick_timer(void)
 {
-if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+if (!mttcg_enabled && !tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
 tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
kick_tcg_thread, NULL);
 timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
@@ -1062,27 +1062,34 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+atomic_mb_set(&cpu->thread_kicked, false);
 if (cpu->stop) {
 cpu->stop = false;
 cpu->stopped = true;
 qemu_cond_broadcast(&qemu_pause_cond);
 }
 process_queued_cpu_work(cpu);
-cpu->thread_kicked = false;
+}
+
+static bool qemu_tcg_should_sleep(CPUState *cpu)
+{
+if (mttcg_enabled) {
+return cpu_thread_is_idle(cpu);
+} else {
+return all_cpu_threads_idle();
+}
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-while (all_cpu_threads_idle()) {
+while (qemu_tcg_should_sleep(cpu)) {
 stop_tcg_kick_timer();
 qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);

[Qemu-devel] [PATCH v8 20/25] target-arm/powerctl: defer cpu reset work to CPU context

2017-01-27 Thread Alex Bennée

When switching a new vCPU on we want to complete a bunch of the setup
work before we start scheduling the vCPU thread. To do this cleanly we
defer vCPU setup to async work which will run the vCPUs execution
context as the thread is woken up. The scheduling of the work will kick
the vCPU awake.

This avoids potential races in MTTCG system emulation.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 

---
v7
  - add const to static mode_for_el[] array
  - fix checkpatch long lines
---
 target/arm/arm-powerctl.c | 146 --
 1 file changed, 88 insertions(+), 58 deletions(-)

diff --git a/target/arm/arm-powerctl.c b/target/arm/arm-powerctl.c
index fbb7a15daa..082788e3a4 100644
--- a/target/arm/arm-powerctl.c
+++ b/target/arm/arm-powerctl.c
@@ -48,11 +48,87 @@ CPUState *arm_get_cpu_by_id(uint64_t id)
 return NULL;
 }
 
+struct cpu_on_info {
+uint64_t entry;
+uint64_t context_id;
+uint32_t target_el;
+bool target_aa64;
+};
+
+
+static void arm_set_cpu_on_async_work(CPUState *target_cpu_state,
+  run_on_cpu_data data)
+{
+ARMCPU *target_cpu = ARM_CPU(target_cpu_state);
+struct cpu_on_info *info = (struct cpu_on_info *) data.host_ptr;
+
+/* Initialize the cpu we are turning on */
+cpu_reset(target_cpu_state);
+target_cpu->powered_off = false;
+target_cpu_state->halted = 0;
+
+if (info->target_aa64) {
+if ((info->target_el < 3) && arm_feature(&target_cpu->env,
+ ARM_FEATURE_EL3)) {
+/*
+ * As target mode is AArch64, we need to set lower
+ * exception level (the requested level 2) to AArch64
+ */
+target_cpu->env.cp15.scr_el3 |= SCR_RW;
+}
+
+if ((info->target_el < 2) && arm_feature(&target_cpu->env,
+ ARM_FEATURE_EL2)) {
+/*
+ * As target mode is AArch64, we need to set lower
+ * exception level (the requested level 1) to AArch64
+ */
+target_cpu->env.cp15.hcr_el2 |= HCR_RW;
+}
+
+target_cpu->env.pstate = aarch64_pstate_mode(info->target_el, true);
+} else {
+/* We are requested to boot in AArch32 mode */
+static const uint32_t mode_for_el[] = { 0,
+ARM_CPU_MODE_SVC,
+ARM_CPU_MODE_HYP,
+ARM_CPU_MODE_SVC };
+
+cpsr_write(&target_cpu->env, mode_for_el[info->target_el], CPSR_M,
+   CPSRWriteRaw);
+}
+
+if (info->target_el == 3) {
+/* Processor is in secure mode */
+target_cpu->env.cp15.scr_el3 &= ~SCR_NS;
+} else {
+/* Processor is not in secure mode */
+target_cpu->env.cp15.scr_el3 |= SCR_NS;
+}
+
+/* We check if the started CPU is now at the correct level */
+assert(info->target_el == arm_current_el(&target_cpu->env));
+
+if (info->target_aa64) {
+target_cpu->env.xregs[0] = info->context_id;
+target_cpu->env.thumb = false;
+} else {
+target_cpu->env.regs[0] = info->context_id;
+target_cpu->env.thumb = info->entry & 1;
+info->entry &= 0xfffe;
+}
+
+/* Start the new CPU at the requested address */
+cpu_set_pc(target_cpu_state, info->entry);
+g_free(info);
+}
+
 int arm_set_cpu_on(uint64_t cpuid, uint64_t entry, uint64_t context_id,
uint32_t target_el, bool target_aa64)
 {
 CPUState *target_cpu_state;
 ARMCPU *target_cpu;
+struct cpu_on_info *info;
 
 DPRINTF("cpu %" PRId64 " (EL %d, %s) @ 0x%" PRIx64 " with R0 = 0x%" PRIx64
 "\n", cpuid, target_el, target_aa64 ? "aarch64" : "aarch32", entry,
@@ -109,64 +185,18 @@ int arm_set_cpu_on(uint64_t cpuid, uint64_t entry, 
uint64_t context_id,
 return QEMU_ARM_POWERCTL_INVALID_PARAM;
 }
 
-/* Initialize the cpu we are turning on */
-cpu_reset(target_cpu_state);
-target_cpu->powered_off = false;
-target_cpu_state->halted = 0;
-
-if (target_aa64) {
-if ((target_el < 3) && arm_feature(&target_cpu->env, ARM_FEATURE_EL3)) 
{
-/*
- * As target mode is AArch64, we need to set lower
- * exception level (the requested level 2) to AArch64
- */
-target_cpu->env.cp15.scr_el3 |= SCR_RW;
-}
-
-if ((target_el < 2) && arm_feature(&target_cpu->env, ARM_FEATURE_EL2)) 
{
-/*
- * As target mode is AArch64, we need to set lower
- * exception level (the requested level 1) to AArch64
- */
-target_cpu->env.cp15.hcr_el2 |= HCR_RW;
-}
-
-target_cpu->env.pstate = aarch64_pstate_mode(target_el, true);
-} else {
-/* We are requested to boot in AArch32 mode */
-

Re: [Qemu-devel] [PATCH v2 8/8] hw: Drop superfluous special checks for orphaned -drive

2017-01-27 Thread John Snow



On 01/26/2017 10:09 AM, Markus Armbruster wrote:
> We've traditionally rejected orphans here and there, but not
> systematically.  For instance, the sun4m machines have an onboard SCSI
> HBA (bus=0), and have always rejected bus>0.  Other machines with an
> onboard SCSI HBA don't.
> 
> Commit a66c9dc made all orphans trigger a warning, and the previous
> commit turned this into an error.  The checks "here and there" are now
> redundant.  Drop them.
> 
> Note that the one in mips_jazz.c was wrong: it rejected bus > MAX_FD,
> but MAX_FD is the number of floppy drives per bus.
> 
> Error messages change from
> 
> $ qemu-system-x86_64 -drive if=ide,bus=2
> qemu-system-x86_64: Too many IDE buses defined (3 > 2)
> $ qemu-system-mips64 -M magnum,accel=qtest -drive if=floppy,bus=2,id=fd1
> qemu: too many floppy drives
> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
> qemu: too many SCSI bus
> 
> to
> 
> $ qemu-system-x86_64 -drive if=ide,bus=2
> qemu-system-x86_64: -drive if=ide,bus=2: machine type does not support 
> this drive
> $ qemu-system-mips64 -M magnum,accel=qtest -drive if=floppy,bus=2,id=fd1
> qemu-system-mips64: -drive if=floppy,bus=2,id=fd1: machine type does not 
> support this drive
> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
> qemu-system-sparc: -drive if=scsi,bus=1: machine type does not support 
> this drive
> 

Hm, that's a lot less helpful, isn't it? Can we augment with hints?

[Qemu-devel] [PATCH v8 22/25] target-arm/cpu.h: make ARM_CP defined consistent

2017-01-27 Thread Alex Bennée

This is a purely mechanical change to make the ARM_CP flags neatly
align and use a consistent format so it is easier to see which bit
each flag is.

Signed-off-by: Alex Bennée 
---
 target/arm/cpu.h | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 274ef17562..f56a96c675 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1398,20 +1398,20 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
  * need to be surrounded by gen_io_start()/gen_io_end(). In particular,
  * registers which implement clocks or timers require this.
  */
-#define ARM_CP_SPECIAL 1
-#define ARM_CP_CONST 2
-#define ARM_CP_64BIT 4
-#define ARM_CP_SUPPRESS_TB_END 8
-#define ARM_CP_OVERRIDE 16
-#define ARM_CP_ALIAS 32
-#define ARM_CP_IO 64
-#define ARM_CP_NO_RAW 128
-#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
-#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
-#define ARM_CP_NZCV (ARM_CP_SPECIAL | (3 << 8))
-#define ARM_CP_CURRENTEL (ARM_CP_SPECIAL | (4 << 8))
-#define ARM_CP_DC_ZVA (ARM_CP_SPECIAL | (5 << 8))
-#define ARM_LAST_SPECIAL ARM_CP_DC_ZVA
+#define ARM_CP_SPECIAL (1 << 0)
+#define ARM_CP_CONST   (1 << 1)
+#define ARM_CP_64BIT   (1 << 2)
+#define ARM_CP_SUPPRESS_TB_END (1 << 3)
+#define ARM_CP_OVERRIDE(1 << 4)
+#define ARM_CP_ALIAS   (1 << 5)
+#define ARM_CP_IO  (1 << 6)
+#define ARM_CP_NO_RAW  (1 << 7)
+#define ARM_CP_NOP (ARM_CP_SPECIAL | (1 << 8))
+#define ARM_CP_WFI (ARM_CP_SPECIAL | (2 << 8))
+#define ARM_CP_NZCV(ARM_CP_SPECIAL | (3 << 8))
+#define ARM_CP_CURRENTEL   (ARM_CP_SPECIAL | (4 << 8))
+#define ARM_CP_DC_ZVA  (ARM_CP_SPECIAL | (5 << 8))
+#define ARM_LAST_SPECIAL   ARM_CP_DC_ZVA
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
-- 
2.11.0

[Qemu-devel] [PATCH v8 18/25] cputlb: atomically update tlb fields used by tlb_reset_dirty

2017-01-27 Thread Alex Bennée

The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.

To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.

On 32 bit systems attempting to emulate 64 bit guests we don't even
bother as we might not have the atomic primitives available. MTTCG is
disabled in this case and can't be forced on. The copy_tlb_helper
function helps keep the atomic semantics in one place to avoid
confusion.

The dirty helper function is made static as it isn't used outside of
cputlb.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v6
  - use TARGET_PAGE_BITS_MIN
  - use run_on_cpu helpers
v7
  - fix tlb_debug fmt for 32bit build
  - un-merged the mmuidx async work which got mashed in last round
  - introduced copy_tlb_helper function and made TCG_OVERSIZED_GUEST aware
---
 cputlb.c  | 120 +++---
 include/exec/cputlb.h |   2 -
 2 files changed, 95 insertions(+), 27 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index c50254be26..65003350e3 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -342,32 +342,90 @@ void tlb_unprotect_code(ram_addr_t ram_addr)
 cpu_physical_memory_set_dirty_flag(ram_addr, DIRTY_MEMORY_CODE);
 }
 
-static bool tlb_is_dirty_ram(CPUTLBEntry *tlbe)
-{
-return (tlbe->addr_write & (TLB_INVALID_MASK|TLB_MMIO|TLB_NOTDIRTY)) == 0;
-}
 
-void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
+/*
+ * Dirty write flag handling
+ *
+ * When the TCG code writes to a location it looks up the address in
+ * the TLB and uses that data to compute the final address. If any of
+ * the lower bits of the address are set then the slow path is forced.
+ * There are a number of reasons to do this but for normal RAM the
+ * most usual is detecting writes to code regions which may invalidate
+ * generated code.
+ *
+ * Because we want other vCPUs to respond to changes straight away we
+ * update the te->addr_write field atomically. If the TLB entry has
+ * been changed by the vCPU in the mean time we skip the update.
+ *
+ * As this function uses atomic accesses we also need to ensure
+ * updates to tlb_entries follow the same access rules. We don't need
+ * to worry about this for oversized guests as MTTCG is disabled for
+ * them.
+ */
+
+static void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
uintptr_t length)
 {
-uintptr_t addr;
+#if TCG_OVERSIZED_GUEST
+uintptr_t addr = tlb_entry->addr_write;
 
-if (tlb_is_dirty_ram(tlb_entry)) {
-addr = (tlb_entry->addr_write & TARGET_PAGE_MASK) + tlb_entry->addend;
+if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+addr &= TARGET_PAGE_MASK;
+addr += tlb_entry->addend;
 if ((addr - start) < length) {
 tlb_entry->addr_write |= TLB_NOTDIRTY;
 }
 }
+#else
+/* paired with atomic_mb_set in tlb_set_page_with_attrs */
+uintptr_t orig_addr = atomic_mb_read(&tlb_entry->addr_write);
+uintptr_t addr = orig_addr;
+
+if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+addr &= TARGET_PAGE_MASK;
+addr += atomic_read(&tlb_entry->addend);
+if ((addr - start) < length) {
+uintptr_t notdirty_addr = orig_addr | TLB_NOTDIRTY;
+atomic_cmpxchg(&tlb_entry->addr_write, orig_addr, notdirty_addr);
+}
+}
+#endif
+}
+
+/* For atomic correctness when running MTTCG we need to use the right
+ * primitives when copying entries */
+static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
+   bool atomic_set)
+{
+#if TCG_OVERSIZED_GUEST
+*d = *s;
+#else
+if (atomic_set) {
+d->addr_read = s->addr_read;
+d->addr_code = s->addr_code;
+atomic_set(&d->addend, atomic_read(&s->addend));
+/* Pairs with flag setting in tlb_reset_dirty_range */
+atomic_mb_set(&d->addr_write, atomic_read(&s->addr_write));
+} else {
+d->addr_read = s->addr_read;
+d->addr_write = atomic_read(&s->addr_write);
+d->addr_code = s->addr_code;
+d->addend = atomic_read(&s->addend);
+}
+#endif
 }
 
+/* This is a cross vCPU call (i.e. another vCPU resetting the flags of
+ * the target vCPU). As such care needs to be taken that we don't
+ * dangerously race with another vCPU update. The only thing actually
+ * updated is the target TLB entry ->addr_write flags.
+ */
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
 CPUArchState *env;
 
 int mmu_idx;
 
-assert_cpu_is_self(cpu);
-

[Qemu-devel] [PATCH v8 23/25] target-arm: introduce ARM_CP_EXIT_PC

2017-01-27 Thread Alex Bennée

Some helpers may trigger an immediate exit of the cpu_loop. If this
happens the PC need to be rectified to ensure the restart will begin
on the next instruction.

Signed-off-by: Alex Bennée 
---
 target/arm/cpu.h   | 3 ++-
 target/arm/translate-a64.c | 4 
 target/arm/translate.c | 4 
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f56a96c675..1b0670ae11 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1411,7 +1411,8 @@ static inline uint64_t cpreg_to_kvm_id(uint32_t cpregid)
 #define ARM_CP_NZCV(ARM_CP_SPECIAL | (3 << 8))
 #define ARM_CP_CURRENTEL   (ARM_CP_SPECIAL | (4 << 8))
 #define ARM_CP_DC_ZVA  (ARM_CP_SPECIAL | (5 << 8))
-#define ARM_LAST_SPECIAL   ARM_CP_DC_ZVA
+#define ARM_CP_EXIT_PC (ARM_CP_SPECIAL | (6 << 8))
+#define ARM_LAST_SPECIAL   ARM_CP_EXIT_PC
 /* Used only as a terminator for ARMCPRegInfo lists */
 #define ARM_CP_SENTINEL 0x
 /* Mask of only the flag bits in a type field */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 05162f335e..a3f37d8bec 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1561,6 +1561,10 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 tcg_rt = cpu_reg(s, rt);
 gen_helper_dc_zva(cpu_env, tcg_rt);
 return;
+case ARM_CP_EXIT_PC:
+/* The helper may exit the cpu_loop so ensure PC is correct */
+gen_a64_set_pc_im(s->pc);
+break;
 default:
 break;
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 444a24c2b6..7bd18cd25d 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7508,6 +7508,10 @@ static int disas_coproc_insn(DisasContext *s, uint32_t 
insn)
 gen_set_pc_im(s, s->pc);
 s->is_jmp = DISAS_WFI;
 return 0;
+case ARM_CP_EXIT_PC:
+/* The helper may exit the cpu_loop so ensure PC is correct */
+gen_set_pc_im(s, s->pc);
+break;
 default:
 break;
 }
-- 
2.11.0

[Qemu-devel] [PATCH v8 21/25] target-arm: don't generate WFE/YIELD calls for MTTCG

2017-01-27 Thread Alex Bennée

The WFE and YIELD instructions are really only hints and in TCG's case
they were useful to move the scheduling on from one vCPU to the next. In
the parallel context (MTTCG) this just causes an unnecessary cpu_exit
and contention of the BQL.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 target/arm/op_helper.c |  7 +++
 target/arm/translate-a64.c |  8 ++--
 target/arm/translate.c | 20 
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index e1a883c595..abfa7cdd39 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -436,6 +436,13 @@ void HELPER(yield)(CPUARMState *env)
 ARMCPU *cpu = arm_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
 
+/* When running in MTTCG we don't generate jumps to the yield and
+ * WFE helpers as it won't affect the scheduling of other vCPUs.
+ * If we wanted to more completely model WFE/SEV so we don't busy
+ * spin unnecessarily we would need to do something more involved.
+ */
+g_assert(!parallel_cpus);
+
 /* This is a non-trappable hint instruction that generally indicates
  * that the guest is currently busy-looping. Yield control back to the
  * top level loop so that a more deserving VCPU has a chance to run.
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 88a4df6959..05162f335e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1342,10 +1342,14 @@ static void handle_hint(DisasContext *s, uint32_t insn,
 s->is_jmp = DISAS_WFI;
 return;
 case 1: /* YIELD */
-s->is_jmp = DISAS_YIELD;
+if (!parallel_cpus) {
+s->is_jmp = DISAS_YIELD;
+}
 return;
 case 2: /* WFE */
-s->is_jmp = DISAS_WFE;
+if (!parallel_cpus) {
+s->is_jmp = DISAS_WFE;
+}
 return;
 case 4: /* SEV */
 case 5: /* SEVL */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index dc67887918..444a24c2b6 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4343,20 +4343,32 @@ static void gen_exception_return(DisasContext *s, 
TCGv_i32 pc)
 gen_rfe(s, pc, load_cpu_field(spsr));
 }
 
+/*
+ * For WFI we will halt the vCPU until an IRQ. For WFE and YIELD we
+ * only call the helper when running single threaded TCG code to ensure
+ * the next round-robin scheduled vCPU gets a crack. In MTTCG mode we
+ * just skip this instruction. Currently the SEV/SEVL instructions
+ * which are *one* of many ways to wake the CPU from WFE are not
+ * implemented so we can't sleep like WFI does.
+ */
 static void gen_nop_hint(DisasContext *s, int val)
 {
 switch (val) {
 case 1: /* yield */
-gen_set_pc_im(s, s->pc);
-s->is_jmp = DISAS_YIELD;
+if (!parallel_cpus) {
+gen_set_pc_im(s, s->pc);
+s->is_jmp = DISAS_YIELD;
+}
 break;
 case 3: /* wfi */
 gen_set_pc_im(s, s->pc);
 s->is_jmp = DISAS_WFI;
 break;
 case 2: /* wfe */
-gen_set_pc_im(s, s->pc);
-s->is_jmp = DISAS_WFE;
+if (!parallel_cpus) {
+gen_set_pc_im(s, s->pc);
+s->is_jmp = DISAS_WFE;
+}
 break;
 case 4: /* sev */
 case 5: /* sevl */
-- 
2.11.0

[Qemu-devel] [PATCH v8 17/25] cputlb: add tlb_flush_by_mmuidx async routines

2017-01-27 Thread Alex Bennée

This converts the remaining TLB flush routines to use async work when
detecting a cross-vCPU flush. The only minor complication is having to
serialise the var_list of MMU indexes into a form that can be punted
to an asynchronous job.

The pending_tlb_flush field on QOM's CPU structure also becomes a
bitfield rather than a boolean.

Signed-off-by: Alex Bennée 

---
v7
  - un-merged from the atomic cputlb patch in the last series
  - fix long line reported by checkpatch
v8
  - re-base merge/fixes
---
 cputlb.c  | 110 +++---
 include/qom/cpu.h |   2 +-
 2 files changed, 89 insertions(+), 23 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 97e5c12de8..c50254be26 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -68,6 +68,11 @@
  * target_ulong even on 32 bit builds */
 QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
 
+/* We currently can't handle more than 16 bits in the MMUIDX bitmask.
+ */
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
+#define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
+
 /* statistics */
 int tlb_flush_count;
 
@@ -102,7 +107,7 @@ static void tlb_flush_nocheck(CPUState *cpu)
 
 tb_unlock();
 
-atomic_mb_set(&cpu->pending_tlb_flush, false);
+atomic_mb_set(&cpu->pending_tlb_flush, 0);
 }
 
 static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data)
@@ -113,7 +118,8 @@ static void tlb_flush_global_async_work(CPUState *cpu, 
run_on_cpu_data data)
 void tlb_flush(CPUState *cpu)
 {
 if (cpu->created && !qemu_cpu_is_self(cpu)) {
-if (atomic_cmpxchg(&cpu->pending_tlb_flush, false, true) == true) {
+if (atomic_mb_read(&cpu->pending_tlb_flush) != ALL_MMUIDX_BITS) {
+atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS);
 async_run_on_cpu(cpu, tlb_flush_global_async_work,
  RUN_ON_CPU_NULL);
 }
@@ -122,17 +128,18 @@ void tlb_flush(CPUState *cpu)
 }
 }
 
-static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
+static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
-unsigned long mmu_idx_bitmask = idxmap;
+unsigned long mmu_idx_bitmask = data.host_int;
 int mmu_idx;
 
 assert_cpu_is_self(cpu);
-tlb_debug("start\n");
 
 tb_lock();
 
+tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
+
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 
 if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
@@ -145,12 +152,30 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
uint16_t idxmap)
 
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
 
+tlb_debug("done\n");
+
 tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
-v_tlb_flush_by_mmuidx(cpu, idxmap);
+tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap);
+
+if (!qemu_cpu_is_self(cpu)) {
+uint16_t pending_flushes = idxmap;
+pending_flushes &= ~atomic_mb_read(&cpu->pending_tlb_flush);
+
+if (pending_flushes) {
+tlb_debug("reduced mmu_idx: 0x%" PRIx16 "\n", pending_flushes);
+
+atomic_or(&cpu->pending_tlb_flush, pending_flushes);
+async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work,
+ RUN_ON_CPU_HOST_INT(pending_flushes));
+}
+} else {
+tlb_flush_by_mmuidx_async_work(cpu,
+   RUN_ON_CPU_HOST_INT(idxmap));
+}
 }
 
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
@@ -215,27 +240,26 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 }
 }
 
-void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t 
idxmap)
+/* As we are going to hijack the bottom bits of the page address for a
+ * mmuidx bit mask we need to fail to build if we can't do that
+ */
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > TARGET_PAGE_BITS_MIN);
+
+static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
+run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
-unsigned long mmu_idx_bitmap = idxmap;
-int i, page, mmu_idx;
+target_ulong addr_and_mmuidx = (target_ulong) data.target_ptr;
+target_ulong addr = addr_and_mmuidx & TARGET_PAGE_MASK;
+unsigned long mmu_idx_bitmap = addr_and_mmuidx & ALL_MMUIDX_BITS;
+int page = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+int mmu_idx;
+int i;
 
 assert_cpu_is_self(cpu);
-tlb_debug("addr "TARGET_FMT_lx"\n", addr);
-
-/* Check if we need to flush due to large pages.  */
-if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
-tlb_debug("forced full flush ("
-  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
-  env->tlb_flush_addr, env->tlb_flush_mask);
-
-v_tlb_flush_by_mmuidx(cpu, idxmap);
-return;
-}
 
-addr &= TARGET_PAGE_MASK;
-page = (add

[Qemu-devel] [PATCH v8 25/25] tcg: enable MTTCG by default for ARM on x86 hosts

2017-01-27 Thread Alex Bennée

This enables the multi-threaded system emulation by default for ARMv7
and ARMv8 guests using the x86_64 TCG backend. This is because on the
guest side:

  - The ARM translate.c/translate-64.c have been converted to
- use MTTCG safe atomic primitives
- emit the appropriate barrier ops
  - The ARM machine has been updated to
- hold the BQL when modifying shared cross-vCPU state
- defer cpu_reset to async safe work

All the host backends support the barrier and atomic primitives but
need to provide same-or-better support for normal load/store
operations.

Signed-off-by: Alex Bennée 

---
v7
  - drop configure check for backend
  - declare backend memory order for x86
  - declare guest memory order for ARM
  - add configure snippet to set TARGET_SUPPORTS_MTTCG
---
 configure |  6 ++
 target/arm/cpu.h  |  3 +++
 tcg/i386/tcg-target.h | 16 
 3 files changed, 25 insertions(+)

diff --git a/configure b/configure
index 86fd833feb..9f2a665f5b 100755
--- a/configure
+++ b/configure
@@ -5879,6 +5879,7 @@ mkdir -p $target_dir
 echo "# Automatically generated by configure - do not modify" > 
$config_target_mak
 
 bflt="no"
+mttcg="no"
 interp_prefix1=$(echo "$interp_prefix" | sed "s/%M/$target_name/g")
 gdb_xml_files=""
 
@@ -5897,11 +5898,13 @@ case "$target_name" in
   arm|armeb)
 TARGET_ARCH=arm
 bflt="yes"
+mttcg="yes"
 gdb_xml_files="arm-core.xml arm-vfp.xml arm-vfp3.xml arm-neon.xml"
   ;;
   aarch64)
 TARGET_BASE_ARCH=arm
 bflt="yes"
+mttcg="yes"
 gdb_xml_files="aarch64-core.xml aarch64-fpu.xml arm-core.xml arm-vfp.xml 
arm-vfp3.xml arm-neon.xml"
   ;;
   cris)
@@ -6066,6 +6069,9 @@ if test "$target_bigendian" = "yes" ; then
 fi
 if test "$target_softmmu" = "yes" ; then
   echo "CONFIG_SOFTMMU=y" >> $config_target_mak
+  if test "$mttcg" = "yes" ; then
+echo "TARGET_SUPPORTS_MTTCG=y" >> $config_target_mak
+  fi
 fi
 if test "$target_user_only" = "yes" ; then
   echo "CONFIG_USER_ONLY=y" >> $config_target_mak
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 1b0670ae11..47a42ec6d6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -29,6 +29,9 @@
 #  define TARGET_LONG_BITS 32
 #endif
 
+/* ARM processors have a weak memory model */
+#define TCG_DEFAULT_MO  (0)
+
 #define CPUArchState struct CPUARMState
 
 #include "qemu-common.h"
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 21d96ec35c..536190f647 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -165,4 +165,20 @@ static inline void flush_icache_range(uintptr_t start, 
uintptr_t stop)
 {
 }
 
+/* This defines the natural memory order supported by this
+ * architecture before guarantees made by various barrier
+ * instructions.
+ *
+ * The x86 has a pretty strong memory ordering which only really
+ * allows for some stores to be re-ordered after loads.
+ */
+#include "tcg-mo.h"
+
+static inline int get_tcg_target_mo(void)
+{
+return TCG_MO_ALL & ~TCG_MO_LD_ST;
+}
+
+#define TCG_TARGET_DEFAULT_MO get_tcg_target_mo()
+
 #endif
-- 
2.11.0

[Qemu-devel] [PATCH v8 18/25] cputlb: atomically update tlb fields used by tlb_reset_dirty

2017-01-27 Thread Alex Bennée

The main use case for tlb_reset_dirty is to set the TLB_NOTDIRTY flags
in TLB entries to force the slow-path on writes. This is used to mark
page ranges containing code which has been translated so it can be
invalidated if written to. To do this safely we need to ensure the TLB
entries in question for all vCPUs are updated before we attempt to run
the code otherwise a race could be introduced.

To achieve this we atomically set the flag in tlb_reset_dirty_range and
take care when setting it when the TLB entry is filled.

On 32 bit systems attempting to emulate 64 bit guests we don't even
bother as we might not have the atomic primitives available. MTTCG is
disabled in this case and can't be forced on. The copy_tlb_helper
function helps keep the atomic semantics in one place to avoid
confusion.

The dirty helper function is made static as it isn't used outside of
cputlb.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
v6
  - use TARGET_PAGE_BITS_MIN
  - use run_on_cpu helpers
v7
  - fix tlb_debug fmt for 32bit build
  - un-merged the mmuidx async work which got mashed in last round
  - introduced copy_tlb_helper function and made TCG_OVERSIZED_GUEST aware
---
 cputlb.c  | 120 +++---
 include/exec/cputlb.h |   2 -
 2 files changed, 95 insertions(+), 27 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index c50254be26..65003350e3 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -342,32 +342,90 @@ void tlb_unprotect_code(ram_addr_t ram_addr)
 cpu_physical_memory_set_dirty_flag(ram_addr, DIRTY_MEMORY_CODE);
 }
 
-static bool tlb_is_dirty_ram(CPUTLBEntry *tlbe)
-{
-return (tlbe->addr_write & (TLB_INVALID_MASK|TLB_MMIO|TLB_NOTDIRTY)) == 0;
-}
 
-void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
+/*
+ * Dirty write flag handling
+ *
+ * When the TCG code writes to a location it looks up the address in
+ * the TLB and uses that data to compute the final address. If any of
+ * the lower bits of the address are set then the slow path is forced.
+ * There are a number of reasons to do this but for normal RAM the
+ * most usual is detecting writes to code regions which may invalidate
+ * generated code.
+ *
+ * Because we want other vCPUs to respond to changes straight away we
+ * update the te->addr_write field atomically. If the TLB entry has
+ * been changed by the vCPU in the mean time we skip the update.
+ *
+ * As this function uses atomic accesses we also need to ensure
+ * updates to tlb_entries follow the same access rules. We don't need
+ * to worry about this for oversized guests as MTTCG is disabled for
+ * them.
+ */
+
+static void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry, uintptr_t start,
uintptr_t length)
 {
-uintptr_t addr;
+#if TCG_OVERSIZED_GUEST
+uintptr_t addr = tlb_entry->addr_write;
 
-if (tlb_is_dirty_ram(tlb_entry)) {
-addr = (tlb_entry->addr_write & TARGET_PAGE_MASK) + tlb_entry->addend;
+if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+addr &= TARGET_PAGE_MASK;
+addr += tlb_entry->addend;
 if ((addr - start) < length) {
 tlb_entry->addr_write |= TLB_NOTDIRTY;
 }
 }
+#else
+/* paired with atomic_mb_set in tlb_set_page_with_attrs */
+uintptr_t orig_addr = atomic_mb_read(&tlb_entry->addr_write);
+uintptr_t addr = orig_addr;
+
+if ((addr & (TLB_INVALID_MASK | TLB_MMIO | TLB_NOTDIRTY)) == 0) {
+addr &= TARGET_PAGE_MASK;
+addr += atomic_read(&tlb_entry->addend);
+if ((addr - start) < length) {
+uintptr_t notdirty_addr = orig_addr | TLB_NOTDIRTY;
+atomic_cmpxchg(&tlb_entry->addr_write, orig_addr, notdirty_addr);
+}
+}
+#endif
+}
+
+/* For atomic correctness when running MTTCG we need to use the right
+ * primitives when copying entries */
+static inline void copy_tlb_helper(CPUTLBEntry *d, CPUTLBEntry *s,
+   bool atomic_set)
+{
+#if TCG_OVERSIZED_GUEST
+*d = *s;
+#else
+if (atomic_set) {
+d->addr_read = s->addr_read;
+d->addr_code = s->addr_code;
+atomic_set(&d->addend, atomic_read(&s->addend));
+/* Pairs with flag setting in tlb_reset_dirty_range */
+atomic_mb_set(&d->addr_write, atomic_read(&s->addr_write));
+} else {
+d->addr_read = s->addr_read;
+d->addr_write = atomic_read(&s->addr_write);
+d->addr_code = s->addr_code;
+d->addend = atomic_read(&s->addend);
+}
+#endif
 }
 
+/* This is a cross vCPU call (i.e. another vCPU resetting the flags of
+ * the target vCPU). As such care needs to be taken that we don't
+ * dangerously race with another vCPU update. The only thing actually
+ * updated is the target TLB entry ->addr_write flags.
+ */
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length)
 {
 CPUArchState *env;
 
 int mmu_idx;
 
-assert_cpu_is_self(cpu);
-

[Qemu-devel] [PATCH v8 16/25] cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap

2017-01-27 Thread Alex Bennée

While the vargs approach was flexible the original MTTCG ended up
having munge the bits to a bitmap so the data could be used in
deferred work helpers. Instead of hiding that in cputlb we push the
change to the API to make it take a bitmap of MMU indexes instead.

This change is fairly mechanical but as storing the actual index is
useful for cases like the current running context. As a result the
constants are renamed to ARMMMUBit_foo and a couple of helper
functions added to convert between a single bit and a scalar index.

Signed-off-by: Alex Bennée 
---
 cputlb.c   |  60 +---
 include/exec/exec-all.h|  13 +--
 target/arm/cpu.h   |  41 +---
 target/arm/helper.c| 227 ++---
 target/arm/translate-a64.c |  14 +--
 target/arm/translate.c |  24 +++--
 target/arm/translate.h |   4 +-
 target/sparc/ldst_helper.c |   8 +-
 8 files changed, 194 insertions(+), 197 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 5dfd3c3ba9..97e5c12de8 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -122,26 +122,25 @@ void tlb_flush(CPUState *cpu)
 }
 }
 
-static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
+static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
 CPUArchState *env = cpu->env_ptr;
+unsigned long mmu_idx_bitmask = idxmap;
+int mmu_idx;
 
 assert_cpu_is_self(cpu);
 tlb_debug("start\n");
 
 tb_lock();
 
-for (;;) {
-int mmu_idx = va_arg(argp, int);
-
-if (mmu_idx < 0) {
-break;
-}
+for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 
-tlb_debug("%d\n", mmu_idx);
+if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
+tlb_debug("%d\n", mmu_idx);
 
-memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
-memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
+memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
+}
 }
 
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
@@ -149,12 +148,9 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
va_list argp)
 tb_unlock();
 }
 
-void tlb_flush_by_mmuidx(CPUState *cpu, ...)
+void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
-va_list argp;
-va_start(argp, cpu);
-v_tlb_flush_by_mmuidx(cpu, argp);
-va_end(argp);
+v_tlb_flush_by_mmuidx(cpu, idxmap);
 }
 
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
@@ -219,13 +215,11 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 }
 }
 
-void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
+void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t 
idxmap)
 {
 CPUArchState *env = cpu->env_ptr;
-int i, k;
-va_list argp;
-
-va_start(argp, addr);
+unsigned long mmu_idx_bitmap = idxmap;
+int i, page, mmu_idx;
 
 assert_cpu_is_self(cpu);
 tlb_debug("addr "TARGET_FMT_lx"\n", addr);
@@ -236,31 +230,23 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, ...)
   TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
   env->tlb_flush_addr, env->tlb_flush_mask);
 
-v_tlb_flush_by_mmuidx(cpu, argp);
-va_end(argp);
+v_tlb_flush_by_mmuidx(cpu, idxmap);
 return;
 }
 
 addr &= TARGET_PAGE_MASK;
-i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
-
-for (;;) {
-int mmu_idx = va_arg(argp, int);
+page = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
 
-if (mmu_idx < 0) {
-break;
-}
-
-tlb_debug("idx %d\n", mmu_idx);
-
-tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
+for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
+if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
+tlb_flush_entry(&env->tlb_table[mmu_idx][page], addr);
 
-/* check whether there are vltb entries that need to be flushed */
-for (k = 0; k < CPU_VTLB_SIZE; k++) {
-tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
+/* check whether there are vltb entries that need to be flushed */
+for (i = 0; i < CPU_VTLB_SIZE; i++) {
+tlb_flush_entry(&env->tlb_v_table[mmu_idx][i], addr);
+}
 }
 }
-va_end(argp);
 
 tb_flush_jmp_cache(cpu, addr);
 }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e43cb68355..a6c17ed74a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -106,21 +106,22 @@ void tlb_flush(CPUState *cpu);
  * tlb_flush_page_by_mmuidx:
  * @cpu: CPU whose TLB should be flushed
  * @addr: virtual address of page to be flushed
- * @...: list of MMU indexes to flush, terminated by a negative value
+ * @idxmap: bitmap of MMU indexes to flush
  *
  * Flush one page from the TLB o

[Qemu-devel] [PATCH v8 17/25] cputlb: add tlb_flush_by_mmuidx async routines

2017-01-27 Thread Alex Bennée

This converts the remaining TLB flush routines to use async work when
detecting a cross-vCPU flush. The only minor complication is having to
serialise the var_list of MMU indexes into a form that can be punted
to an asynchronous job.

The pending_tlb_flush field on QOM's CPU structure also becomes a
bitfield rather than a boolean.

Signed-off-by: Alex Bennée 

---
v7
  - un-merged from the atomic cputlb patch in the last series
  - fix long line reported by checkpatch
v8
  - re-base merge/fixes
---
 cputlb.c  | 110 +++---
 include/qom/cpu.h |   2 +-
 2 files changed, 89 insertions(+), 23 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 97e5c12de8..c50254be26 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -68,6 +68,11 @@
  * target_ulong even on 32 bit builds */
 QEMU_BUILD_BUG_ON(sizeof(target_ulong) > sizeof(run_on_cpu_data));
 
+/* We currently can't handle more than 16 bits in the MMUIDX bitmask.
+ */
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
+#define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
+
 /* statistics */
 int tlb_flush_count;
 
@@ -102,7 +107,7 @@ static void tlb_flush_nocheck(CPUState *cpu)
 
 tb_unlock();
 
-atomic_mb_set(&cpu->pending_tlb_flush, false);
+atomic_mb_set(&cpu->pending_tlb_flush, 0);
 }
 
 static void tlb_flush_global_async_work(CPUState *cpu, run_on_cpu_data data)
@@ -113,7 +118,8 @@ static void tlb_flush_global_async_work(CPUState *cpu, 
run_on_cpu_data data)
 void tlb_flush(CPUState *cpu)
 {
 if (cpu->created && !qemu_cpu_is_self(cpu)) {
-if (atomic_cmpxchg(&cpu->pending_tlb_flush, false, true) == true) {
+if (atomic_mb_read(&cpu->pending_tlb_flush) != ALL_MMUIDX_BITS) {
+atomic_mb_set(&cpu->pending_tlb_flush, ALL_MMUIDX_BITS);
 async_run_on_cpu(cpu, tlb_flush_global_async_work,
  RUN_ON_CPU_NULL);
 }
@@ -122,17 +128,18 @@ void tlb_flush(CPUState *cpu)
 }
 }
 
-static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
+static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
-unsigned long mmu_idx_bitmask = idxmap;
+unsigned long mmu_idx_bitmask = data.host_int;
 int mmu_idx;
 
 assert_cpu_is_self(cpu);
-tlb_debug("start\n");
 
 tb_lock();
 
+tlb_debug("start: mmu_idx:0x%04lx\n", mmu_idx_bitmask);
+
 for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
 
 if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
@@ -145,12 +152,30 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
uint16_t idxmap)
 
 memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
 
+tlb_debug("done\n");
+
 tb_unlock();
 }
 
 void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 {
-v_tlb_flush_by_mmuidx(cpu, idxmap);
+tlb_debug("mmu_idx: 0x%" PRIx16 "\n", idxmap);
+
+if (!qemu_cpu_is_self(cpu)) {
+uint16_t pending_flushes = idxmap;
+pending_flushes &= ~atomic_mb_read(&cpu->pending_tlb_flush);
+
+if (pending_flushes) {
+tlb_debug("reduced mmu_idx: 0x%" PRIx16 "\n", pending_flushes);
+
+atomic_or(&cpu->pending_tlb_flush, pending_flushes);
+async_run_on_cpu(cpu, tlb_flush_by_mmuidx_async_work,
+ RUN_ON_CPU_HOST_INT(pending_flushes));
+}
+} else {
+tlb_flush_by_mmuidx_async_work(cpu,
+   RUN_ON_CPU_HOST_INT(idxmap));
+}
 }
 
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
@@ -215,27 +240,26 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 }
 }
 
-void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t 
idxmap)
+/* As we are going to hijack the bottom bits of the page address for a
+ * mmuidx bit mask we need to fail to build if we can't do that
+ */
+QEMU_BUILD_BUG_ON(NB_MMU_MODES > TARGET_PAGE_BITS_MIN);
+
+static void tlb_flush_page_by_mmuidx_async_work(CPUState *cpu,
+run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
-unsigned long mmu_idx_bitmap = idxmap;
-int i, page, mmu_idx;
+target_ulong addr_and_mmuidx = (target_ulong) data.target_ptr;
+target_ulong addr = addr_and_mmuidx & TARGET_PAGE_MASK;
+unsigned long mmu_idx_bitmap = addr_and_mmuidx & ALL_MMUIDX_BITS;
+int page = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+int mmu_idx;
+int i;
 
 assert_cpu_is_self(cpu);
-tlb_debug("addr "TARGET_FMT_lx"\n", addr);
-
-/* Check if we need to flush due to large pages.  */
-if ((addr & env->tlb_flush_mask) == env->tlb_flush_addr) {
-tlb_debug("forced full flush ("
-  TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
-  env->tlb_flush_addr, env->tlb_flush_mask);
-
-v_tlb_flush_by_mmuidx(cpu, idxmap);
-return;
-}
 
-addr &= TARGET_PAGE_MASK;
-page = (add

Re: [Qemu-devel] MIPS machines

2017-01-27 Thread Yongbok Kim



On 27/01/2017 10:31, Thomas Huth wrote:
> On 27.01.2017 11:21, Yongbok Kim wrote:
>>
 Slightly off-topic, but: Is fulong2e still maintained? I did not spot an
 entry in MAINTAINERS...?
>>>
>>> It's covered by the general MIPS stanza:
>>>
>>> $ scripts/get_maintainer.pl -f hw/mips/mips_fulong2e.c 
>>> Aurelien Jarno  (maintainer:MIPS)
>>> Yongbok Kim  (maintainer:MIPS)
>>> qemu-devel@nongnu.org (open list:All patches CC here)
>>>
>>
>> I'm not actively looking after the device at the moment but if it has any
>> issues I love to handle that.
> 
> Great! Then could you maybe send a patch for the MAINTAINERS file to add
> an entry for that machine?
> 

Sure I will send a patch for that.

> Also it's a little bit confusing that "magnum" and "pica61" do not show
> up in MAINTAINERS, but I guess that's what is meant by the "Jazz" entry?
> 
>  Thanks,
>   Thomas
> 

I believe that is the case. The file listed in the Jazz entry
"hw/mips/mips_jazz.c" is the one to support these two machines.

Regards,
Yongbok

[Qemu-devel] [PATCH v8 21/25] target-arm: don't generate WFE/YIELD calls for MTTCG

2017-01-27 Thread Alex Bennée

The WFE and YIELD instructions are really only hints and in TCG's case
they were useful to move the scheduling on from one vCPU to the next. In
the parallel context (MTTCG) this just causes an unnecessary cpu_exit
and contention of the BQL.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
---
 target/arm/op_helper.c |  7 +++
 target/arm/translate-a64.c |  8 ++--
 target/arm/translate.c | 20 
 3 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index e1a883c595..abfa7cdd39 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -436,6 +436,13 @@ void HELPER(yield)(CPUARMState *env)
 ARMCPU *cpu = arm_env_get_cpu(env);
 CPUState *cs = CPU(cpu);
 
+/* When running in MTTCG we don't generate jumps to the yield and
+ * WFE helpers as it won't affect the scheduling of other vCPUs.
+ * If we wanted to more completely model WFE/SEV so we don't busy
+ * spin unnecessarily we would need to do something more involved.
+ */
+g_assert(!parallel_cpus);
+
 /* This is a non-trappable hint instruction that generally indicates
  * that the guest is currently busy-looping. Yield control back to the
  * top level loop so that a more deserving VCPU has a chance to run.
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 88a4df6959..05162f335e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1342,10 +1342,14 @@ static void handle_hint(DisasContext *s, uint32_t insn,
 s->is_jmp = DISAS_WFI;
 return;
 case 1: /* YIELD */
-s->is_jmp = DISAS_YIELD;
+if (!parallel_cpus) {
+s->is_jmp = DISAS_YIELD;
+}
 return;
 case 2: /* WFE */
-s->is_jmp = DISAS_WFE;
+if (!parallel_cpus) {
+s->is_jmp = DISAS_WFE;
+}
 return;
 case 4: /* SEV */
 case 5: /* SEVL */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index dc67887918..444a24c2b6 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4343,20 +4343,32 @@ static void gen_exception_return(DisasContext *s, 
TCGv_i32 pc)
 gen_rfe(s, pc, load_cpu_field(spsr));
 }
 
+/*
+ * For WFI we will halt the vCPU until an IRQ. For WFE and YIELD we
+ * only call the helper when running single threaded TCG code to ensure
+ * the next round-robin scheduled vCPU gets a crack. In MTTCG mode we
+ * just skip this instruction. Currently the SEV/SEVL instructions
+ * which are *one* of many ways to wake the CPU from WFE are not
+ * implemented so we can't sleep like WFI does.
+ */
 static void gen_nop_hint(DisasContext *s, int val)
 {
 switch (val) {
 case 1: /* yield */
-gen_set_pc_im(s, s->pc);
-s->is_jmp = DISAS_YIELD;
+if (!parallel_cpus) {
+gen_set_pc_im(s, s->pc);
+s->is_jmp = DISAS_YIELD;
+}
 break;
 case 3: /* wfi */
 gen_set_pc_im(s, s->pc);
 s->is_jmp = DISAS_WFI;
 break;
 case 2: /* wfe */
-gen_set_pc_im(s, s->pc);
-s->is_jmp = DISAS_WFE;
+if (!parallel_cpus) {
+gen_set_pc_im(s, s->pc);
+s->is_jmp = DISAS_WFE;
+}
 break;
 case 4: /* sev */
 case 5: /* sevl */
-- 
2.11.0

Re: [Qemu-devel] [PATCH] spapr: clock should count only if vm is running

2017-01-27 Thread Laurent Vivier

On 27/01/2017 10:45, Thomas Huth wrote:
> On 26.01.2017 21:45, Laurent Vivier wrote:
>> This is a port to ppc of the i386 commit:
>> 00f4d64 kvmclock: clock should count only if vm is running
>>
>> We remove timebase_/pre_save/post_load/ functions,
>> and use the VM state change handler to save and restore
>> the guest_timebase (on stop and continue).
>>
>> Time base offset has originally been introduced by commit
>> 98a8b52 spapr: Add support for time base offset migration
>>
>> So while VM is paused, the time is stopped. This allows to have
>> the same result with date (based on Time Base Register) and
>> hwclock (based on "get-time-of-day" RTAS call).
>>
>> Moreover in TCG mode, the Time Base is always paused, so this
>> patch also adjust the behavior between TCG and KVM.
>>
>> VM state field "time_of_the_day_ns" is now useless but we keep
>> it to be able to migrate to older version of the machine.
> 
> Not sure, but the cpu_ppc_clock_vm_state_change() handler is only used
> with KVM, isn't it? So what happens if you migrate in TCG mode from a
> new QEMU to an older one? Don't you have to update time_of_the_day_ns
> here somewhere, too (e.g. in a pre_save handler)?

This will be fixed because I'm preparing a new version with the pre_save
function to answer to the comment of Paolo and to do like in:
6053a86 kvmclock: reduce kvmclock difference on migration

But originally the time_of_the_day_ns was to compensate the time
difference between two hosts (QEMU_CLOCK_HOST), and I think this is not
used in case of TCG because we use the virtual clock
(QEMU_CLOCK_VIRTUAL) that is stopped and migrated independently
(cpu_clock_offset in vmstate_timers).

Thanks,
Laurent

Re: [Qemu-devel] [PATCH v8 2/9] icount: exit cpu loop on expire

2017-01-27 Thread Paolo Bonzini



On 27/01/2017 07:09, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> On 26/01/2017 15:32, Pavel Dovgalyuk wrote:
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 26/01/2017 14:37, Pavel Dovgalyuk wrote:
>> Simpler:
>>
>>  use_icount &&
>>  ((int32_t)cpu->icount_decr.u32 < 0 ||
>>   cpu->icount_decr.u16.low + cpu->icount_extra == 0)
> Right.
>
>> But I'm not sure that you need to test u32.  After all you're not
> Checking u32 is needed, because sometimes it is less than zero.

 If cpu->icount_decr.u32 is less than zero, the next translation block
 would immediately exit with TB_EXIT_ICOUNT_EXPIRED, causing

 cpu->exception_index = EXCP_INTERRUPT;
 *last_tb = NULL;   
 cpu_loop_exit(cpu);

 from cpu_loop_exec_tb's "case TB_EXIT_ICOUNT_EXPIRED".

 And the same is true for cpu->icount_decr.u16.low + cpu->icount_extra ==
 0, so I don't understand why this part of the patch is necessary.
>>>
>>> I removed that lines because we have to check icount=0 not only when it is 
>>> expired,
>>> but also when all instructions were executed successfully.
>>> If there are no instructions to execute, calling tb_find (and translation 
>>> then)
>>> may cause an exception at the wrong moment.
>>
>> Ok, that makes sense for cpu->icount_decr.u16.low + cpu->icount_extra == 0.
>>
>> But for decr.u32 < 0, the same reasoning of this comment is also true:
>>
>> /* Something asked us to stop executing
>>  * chained TBs; just continue round the main
>>  * loop. Whatever requested the exit will also
>>  * have set something else (eg exit_request or
>>  * interrupt_request) which we will handle
>>  * next time around the loop.  But we need to
>>  * ensure the tcg_exit_req read in generated code
>>  * comes before the next read of cpu->exit_request
>>  * or cpu->interrupt_request.
>>  */
> 
> Right. If the following lines will not be removed (as opposite to my patch) 
> then checking
> decr.u32 < 0 will not be needed.

That's what I'm not sure about.  u32 < 0 is only true if you have set
the interrupt_request as well, but interrupt requests are processed in
cpu_handle_interrupt and that doesn't require going back to the main loop.

Let me try some cleanups early next week and come back to you with a
patch to base your work on.

Paolo

> - cpu->exception_index = EXCP_INTERRUPT;
> - *last_tb = NULL;
> - cpu_loop_exit(cpu);
> 
> What is your point about the new version of that patch?
> 
> Pavel Dovgalyuk
> 
> 
>

Re: [Qemu-devel] [PATCH 2/2] migration: discard non-dirty ram pages after the start of postcopy

2017-01-27 Thread Dr. David Alan Gilbert

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:
> After the start of postcopy migration there are some non-dirty pages which 
> have
> already been migrated. These pages are no longer needed on the source vm so 
> that
> we can free them and it doen't hurt to complete the migration.
> 
> Signed-off-by: Pavel Butsykin 
> ---
>  include/migration/migration.h |  1 +
>  migration/migration.c |  2 ++
>  migration/ram.c   | 25 +
>  3 files changed, 28 insertions(+)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index d7bd404365..0d9b81545c 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -279,6 +279,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
>  int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
>uint64_t start, size_t length);
>  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
> +void ram_postcopy_migrated_memory_discard(MigrationState *ms);
>  
>  /**
>   * @migrate_add_blocker - prevent migration from proceeding
> diff --git a/migration/migration.c b/migration/migration.c
> index 391db6f28b..20490ed020 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1662,6 +1662,8 @@ static int postcopy_start(MigrationState *ms, bool 
> *old_vm_running)
>   */
>  qemu_savevm_send_ping(ms->to_dst_file, 4);
>  
> +ram_postcopy_migrated_memory_discard(ms);
> +

Did you intend this to be selected based on your capability flag?


>  ret = qemu_file_get_error(ms->to_dst_file);
>  if (ret) {
>  error_report("postcopy_start: Migration stream errored");
> diff --git a/migration/ram.c b/migration/ram.c
> index b0322a0b5c..8a6b614b0d 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1546,6 +1546,31 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool 
> expected)
>  
>  /*  functions for postcopy * */
>  
> +void ram_postcopy_migrated_memory_discard(MigrationState *ms)
> +{
> +struct RAMBlock *block;
> +unsigned long *bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
> +
> +QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> +unsigned long first = block->offset >> TARGET_PAGE_BITS;
> +unsigned long range = first + (block->used_length >> 
> TARGET_PAGE_BITS);
> +unsigned long run_start = find_next_zero_bit(bitmap, range, first);
> +
> +while (run_start < range) {
> +unsigned long run_end = find_next_bit(bitmap, range, run_start + 
> 1);
> +uint8_t *addr = block->host + (run_start << TARGET_PAGE_BITS);
> +size_t chunk_size = (run_end - run_start) << TARGET_PAGE_BITS;
> +
> +if (qemu_madvise(addr, chunk_size, QEMU_MADV_DONTNEED) < 0) {
> +error_report("migrate: madvise DONTNEED failed %p %ld: %s",
> + addr, chunk_size, strerror(errno));
> +}

can you use your ram_discard_page here, it keeps all the use of madvise 
together.
> +
> +run_start = find_next_zero_bit(bitmap, range, run_end + 1);
> +}
> +}
> +}

Dave

>  /*
>   * Callback from postcopy_each_ram_send_discard for each RAMBlock
>   * Note: At this point the 'unsentmap' is the processed bitmap combined
> -- 
> 2.11.0
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v8 19/25] cputlb: introduce tlb_flush_*_all_cpus[_synced]

2017-01-27 Thread Alex Bennée

This introduces support to the cputlb API for flushing all CPUs TLBs
with one call. This avoids the need for target helpers to iterate
through the vCPUs themselves.

An additional variant of the API (_synced) do not return from the
caller and will cause the work to be scheduled as "safe work". The
result will be all the flush operations will be complete by the time
the originating vCPU starts executing again. It is up to the caller to
ensure enough state has been saved so execution can be restarted at
the next appropriate instruction.

Some guest architectures can defer completion of flush operations
until later. If they later schedule work using the async_safe_work
mechanism they can be sure other vCPUs will have flushed their TLBs by
the point execution returns from the safe work.

Signed-off-by: Alex Bennée 

---
v7
  - some checkpatch long line fixes
v8
  - change from varg to bitmap calling convention
  - add _synced variants, re-factored helper
---
 cputlb.c| 110 +++---
 include/exec/exec-all.h | 114 ++--
 2 files changed, 215 insertions(+), 9 deletions(-)

diff --git a/cputlb.c b/cputlb.c
index 65003350e3..7f9a54f253 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -73,6 +73,25 @@ QEMU_BUILD_BUG_ON(sizeof(target_ulong) > 
sizeof(run_on_cpu_data));
 QEMU_BUILD_BUG_ON(NB_MMU_MODES > 16);
 #define ALL_MMUIDX_BITS ((1 << NB_MMU_MODES) - 1)
 
+/* flush_all_helper: run fn across all cpus
+ *
+ * If the wait flag is set then the src cpu's helper will be queued as
+ * "safe" work and the loop exited creating a synchronisation point
+ * where all queued work will be finished before execution starts
+ * again.
+ */
+static void flush_all_helper(CPUState *src, run_on_cpu_func fn,
+ run_on_cpu_data d)
+{
+CPUState *cpu;
+
+CPU_FOREACH(cpu) {
+if (cpu != src) {
+async_run_on_cpu(cpu, fn, d);
+}
+}
+}
+
 /* statistics */
 int tlb_flush_count;
 
@@ -128,6 +147,19 @@ void tlb_flush(CPUState *cpu)
 }
 }
 
+void tlb_flush_all_cpus(CPUState *src_cpu)
+{
+flush_all_helper(src_cpu, tlb_flush_global_async_work, RUN_ON_CPU_NULL);
+tlb_flush_global_async_work(src_cpu, RUN_ON_CPU_NULL);
+}
+
+void QEMU_NORETURN tlb_flush_all_cpus_synced(CPUState *src_cpu)
+{
+flush_all_helper(src_cpu, tlb_flush_global_async_work, RUN_ON_CPU_NULL);
+tlb_flush_global_async_work(src_cpu, RUN_ON_CPU_NULL);
+cpu_loop_exit(src_cpu);
+}
+
 static void tlb_flush_by_mmuidx_async_work(CPUState *cpu, run_on_cpu_data data)
 {
 CPUArchState *env = cpu->env_ptr;
@@ -178,6 +210,30 @@ void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
 }
 }
 
+void tlb_flush_by_mmuidx_all_cpus(CPUState *src_cpu, uint16_t idxmap)
+{
+const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work;
+
+tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
+
+flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
+fn(src_cpu, RUN_ON_CPU_HOST_INT(idxmap));
+}
+
+void QEMU_NORETURN tlb_flush_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+   uint16_t idxmap)
+{
+const run_on_cpu_func fn = tlb_flush_by_mmuidx_async_work;
+
+tlb_debug("mmu_idx: 0x%"PRIx16"\n", idxmap);
+
+flush_all_helper(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
+async_safe_run_on_cpu(src_cpu, fn, RUN_ON_CPU_HOST_INT(idxmap));
+cpu_loop_exit(src_cpu);
+}
+
+
+
 static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
 {
 if (addr == (tlb_entry->addr_read &
@@ -317,14 +373,56 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong 
addr, uint16_t idxmap)
 }
 }
 
-void tlb_flush_page_all(target_ulong addr)
+void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, target_ulong addr,
+   uint16_t idxmap)
 {
-CPUState *cpu;
+const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work;
+target_ulong addr_and_mmu_idx;
 
-CPU_FOREACH(cpu) {
-async_run_on_cpu(cpu, tlb_flush_page_async_work,
- RUN_ON_CPU_TARGET_PTR(addr));
-}
+tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
+
+/* This should already be page aligned */
+addr_and_mmu_idx = addr & TARGET_PAGE_MASK;
+addr_and_mmu_idx |= idxmap;
+
+flush_all_helper(src_cpu, fn, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
+fn(src_cpu, RUN_ON_CPU_TARGET_PTR(addr_and_mmu_idx));
+}
+
+void QEMU_NORETURN tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
+target_ulong addr,
+uint16_t idxmap)
+{
+const run_on_cpu_func fn = tlb_check_page_and_flush_by_mmuidx_async_work;
+target_ulong addr_and_mmu_idx;
+
+tlb_debug("addr: "TARGET_FMT_lx" mmu_idx:%"PRIx16"\n", addr, idxmap);
+
+/* This should alread

Re: [Qemu-devel] [PATCH v3 3/3] xen-platform: add missing disk unplug option

2017-01-27 Thread John Snow



On 01/26/2017 04:37 AM, Paul Durrant wrote:
> The Xen HVM unplug protocol [1] specifies a mechanism to allow guests to
> request unplug of 'aux' disks (which is stated to mean all IDE disks,
> except the primary master). This patch adds support for that unplug request.
> 
> NOTE: The semantics of what happens if unplug of all disks and 'aux' disks
>   is simultaneously requests is not clear. The patch makes that
>   assumption that an 'all' request overrides an 'aux' request.
> 
> [1] 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/hvm-emulated-unplug.markdown
> 
> Signed-off-by: Paul Durrant 
> Reviewed-by: Stefano Stabellini 
> 
> Cc: Anthony Perard 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: "Michael S. Tsirkin" 
> Cc: John Snow 
> ---
>  hw/i386/xen/xen_platform.c | 27 +++
>  hw/ide/piix.c  |  4 ++--
>  include/hw/ide.h   |  2 +-
>  3 files changed, 18 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> index 7d41ebb..6010f35 100644
> --- a/hw/i386/xen/xen_platform.c
> +++ b/hw/i386/xen/xen_platform.c
> @@ -107,8 +107,12 @@ static void pci_unplug_nics(PCIBus *bus)
>  pci_for_each_device(bus, 0, unplug_nic, NULL);
>  }
>  
> -static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
> +static void unplug_disks(PCIBus *b, PCIDevice *d, void *opaque)
>  {
> +uint32_t flags = *(uint32_t *)opaque;
> +bool aux = (flags & UNPLUG_AUX_IDE_DISKS) &&
> +!(flags & UNPLUG_ALL_DISKS);
> +
>  /* We have to ignore passthrough devices */
>  if (!strcmp(d->name, "xen-pci-passthrough")) {
>  return;
> @@ -116,12 +120,14 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void 
> *o)
>  
>  switch (pci_get_word(d->config + PCI_CLASS_DEVICE)) {
>  case PCI_CLASS_STORAGE_IDE:
> -pci_piix3_xen_ide_unplug(DEVICE(d));
> +pci_piix3_xen_ide_unplug(DEVICE(d), aux);
>  break;
>  
>  case PCI_CLASS_STORAGE_SCSI:
>  case PCI_CLASS_STORAGE_EXPRESS:
> -object_unparent(OBJECT(d));
> +if (!aux) {
> +object_unparent(OBJECT(d));
> +}
>  break;
>  
>  default:
> @@ -129,9 +135,9 @@ static void unplug_disks(PCIBus *b, PCIDevice *d, void *o)
>  }
>  }
>  
> -static void pci_unplug_disks(PCIBus *bus)
> +static void pci_unplug_disks(PCIBus *bus, uint32_t flags)
>  {
> -pci_for_each_device(bus, 0, unplug_disks, NULL);
> +pci_for_each_device(bus, 0, unplug_disks, &flags);
>  }
>  
>  static void platform_fixed_ioport_writew(void *opaque, uint32_t addr, 
> uint32_t val)
> @@ -144,17 +150,14 @@ static void platform_fixed_ioport_writew(void *opaque, 
> uint32_t addr, uint32_t v
>  /* Unplug devices.  Value is a bitmask of which devices to
> unplug, with bit 0 the disk devices, bit 1 the network
> devices, and bit 2 the non-primary-master IDE devices. */
> -if (val & UNPLUG_ALL_DISKS) {
> +if (val & (UNPLUG_ALL_DISKS | UNPLUG_AUX_IDE_DISKS)) {
>  DPRINTF("unplug disks\n");
> -pci_unplug_disks(pci_dev->bus);
> +pci_unplug_disks(pci_dev->bus, val);
>  }
>  if (val & UNPLUG_ALL_NICS) {
>  DPRINTF("unplug nics\n");
>  pci_unplug_nics(pci_dev->bus);
>  }
> -if (val & UNPLUG_AUX_IDE_DISKS) {
> -DPRINTF("unplug auxiliary disks not supported\n");
> -}
>  break;
>  }
>  case 2:
> @@ -335,14 +338,14 @@ static void xen_platform_ioport_writeb(void *opaque, 
> hwaddr addr,
>   * If VMDP was to control both disk and LAN it would use 4.
>   * If it controlled just disk or just LAN, it would use 8 below.
>   */
> -pci_unplug_disks(pci_dev->bus);
> +pci_unplug_disks(pci_dev->bus, UNPLUG_ALL_DISKS);
>  pci_unplug_nics(pci_dev->bus);
>  }
>  break;
>  case 8:
>  switch (val) {
>  case 1:
> -pci_unplug_disks(pci_dev->bus);
> +pci_unplug_disks(pci_dev->bus, UNPLUG_ALL_DISKS);
>  break;
>  case 2:
>  pci_unplug_nics(pci_dev->bus);
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index d5777fd..7e2d767 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -165,7 +165,7 @@ static void pci_piix_ide_realize(PCIDevice *dev, Error 
> **errp)
>  pci_piix_init_ports(d);
>  }
>  
> -int pci_piix3_xen_ide_unplug(DeviceState *dev)
> +int pci_piix3_xen_ide_unplug(DeviceState *dev, bool aux)
>  {
>  PCIIDEState *pci_ide;
>  DriveInfo *di;
> @@ -174,7 +174,7 @@ int pci_piix3_xen_ide_unplug(DeviceState *dev)
>  
>  pci_ide = PCI_IDE(dev);
>  
> -for (i = 0; i < 4; i++) {
> +for (i = aux ? 1 : 0; i < 4; i++) {
>  di = drive_get_by_index(IF_IDE, i);
>  if (di != NULL && !di->media_cd) {
>  BlockBackend *blk = b

Re: [Qemu-devel] [PATCH 1/2] add 'discard-ram' migrate capability

2017-01-27 Thread Dr. David Alan Gilbert

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:
> This feature frees the migrated memory on the source during postcopy-ram
> migration. In the second step of postcopy-ram migration when the source vm
> is put on pause we can free unnecessary memory. It will allow, in particular,
> to start relaxing the memory stress on the source host in a load-balancing
> scenario.
> 
> Signed-off-by: Pavel Butsykin 

Hi Pavel,
  Firstly a higher-level thing, can we use a different word than 'discard'
because I already use 'discard' in postcopy to mean the request from the source
to the destination to discard pages that are redirtied.  I suggest 'release-ram'
to just pick a different word that means the same thing.

Also, see patchew's build error it spotted.

> ---
>  include/migration/migration.h |  1 +
>  include/migration/qemu-file.h |  3 ++-
>  migration/migration.c |  9 +++
>  migration/qemu-file.c | 59 
> ++-
>  migration/ram.c   | 24 +-
>  qapi-schema.json  |  5 +++-
>  6 files changed, 91 insertions(+), 10 deletions(-)
> 
> diff --git a/include/migration/migration.h b/include/migration/migration.h
> index c309d23370..d7bd404365 100644
> --- a/include/migration/migration.h
> +++ b/include/migration/migration.h
> @@ -294,6 +294,7 @@ void migrate_add_blocker(Error *reason);
>   */
>  void migrate_del_blocker(Error *reason);
>  
> +bool migrate_discard_ram(void);
>  bool migrate_postcopy_ram(void);
>  bool migrate_zero_blocks(void);
>  
> diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
> index abedd466c9..0cd648a733 100644
> --- a/include/migration/qemu-file.h
> +++ b/include/migration/qemu-file.h
> @@ -132,7 +132,8 @@ void qemu_put_byte(QEMUFile *f, int v);
>   * put_buffer without copying the buffer.
>   * The buffer should be available till it is sent asynchronously.
>   */
> -void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size);
> +void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size,
> +   bool may_free);
>  bool qemu_file_mode_is_not_valid(const char *mode);
>  bool qemu_file_is_writable(QEMUFile *f);
>  
> diff --git a/migration/migration.c b/migration/migration.c
> index f498ab84f2..391db6f28b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1251,6 +1251,15 @@ void qmp_migrate_set_downtime(double value, Error 
> **errp)
>  qmp_migrate_set_parameters(&p, errp);
>  }
>  
> +bool migrate_discard_ram(void)
> +{
> +MigrationState *s;
> +
> +s = migrate_get_current();
> +
> +return s->enabled_capabilities[MIGRATION_CAPABILITY_DISCARD_RAM];
> +}
> +
>  bool migrate_postcopy_ram(void)
>  {
>  MigrationState *s;
> diff --git a/migration/qemu-file.c b/migration/qemu-file.c
> index e9fae31158..f85a0ecd9e 100644
> --- a/migration/qemu-file.c
> +++ b/migration/qemu-file.c
> @@ -30,6 +30,7 @@
>  #include "qemu/coroutine.h"
>  #include "migration/migration.h"
>  #include "migration/qemu-file.h"
> +#include "sysemu/sysemu.h"
>  #include "trace.h"
>  
>  #define IO_BUF_SIZE 32768
> @@ -49,6 +50,7 @@ struct QEMUFile {
>  int buf_size; /* 0 when writing */
>  uint8_t buf[IO_BUF_SIZE];
>  
> +DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
>  struct iovec iov[MAX_IOV_SIZE];
>  unsigned int iovcnt;
>  
> @@ -132,6 +134,40 @@ bool qemu_file_is_writable(QEMUFile *f)
>  return f->ops->writev_buffer;
>  }
>  
> +static void qemu_iovec_discard_ram(QEMUFile *f)
> +{
> +struct iovec iov;
> +unsigned long idx;
> +
> +if (!migrate_discard_ram() || !runstate_check(RUN_STATE_FINISH_MIGRATE)) 
> {
> +return;
> +}

Can we split this out into a separate function please; so qemu_iovec_discard_ram
always does it, and then you have something that only calls it if enabled.

> +idx = find_next_bit(f->may_free, f->iovcnt, 0);
> +if (idx >= f->iovcnt) {
> +return;
> +}
> +iov = f->iov[idx];
> +
> +while ((idx = find_next_bit(f->may_free, f->iovcnt, idx + 1)) < 
> f->iovcnt) {
> +/* check for adjacent buffer and coalesce them */
> +if (iov.iov_base + iov.iov_len == f->iov[idx].iov_base) {
> +iov.iov_len += f->iov[idx].iov_len;
> +continue;
> +}
> +if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) 
> {
> +error_report("migrate: madvise DONTNEED failed %p %ld: %s",
> + iov.iov_base, iov.iov_len, strerror(errno));
> +}
> +iov = f->iov[idx];

Can you add some comments in here please; it took me a while to understand it;
I think what you're doing is that the madvise in the loop is called for the last
iov within a continuous range and then you fall through to deal with the last 
one.

Also, see my 'postcopy: enhance ram_discard_range for hugepages' - these 
madvise's
get a bit more complex with hugepage.

> +}
> +if (qemu_madv

Re: [Qemu-devel] Commit 3a6c9 breaks QEMU on FreeBSD/Xen

2017-01-27 Thread Juergen Gross

On 24/01/17 17:42, Roger Pau Monné wrote:
> Hello,
> 
> The following commit:
> 
> commit 3a6c9172ac5951e6dac2b3f6cbce3cfccdec5894
> Author: Juergen Gross 
> Date:   Tue Nov 22 07:10:58 2016 +0100
> 
> xen: create qdev for each backend device
> 
> Prevents me from running QEMU on FreeBSD/Xen, the following is printed on the
> QEMU log:
> 
> char device redirected to /dev/pts/2 (label serial0)
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> 
> # xl create -c ~/domain.cfg
> Parsing config from /root/domain.cfg
> libxl: error: libxl_dm.c:2201:device_model_spawn_outcome: Domain 32:domain 32 
> device model: spawn failed (rc=-3)
> libxl: error: libxl_create.c:1506:domcreate_devmodel_started: Domain 
> 32:device model did not start: -3
> libxl: error: libxl_dm.c:2315:kill_device_model: Device Model already exited
> libxl: error: libxl.c:1572:libxl__destroy_domid: Domain 32:Non-existant domain
> libxl: error: libxl.c:1531:domain_destroy_callback: Domain 32:Unable to 
> destroy guest
> libxl: error: libxl.c:1458:domain_destroy_cb: Domain 32:Destruction of domain 
> failed
> # cat /var/log/xen/qemu-dm-domain.log
> char device redirected to /dev/pts/2 (label serial0)
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> xen be core: xen be core: can't open gnttab device
> can't open gnttab device
> 
> I'm not really familiar with any of that code, but I think that using
> qdev_init_nofail is wrong, since on FreeBSD/Xen for example we don't yet
> support the gnttab device, so initialization of the Xen Qdisk backend can fail
> (and possibly the same applies to Linux if someone decides to compile a kernel
> without the gnttab device). Yet QEMU can be used without the Qdisk backend.

I don't think this is due to qdev_init_nofail(). As the gnttab related
messages are printed _after_ calling qdev_init_nofail() I don't see how
it could be related.

I think the error exits of xen_be_get_xendev() are just wrong in using
g_free(xendev) instead of qdev_unplug(&xendev->qdev, NULL).

I'll do a test with that modification. Could you test it in parallel?


Juergen

Re: [Qemu-devel] [PATCH v2 8/8] hw: Drop superfluous special checks for orphaned -drive

2017-01-27 Thread Markus Armbruster

John Snow  writes:

> On 01/26/2017 10:09 AM, Markus Armbruster wrote:
>> We've traditionally rejected orphans here and there, but not
>> systematically.  For instance, the sun4m machines have an onboard SCSI
>> HBA (bus=0), and have always rejected bus>0.  Other machines with an
>> onboard SCSI HBA don't.
>> 
>> Commit a66c9dc made all orphans trigger a warning, and the previous
>> commit turned this into an error.  The checks "here and there" are now
>> redundant.  Drop them.
>> 
>> Note that the one in mips_jazz.c was wrong: it rejected bus > MAX_FD,
>> but MAX_FD is the number of floppy drives per bus.
>> 
>> Error messages change from
>> 
>> $ qemu-system-x86_64 -drive if=ide,bus=2
>> qemu-system-x86_64: Too many IDE buses defined (3 > 2)
>> $ qemu-system-mips64 -M magnum,accel=qtest -drive if=floppy,bus=2,id=fd1
>> qemu: too many floppy drives
>> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
>> qemu: too many SCSI bus
>> 
>> to
>> 
>> $ qemu-system-x86_64 -drive if=ide,bus=2
>> qemu-system-x86_64: -drive if=ide,bus=2: machine type does not support 
>> this drive
>> $ qemu-system-mips64 -M magnum,accel=qtest -drive if=floppy,bus=2,id=fd1
>> qemu-system-mips64: -drive if=floppy,bus=2,id=fd1: machine type does not 
>> support this drive
>> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
>> qemu-system-sparc: -drive if=scsi,bus=1: machine type does not support 
>> this drive
>> 
>
> Hm, that's a lot less helpful, isn't it? Can we augment with hints?

The message itself may be less specific, but it now comes with a precise
location.  Personally, I'd even find

qemu-system-sparc: -drive if=scsi,bus=1: *mumble* *mumble*

more helpful than

qemu: too many SCSI bus

because the former tells me *which* of the options is bad.  We tend to
have lots and lots of them.

The deleted special case errors cover only a minority of "orphan"
-drive.  If these cases need improvement, then so will the general case.
If you can come up with a hint that makes the general case message more
useful, I'm more than happy to squash it into PATCH 6.

Re: [Qemu-devel] [PULL 0/2] HBitmap patches

2017-01-27 Thread Peter Maydell

On 26 January 2017 at 02:48, Fam Zheng  wrote:
> The following changes since commit c7f1cf01b8245762ca5864e835d84f6677ae8b1f:
>
>   Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging 
> (2017-01-25 17:54:14 +)
>
> are available in the git repository at:
>
>   git://github.com/famz/qemu.git tags/for-upstream
>
> for you to fetch changes up to 7cdc49b9a2ee127b2403f6ce98ce3f96afb41733:
>
>   test-hbitmap: Add hbitmap_is_serializable() calls (2017-01-26 10:25:01 
> +0800)
>
> 
>
> Two patches by Max submitted back in 2.8 timeframe.
>
> 
>
> Max Reitz (2):
>   hbitmap: Add hbitmap_is_serializable()
>   test-hbitmap: Add hbitmap_is_serializable() calls
>
>  include/qemu/hbitmap.h | 13 +
>  tests/test-hbitmap.c   | 11 +++
>  util/hbitmap.c | 22 +++---
>  3 files changed, 43 insertions(+), 3 deletions(-)
>
> -


Applied, thanks.

-- PMM

[Qemu-devel] [PATCH] dma: omap: check dma channel data_type

2017-01-27 Thread P J P

From: Prasad J Pandit 

When setting dma channel 'data_type', if (value & 3) == 3,
the set 'data_type' is said to be bad. This also leads to an
OOB access in 'omap_dma_transfer_generic', while doing
cpu_physical_memory_r/w operations. Add check to avoid it.

Reported-by: Jiang Xin 
Signed-off-by: Prasad J Pandit 
---
 hw/dma/omap_dma.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/dma/omap_dma.c b/hw/dma/omap_dma.c
index f6f86f9..45dfe7a 100644
--- a/hw/dma/omap_dma.c
+++ b/hw/dma/omap_dma.c
@@ -878,15 +878,17 @@ static int omap_dma_ch_reg_write(struct omap_dma_s *s,
 ch->burst[0] = (value & 0x0180) >> 7;
 ch->pack[0] = (value & 0x0040) >> 6;
 ch->port[0] = (enum omap_dma_port) ((value & 0x003c) >> 2);
-ch->data_type = 1 << (value & 3);
 if (ch->port[0] >= __omap_dma_port_last)
 printf("%s: invalid DMA port %i\n", __FUNCTION__,
 ch->port[0]);
 if (ch->port[1] >= __omap_dma_port_last)
 printf("%s: invalid DMA port %i\n", __FUNCTION__,
 ch->port[1]);
-if ((value & 3) == 3)
+ch->data_type = 1 << (value & 3);
+if ((value & 3) == 3) {
 printf("%s: bad data_type for DMA channel\n", __FUNCTION__);
+ch->data_type >>= 1;
+}
 break;
 
 case 0x02: /* SYS_DMA_CCR_CH0 */
@@ -1988,8 +1990,10 @@ static void omap_dma4_write(void *opaque, hwaddr addr,
 fprintf(stderr, "%s: bad MReqAddressTranslate sideband signal\n",
 __FUNCTION__);
 ch->data_type = 1 << (value & 3);
-if ((value & 3) == 3)
+if ((value & 3) == 3) {
 printf("%s: bad data_type for DMA channel\n", __FUNCTION__);
+ch->data_type >>= 1;
+}
 break;
 
 case 0x14: /* DMA4_CEN */
-- 
2.9.3

Re: [Qemu-devel] [PATCH RFC] migration: set cpu throttle value by workload

2017-01-27 Thread Dr. David Alan Gilbert

* Chao Fan (fanc.f...@cn.fujitsu.com) wrote:
> Hi all,
> 
> This is a test for this RFC patch.
> 
> Start vm as following:
> cmdline="./x86_64-softmmu/qemu-system-x86_64 -m 2560 \
> -drive if=none,file=/nfs/img/fedora.qcow2,format=qcow2,id=foo \
> -netdev tap,id=hn0,queues=1 \
> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> -device virtio-blk,drive=foo \
> -enable-kvm -M pc -cpu host \
> -vnc :3 \
> -monitor stdio"
> 
> Continue running benchmark program named himeno[*](modified base on
> original source). The code is in the attach file, make it in MIDDLE.
> It costs much cpu calculation and memory. Then migrate the guest.
> The source host and target host are in one switch.
> 
> "before" means the upstream version, "after" means applying this patch.
> "idpr" means "inst_dirty_pages_rate", a new variable in this RFC PATCH.
> "count" is "dirty sync count" in "info migrate".
> "time" is "total time" in "info migrate".
> "ct pct" is "cpu throttle percentage" in "info migrate".
> 
>  
> | |before|after| 
> |-|--|-| 
> |count|time(s)|ct pct|time(s)| idpr |ct pct| 
> |-|---|--|---|--|--| 
> |  1  |3  |   0  |4  |   x  |   0  | 
> |  2  |   53  |   0  |   53  | 14237|   0  | 
> |  3  |   97  |   0  |   95  |  3142|   0  | 
> |  4  |  109  |   0  |  105  | 11085|   0  | 
> |  5  |  117  |   0  |  113  | 12894|   0  | 
> |  6  |  125  |  20  |  121  | 13549|  67  | 
> |  7  |  133  |  20  |  130  | 13550|  67  | 
> |  8  |  141  |  20  |  136  | 13587|  67  | 
> |  9  |  149  |  30  |  144  | 13553|  99  | 
> | 10  |  156  |  30  |  152  |  1474|  99  |  
> | 11  |  164  |  30  |  152  |  1706|  99  |  
> | 12  |  172  |  40  |  153  |   0  |  99  |  
> | 13  |  180  |  40  |  153  |   0  |   x  |  
> | 14  |  188  |  40  |-|
> | 15  |  195  |  50  |  completed  |  
> | 16  |  203  |  50  | |  
> | 17  |  211  |  50  | |  
> | 18  |  219  |  60  | |  
> | 19  |  227  |  60  | |  
> | 20  |  235  |  60  | |  
> | 21  |  242  |  70  | |  
> | 22  |  250  |  70  | |  
> | 23  |  258  |  70  | |  
> | 24  |  266  |  80  | |  
> | 25  |  274  |  80  | |  
> | 26  |  281  |  80  | |  
> | 27  |  289  |  90  | |  
> | 28  |  297  |  90  | |  
> | 29  |  305  |  90  | |  
> | 30  |  315  |  99  | |  
> | 31  |  320  |  99  | |  
> | 32  |  320  |  99  | |  
> | 33  |  321  |  99  | |  
> | 34  |  321  |  99  | |  
> || |
> |completed   | |
> 
> 
> And the "info migrate" when completed:
> 
> before:
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: on
> zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
> Migration status: completed
> total time: 321091 milliseconds
> downtime: 573 milliseconds
> setup: 40 milliseconds
> transferred ram: 10509346 kbytes
> throughput: 268.13 mbps
> remaining ram: 0 kbytes
> total ram: 2638664 kbytes
> duplicate: 362439 pages
> skipped: 0 pages
> normal: 2621414 pages
> normal bytes: 10485656 kbytes
> dirty sync count: 34
> 
> after:
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: on
> zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
> Migration status: completed
> total time: 152652 milliseconds
> downtime: 290 milliseconds
> setup: 47 milliseconds
> transferred ram: 4997452 kbytes
> throughput: 268.20 mbps
> remaining ram: 0 kbytes
> total ram: 2638664 kbytes
> duplicate: 359598 pages
> skipped: 0 pages
> normal: 1246136 pages
> normal bytes: 4984544 kbytes
> dirty sync count: 13
> 
> It's clear that the total time is much better(321s VS 153s).
> The guest began cpu throttle in the 6th dirty sync. But at this time,
> the dirty pages born too much in this guest. So the default
> cpu throttle percentage(20 and 10) is too small for this condition. I
> just use (inst_dirty_pages_rate / 200) to calculate the cpu throttle
> value. This is just an adhoc algorithm, not supported by any theories. 
> 
> Of course on the other hand, the cpu throttle percentage is higher, the
> guest runs more slowly. But in the result, after applying this patch,
> the guest spend 23s with the cpu throttle percentage is 67 (total time
> from 121 to 144), and 9s with cpu throttle percentage is 99 (total time
> from 144 to completed). But in the upstream version, the guest spend
> 73s with the cpu throttle percentage is 70.80.90 (total time from 21 to
> 30), 6s with the cpu throttle percentage is 99

[Qemu-devel] [PATCH v2] spapr: clock should count only if vm is running

2017-01-27 Thread Laurent Vivier

This is a port to ppc of the i386 commit:
00f4d64 kvmclock: clock should count only if vm is running

We remove timebase_post_load function, and use the VM state
change handler to save and restore the guest_timebase (on stop
and continue).

We keep timebase_pre_save to reduce the clock difference on
migration like in:
6053a86 kvmclock: reduce kvmclock difference on migration

Time base offset has originally been introduced by commit
98a8b52 spapr: Add support for time base offset migration

So while VM is paused, the time is stopped. This allows to have
the same result with date (based on Time Base Register) and
hwclock (based on "get-time-of-day" RTAS call).

Moreover in TCG mode, the Time Base is always paused, so this
patch also adjust the behavior between TCG and KVM.

VM state field "time_of_the_day_ns" is now useless but we keep
it to be able to migrate to older version of the machine.

As vmstate_ppc_timebase structure (with timebase_pre_save() and
timebase_post_load() functions) was only used by vmstate_spapr,
we register the VM state change handler only in ppc_spapr_init().

Signed-off-by: Laurent Vivier 
---
v2:
  keep timebase_pre_save()
  move save and load code to timebase_save() and timebase_load(),
  to use the timebase_save() in timebase_pre_save() and
  in cpu_ppc_clock_vm_state_change(), and this eases the
  patch review.
  put a "#if defined(CONFIG_KVM)" only around kvm_set_one_reg()
  don't remove the trace function

 hw/ppc/ppc.c | 66 ++--
 hw/ppc/spapr.c   |  6 +
 target/ppc/cpu-qom.h |  3 +++
 3 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 8945869..24d4392 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -847,9 +847,8 @@ static void cpu_ppc_set_tb_clk (void *opaque, uint32_t freq)
 cpu_ppc_store_purr(cpu, 0xULL);
 }
 
-static void timebase_pre_save(void *opaque)
+static void timebase_save(PPCTimebase *tb)
 {
-PPCTimebase *tb = opaque;
 uint64_t ticks = cpu_get_host_ticks();
 PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
 
@@ -858,43 +857,30 @@ static void timebase_pre_save(void *opaque)
 return;
 }
 
+/* not used anymore, we keep it for compatibility */
 tb->time_of_the_day_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
 /*
- * tb_offset is only expected to be changed by migration so
+ * tb_offset is only expected to be changed by QEMU so
  * there is no need to update it from KVM here
  */
 tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
 }
 
-static int timebase_post_load(void *opaque, int version_id)
+static void timebase_load(PPCTimebase *tb)
 {
-PPCTimebase *tb_remote = opaque;
 CPUState *cpu;
 PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
-int64_t tb_off_adj, tb_off, ns_diff;
-int64_t migration_duration_ns, migration_duration_tb, guest_tb, host_ns;
+int64_t tb_off_adj, tb_off;
 unsigned long freq;
 
 if (!first_ppc_cpu->env.tb_env) {
 error_report("No timebase object");
-return -1;
+return;
 }
 
 freq = first_ppc_cpu->env.tb_env->tb_freq;
-/*
- * Calculate timebase on the destination side of migration.
- * The destination timebase must be not less than the source timebase.
- * We try to adjust timebase by downtime if host clocks are not
- * too much out of sync (1 second for now).
- */
-host_ns = qemu_clock_get_ns(QEMU_CLOCK_HOST);
-ns_diff = MAX(0, host_ns - tb_remote->time_of_the_day_ns);
-migration_duration_ns = MIN(NANOSECONDS_PER_SECOND, ns_diff);
-migration_duration_tb = muldiv64(freq, migration_duration_ns,
- NANOSECONDS_PER_SECOND);
-guest_tb = tb_remote->guest_timebase + MIN(0, migration_duration_tb);
 
-tb_off_adj = guest_tb - cpu_get_host_ticks();
+tb_off_adj = tb->guest_timebase - cpu_get_host_ticks();
 
 tb_off = first_ppc_cpu->env.tb_env->tb_offset;
 trace_ppc_tb_adjust(tb_off, tb_off_adj, tb_off_adj - tb_off,
@@ -904,9 +890,44 @@ static int timebase_post_load(void *opaque, int version_id)
 CPU_FOREACH(cpu) {
 PowerPCCPU *pcpu = POWERPC_CPU(cpu);
 pcpu->env.tb_env->tb_offset = tb_off_adj;
+#if defined(CONFIG_KVM)
+kvm_set_one_reg(cpu, KVM_REG_PPC_TB_OFFSET,
+&pcpu->env.tb_env->tb_offset);
+#endif
 }
+}
 
-return 0;
+void cpu_ppc_clock_vm_state_change(void *opaque, int running,
+   RunState state)
+{
+PPCTimebase *tb = opaque;
+
+if (running) {
+timebase_load(tb);
+} else {
+timebase_save(tb);
+}
+}
+
+/*
+ * When migrating, read the clock just before migration,
+ * so that the guest clock counts during the events
+ * between:
+ *
+ *  * vm_stop()
+ *  *
+ *  * pre_save()
+ *
+ *  This reduces clock difference on migration from 5s
+ *  to 0.1s (when max_downtime

Re: [Qemu-devel] [PATCH 03/10] armv7m: add state for v7M CCR, CFSR, HFSR, DFSR, MMFAR, BFAR

2017-01-27 Thread Alex Bennée


Peter Maydell  writes:

> Add the structure fields, VMState fields, reset code and macros for
> the v7M system control registers CCR, CFSR, HFSR, DFSR, MMFAR and
> BFAR.
>
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/cpu.h | 54 
> 
>  target/arm/cpu.c |  7 +++
>  target/arm/machine.c | 10 --
>  3 files changed, 69 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index b2cc329..4b062d2 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -21,6 +21,7 @@
>  #define ARM_CPU_H
>
>  #include "kvm-consts.h"
> +#include "hw/registerfields.h"
>
>  #if defined(TARGET_AARCH64)
>/* AArch64 definitions */
> @@ -405,6 +406,12 @@ typedef struct CPUARMState {
>  uint32_t vecbase;
>  uint32_t basepri;
>  uint32_t control;
> +uint32_t ccr; /* Configuration and Control */
> +uint32_t cfsr; /* Configurable Fault Status */
> +uint32_t hfsr; /* HardFault Status */
> +uint32_t dfsr; /* Debug Fault Status Register */
> +uint32_t mmfar; /* MemManage Fault Address */
> +uint32_t bfar; /* BusFault Address */

Given the CPUARMState needs to be accessed via env do we need to start
getting concerned about its size?

>  int exception;
>  } v7m;
>
> @@ -1086,6 +1093,53 @@ enum arm_cpu_mode {
>  #define ARM_IWMMXT_wCGR2 10
>  #define ARM_IWMMXT_wCGR3 11
>
> +/* V7M CCR bits */
> +FIELD(V7M_CCR, NONBASETHRDENA, 0, 1)
> +FIELD(V7M_CCR, USERSETMPEND, 1, 1)
> +FIELD(V7M_CCR, UNALIGN_TRP, 3, 1)
> +FIELD(V7M_CCR, DIV_0_TRP, 4, 1)
> +FIELD(V7M_CCR, BFHFNMIGN, 8, 1)
> +FIELD(V7M_CCR, STKALIGN, 9, 1)
> +FIELD(V7M_CCR, DC, 16, 1)
> +FIELD(V7M_CCR, IC, 17, 1)
> +
> +/* V7M CFSR bits for MMFSR */
> +FIELD(V7M_CFSR, IACCVIOL, 0, 1)
> +FIELD(V7M_CFSR, DACCVIOL, 1, 1)
> +FIELD(V7M_CFSR, MUNSTKERR, 3, 1)
> +FIELD(V7M_CFSR, MSTKERR, 4, 1)
> +FIELD(V7M_CFSR, MLSPERR, 5, 1)
> +FIELD(V7M_CFSR, MMARVALID, 7, 1)
> +
> +/* V7M CFSR bits for BFSR */
> +FIELD(V7M_CFSR, IBUSERR, 8 + 0, 1)
> +FIELD(V7M_CFSR, PRECISERR, 8 + 1, 1)
> +FIELD(V7M_CFSR, IMPRECISERR, 8 + 2, 1)
> +FIELD(V7M_CFSR, UNSTKERR, 8 + 3, 1)
> +FIELD(V7M_CFSR, STKERR, 8 + 4, 1)
> +FIELD(V7M_CFSR, LSPERR, 8 + 5, 1)
> +FIELD(V7M_CFSR, BFARVALID, 8 + 7, 1)
> +
> +/* V7M CFSR bits for UFSR */
> +FIELD(V7M_CFSR, UNDEFINSTR, 16 + 0, 1)
> +FIELD(V7M_CFSR, INVSTATE, 16 + 1, 1)
> +FIELD(V7M_CFSR, INVPC, 16 + 2, 1)
> +FIELD(V7M_CFSR, NOCP, 16 + 3, 1)
> +FIELD(V7M_CFSR, UNALIGNED, 16 + 8, 1)
> +FIELD(V7M_CFSR, DIVBYZERO, 16 + 9, 1)
> +
> +/* V7M HFSR bits */
> +FIELD(V7M_HFSR, VECTTBL, 1, 1)
> +FIELD(V7M_HFSR, FORCED, 30, 1)
> +FIELD(V7M_HFSR, DEBUGEVT, 31, 1)
> +
> +/* V7M DFSR bits */
> +FIELD(V7M_DFSR, HALTED, 0, 1)
> +FIELD(V7M_DFSR, BKPT, 1, 1)
> +FIELD(V7M_DFSR, DWTTRAP, 2, 1)
> +FIELD(V7M_DFSR, VCATCH, 3, 1)
> +FIELD(V7M_DFSR, EXTERNAL, 4, 1)
> +
>  /* If adding a feature bit which corresponds to a Linux ELF
>   * HWCAP bit, remember to update the feature-bit-to-hwcap
>   * mapping in linux-user/elfload.c:get_elf_hwcap().
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 6395d5a..c804f59 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -188,6 +188,13 @@ static void arm_cpu_reset(CPUState *s)
>  uint8_t *rom;
>
>  env->daif &= ~PSTATE_I;
> +
> +/* The reset value of this bit is IMPDEF, but ARM recommends
> + * that it resets to 1, so QEMU always does that rather than making
> + * it dependent on CPU model.
> + */
> +env->v7m.ccr = R_V7M_CCR_STKALIGN_MASK;
> +
>  rom = rom_ptr(0);
>  if (rom) {
>  /* Address zero is covered by ROM which hasn't yet been
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index 8ed24bf..49e09a8 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c
> @@ -96,13 +96,19 @@ static bool m_needed(void *opaque)
>
>  static const VMStateDescription vmstate_m = {
>  .name = "cpu/m",
> -.version_id = 2,
> -.minimum_version_id = 2,
> +.version_id = 3,
> +.minimum_version_id = 3,
>  .needed = m_needed,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(env.v7m.vecbase, ARMCPU),
>  VMSTATE_UINT32(env.v7m.basepri, ARMCPU),
>  VMSTATE_UINT32(env.v7m.control, ARMCPU),
> +VMSTATE_UINT32(env.v7m.ccr, ARMCPU),
> +VMSTATE_UINT32(env.v7m.cfsr, ARMCPU),
> +VMSTATE_UINT32(env.v7m.hfsr, ARMCPU),
> +VMSTATE_UINT32(env.v7m.dfsr, ARMCPU),
> +VMSTATE_UINT32(env.v7m.mmfar, ARMCPU),
> +VMSTATE_UINT32(env.v7m.bfar, ARMCPU),
>  VMSTATE_INT32(env.v7m.exception, ARMCPU),
>  VMSTATE_END_OF_LIST()
>  }

Otherwise:

Reviewed-by: Alex Bennée 

--
Alex Bennée

Re: [Qemu-devel] [PATCH 01/10] target/arm: Drop IS_M() macro

2017-01-27 Thread Alex Bennée


Peter Maydell  writes:

> We only use the IS_M() macro in two places, and it's a bit of a
> namespace grab to put in cpu.h.  Drop it in favour of just explicitly
> calling arm_feature() in the places where it was used.
>
> Signed-off-by: Peter Maydell 

Reviewed-by: Alex Bennée 

> ---
>  target/arm/cpu.h| 6 --
>  target/arm/cpu.c| 2 +-
>  target/arm/helper.c | 2 +-
>  3 files changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 521c11b..b2cc329 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1762,12 +1762,6 @@ bool write_list_to_cpustate(ARMCPU *cpu);
>   */
>  bool write_cpustate_to_list(ARMCPU *cpu);
>
> -/* Does the core conform to the "MicroController" profile. e.g. Cortex-M3.
> -   Note the M in older cores (eg. ARM7TDMI) stands for Multiply. These are
> -   conventional cores (ie. Application or Realtime profile).  */
> -
> -#define IS_M(env) arm_feature(env, ARM_FEATURE_M)
> -
>  #define ARM_CPUID_TI915T  0x54029152
>  #define ARM_CPUID_TI925T  0x54029252
>
> diff --git a/target/arm/cpu.c b/target/arm/cpu.c
> index 9075989..6395d5a 100644
> --- a/target/arm/cpu.c
> +++ b/target/arm/cpu.c
> @@ -182,7 +182,7 @@ static void arm_cpu_reset(CPUState *s)
>  /* On ARMv7-M the CPSR_I is the value of the PRIMASK register, and is
>   * clear at reset. Initial SP and PC are loaded from ROM.
>   */
> -if (IS_M(env)) {
> +if (arm_feature(env, ARM_FEATURE_M)) {
>  uint32_t initial_msp; /* Loaded from 0x0 */
>  uint32_t initial_pc; /* Loaded from 0x4 */
>  uint8_t *rom;
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index cfbc622..ce7e43b 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -6695,7 +6695,7 @@ void arm_cpu_do_interrupt(CPUState *cs)
>  CPUARMState *env = &cpu->env;
>  unsigned int new_el = env->exception.target_el;
>
> -assert(!IS_M(env));
> +assert(!arm_feature(env, ARM_FEATURE_M));
>
>  arm_log_exception(cs->exception_index);
>  qemu_log_mask(CPU_LOG_INT, "...from EL%d to EL%d\n", arm_current_el(env),


--
Alex Bennée

Re: [Qemu-devel] [PATCH 02/10] armv7m_nvic: keep a pointer to the CPU

2017-01-27 Thread Alex Bennée


Peter Maydell  writes:

> From: Michael Davidsaver 
>
> Many NVIC operations access the CPU state, so store a pointer in
> struct nvic_state rather than fetching it via qemu_get_cpu() every
> time we need it.
>
> As with the arm_gicv3_common code, we currently just call
> qemu_get_cpu() in the NVIC's realize method, but in future we might
> want to use a QOM property to pass the CPU to the NVIC.
>
> This imposes an ordering requirement that the CPU is
> realized before the NVIC, but that is always true since
> both are dealt with in armv7m_init().
>
> Signed-off-by: Michael Davidsaver 
> [PMM: Use qemu_get_cpu(0) rather than first_cpu; expand
>  commit message]
> Reviewed-by: Peter Maydell 
> Signed-off-by: Peter Maydell 
> ---
>  hw/intc/armv7m_nvic.c | 11 +--
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/hw/intc/armv7m_nvic.c b/hw/intc/armv7m_nvic.c
> index 06d8db6..81dcb83 100644
> --- a/hw/intc/armv7m_nvic.c
> +++ b/hw/intc/armv7m_nvic.c
> @@ -23,6 +23,7 @@
>
>  typedef struct {
>  GICState gic;
> +ARMCPU *cpu;
>  struct {
>  uint32_t control;
>  uint32_t reload;
> @@ -155,7 +156,7 @@ void armv7m_nvic_complete_irq(void *opaque, int irq)
>
>  static uint32_t nvic_readl(nvic_state *s, uint32_t offset)
>  {
> -ARMCPU *cpu;
> +ARMCPU *cpu = s->cpu;
>  uint32_t val;
>  int irq;
>
> @@ -187,11 +188,9 @@ static uint32_t nvic_readl(nvic_state *s, uint32_t 
> offset)
>  case 0x1c: /* SysTick Calibration Value.  */
>  return 1;
>  case 0xd00: /* CPUID Base.  */
> -cpu = ARM_CPU(qemu_get_cpu(0));
>  return cpu->midr;
>  case 0xd04: /* Interrupt Control State.  */
>  /* VECTACTIVE */
> -cpu = ARM_CPU(qemu_get_cpu(0));
>  val = cpu->env.v7m.exception;
>  if (val == 1023) {
>  val = 0;
> @@ -222,7 +221,6 @@ static uint32_t nvic_readl(nvic_state *s, uint32_t offset)
>  val |= (1 << 31);
>  return val;
>  case 0xd08: /* Vector Table Offset.  */
> -cpu = ARM_CPU(qemu_get_cpu(0));
>  return cpu->env.v7m.vecbase;
>  case 0xd0c: /* Application Interrupt/Reset Control.  */
>  return 0xfa05;
> @@ -296,7 +294,7 @@ static uint32_t nvic_readl(nvic_state *s, uint32_t offset)
>
>  static void nvic_writel(nvic_state *s, uint32_t offset, uint32_t value)
>  {
> -ARMCPU *cpu;
> +ARMCPU *cpu = s->cpu;
>  uint32_t oldval;
>  switch (offset) {
>  case 0x10: /* SysTick Control and Status.  */
> @@ -349,7 +347,6 @@ static void nvic_writel(nvic_state *s, uint32_t offset, 
> uint32_t value)
>  }
>  break;
>  case 0xd08: /* Vector Table Offset.  */
> -cpu = ARM_CPU(qemu_get_cpu(0));
>  cpu->env.v7m.vecbase = value & 0xff80;

Given it is only used once here you could just indirect it:

   s->cpu->env.v7m.vecbase = value & 0xff80;

But I assume the compiler would DTRT if the load wasn't needed.

>  break;
>  case 0xd0c: /* Application Interrupt/Reset Control.  */
> @@ -495,6 +492,8 @@ static void armv7m_nvic_realize(DeviceState *dev, Error 
> **errp)
>  NVICClass *nc = NVIC_GET_CLASS(s);
>  Error *local_err = NULL;
>
> +s->cpu = ARM_CPU(qemu_get_cpu(0));
> +assert(s->cpu);
>  /* The NVIC always has only one CPU */
>  s->gic.num_cpu = 1;
>  /* Tell the common code we're an NVIC */

Anyway:

Reviewed-by: Alex Bennée 

--
Alex Bennée

Re: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.

2017-01-27 Thread Juan Quintela

Jitendra Kolhe  wrote:
> Using "-mem-prealloc" option for a very large guest leads to huge guest
> start-up and migration time. This is because with "-mem-prealloc" option
> qemu tries to map every guest page (create address translations), and
> make sure the pages are available during runtime. virsh/libvirt by
> default, seems to use "-mem-prealloc" option in case the guest is
> configured to use huge pages. The patch tries to map all guest pages
> simultaneously by spawning multiple threads. Given the problem is more
> prominent for large guests, the patch limits the changes to the guests
> of at-least 64GB of memory size. Currently limiting the change to QEMU
> library functions on POSIX compliant host only, as we are not sure if
> the problem exists on win32. Below are some stats with "-mem-prealloc"
> option for guest configured to use huge pages.
>
> 
> Idle Guest  | Start-up time | Migration time
> 
> Guest stats with 2M HugePage usage - single threaded (existing code)
> 
> 64 Core - 4TB   | 54m11.796s| 75m43.843s
^^

> 64 Core - 1TB   | 8m56.576s | 14m29.049s
> 64 Core - 256GB | 2m11.245s | 3m26.598s
> 
> Guest stats with 2M HugePage usage - map guest pages using 8 threads
> 
> 64 Core - 4TB   | 5m1.027s  | 34m10.565s
> 64 Core - 1TB   | 1m10.366s | 8m28.188s
> 64 Core - 256GB | 0m19.040s | 2m10.148s
> ---
> Guest stats with 2M HugePage usage - map guest pages using 16 threads
> ---
> 64 Core - 4TB   | 1m58.970s | 31m43.400s
^

Impressive, not everyday one get an speedup of 20 O:-)


> +static void *do_touch_pages(void *arg)
> +{
> +PageRange *range = (PageRange *)arg;
> +char *start_addr = range->addr;
> +uint64_t numpages = range->numpages;
> +uint64_t hpagesize = range->hpagesize;
> +uint64_t i = 0;
> +
> +for (i = 0; i < numpages; i++) {
> +memset(start_addr + (hpagesize * i), 0, 1);

I would use the range->addr and similar here directly, but it is just a
question of taste.

> -/* MAP_POPULATE silently ignores failures */
> -for (i = 0; i < numpages; i++) {
> -memset(area + (hpagesize * i), 0, 1);
> +/* touch pages simultaneously for memory >= 64G */
> +if (memory < (1ULL << 36)) {

64GB guest already took quite a bit of time, I think I would put it
always as min(num_vcpus, 16).  So, we always execute the multiple theard
codepath?

But very nice, thanks.

Later, Juan.

Re: [Qemu-devel] [PATCH] target/sparc: Restore ldstub of odd asis

2017-01-27 Thread Artyom Tarasenko

On Fri, Jan 27, 2017 at 9:15 AM, Richard Henderson  wrote:
> Fixes the booting of ss20 roms.

Mike, can you please test this fix?

> Reported-by: Mark Cave-Ayland 

Initially Reported-by: Michael Russo 

> Signed-off-by: Richard Henderson 
> ---
>  target/sparc/translate.c | 27 +--
>  1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/target/sparc/translate.c b/target/sparc/translate.c
> index 655060c..aa6734d 100644
> --- a/target/sparc/translate.c
> +++ b/target/sparc/translate.c
> @@ -2448,8 +2448,31 @@ static void gen_ldstub_asi(DisasContext *dc, TCGv dst, 
> TCGv addr, int insn)
>  gen_ldstub(dc, dst, addr, da.mem_idx);
>  break;
>  default:
> -/* ??? Should be DAE_invalid_asi.  */
> -gen_exception(dc, TT_DATA_ACCESS);
> +/* ??? In theory, this should be raise DAE_invalid_asi.
> +   But the SS-20 roms do ldstuba [%l0] #ASI_M_CTL, %o1.  */
> +if (parallel_cpus) {
> +gen_helper_exit_atomic(cpu_env);
> +} else {
> +TCGv_i32 r_asi = tcg_const_i32(da.asi);
> +TCGv_i32 r_mop = tcg_const_i32(MO_UB);
> +TCGv_i64 s64, t64;
> +
> +save_state(dc);
> +t64 = tcg_temp_new_i64();
> +gen_helper_ld_asi(t64, cpu_env, addr, r_asi, r_mop);
> +
> +s64 = tcg_const_i64(0xff);
> +gen_helper_st_asi(cpu_env, addr, s64, r_asi, r_mop);
> +tcg_temp_free_i64(s64);
> +tcg_temp_free_i32(r_mop);
> +tcg_temp_free_i32(r_asi);
> +
> +tcg_gen_trunc_i64_tl(dst, t64);
> +tcg_temp_free_i64(t64);
> +
> +/* End the TB.  */
> +dc->npc = DYNAMIC_PC;
> +}
>  break;
>  }
>  }
> --
> 2.9.3
>



-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu

Re: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.

2017-01-27 Thread Dr. David Alan Gilbert

* Jitendra Kolhe (jitendra.ko...@hpe.com) wrote:
> Using "-mem-prealloc" option for a very large guest leads to huge guest
> start-up and migration time. This is because with "-mem-prealloc" option
> qemu tries to map every guest page (create address translations), and
> make sure the pages are available during runtime. virsh/libvirt by
> default, seems to use "-mem-prealloc" option in case the guest is
> configured to use huge pages. The patch tries to map all guest pages
> simultaneously by spawning multiple threads. Given the problem is more
> prominent for large guests, the patch limits the changes to the guests
> of at-least 64GB of memory size. Currently limiting the change to QEMU
> library functions on POSIX compliant host only, as we are not sure if
> the problem exists on win32. Below are some stats with "-mem-prealloc"
> option for guest configured to use huge pages.
> 
> 
> Idle Guest  | Start-up time | Migration time
> 
> Guest stats with 2M HugePage usage - single threaded (existing code)
> 
> 64 Core - 4TB   | 54m11.796s| 75m43.843s
> 64 Core - 1TB   | 8m56.576s | 14m29.049s
> 64 Core - 256GB | 2m11.245s | 3m26.598s
> 
> Guest stats with 2M HugePage usage - map guest pages using 8 threads
> 
> 64 Core - 4TB   | 5m1.027s  | 34m10.565s
> 64 Core - 1TB   | 1m10.366s | 8m28.188s
> 64 Core - 256GB | 0m19.040s | 2m10.148s
> ---
> Guest stats with 2M HugePage usage - map guest pages using 16 threads
> ---
> 64 Core - 4TB   | 1m58.970s | 31m43.400s
> 64 Core - 1TB   | 0m39.885s | 7m55.289s
> 64 Core - 256GB | 0m11.960s | 2m0.135s
> ---

That's a nice improvement.

> Signed-off-by: Jitendra Kolhe 
> ---
>  util/oslib-posix.c | 64 
> +++---
>  1 file changed, 61 insertions(+), 3 deletions(-)
> 
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index f631464..a8bd7c2 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -55,6 +55,13 @@
>  #include "qemu/error-report.h"
>  #endif
>  
> +#define PAGE_TOUCH_THREAD_COUNT 8

It seems a shame to fix that number as a constant.

> +typedef struct {
> +char *addr;
> +uint64_t numpages;
> +uint64_t hpagesize;
> +} PageRange;
> +
>  int qemu_get_thread_id(void)
>  {
>  #if defined(__linux__)
> @@ -323,6 +330,52 @@ static void sigbus_handler(int signal)
>  siglongjmp(sigjump, 1);
>  }
>  
> +static void *do_touch_pages(void *arg)
> +{
> +PageRange *range = (PageRange *)arg;
> +char *start_addr = range->addr;
> +uint64_t numpages = range->numpages;
> +uint64_t hpagesize = range->hpagesize;
> +uint64_t i = 0;
> +
> +for (i = 0; i < numpages; i++) {
> +memset(start_addr + (hpagesize * i), 0, 1);
> +}
> +qemu_thread_exit(NULL);
> +
> +return NULL;
> +}
> +
> +static int touch_all_pages(char *area, size_t hpagesize, size_t numpages)
> +{
> +QemuThread page_threads[PAGE_TOUCH_THREAD_COUNT];
> +PageRange page_range[PAGE_TOUCH_THREAD_COUNT];
> +uint64_tnumpage_per_thread, size_per_thread;
> +int i = 0, tcount = 0;
> +
> +numpage_per_thread = (numpages / PAGE_TOUCH_THREAD_COUNT);
> +size_per_thread = (hpagesize * numpage_per_thread);
> +for (i = 0; i < (PAGE_TOUCH_THREAD_COUNT - 1); i++) {
> +page_range[i].addr = area;
> +page_range[i].numpages = numpage_per_thread;
> +page_range[i].hpagesize = hpagesize;
> +
> +qemu_thread_create(page_threads + i, "touch_pages",
> +   do_touch_pages, (page_range + i),
> +   QEMU_THREAD_JOINABLE);
> +tcount++;
> +area += size_per_thread;
> +numpages -= numpage_per_thread;
> +}
> +for (i = 0; i < numpages; i++) {
> +memset(area + (hpagesize * i), 0, 1);
> +}
> +for (i = 0; i < tcount; i++) {
> +qemu_thread_join(page_threads + i);
> +}
> +return 0;
> +}
> +
>  void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>  {
>  int ret;
> @@ -353,9 +406,14 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
> Error **errp)
>  size_t hpagesize = qemu_fd_getpagesize(fd);
>  size_t numpages = DIV_ROUND_UP(memory, hpagesize);
>  
> -/* MAP_POPULATE silently ignores failures */
> -for (i = 0; i < numpages; i++) {
> -memset(area + (hpagesize * i), 0, 1);
> +/* touch pages simult

Re: [Qemu-devel] [PATCH v8 16/25] cputlb and arm/sparc targets: convert mmuidx flushes from varg to bitmap

2017-01-27 Thread Artyom Tarasenko

On Fri, Jan 27, 2017 at 11:34 AM, Alex Bennée  wrote:
> While the vargs approach was flexible the original MTTCG ended up
> having munge the bits to a bitmap so the data could be used in
> deferred work helpers. Instead of hiding that in cputlb we push the
> change to the API to make it take a bitmap of MMU indexes instead.
>
> This change is fairly mechanical but as storing the actual index is
> useful for cases like the current running context. As a result the
> constants are renamed to ARMMMUBit_foo and a couple of helper
> functions added to convert between a single bit and a scalar index.
>
> Signed-off-by: Alex Bennée 

The sparc part:
Reviewed-by: Artyom Tarasenko 

---
>  cputlb.c   |  60 +---
>  include/exec/exec-all.h|  13 +--
>  target/arm/cpu.h   |  41 +---
>  target/arm/helper.c| 227 
> ++---
>  target/arm/translate-a64.c |  14 +--
>  target/arm/translate.c |  24 +++--
>  target/arm/translate.h |   4 +-
>  target/sparc/ldst_helper.c |   8 +-
>  8 files changed, 194 insertions(+), 197 deletions(-)
>
> diff --git a/cputlb.c b/cputlb.c
> index 5dfd3c3ba9..97e5c12de8 100644
> --- a/cputlb.c
> +++ b/cputlb.c
> @@ -122,26 +122,25 @@ void tlb_flush(CPUState *cpu)
>  }
>  }
>
> -static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, va_list argp)
> +static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
>  {
>  CPUArchState *env = cpu->env_ptr;
> +unsigned long mmu_idx_bitmask = idxmap;
> +int mmu_idx;
>
>  assert_cpu_is_self(cpu);
>  tlb_debug("start\n");
>
>  tb_lock();
>
> -for (;;) {
> -int mmu_idx = va_arg(argp, int);
> -
> -if (mmu_idx < 0) {
> -break;
> -}
> +for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
>
> -tlb_debug("%d\n", mmu_idx);
> +if (test_bit(mmu_idx, &mmu_idx_bitmask)) {
> +tlb_debug("%d\n", mmu_idx);
>
> -memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
> -memset(env->tlb_v_table[mmu_idx], -1, sizeof(env->tlb_v_table[0]));
> +memset(env->tlb_table[mmu_idx], -1, sizeof(env->tlb_table[0]));
> +memset(env->tlb_v_table[mmu_idx], -1, 
> sizeof(env->tlb_v_table[0]));
> +}
>  }
>
>  memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
> @@ -149,12 +148,9 @@ static inline void v_tlb_flush_by_mmuidx(CPUState *cpu, 
> va_list argp)
>  tb_unlock();
>  }
>
> -void tlb_flush_by_mmuidx(CPUState *cpu, ...)
> +void tlb_flush_by_mmuidx(CPUState *cpu, uint16_t idxmap)
>  {
> -va_list argp;
> -va_start(argp, cpu);
> -v_tlb_flush_by_mmuidx(cpu, argp);
> -va_end(argp);
> +v_tlb_flush_by_mmuidx(cpu, idxmap);
>  }
>
>  static inline void tlb_flush_entry(CPUTLBEntry *tlb_entry, target_ulong addr)
> @@ -219,13 +215,11 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
>  }
>  }
>
> -void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, ...)
> +void tlb_flush_page_by_mmuidx(CPUState *cpu, target_ulong addr, uint16_t 
> idxmap)
>  {
>  CPUArchState *env = cpu->env_ptr;
> -int i, k;
> -va_list argp;
> -
> -va_start(argp, addr);
> +unsigned long mmu_idx_bitmap = idxmap;
> +int i, page, mmu_idx;
>
>  assert_cpu_is_self(cpu);
>  tlb_debug("addr "TARGET_FMT_lx"\n", addr);
> @@ -236,31 +230,23 @@ void tlb_flush_page_by_mmuidx(CPUState *cpu, 
> target_ulong addr, ...)
>TARGET_FMT_lx "/" TARGET_FMT_lx ")\n",
>env->tlb_flush_addr, env->tlb_flush_mask);
>
> -v_tlb_flush_by_mmuidx(cpu, argp);
> -va_end(argp);
> +v_tlb_flush_by_mmuidx(cpu, idxmap);
>  return;
>  }
>
>  addr &= TARGET_PAGE_MASK;
> -i = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> -
> -for (;;) {
> -int mmu_idx = va_arg(argp, int);
> +page = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
>
> -if (mmu_idx < 0) {
> -break;
> -}
> -
> -tlb_debug("idx %d\n", mmu_idx);
> -
> -tlb_flush_entry(&env->tlb_table[mmu_idx][i], addr);
> +for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
> +if (test_bit(mmu_idx, &mmu_idx_bitmap)) {
> +tlb_flush_entry(&env->tlb_table[mmu_idx][page], addr);
>
> -/* check whether there are vltb entries that need to be flushed */
> -for (k = 0; k < CPU_VTLB_SIZE; k++) {
> -tlb_flush_entry(&env->tlb_v_table[mmu_idx][k], addr);
> +/* check whether there are vltb entries that need to be flushed 
> */
> +for (i = 0; i < CPU_VTLB_SIZE; i++) {
> +tlb_flush_entry(&env->tlb_v_table[mmu_idx][i], addr);
> +}
>  }
>  }
> -va_end(argp);
>
>  tb_flush_jmp_cache(cpu, addr);
>  }
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e43cb68355..a6c17ed74a 100644
> --- a/include/exec/exec-all.h

Re: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.

2017-01-27 Thread Paolo Bonzini



On 27/01/2017 13:53, Juan Quintela wrote:
>> +static void *do_touch_pages(void *arg)
>> +{
>> +PageRange *range = (PageRange *)arg;
>> +char *start_addr = range->addr;
>> +uint64_t numpages = range->numpages;
>> +uint64_t hpagesize = range->hpagesize;
>> +uint64_t i = 0;
>> +
>> +for (i = 0; i < numpages; i++) {
>> +memset(start_addr + (hpagesize * i), 0, 1);
> 
> I would use the range->addr and similar here directly, but it is just a
> question of taste.
> 
>> -/* MAP_POPULATE silently ignores failures */
>> -for (i = 0; i < numpages; i++) {
>> -memset(area + (hpagesize * i), 0, 1);
>> +/* touch pages simultaneously for memory >= 64G */
>> +if (memory < (1ULL << 36)) {
> 
> 64GB guest already took quite a bit of time, I think I would put it
> always as min(num_vcpus, 16).  So, we always execute the multiple theard
> codepath?

I too would like some kind of heuristic to choose the number of threads.
 Juan's suggested usage of the VCPUs (smp_cpus) is a good one.

Paolo

Re: [Qemu-devel] [PATCH 03/10] armv7m: add state for v7M CCR, CFSR, HFSR, DFSR, MMFAR, BFAR

2017-01-27 Thread Peter Maydell

On 27 January 2017 at 12:28, Alex Bennée  wrote:
>
> Peter Maydell  writes:
>
>> Add the structure fields, VMState fields, reset code and macros for
>> the v7M system control registers CCR, CFSR, HFSR, DFSR, MMFAR and
>> BFAR.
>>
>> Signed-off-by: Peter Maydell 
>> ---
>>  target/arm/cpu.h | 54 
>> 
>>  target/arm/cpu.c |  7 +++
>>  target/arm/machine.c | 10 --
>>  3 files changed, 69 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index b2cc329..4b062d2 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -21,6 +21,7 @@
>>  #define ARM_CPU_H
>>
>>  #include "kvm-consts.h"
>> +#include "hw/registerfields.h"
>>
>>  #if defined(TARGET_AARCH64)
>>/* AArch64 definitions */
>> @@ -405,6 +406,12 @@ typedef struct CPUARMState {
>>  uint32_t vecbase;
>>  uint32_t basepri;
>>  uint32_t control;
>> +uint32_t ccr; /* Configuration and Control */
>> +uint32_t cfsr; /* Configurable Fault Status */
>> +uint32_t hfsr; /* HardFault Status */
>> +uint32_t dfsr; /* Debug Fault Status Register */
>> +uint32_t mmfar; /* MemManage Fault Address */
>> +uint32_t bfar; /* BusFault Address */
>
> Given the CPUARMState needs to be accessed via env do we need to start
> getting concerned about its size?

We only care that accesses to the front of it are within easy
reach (specifically, accesses to fields via frequently used
TCG globals); it doesn't matter if more things are added at
the end.

thanks
-- PMM

Re: [Qemu-devel] [PATCH 02/10] armv7m_nvic: keep a pointer to the CPU

2017-01-27 Thread Peter Maydell

On 27 January 2017 at 12:41, Alex Bennée  wrote:
>> @@ -349,7 +347,6 @@ static void nvic_writel(nvic_state *s, uint32_t offset, 
>> uint32_t value)
>>  }
>>  break;
>>  case 0xd08: /* Vector Table Offset.  */
>> -cpu = ARM_CPU(qemu_get_cpu(0));
>>  cpu->env.v7m.vecbase = value & 0xff80;
>
> Given it is only used once here you could just indirect it:
>
>s->cpu->env.v7m.vecbase = value & 0xff80;

Two reasons not to do that:
(1) it makes this patch easier to review if all it's
doing is deleting lines that set cpu
(2) future patches improving the NVIC support are going
to add more cases that want to use the cpu pointer

thanks
-- PMM

1 2 3 >

1 - 100 of 273 matches

Mail list logo