date:20171110

Re: [Qemu-devel] [PATCH] thread-posix: fix qemu_rec_mutex_trylock macro

2017-11-10 Thread Paolo Bonzini

On 10/11/2017 01:30, Emilio G. Cota wrote:
> We never noticed because it has no users.
> 
> Signed-off-by: Emilio G. Cota 
> ---
>  include/qemu/thread-posix.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/qemu/thread-posix.h b/include/qemu/thread-posix.h
> index f4296d3..f3f47e4 100644
> --- a/include/qemu/thread-posix.h
> +++ b/include/qemu/thread-posix.h
> @@ -7,7 +7,7 @@
>  typedef QemuMutex QemuRecMutex;
>  #define qemu_rec_mutex_destroy qemu_mutex_destroy
>  #define qemu_rec_mutex_lock qemu_mutex_lock
> -#define qemu_rec_mutex_try_lock qemu_mutex_try_lock
> +#define qemu_rec_mutex_trylock qemu_mutex_trylock
>  #define qemu_rec_mutex_unlock qemu_mutex_unlock
>  
>  struct QemuMutex {
> 

Queued, thanks.

Paolo

Re: [Qemu-devel] [RFC PATCH 19/26] cpu-exec: reset exit flag before calling cpu_exec_nocache

2017-11-10 Thread Pavel Dovgalyuk

> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> On 03/11/2017 09:27, Pavel Dovgalyuk wrote:
> >> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >> On 02/11/2017 12:33, Paolo Bonzini wrote:
> >>> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
> > I am not sure about this.  I think if instead you should return false
> > from here and EXCP_INTERRUPT from cpu_exec.
>  The problem is inside the TB. It checks cpu->icount_decr.u16.high which 
>  is -1.
>  And we have to enter the TB to cause an exception (because it exists in 
>  replay log).
>  That is why we reset this flag and try to execute the TB.
> >>>
> >>> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
> >>> "Finally, check if we need to exit to the main loop" in
> >>> cpu_handle_interrupt)?  Then only cause the exception when that one is
> >>> processed.
> >>
> >> ... indeed, you probably need something like:
> >>
> >> /* Clear the interrupt flag now since we're processing
> >>  * cpu->interrupt_request and cpu->exit_request.
> >>  */
> >> insns_left = atomic_read(&cpu->icount_decr.u32);
> >> atomic_set(&cpu->icount_decr.u16.high, 0);
> >> if (unlikely(insns_left < 0) {
> >> /* Ensure the zeroing of icount_decr comes before the next read
> >>  * of cpu->exit_request or cpu->interrupt_request.
> >>  */
> >> smb_mb();
> >> }
> >>
> >> at the top of cpu_handle_interrupt.  Then you can remove the same
> >> atomic_set+smp_mb in cpu_loop_exec_tb, like
> >>
> >> *last_tb = NULL;
> >> insns_left = atomic_read(&cpu->icount_decr.u32);
> >> if (insns_left < 0) {
> >> /* Something asked us to stop executing chained TBs; just
> >>  * continue round the main loop. Whatever requested the exit
> >>  * will also have set something else (eg exit_request or
> >>  * interrupt_request) which will be handled by
> >>  * cpu_handle_interrupt.  cpu_handle_interrupt will also
> >>  * clear cpu->icount_decr.u16.high.
> >>  */
> >> return;
> >> }
> >
> > I tried this approach and it didn't work.
> > I think iothread sets u16.high flag after resetting it in 
> > cpu_handle_interrupt.
> 
> But why is this a problem?  The TB would exit immediately and go again
> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
> cpu_handle_exception causes the exception via cpu_exec_nocache.

I've tested your variant more thoroughly.
It seems, that iothread calls cpu_exec between 
atomic_set(&cpu->icount_decr.u16.high, 0); 
in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
I see no other reason, because this happens not for the every time.
And cpu_handle_interrupt is not called again, because cpu_handle_exception 
returns true.
Therefore we have an infinite loop, because no other code here resets 
cpu->icount_decr.u16.high.

Pavel Dovgalyuk

Re: [Qemu-devel] [PATCH] target-i386: adds PV_TLB_FLUSH CPUID feature bit

2017-11-10 Thread Paolo Bonzini

On 10/11/2017 08:54, Wanpeng Li wrote:
> 2017-11-10 15:45 GMT+08:00 Wanpeng Li :
>> From: Wanpeng Li 
>>
>> Adds PV_TLB_FLUSH CPUID feature bit.
>>
>> Cc: Paolo Bonzini 
>> Cc: Radim Krčmář 
>> Cc: Richard Henderson 
>> Cc: Eduardo Habkost 
>> Signed-off-by: Wanpeng Li 
>> ---
>>  target/i386/cpu.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 6f21a5e..ecebc5a 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -347,7 +347,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] 
>> = {
>>  .feat_names = {
>>  "kvmclock", "kvm-nopiodelay", "kvm-mmu", "kvmclock",
>>  "kvm-asyncpf", "kvm-steal-time", "kvm-pv-eoi", "kvm-pv-unhalt",
>> -NULL, NULL, NULL, NULL,
>> +NULL, "kvm-pv-tlb-flush", NULL, NULL,
> 
> Note: bit 8 is reserved for the PV_DEDICATED which is posted by
> another guy in kvm community.
> 
>>  NULL, NULL, NULL, NULL,
>>  NULL, NULL, NULL, NULL,
>>  NULL, NULL, NULL, NULL,
>> --
>> 2.7.4
>>

I'd like PV_DEDICATED to be bit 0 in another feature word (it's a
performance hint, not a feature), but it's okay to use bit 9.

Paolo

Re: [Qemu-devel] [RFC PATCH 19/26] cpu-exec: reset exit flag before calling cpu_exec_nocache

2017-11-10 Thread Paolo Bonzini

On 10/11/2017 09:20, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>> On 03/11/2017 09:27, Pavel Dovgalyuk wrote:
 From: Paolo Bonzini [mailto:pbonz...@redhat.com]
 On 02/11/2017 12:33, Paolo Bonzini wrote:
> On 02/11/2017 12:24, Pavel Dovgalyuk wrote:
>>> I am not sure about this.  I think if instead you should return false
>>> from here and EXCP_INTERRUPT from cpu_exec.
>> The problem is inside the TB. It checks cpu->icount_decr.u16.high which 
>> is -1.
>> And we have to enter the TB to cause an exception (because it exists in 
>> replay log).
>> That is why we reset this flag and try to execute the TB.
>
> But if u16.high is -1, shouldn't you return EXCP_INTERRUPT first (via
> "Finally, check if we need to exit to the main loop" in
> cpu_handle_interrupt)?  Then only cause the exception when that one is
> processed.

 ... indeed, you probably need something like:

 /* Clear the interrupt flag now since we're processing
  * cpu->interrupt_request and cpu->exit_request.
  */
 insns_left = atomic_read(&cpu->icount_decr.u32);
 atomic_set(&cpu->icount_decr.u16.high, 0);
 if (unlikely(insns_left < 0) {
 /* Ensure the zeroing of icount_decr comes before the next read
  * of cpu->exit_request or cpu->interrupt_request.
  */
 smb_mb();
 }

 at the top of cpu_handle_interrupt.  Then you can remove the same
 atomic_set+smp_mb in cpu_loop_exec_tb, like

 *last_tb = NULL;
 insns_left = atomic_read(&cpu->icount_decr.u32);
 if (insns_left < 0) {
 /* Something asked us to stop executing chained TBs; just
  * continue round the main loop. Whatever requested the exit
  * will also have set something else (eg exit_request or
  * interrupt_request) which will be handled by
  * cpu_handle_interrupt.  cpu_handle_interrupt will also
  * clear cpu->icount_decr.u16.high.
  */
 return;
 }
>>>
>>> I tried this approach and it didn't work.
>>> I think iothread sets u16.high flag after resetting it in 
>>> cpu_handle_interrupt.
>>
>> But why is this a problem?  The TB would exit immediately and go again
>> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
>> cpu_handle_exception causes the exception via cpu_exec_nocache.
> 
> I've tested your variant more thoroughly.
> It seems, that iothread calls cpu_exec between 
> atomic_set(&cpu->icount_decr.u16.high, 0); 
> in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
> I see no other reason, because this happens not for the every time.
> And cpu_handle_interrupt is not called again, because cpu_handle_exception 
> returns true.
> Therefore we have an infinite loop, because no other code here resets 
> cpu->icount_decr.u16.high.

Then returning true unconditionally is wrong in the cpu_exec_nocache
case.  What if you do:

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 61297f8f4a..fb5446be3e 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -470,7 +470,19 @@ static inline void cpu_handle_debug_exception(CPUState 
*cpu)
 
 static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
 {
-if (cpu->exception_index >= 0) {
+if (cpu->exception_index < 0) {
+#ifndef CONFIG_USER_ONLY
+if (replay_has_exception()
+&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
+/* try to cause an exception pending in the log */
+cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), 
true);
+}
+#endif
+if (cpu->exception_index < 0) {
+return;
+}
+}
+
 if (cpu->exception_index >= EXCP_INTERRUPT) {
 /* exit request from the cpu execution loop */
 *ret = cpu->exception_index;
@@ -505,16 +517,6 @@ static inline bool cpu_handle_exception(CPUState *cpu, int 
*ret)
 }
 #endif
 }
-#ifndef CONFIG_USER_ONLY
-} else if (replay_has_exception()
-   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
-/* try to cause an exception pending in the log */
-cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
-*ret = -1;
-return true;
-#endif
-}
-
 return false;
 }

Re: [Qemu-devel] [PATCH] build: Don't force preserving permissions on config-devices.mak.old

2017-11-10 Thread Markus Armbruster

Peter Maydell  writes:

> On 20 October 2017 at 20:08, Stefan Weil  wrote:
>> Am 20.10.2017 um 20:24 schrieb alind...@codeaurora.org:
>>> On 2017-10-20 05:27, Peter Maydell wrote:
 Do we even need this code at all? As far as I can tell from
 the git logs, the idea is to support users who hand-modify
 config-devices.mak. But do we want to support that? I would
 think of config-devices.mak as an internal part of the build
 machinery, and the bit you can edit as a user is the stuff
 in default-configs/.
>>
>> It's a long time since I wrote that code, but when I look at
>> the commit message for my commit 012f0879234, it was written
>> for users who do _not_ hand-modify config-devices.mak. They
>> had a problem when they updated the code from git and the
>> new version had changed some of the device configurations
>> which were used to build config-devices.mak.
>
> Right, but it's only this complicated set of conditions
> because we seem to be also trying to support the hand-modify
> case. Otherwise we could just generate the new version
> and copy it into place if it's changed, unconditionally...

So, do we need this patch?  If yes, who's going to merge it?  If no, do
we need some other patch?

Re: [Qemu-devel] [Qemu devel PATCH] MAINTAINERS: Add entries for Smartfusion2

2017-11-10 Thread sundeep subbaraya

Hi Guys,

On Fri, Nov 10, 2017 at 5:52 AM, Philippe Mathieu-Daudé 
wrote:

> On 11/09/2017 08:55 PM, Peter Maydell wrote:
> > On 9 November 2017 at 21:46, Philippe Mathieu-Daudé 
> wrote:
> >> Hi Subbaraya,
> >>
> >> On 11/09/2017 09:02 AM, Subbaraya Sundeep wrote:
> >>> add voluntarily myself as maintainer for Smartfusion2
> >>
> >> You need to share your GnuPG key signed, I couldn't find it using
> >> http://pgp.mit.edu/pks/lookup?search=Subbaraya+Sundeep
> >>
> >> from https://wiki.qemu.org/Contribute/SubmitAPullRequest :
> >
> > I don't in general expect to take pull requests from
> > everybody listed as a maintainer in the MAINTAINERS file.
> > That just means "I'm going to be reviewing and should
> > be cc'd on patches". Pull requests are sent by people
> > who are maintainers for a subsystem. Rule of thumb:
> > unless somebody asks you to send a pull request, you
> > don't need to do it.
>
> Ok, please apologize my misunderstanding. I still think the M: entry
> stand for 'Maintainer' instead of 'Mail', and still don't understand the
> difference with a "Designated reviewer" (R: entry):
>
> M: Mail patches to: FullName 
> R: Designated reviewer: FullName 
>These reviewers should be CCed on patches.
>
> "Designated reviewer" seems to duplicate the M: entry and is therefore
> confusing. Can we simply remove it instead?
>
> When introduced in fdf6fab4df4 the explanation was:
>
> --
> Some people are not content with the amount of mail they get, and would
> like to be CCed on patches for areas they do not maintain.  Let them
> satisfy their own appetite for qemu-devel messages.
>
> Seriously: the purpose here is a bit different from the Linux kernel.
> While Linux uses "R" to designate non-maintainers for reviewing patches
> in a given area, in QEMU I would also like to use "R" so that people can
> delegate sending pull requests while keeping some degree of oversight.
>

Do you want me to remove M: and put only R: ?

Thanks,
Sundeep

> --
>
> Regards,
>
> Phil.
>

[Qemu-devel] [PATCH] fix scripts/update-linux-headers.sh here document

2017-11-10 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 scripts/update-linux-headers.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 8b847e279b..e2b159aa3d 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -108,7 +108,7 @@ for arch in $ARCHLIST; do
 if [ $arch = x86 ]; then
 cat <<-EOF >"$output/include/standard-headers/asm-x86/hyperv.h"
 /* this is a temporary placeholder until kvm_para.h stops including it 
*/
-EOF
+EOF
 cp "$tmpdir/include/asm/unistd_32.h" "$output/linux-headers/asm-x86/"
 cp "$tmpdir/include/asm/unistd_x32.h" "$output/linux-headers/asm-x86/"
 cp "$tmpdir/include/asm/unistd_64.h" "$output/linux-headers/asm-x86/"
-- 
2.9.3

Re: [Qemu-devel] [PATCH v6 6/6] tests: Add check-qobject for equality tests

2017-11-10 Thread Markus Armbruster

Max Reitz  writes:

> Add a new test file (check-qobject.c) for unit tests that concern
> QObjects as a whole.
>
> Its only purpose for now is to test the qobject_is_equal() function.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Markus Armbruster

Re: [Qemu-devel] [PATCH v6 0/6] block: Don't compare strings in bdrv_reopen_prepare()

2017-11-10 Thread Markus Armbruster

Max Reitz  writes:

> bdrv_reopen_prepare() assumes that all BDS options are strings, which is
> not necessarily correct. This series introduces a new qobject_is_equal()
> function which can be used to test whether any options have changed,
> independently of their type.

Series looks ready to me.  It touches QAPI to achieve its purpose in the
block layer; I'd be fine with merging it via a block tree.

Re: [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-10 Thread Stefan Hajnoczi

On Thu, Nov 09, 2017 at 09:43:15PM +0100, Max Reitz wrote:
> Draining a BDS may lead to graph modifications, which in turn may result
> in it and other BDS being stripped of their current references.  If
> bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
> references themselves, the BDS they are trying to drain (or undrain) may
> disappear right under their feet -- or, more specifically, under the
> feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().
> 
> This fixes an occasional hang of iotest 194.
> 
> Signed-off-by: Max Reitz 
> ---
>  block/io.c | 47 ---
>  1 file changed, 44 insertions(+), 3 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 3d5ef2cabe..a0a2833e8e 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
>  bool waited = true;
>  BlockDriverState *bs;
>  BdrvNextIterator it;
> -GSList *aio_ctxs = NULL, *ctx;
> +GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
> +
> +/* Must be called from the main loop */
> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>  
>  block_job_pause_all();
>  
> @@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
>  if (!g_slist_find(aio_ctxs, aio_context)) {
>  aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
>  }
> +
> +/* Keep a strong reference to all root BDS and copy them into
> + * an own list because draining them may lead to graph
> + * modifications. */
> +bdrv_ref(bs);
> +bs_list = g_slist_prepend(bs_list, bs);
>  }
>  
>  /* Note that completion of an asynchronous I/O operation can trigger any
> @@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
>  AioContext *aio_context = ctx->data;
>  
>  aio_context_acquire(aio_context);
> -for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
> +for (bs_list_entry = bs_list; bs_list_entry;
> + bs_list_entry = bs_list_entry->next)
> +{
> +bs = bs_list_entry->data;
> +
>  if (aio_context == bdrv_get_aio_context(bs)) {
>  waited |= bdrv_drain_recurse(bs, true);
>  }
> @@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
>  }
>  }
>  
> +for (bs_list_entry = bs_list; bs_list_entry;
> + bs_list_entry = bs_list_entry->next)
> +{
> +bdrv_unref(bs_list_entry->data);
> +}
> +
>  g_slist_free(aio_ctxs);
> +g_slist_free(bs_list);
>  }

Which specific parts of this function access bs without a reference?

I see bdrv_next() may do QTAILQ_NEXT(bs, monitor_list) after
bdrv_drain_recurse() has returned.

Anything else?

If bdrv_next() is the only issue then I agree with Fam that it makes
sense to build the ref/unref into bdrv_next().


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH 2/7] s390x/pci: rework PCI STORE

2017-11-10 Thread Yi Min Zhao




在 2017/11/10 上午12:50, Cornelia Huck 写道:

On Tue,  7 Nov 2017 18:24:34 +0100
Pierre Morel  wrote:


Enhance the fault detection, correction of the fault reporting.

Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
  hw/s390x/s390-pci-inst.c | 41 -
  1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 8fcb02d..4a2f996 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -469,6 +469,12 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
  pcias = (env->regs[r2] >> 16) & 0xf;
  len = env->regs[r2] & 0xf;
  offset = env->regs[r2 + 1];
+data = env->regs[r1];
+
+if (!(fh & FH_MASK_ENABLE)) {

This covers the reserved/standby/disabled states, right?

yes

[...]

Re: [Qemu-devel] [PATCH V4] hw/pci-host: Fix x86 Host Bridges 64bit PCI hole

2017-11-10 Thread Laszlo Ersek

Hi Marcel,

On 11/09/17 18:27, Marcel Apfelbaum wrote:
> Currently there is no MMIO range over 4G
> reserved for PCI hotplug. Since the 32bit PCI hole
> depends on the number of cold-plugged PCI devices
> and other factors, it is very possible is too small
> to hotplug PCI devices with large BARs.
> 
> Fix it by reserving 2G for I4400FX chipset
> in order to comply with older Win32 Guest OSes
> and 32G for Q35 chipset.
> 
> Even if the new defaults of pci-hole64-size will appear in
> "info qtree" also for older machines, the property was
> not implemented so no changes will be visible to guests.
> 
> Note this is a regression since prev QEMU versions had
> some range reserved for 64bit PCI hotplug.
> 
> Reviewed-by: Laszlo Ersek 
> Reviewed-by: Gerd Hoffmann 
> Signed-off-by: Marcel Apfelbaum 
> ---
> 
> V3 -> V4:
>  - Addressed Laszlo's comments:
> - Added defines for pci-hole64 default size props.
> - Rounded the hole64_end to 1G
> - Moved some info to commit message

Looks good to me, but a new variable's name is a bit misleading:

>  - Addressed Michael's comments:
> - Added more comments.
>  - I kept Gerd's "review-by" tag since no functional changes were made.
> 
> V2 -> V3:
>  - Addressed Gerd's and others comments and re-enabled the pci-hole64-size
>property defaulting it to 2G for I440FX and 32g for Q35.
>  - Even if the new defaults of pci-hole64-size will appear in "info qtree"
>also for older machines, the property was not implemented so
>no changes will be visible to guests.
> 
> V1 -> V2:
>  Addressed Igor's comments:
> - aligned the hole64 start to 1Gb
>  (I think all the computations took care of it already,
>   but it can't hurt)
> - Init compat props to "off" instead of "false"
> 
>  hw/i386/pc.c  | 22 ++
>  hw/pci-host/piix.c| 32 ++--
>  hw/pci-host/q35.c | 35 ---
>  include/hw/i386/pc.h  | 10 +-
>  include/hw/pci-host/q35.h |  1 +
>  5 files changed, 94 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index e11a65b545..fafe5ba5cd 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1448,6 +1448,28 @@ void pc_memory_init(PCMachineState *pcms,
>  pcms->ioapic_as = &address_space_memory;
>  }
>  
> +/*
> + * The 64bit pci hole starts after "above 4G RAM" and
> + * potentially the space reserved for memory hotplug.
> + */
> +uint64_t pc_pci_hole64_start(void)
> +{
> +PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> +uint64_t hole64_start = 0;
> +
> +if (pcmc->has_reserved_memory && pcms->hotplug_memory.base) {
> +hole64_start = pcms->hotplug_memory.base;
> +if (!pcmc->broken_reserved_end) {
> +hole64_start += memory_region_size(&pcms->hotplug_memory.mr);
> +}
> +} else {
> +hole64_start = 0x1ULL + pcms->above_4g_mem_size;
> +}
> +
> +return ROUND_UP(hole64_start, 1ULL << 30);
> +}
> +
>  qemu_irq pc_allocate_cpu_irq(void)
>  {
>  return qemu_allocate_irq(pic_irq_request, NULL, 0);
> diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
> index a7e2256870..f689c31d12 100644
> --- a/hw/pci-host/piix.c
> +++ b/hw/pci-host/piix.c
> @@ -50,6 +50,7 @@ typedef struct I440FXState {
>  PCIHostState parent_obj;
>  Range pci_hole;
>  uint64_t pci_hole64_size;
> +bool pci_hole64_fix;
>  uint32_t short_root_bus;
>  } I440FXState;
>  
> @@ -112,6 +113,9 @@ struct PCII440FXState {
>  #define I440FX_PAM_SIZE 7
>  #define I440FX_SMRAM0x72
>  
> +/* Keep it 2G to comply with older win32 guests */
> +#define I440FX_PCI_HOST_HOLE64_SIZE_DEFAULT (1ULL << 31)
> +
>  /* Older coreboot versions (4.0 and older) read a config register that 
> doesn't
>   * exist in real hardware, to get the RAM size from QEMU.
>   */
> @@ -238,29 +242,52 @@ static void i440fx_pcihost_get_pci_hole_end(Object 
> *obj, Visitor *v,
>  visit_type_uint32(v, name, &value, errp);
>  }
>  
> +/*
> + * The 64bit PCI hole start is set by the Guest firmware
> + * as the address of the first 64bit PCI MEM resource.
> + * If no PCI device has resources on the 64bit area,
> + * the 64bit PCI hole will start after "over 4G RAM" and the
> + * reserved space for memory hotplug if any.
> + */
>  static void i440fx_pcihost_get_pci_hole64_start(Object *obj, Visitor *v,
>  const char *name,
>  void *opaque, Error **errp)
>  {
>  PCIHostState *h = PCI_HOST_BRIDGE(obj);
> +I440FXState *s = I440FX_PCI_HOST_BRIDGE(obj);
>  Range w64;
>  uint64_t value;
>  
>  pci_bus_get_w64_range(h->bus, &w64);
>  value = range_is_empty(&w64) ? 0 : range_lob(&w64);
> +if (!value && s->pci_hole64_fix) {
> +value = pc_pci_hole64_start();
> +}
>  visit_type_uint64(v, name, &

Re: [Qemu-devel] [PATCH 6/7] s390x/pci: move the memory region write from pcistg

2017-11-10 Thread Yi Min Zhao




在 2017/11/10 上午3:23, Cornelia Huck 写道:

On Tue,  7 Nov 2017 18:24:38 +0100
Pierre Morel  wrote:


Let's move the memory region write from pcistg into a dedicated
function.
This allows us to prepare a later patch searching for subregions
inside of the memory region.

OK, so here is the memory region write. Do we have any sleeping
endianness bugs in there for when we wire up tcg? I'm not sure how this
plays with the bswaps (see patch 1).

But maybe I've just gotten lost somewhere.

I think there's no error. For PCI bars' MRs, we got the little-endian data
that is exactly fit to the byte ordering of pcilg instruction. For PCI 
config

space, the data has been swapped according to the cpu byte ordering.
So we use zpci_swap_endian() to swap the data back to the little-endian
ordering.



Signed-off-by: Pierre Morel 
Reviewed-by: Yi Min Zhao 
---
  hw/s390x/s390-pci-inst.c | 27 +--
  1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 50135a0..97f62b5 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -455,12 +455,27 @@ static int trap_msix(S390PCIBusDevice *pbdev, uint64_t 
offset, uint8_t pcias)
  }
  }
  
+static MemTxResult zpci_write_bar(S390PCIBusDevice *pbdev, uint8_t pcias,

+  uint64_t offset, uint64_t data, uint8_t len)
+{
+MemoryRegion *mr;
+
+if (trap_msix(pbdev, offset, pcias)) {
+offset = offset - pbdev->msix.table_offset;
+mr = &pbdev->pdev->msix_table_mmio;
+} else {
+mr = pbdev->pdev->io_regions[pcias].memory;
+}
+
+return memory_region_dispatch_write(mr, offset, data, len,
+MEMTXATTRS_UNSPECIFIED);
+}
+
  int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2)
  {
  CPUS390XState *env = &cpu->env;
  uint64_t offset, data;
  S390PCIBusDevice *pbdev;
-MemoryRegion *mr;
  MemTxResult result;
  uint8_t len;
  uint32_t fh;
@@ -517,15 +532,7 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t 
r2)
  return 0;
  }
  
-if (trap_msix(pbdev, offset, pcias)) {

-offset = offset - pbdev->msix.table_offset;
-mr = &pbdev->pdev->msix_table_mmio;
-} else {
-mr = pbdev->pdev->io_regions[pcias].memory;
-}
-
-result = memory_region_dispatch_write(mr, offset, data, len,
- MEMTXATTRS_UNSPECIFIED);
+result = zpci_write_bar(pbdev, pcias, offset, data, len);
  if (result != MEMTX_OK) {
  program_interrupt(env, PGM_OPERAND, 4);
  return 0;

Re: [Qemu-devel] [PATCH 6/7] s390x/pci: move the memory region write from pcistg

2017-11-10 Thread Cornelia Huck

On Fri, 10 Nov 2017 17:40:12 +0800
Yi Min Zhao  wrote:

> 在 2017/11/10 上午3:23, Cornelia Huck 写道:
> > On Tue,  7 Nov 2017 18:24:38 +0100
> > Pierre Morel  wrote:
> >  
> >> Let's move the memory region write from pcistg into a dedicated
> >> function.
> >> This allows us to prepare a later patch searching for subregions
> >> inside of the memory region.  
> > OK, so here is the memory region write. Do we have any sleeping
> > endianness bugs in there for when we wire up tcg? I'm not sure how this
> > plays with the bswaps (see patch 1).
> >
> > But maybe I've just gotten lost somewhere.  
> I think there's no error. For PCI bars' MRs, we got the little-endian data
> that is exactly fit to the byte ordering of pcilg instruction. For PCI 
> config
> space, the data has been swapped according to the cpu byte ordering.

Host or target cpu?

> So we use zpci_swap_endian() to swap the data back to the little-endian
> ordering.

That swap is unconditional. If we were running on a little-endian host,
it would be wrong, wouldn't it?

> >  
> >> Signed-off-by: Pierre Morel 
> >> Reviewed-by: Yi Min Zhao 
> >> ---
> >>   hw/s390x/s390-pci-inst.c | 27 +--
> >>   1 file changed, 17 insertions(+), 10 deletions(-)

Re: [Qemu-devel] [Qemu-block] [PATCH] block: all I/O should be completed before removing throttle timers.

2017-11-10 Thread Alberto Garcia

On Sat 21 Oct 2017 07:34:00 AM CEST, Zhengui Li wrote:
> From: Zhengui 
>
> In blk_remove_bs, all I/O should be completed before removing throttle
> timers. If there has inflight I/O, removing throttle timers here will
> cause the inflight I/O never return.
> This patch add bdrv_drained_begin before throttle_timers_detach_aio_context
> to let all I/O completed before removing throttle timers.
>
> Signed-off-by: Zhengui 
> ---
>  block/block-backend.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 45d9101..9edc452 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -660,7 +660,11 @@ void blk_remove_bs(BlockBackend *blk)
>  notifier_list_notify(&blk->remove_bs_notifiers, blk);
>  if (blk->public.throttle_group_member.throttle_state) {
>  tt = &blk->public.throttle_group_member.throttle_timers;
> +BlockDriverState *bs;
> +bs = blk_bs(blk);
> +bdrv_drained_begin(bs);
>  throttle_timers_detach_aio_context(tt);
> +bdrv_drained_end(bs);
>  }
>  
>  blk_update_root_state(blk);

Reviewed-by: Alberto Garcia 

Berto

Re: [Qemu-devel] [PATCH] hw/vfio: improve error message when cannot init vfio event notifiers

2017-11-10 Thread Jim Quigley




On 16/10/2017 19:07, Michael Tokarev wrote:

10.10.2017 13:22, Jim Quigley wrote:

More information is required to assist trouble-shooting when
QEMU fails to initialise the event notifications for devices
assigned with VFIO-PCI. Instead of supplying the user with a cryptic
error number only, print out a proper error message with strerror()
so that the user has a better way to figure out what the problem is.

Reviewed-by: Liam Merwick 
Signed-off-by: Jim Quigley 
---
Cc: qemu-triv...@nongnu.org
Cc: m...@tls.msk.ru
Cc: laur...@vivier.eu
Cc: alex.william...@redhat.com
---
  hw/vfio/pci.c | 35 ---
  1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 31e1edf..3bffb93 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -430,13 +430,16 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
  static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
int vector_n, bool msix)
  {
-int virq;
+int virq, ret;
  
  if ((msix && vdev->no_kvm_msix) || (!msix && vdev->no_kvm_msi)) {

  return;
  }
  
-if (event_notifier_init(&vector->kvm_interrupt, 0)) {

+ret = event_notifier_init(&vector->kvm_interrupt, 0);
+if (ret) {
+error_report("vfio (%s): Error: unable to init event notifier: %s 
(%d)",
+ __func__, strerror(-ret), -ret);

Since this pattern gets repeated again and again, maybe we can either
use a common wrapper or move that eror reporting into event_notifier_init()?
Note there are other users of this function, besides hw/vfio, and maybe
these, too, can benefit from better error reporting?


    Ideally the strerror() would be included in the error_report() 
function,
    (as per the error_setg() function), which obviously would involve a 
more
    extensive change to the code base. Would that be an acceptable 
solution ?
    Or I can move the reporting into theevent_notifier_init() function 
if that is

    the preferred approach ?

        thanks

        regards

        Jim Q.


Thanks,

/mjt

Re: [Qemu-devel] [Qemu-block] [PATCH v2 4/5] iotests: Make 083 less flaky

2017-11-10 Thread Alberto Garcia

On Thu 09 Nov 2017 09:30:24 PM CET, Max Reitz wrote:
> +echo > "$TEST_DIR/nbd-fault-injector.out"
>   $PYTHON nbd-fault-injector.py $extra_args "$nbd_addr" 
> "$TEST_DIR/nbd-fault-injector.conf" >"$TEST_DIR/nbd-fault-injector.out" 2>&1 &

It seems that in this patch you're indenting with spaces but this file
uses tabs.

Berto

Re: [Qemu-devel] [RFC v3 00/27] QMP: out-of-band (OOB) execution support

2017-11-10 Thread no-reply

Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [RFC v3 00/27] QMP: out-of-band (OOB) execution support
Type: series
Message-id: 20171106094643.14881-1-pet...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20171106094643.14881-1-pet...@redhat.com -> 
patchew/20171106094643.14881-1-pet...@redhat.com
Switched to a new branch 'test'
5525c4e791 tests: qmp-test: add oob test
ccc9e4c399 tests: qmp-test: verify command batching
7f45b4c6c0 docs: update QMP documents for OOB commands
58cfe877d2 monitor: enable IO thread for (qmp & !mux) typed
5e1d56ce74 qmp: isolate responses into io thread
aef4275536 qmp: let migrate-incoming allow out-of-band
5e68beacf3 qmp: support out-of-band (oob) execution
43c7215a30 qapi: introduce new cmd option "allow-oob"
e11127ba4b monitor: send event when request queue full
45cef4f7e4 qmp: add new event "request-dropped"
485da28be1 monitor: separate QMP parser and dispatcher
4892fe9ca2 monitor: let monitor_{suspend|resume} thread safe
1b86166d9c monitor: introduce monitor_qmp_respond()
0f48093add qmp: introduce some capability helpers
8d3f33043d qmp: negociate QMP capabilities
023b386d0e qmp: introduce QMPCapability
2bde5ca8ce monitor: allow to use IO thread for parsing
f4cc112f80 monitor: create monitor dedicate iothread
3590fdc1d4 monitor: let mon_list be tail queue
11c818a9ac monitor: unify global init
36d3efb87d monitor: move the cur_mon hack deeper for QMP
bf3e493a86 qjson: add "opaque" field to JSONMessageParser
17367fe7a1 monitor: move skip_flush into monitor_data_init
0c98d4baa4 qobject: let object_property_get_str() use new API
aa4b973dd5 qobject: introduce qobject_get_try_str()
981ccebc1e qobject: introduce qstring_get_try_str()
d40ba38085 char-io: fix possible race on IOWatchPoll

=== OUTPUT BEGIN ===
Checking PATCH 1/27: char-io: fix possible race on IOWatchPoll...
Checking PATCH 2/27: qobject: introduce qstring_get_try_str()...
Checking PATCH 3/27: qobject: introduce qobject_get_try_str()...
Checking PATCH 4/27: qobject: let object_property_get_str() use new API...
Checking PATCH 5/27: monitor: move skip_flush into monitor_data_init...
Checking PATCH 6/27: qjson: add "opaque" field to JSONMessageParser...
Checking PATCH 7/27: monitor: move the cur_mon hack deeper for QMP...
Checking PATCH 8/27: monitor: unify global init...
Checking PATCH 9/27: monitor: let mon_list be tail queue...
Checking PATCH 10/27: monitor: create monitor dedicate iothread...
Checking PATCH 11/27: monitor: allow to use IO thread for parsing...
Checking PATCH 12/27: qmp: introduce QMPCapability...
Checking PATCH 13/27: qmp: negociate QMP capabilities...
Checking PATCH 14/27: qmp: introduce some capability helpers...
Checking PATCH 15/27: monitor: introduce monitor_qmp_respond()...
Checking PATCH 16/27: monitor: let monitor_{suspend|resume} thread safe...
ERROR: braces {} are necessary for all arms of this statement
#28: FILE: monitor.c:4014:
+if (atomic_dec_fetch(&mon->suspend_cnt) == 0)
[...]

ERROR: Missing Signed-off-by: line(s)

total: 2 errors, 0 warnings, 16 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 17/27: monitor: separate QMP parser and dispatcher...
Checking PATCH 18/27: qmp: add new event "request-dropped"...
Checking PATCH 19/27: monitor: send event when request queue full...
Checking PATCH 20/27: qapi: introduce new cmd option "allow-oob"...
Checking PATCH 21/27: qmp: support out-of-band (oob) execution...
Checking PATCH 22/27: qmp: let migrate-incoming allow out-of-band...
Checking PATCH 23/27: qmp: isolate responses into io thread...
Checking PATCH 24/27: monitor: enable IO thread for (qmp & !mux) typed...
Checking PATCH 25/27: docs: update QMP documents for OOB commands...
Checking PATCH 26/27: tests: qmp-test: verify command batching...
Checking PATCH 27/27: tests: qmp-test: add oob test...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field

2017-11-10 Thread Cornelia Huck

On Thu, 9 Nov 2017 18:02:35 -0200
Eduardo Habkost  wrote:

> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> > On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
> > > On Mon, 6 Nov 2017 16:02:16 -0200
> > > Eduardo Habkost  wrote:
> > >   
> > > > On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
> > > > > On Thu, 19 Oct 2017 17:31:51 +1100
> > > > > David Gibson  wrote:
> > > > > 
> > > > > > On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
> > > > > > > For enabling early cpu to numa node configuration at runtime
> > > > > > > qmp_query_hotpluggable_cpus() should provide a list of available
> > > > > > > cpu slots at early stage, before machine_init() is called and
> > > > > > > the 1st cpu is created, so that mgmt might be able to call it
> > > > > > > and use output to set numa mapping.
> > > > > > > Use MachineClass::possible_cpu_arch_ids() callback to set
> > > > > > > cpu type info, along with the rest of possible cpu properties,
> > > > > > > to let machine define which cpu type* will be used.
> > > > > > > 
> > > > > > > * for SPAPR it will be a spapr core type and for ARM/s390x/x86
> > > > > > >   a respective descendant of CPUClass.
> > > > > > > 
> > > > > > > Move parse_numa_opts() in vl.c after cpu_model is parsed into
> > > > > > > cpu_type so that possible_cpu_arch_ids() would know which
> > > > > > > cpu_type to use during layout initialization.
> > > > > > > 
> > > > > > > Signed-off-by: Igor Mammedov   
> > > > > > 
> > > > > > Reviewed-by: David Gibson 
> > > > > > 
> > > > > > > ---
> > > > > > >   v2:
> > > > > > >  - fix NULL dereference caused by not initialized
> > > > > > >MachineState::cpu_type at the time parse_numa_opts()
> > > > > > >were called
> > > > > > > ---
> > > > > > >  include/hw/boards.h|  2 ++
> > > > > > >  hw/arm/virt.c  |  3 ++-
> > > > > > >  hw/core/machine.c  | 12 ++--
> > > > > > >  hw/i386/pc.c   |  4 +++-
> > > > > > >  hw/ppc/spapr.c | 13 -
> > > > > > >  hw/s390x/s390-virtio-ccw.c |  1 +
> > > > > > >  vl.c   |  3 +--
> > > > > > >  7 files changed, 23 insertions(+), 15 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/include/hw/boards.h b/include/hw/boards.h
> > > > > > > index 191a5b3..fa21758 100644
> > > > > > > --- a/include/hw/boards.h
> > > > > > > +++ b/include/hw/boards.h
> > > > > > > @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState 
> > > > > > > *machine,
> > > > > > >   * CPUArchId:
> > > > > > >   * @arch_id - architecture-dependent CPU ID of present or 
> > > > > > > possible CPU  
> > > > > > 
> > > > > > I know this isn't really in scope for this patch, but is @arch_id 
> > > > > > here
> > > > > > supposed to have meaning defined by the target, or by the machine?
> > > > > > 
> > > > > > If it's the machime, it could do with a rename - "arch" means target
> > > > > > to most people (thanks to Linux).
> > > > > > 
> > > > > > If it's the target, it's kind of bogus, because it doesn't 
> > > > > > necessarily
> > > > > > have a clear meaning per target - get_arch_id in CPUClass has the 
> > > > > > same
> > > > > > problem, which is probably one reason it's basically only used by 
> > > > > > the
> > > > > > x86 code at present.
> > > > > > 
> > > > > > e.g. for target/ppc, what do we use?  There's the PIR, which is in 
> > > > > > the
> > > > > > CPU.. but only on some cpu models, not all.  There will generally be
> > > > > > some kind of master PIC id, but there are different PIC models on
> > > > > > different boards.  What goes in the devicetree?  Well only some
> > > > > > machines use devicetree, and they might define the cpu reg 
> > > > > > differently.
> > > > > > 
> > > > > > Board designs will generally try to make some if not all of those
> > > > > > possible values equal for simplicity, but there's still no real way 
> > > > > > of
> > > > > > defining a sensible arch_id independent of machine / board.
> > > > > I'd say arch_id is machine specific so far, it was introduced when we
> > > > > didn't have CpuInstanceProperties and at that time we considered only
> > > > > vcpus (threads) and doesn't really apply to spapr cores.
> > > > > 
> > > > > In general we could do away with arch_id and use CpuInstanceProperties
> > > > > instead, but arch_id also serves aux purpose, it allows machine to
> > > > > pre-calculate(cache) apic-id/mpidr values in one place and then they
> > > > > are/(could be) used by arch in-depended code to build acpi tables.
> > > > > So if we drop arch_id we would need to introduce a machine hook,
> > > > > which would translate CpuInstanceProperties into current arch_id.
> > > > 
> > > > I think we need to do a better to job documenting where exactly
> > > > we expect arch_id to be used and how, so people know what it's
> > > > supposed to return.
> > > > 
> > > > If the only place where it'

Re: [Qemu-devel] Yet another git submodule rant

2017-11-10 Thread Alexey Kardashevskiy

On 09/11/17 00:01, Daniel P. Berrange wrote:
> On Wed, Nov 08, 2017 at 09:26:01AM -0300, Philippe Mathieu-Daudé wrote:
>> On 11/08/2017 06:57 AM, Thomas Huth wrote:
>>>
>>> That automatic git submodule stuff now broke my workflow again. I
>>> usually keep the git repository on my laptop and then simply rsync the
>>> sources (without .git directories) to my target machine to compile it
>>> there. Used to work great for years. Now it's broken, the build process
>>> complains:
>>>
>>> GIT submodule checkout is out of date. Please run
>>>   scripts/git-submodule.sh update
>>> from the source directory checkout /home/thuth/devel/qemu
>>>
>>> Running "scripts/git-submodule.sh update" did not fix the issue at all -
>>> I first had to tinker with it for a while to find out that I simply have
>>> to delete ".git-submodule-status" in my git tree to fix the issue.
>>>
>>> I've got the feeling that all this submodule crap is constantly causing
>>> pain ... do we really need this? Can't we find another solution instead?
>>> Or at least stop modifying files automatically in the $SRC_PATH ?
>>
>> Also yesterday on IRC:
>>
>>  [...] I downloaded the qemu source from git and tried to compile
>> it. I am getting this:
>>
>> ./configure --static && make && sudo make install
>>  CC  ui/input-keymap.o
>> ui/input-keymap.c:8:10: fatal error: ui/input-keymap-linux-to-qcode.c:
>> No such file or directory
> 
> I had a pull request merged yesterday later afternoon which possibly
> would address that problem, though hard hard to say for certain.

wow, already? :(

I still wonder why do not we checkout submodules into the build directory
and why .git-submodule-status is not there too...



-- 
Alexey

Re: [Qemu-devel] Yet another git submodule rant

2017-11-10 Thread Daniel P. Berrange

On Fri, Nov 10, 2017 at 09:35:54PM +1100, Alexey Kardashevskiy wrote:
> On 09/11/17 00:01, Daniel P. Berrange wrote:
> > On Wed, Nov 08, 2017 at 09:26:01AM -0300, Philippe Mathieu-Daudé wrote:
> >> On 11/08/2017 06:57 AM, Thomas Huth wrote:
> >>>
> >>> That automatic git submodule stuff now broke my workflow again. I
> >>> usually keep the git repository on my laptop and then simply rsync the
> >>> sources (without .git directories) to my target machine to compile it
> >>> there. Used to work great for years. Now it's broken, the build process
> >>> complains:
> >>>
> >>> GIT submodule checkout is out of date. Please run
> >>>   scripts/git-submodule.sh update
> >>> from the source directory checkout /home/thuth/devel/qemu
> >>>
> >>> Running "scripts/git-submodule.sh update" did not fix the issue at all -
> >>> I first had to tinker with it for a while to find out that I simply have
> >>> to delete ".git-submodule-status" in my git tree to fix the issue.
> >>>
> >>> I've got the feeling that all this submodule crap is constantly causing
> >>> pain ... do we really need this? Can't we find another solution instead?
> >>> Or at least stop modifying files automatically in the $SRC_PATH ?
> >>
> >> Also yesterday on IRC:
> >>
> >>  [...] I downloaded the qemu source from git and tried to compile
> >> it. I am getting this:
> >>
> >> ./configure --static && make && sudo make install
> >>  CC  ui/input-keymap.o
> >> ui/input-keymap.c:8:10: fatal error: ui/input-keymap-linux-to-qcode.c:
> >> No such file or directory
> > 
> > I had a pull request merged yesterday later afternoon which possibly
> > would address that problem, though hard hard to say for certain.
> 
> wow, already? :(
> 
> I still wonder why do not we checkout submodules into the build directory
> and why .git-submodule-status is not there too...

That simply isn't the way submodules work, they are inherently part of
the source tree, and the status file reflects that too.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RfC PATCH 5/6] vfio/display: adding region support

2017-11-10 Thread Gerd Hoffmann

  Hi,

> The overhead of a VFIORegion seems to be that we setup a MemoryRegion
> for r/w access to the vfio region and overlap that with one or more
> MemoryRegions for the mmap(s).  That's a bit of structural overhead,
> but we'd simply never map those into a guest visible address space.
> OTOH, it saves you from dealing with the region info, and potentially
> sparse mmap (you could call vfio_region_setup() and check nr_mmaps = 1,
> mmaps[0].offset/size, then call vfio_region_mmap() if it checks out,
> otherwise error).  Just seems like duplicate code here even if the
> VFIORegion includes some things we don't need.  Thanks,

Update pushed to https://www.kraxel.org/cgit/qemu/log/?h=work/vgpu-vfio

Addressed review comments from alex.  Updated vfio header to v17 patch
series.  Rebased to -rc0.  Incremental fixes not squashed (yet).

This series waits for:
  (1) vfio api update (linux kernel) landing upstream.
  (2) testing feedback from nvidia.
  (3) qemu 2.11 freeze being over.

If all goes well this should be able to land in the 2.12 devel cycle.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH V5] hw/pcie-pci-bridge: restrict to X86 and ARM

2017-11-10 Thread Cornelia Huck

On Thu,  9 Nov 2017 17:46:45 +0200
Marcel Apfelbaum  wrote:

> The PCIE-PCI bridge is specific to "pure" PCIe systems
> (on QEMU we have X86 and ARM), it does not make sense to
> have it in other archs.
> 
> Reported-by: Thomas Huth 
> Signed-off-by: Marcel Apfelbaum 
> ---
> 
> V4 -> V5
>   - Since all other tries failed, conditioned the
> device on the PCIe Root Port.
> 
> V3 -> V4:
>  - Move the config line to pci.mak  (Thomas)
> 
> V2 -> V3:
>  - Another tweak in subject s/if/it (Cornelia) 
> 
> V1 -> V2:
>  Addressed Thomas and Cornelia comments:
>  - Conditioned the pcie-pci-bridge compilation on
>the PCIe Root CONFIG_PCIE_PORT
>  - Tweaked subject PCI -> PCIe
> 
>  Thanks,
>  Marcel
> 
> 
>  hw/pci-bridge/Makefile.objs | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
> index 666db37da2..1b05023662 100644
> --- a/hw/pci-bridge/Makefile.objs
> +++ b/hw/pci-bridge/Makefile.objs
> @@ -1,5 +1,5 @@
> -common-obj-y += pci_bridge_dev.o pcie_pci_bridge.o
> -common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o gen_pcie_root_port.o
> +common-obj-y += pci_bridge_dev.o
> +common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o gen_pcie_root_port.o 
> pcie_pci_bridge.o
>  common-obj-$(CONFIG_PXB) += pci_expander_bridge.o
>  common-obj-$(CONFIG_XIO3130) += xio3130_upstream.o xio3130_downstream.o
>  common-obj-$(CONFIG_IOH3420) += ioh3420.o

Reviewed-by: Cornelia Huck

Re: [Qemu-devel] QMP event missed during startup

2017-11-10 Thread Ross Lagerwall


On 11/09/2017 02:14 PM, Markus Armbruster wrote:

"Dr. David Alan Gilbert"  writes:


* Ross Lagerwall (ross.lagerw...@citrix.com) wrote:

Hi,

I have found an issue where QEMU emits the RESUME event during startup when
it starts VM execution, but it is not possible to receive this event.

To repro this, run:
qemu-system-i386 -m 256 -trace
enable=monitor_protocol_event_emit,file=/tmp/out -qmp
unix:/tmp/qmp,server,wait

QEMU will not start execution of the VM until something connects to the QMP
socket (e.g. qmp-shell). Once connected, no event is received on the QMP
connection but the tracepoint is hit indicating that an event has been
emitted. I suspect that the event is emitted while the QMP client is doing
the initial negotiation.

The reason I want to receive this event is that QEMU currently uses xenstore
to communicate this information to the Xen toolstack (see
xen-common.c:xen_change_state_handler) but we want to move to using QMP
rather than xenstore for this kind of thing.

Is this a known issue or just a bug that should be fixed?


I'll leave it to Markus to say if it's a bug or not, but can't
you work around this by starting qemu with -S which leaves the guest
paused, and then continuing the guest when you have your QMP ?


You can:

<-- {"QMP": {"version": {"qemu": {"micro": 50, "minor": 10, "major": 2}, "package": " 
(v2.10.0-613-g10656079e1-dirty)"}, "capabilities": []}}
--> { "execute": "qmp_capabilities" }
<-- {"return": {}}
--> { "execute": "cont" }
<-- {"timestamp": {"seconds": 1510235984, "microseconds": 108550}, "event": 
"RESUME"}
<-- {"return": {}}

RESUME is sent in vm_prepare_start(), called from main() via vm_start(),
but only if @autostart, i.e. no -S.

The "wait" in the argument of -qmp makes QEMU wait for a QMP client to
connect to the QMP socket, long before vm_start() gets called.  However,
having connected is not sufficient for receiving events, you also have
to exit capabilities negotiation mode.  Not possible until QEMU is
running the main loop, which runs after the vm_start() quoted above.

If QMP monitors became usable before entering main_loop(), we'd have a
race condition instead.  The only reliable way to get the RESUME event
is -S.

This adds one minor item to the long list of reasons why management
software should pass -S.

All clear now?



Yeah that makes sense thanks. I've now tested with -S and it works fine.

Cheers,
--
Ross Lagerwall

[Qemu-devel] [PATCH v12 01/12] ACPI: add related GHES structures and macros definition

2017-11-10 Thread Dongjiu Geng

Add Generic Error Status Block structures and some macros
definition, which refer to the ACPI 4.0 and ACPI 6.1. The
HEST table generation and CPER record will use them.

Signed-off-by: Dongjiu Geng 
---

It has been suggested to get rid of most structures that introduced in patch, 
the
second patch mainly use build_append_int_noprefix() API to compose whole error 
status
block and APEI table

https://lkml.org/lkml/2017/8/29/187

---
 include/hw/acpi/acpi-defs.h | 49 +
 1 file changed, 49 insertions(+)

diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 72be675..f955f1b 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -298,6 +298,25 @@ typedef struct AcpiMultipleApicTable AcpiMultipleApicTable;
 #define ACPI_APIC_RESERVED  16   /* 16 and greater are reserved */
 
 /*
+ * ACPI 4.0 spec, "17.3.2.7 Hardware Error Notification"
+ */
+enum AcpiHestNotifyType {
+ACPI_HEST_NOTIFY_POLLED = 0,
+ACPI_HEST_NOTIFY_EXTERNAL = 1,
+ACPI_HEST_NOTIFY_LOCAL = 2,
+ACPI_HEST_NOTIFY_SCI = 3,
+ACPI_HEST_NOTIFY_NMI = 4,
+ACPI_HEST_NOTIFY_CMCI = 5,  /* ACPI 5.0 */
+ACPI_HEST_NOTIFY_MCE = 6,   /* ACPI 5.0 */
+ACPI_HEST_NOTIFY_GPIO = 7,  /* ACPI 6.0 */
+ACPI_HEST_NOTIFY_SEA = 8,   /* ACPI 6.1 */
+ACPI_HEST_NOTIFY_SEI = 9,   /* ACPI 6.1 */
+ACPI_HEST_NOTIFY_GSIV = 10, /* ACPI 6.1 */
+ACPI_HEST_NOTIFY_SDEI = 11, /* ACPI 6.2 */
+ACPI_HEST_NOTIFY_RESERVED = 12 /* 12 and greater are reserved */
+};
+
+/*
  * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE)
  */
 #define ACPI_SUB_HEADER_DEF   /* Common ACPI sub-structure header */\
@@ -474,6 +493,36 @@ struct AcpiSystemResourceAffinityTable {
 } QEMU_PACKED;
 typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable;
 
+/*
+ * ACPI 4.0, "17.3.2.6.1 Generic Error Data"
+ */
+#define ACPI_GEBS_UNCORRECTABLE  (1)
+/*
+ * ACPI 6.1, "18.3.2.8 Generic Hardware Error
+ * Source version 2"
+ */
+#define ACPI_HEST_SOURCE_GENERIC_ERROR_V2(10)
+/*
+ * Table 17-12 Generic Error Status Block, ACPI 4.0,
+ * "17.3.2.6.1 Generic Error Data"
+ */
+struct AcpiGenericErrorStatus {
+/* It is a bitmask composed of ACPI_GEBS_xxx macros */
+uint32_t block_status;
+uint32_t raw_data_offset;
+uint32_t raw_data_length;
+uint32_t data_length;
+uint32_t error_severity;
+} QEMU_PACKED;
+typedef struct AcpiGenericErrorStatus AcpiGenericErrorStatus;
+
+enum AcpiGenericErrorSeverity {
+ACPI_CPER_SEV_RECOVERABLE,
+ACPI_CPER_SEV_FATAL,
+ACPI_CPER_SEV_CORRECTED,
+ACPI_CPER_SEV_NONE,
+};
+
 #define ACPI_SRAT_PROCESSOR_APIC 0
 #define ACPI_SRAT_MEMORY 1
 #define ACPI_SRAT_PROCESSOR_x2APIC   2
-- 
1.8.3.1

[Qemu-devel] [PATCH v12 08/12] target-arm: kvm64: inject synchronous External Abort

2017-11-10 Thread Dongjiu Geng

Add synchronous external abort injection logic, setup
spsr_elx, esr_elx, PSTATE, elr_elx etc, when switch to
guest, guest will jump to the synchronous external abort
vector table entry.

The ESR_ELx.DFSC is set to Synchronous external abort(0x10),
and ESR_ELx.FnV is set to not valid(0x1), which will tell
guest that FAR is not valid and holds an UNKNOWN value.

These value will be set to KVM related structure through
KVM_SET_ONE_REG IOCTL.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 

---
People is against that KVM inject the SEA, so userspace how to
inject it:

https://lkml.org/lkml/2017/3/2/110

Below is the log that Qemu injects SEA and guest happen synchronous external 
abort

Taking exception 4 [Data Abort]
...from EL0 to EL1
...with ESR 0x24/0x92000410
...with FAR 0x0
...with ELR 0x40cf04
...to EL1 PC 0xffc84c00 PSTATE 0x3c5
after kvm_inject_arm_sea
Unhandled fault: synchronous external abort (0x92000410) at 0x007fa234c12c
CPU: 0 PID: 536 Comm: devmem Not tainted 4.1.0+ #20
Hardware name: linux,dummy-virt (DT)
task: ffc019ab2b00 ti: ffc008134000 task.ti: ffc008134000
PC is at 0x40cf04
LR is at 0x40cdec
pc : [<0040cf04>] lr : [<0040cdec>] pstate: 6000
sp : 007ff7b24130
x29: 007ff7b24260 x28: 
x27: 00ad x26: 0049c000
x25: 0048904b x24: 0049c000
x23: 4060 x22: 007ff7b243a0
x21: 0002 x20: 
x19: 0020 x18: 
x17: 0049c6d0 x16: 007fa22c85c0
x15: 5798 x14: 007fa2205f1c
x13: 007fa241ccb0 x12: 0137
x11:  x10: 
x9 :  x8 : 00de
x7 :  x6 : 2000
x5 : 4060 x4 : 0003
x3 : 0001 x2 : 
x1 :  x0 : 007fa2418000
---
 target/arm/kvm64.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 2d0eb32..7f662e9 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -582,6 +582,70 @@ int kvm_arm_cpreg_level(uint64_t regidx)
 return KVM_PUT_RUNTIME_STATE;
 }
 
+static int kvm_arm_cpreg_value(ARMCPU *cpu, ptrdiff_t fieldoffset)
+{
+int i;
+
+for (i = 0; i < cpu->cpreg_array_len; i++) {
+uint32_t regidx = kvm_to_cpreg_id(cpu->cpreg_indexes[i]);
+const ARMCPRegInfo *ri;
+ri = get_arm_cp_reginfo(cpu->cp_regs, regidx);
+if (!ri) {
+continue;
+}
+
+if (ri->type & ARM_CP_NO_RAW) {
+continue;
+}
+
+if (ri->fieldoffset == fieldoffset) {
+cpu->cpreg_values[i] = read_raw_cp_reg(&cpu->env, ri);
+return 0;
+}
+}
+return -EINVAL;
+}
+
+/* Inject synchronous external abort */
+static void kvm_inject_arm_sea(CPUState *c)
+{
+ARMCPU *cpu = ARM_CPU(c);
+CPUARMState *env = &cpu->env;
+unsigned long cpsr = pstate_read(env);
+uint32_t esr, ret;
+
+c->exception_index = EXCP_DATA_ABORT;
+/* Inject the exception to El1 */
+env->exception.target_el = 1;
+CPUClass *cc = CPU_GET_CLASS(c);
+
+/* Set the DFSC to Synchronous external abort and FnV to not valid,
+ * this will tell guest the FAR_EL1 is UNKNOWN.
+ */
+esr = (0x10 | (1 << 10));
+
+/* This exception is EL0 or EL1 fault. */
+if ((cpsr & 0xf) == PSTATE_MODE_EL0t) {
+esr |= (EC_DATAABORT << ARM_EL_EC_SHIFT);
+} else {
+esr |= (EC_DATAABORT_SAME_EL << ARM_EL_EC_SHIFT);
+}
+
+/* In the aarch64, there is only 32-bit instruction*/
+esr |= ARM_EL_IL;
+env->exception.syndrome = esr;
+
+cc->do_interrupt(c);
+
+/* set ESR_EL1 */
+ret = kvm_arm_cpreg_value(cpu, offsetof(CPUARMState, cp15.esr_el[1]));
+
+if (ret) {
+fprintf(stderr, "<%s> failed to set esr_el1\n", __func__);
+abort();
+}
+}
+
 #define AARCH64_CORE_REG(x)   (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \
  KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x))
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v12 07/12] target-arm: handle SError interrupt exception from the guest OS

2017-11-10 Thread Dongjiu Geng

When guest OS happens SError interrupt(SEI), it will trap to
host. Host check the Asynchronous Error Type(ESR_ELx.AET). If
it the error has not been propagated and has not (yet) been
architecturally consumed by the PE, it will return to use space
with error code KVM_SEI_SEV_RECOVERABLE.

Qemu receive this exception exit, check whether KVM support to
set ESR(exception syndrome registers) value. If support, it sets
the ESR value using a new IOCTL.

This handling is only supported in AArch64 platform, not supported
in AArch32 platform.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 

---
set ESR and inject SEI by userspace are suggested here:
https://lkml.org/lkml/2017/3/20/441
https://lkml.org/lkml/2017/3/20/516

Below is the log that Qemu inject SError with specify ESR and guest happen 
exception:

 Bad mode in Error handler detected, code 0xbe000c11 -- SError
 CPU: 0 PID: 539 Comm: devmem Tainted: G  D 4.1.0+ #20
 Hardware name: linux,dummy-virt (DT)
 task: ffc019aad600 ti: ffc008134000 task.ti: ffc008134000
 PC is at 0x405cc0
 LR is at 0x40ce80
 pc : [<00405cc0>] lr : [<0040ce80>] pstate: 6000
 sp : ffc008137ff0
 x29: 007fd9e80790 x28: 
 x27: 00ad x26: 0049c000
 x25: 0048904b x24: 0049c000
 x23: 4060 x22: 007fd9e808d0
 x21: 0002 x20: 
 x19: 0020 x18: 
 x17: 00405cc0 x16: 0049c698
 x15: 5798 x14: 007f93875f1c
 x13: 007f93a8ccb0 x12: 0137
 x11:  x10: 
 x9 :  x8 : 00de
 x7 :  x6 : 2000
 x5 : 4060 x4 : 0003
 x3 : 0001 x2 : 000f123b
 x1 : 0008 x0 : 0047a048
---
 target/arm/internals.h |  4 
 target/arm/kvm.c   |  3 +++
 target/arm/kvm32.c |  6 ++
 target/arm/kvm64.c | 34 ++
 target/arm/kvm_arm.h   |  8 
 5 files changed, 55 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 1f6efef..cd26a9d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -233,9 +233,13 @@ enum arm_exception_class {
 #define ARM_EL_EC_SHIFT 26
 #define ARM_EL_IL_SHIFT 25
 #define ARM_EL_ISV_SHIFT 24
+#define ARM_EL_AET_SHIFT 10
 #define ARM_EL_IL (1 << ARM_EL_IL_SHIFT)
 #define ARM_EL_ISV (1 << ARM_EL_ISV_SHIFT)
 
+/* Asynchronous Error Type */
+#define KVM_SEI_SEV_RECOVERABLE 1
+
 /* Utility functions for constructing various kinds of syndrome value.
  * Note that in general we follow the AArch64 syndrome values; in a
  * few cases the value in HSR for exceptions taken to AArch32 Hyp
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 7c17f0d..d85e36a 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -593,6 +593,9 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
 ret = EXCP_DEBUG;
 } /* otherwise return to guest */
 break;
+case KVM_EXIT_EXCEPTION:
+kvm_arm_handle_exception(cs, run);
+break;
 default:
 qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
   __func__, run->exit_reason);
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index 069da0c..8ce56fd 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -493,6 +493,12 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 return false;
 }
 
+bool kvm_arm_handle_exception(CPUState *cs, struct kvm_run *run)
+{
+qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
+return false;
+}
+
 int kvm_arch_insert_hw_breakpoint(target_ulong addr,
   target_ulong len, int type)
 {
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index af8ebc9..2d0eb32 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -986,3 +986,37 @@ static bool kvm_can_set_vcpu_esr(struct KVMState *state)
 int ret = kvm_check_extension(state, KVM_CAP_ARM_INJECT_SERROR_ESR);
 return (ret) ? true : false;
 }
+
+static bool kvm_inject_arm_sei(CPUState *cs, unsigned int error_code)
+{
+int ret;
+/* IMPLEMENTATION DEFINED syndrome by default */
+uint32_t syndrome = ARM_EL_ISV;
+
+if (kvm_can_set_vcpu_esr(cs->kvm_state)) {
+if (error_code == KVM_SEI_SEV_RECOVERABLE) {
+/* Set Recoverable Asynchronous SError interrupt Type */
+syndrome = (3 << ARM_EL_AET_SHIFT) | 0x11;
+}
+ret = kvm_vcpu_ioctl(cs, KVM_ARM_INJECT_SERROR_ESR, &syndrome);
+if (ret < 0) {
+fprintf(stderr, "KVM_ARM_SET_SERROR_ESR failed: %s\n",
+strerror(-ret));
+abort();
+}
+
+return true;
+}
+
+return false;
+}
+
+bool kvm_arm_handle_exception(CPUState *cs, struct kvm_run *run)
+{
+int exception = run->ex.exception;
+unsign

[Qemu-devel] [PATCH v12 02/12] ACPI: Add APEI GHES table generation and CPER record support

2017-11-10 Thread Dongjiu Geng

This implements APEI GHES Table generation when OS boot and
record CPER in runtime via fw_cfg blobs. After a CPER info is
recorded into guest memory, it need to inject whatever interrupt
(or assert whatever GPIO line) to notify the guest. About the
detailed design or implementation, please see the "hest_ghes.txt"
in the doc folder.

Now we only support three types of GHESv2, which are GPIO-Signal,
ARMv8 SEA and ARMv8 SEI. Afterwards, we can extend the supported
type if needed. For the CPER section type, currently it is memory
section because kernel manly wants userspace to handle the memory
section errors.

For GHESv2 error source, the OSPM must acknowledges the error via
Read Ack register. So user space must check the ack value before
recording a new CPER to avoid read-write race condition.

Suggested-by: Laszlo Ersek 
Signed-off-by: Dongjiu Geng 
---
The basic solution is ever discussed in this mail
https://lkml.org/lkml/2017/3/29/342

---
 hw/acpi/aml-build.c |   2 +
 hw/acpi/hest_ghes.c | 360 
 hw/arm/virt-acpi-build.c|   8 +
 include/hw/acpi/aml-build.h |   1 +
 include/hw/acpi/hest_ghes.h |  84 +++
 5 files changed, 455 insertions(+)
 create mode 100644 hw/acpi/hest_ghes.c
 create mode 100644 include/hw/acpi/hest_ghes.h

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 36a6cc4..6849e5f 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1561,6 +1561,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
 tables->table_data = g_array_new(false, true /* clear */, 1);
 tables->tcpalog = g_array_new(false, true /* clear */, 1);
 tables->vmgenid = g_array_new(false, true /* clear */, 1);
+tables->hardware_errors = g_array_new(false, true /* clear */, 1);
 tables->linker = bios_linker_loader_init();
 }
 
@@ -1571,6 +1572,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, 
bool mfre)
 g_array_free(tables->table_data, true);
 g_array_free(tables->tcpalog, mfre);
 g_array_free(tables->vmgenid, mfre);
+g_array_free(tables->hardware_errors, mfre);
 }
 
 /* Build rsdt table */
diff --git a/hw/acpi/hest_ghes.c b/hw/acpi/hest_ghes.c
new file mode 100644
index 000..9061e3c
--- /dev/null
+++ b/hw/acpi/hest_ghes.c
@@ -0,0 +1,360 @@
+/* Support for generating APEI tables and passing them to Guests
+ *
+ * Copyright (C) 2017 HuaWei Corporation.
+ *
+ * Author: Dongjiu Geng 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/acpi/hest_ghes.h"
+#include "hw/nvram/fw_cfg.h"
+#include "sysemu/sysemu.h"
+#include "qemu/error-report.h"
+
+/* Generic Error Status Block
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void build_append_gesb(GArray *table, uint32_t block_status,
+  uint32_t raw_data_offset, uint32_t raw_data_length,
+  uint32_t data_length, uint32_t error_severity)
+{
+build_append_int_noprefix(table, block_status, 4);
+build_append_int_noprefix(table, raw_data_offset, 4);
+build_append_int_noprefix(table, raw_data_length, 4);
+build_append_int_noprefix(table, data_length, 4);
+build_append_int_noprefix(table, error_severity, 4);
+}
+
+/* Generic Error Data Entry
+ * ACPI 6.1: 18.3.2.7.1 Generic Error Data
+ */
+static void build_append_gede(GArray *table, const char *section_type,
+  const uint32_t error_severity, const uint16_t revision,
+  const uint32_t error_data_length)
+{
+int i;
+
+for (i = 0; i < 16; i++) {
+build_append_int_noprefix(table, section_type[i], 1);
+}
+
+build_append_int_noprefix(table, error_severity, 4);
+build_append_int_noprefix(table, revision, 2);
+build_append_int_noprefix(table, 0, 2);
+build_append_int_noprefix(table, error_data_length, 4);
+build_append_int_noprefix(table, 0, 44);
+}
+
+/* Generic Address Structure (GAS)
+ * ACPI 2.0/3.0: 5.2.3.1 Generic Address Structure
+ * 2.0 compat note:
+ *@access_width must be 0, see ACPI 2.0:Table 5-1
+ */
+static void build_append_gas(GArray *table, AmlRegionSpace as,
+  uint8_t bit_width, uint8_t bit_offset,
+  uint8_t access_width, uint64_t address)
+{
+build_append_int_noprefix(table, as, 1);
+bu

[Qemu-devel] [PATCH v12 00/12] Add RAS virtualization support in QEMU

2017-11-10 Thread Dongjiu Geng

In the ARMv8 platform, the CPU error type are synchronous external
abort(SEA) and SError Interrupt (SEI). If guest happen exception, 
sometimes  guest itself do the recovery is better, because host 
does not know guest's detailed info. For example, if a guest
user-space application happen exception, guest can kill this 
application, but host can not do that.

For the ARMv8 SEA/SEI, KVM or host kernel will deliver SIGBUS or
use other interface to notify user space. After user space gets 
the notification, it will record the CPER to guest GHES buffer
for guest and inject a exception or IRQ to KVM.

In the current implement, if the SIGBUS is BUS_MCEERR_AR, we will
treat it as synchronous exception, and use ARMv8 SEA notification type
to notify guest after recording CPER for guest; If the SIGBUS is
BUS_MCEERR_AO, we will use treat it as asynchronous exception, and
use GPIO-Signal to notify guest after recording CPER for guest.

If KVM wants userspace to do the recovery for the SError, it will return a error
status to Qemu. Then Qemu will specify the guest ESR value and inject a virtual
SError.

This series patches have three parts:
1. Generate APEI/GHES table and record CPER for guest in runtime.
2. Handle the SIGBUS signal, record the CPER and fill into guest memory,
   then according to SIGBUS type(BUS_MCEERR_AR or BUS_MCEERR_AO), using
   different ACPI notification type to notify guest.
3. Specify guest SError ESR value and inject a virtual SError 


About the whole solution we ever discussed here before:
https://lkml.org/lkml/2017/2/27/246
https://patchwork.kernel.org/patch/9633105/
https://patchwork.kernel.org/patch/9925227/


---
1. How to test ACPI table.
Note: the UEFI(QEMU_EFI.fd) is needed if guest want to use ACPI table.

After guest boot up, dump the APEI table, you can see whether the table is 
right.
(1) # iasl -p ./HEST -d /sys/firmware/acpi/tables/HEST
(2) # cat HEST.dsl
/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20170728 (64-bit version)
 * Copyright (c) 2000 - 2017 Intel Corporation
 *
 * Disassembly of /sys/firmware/acpi/tables/HEST, Mon Sep  5 07:59:17 2016
 *
 * ACPI Data Table [HEST]
 *
 * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
 */


..
[308h 0776   2]Subtable Type : 000A [Generic Hardware Error 
Source V2]
[30Ah 0778   2]Source Id : 0008
[30Ch 0780   2]Related Source Id : 
[30Eh 0782   1] Reserved : 00
[30Fh 0783   1]  Enabled : 01
[310h 0784   4]   Records To Preallocate : 0001
[314h 0788   4]  Max Sections Per Record : 0001
[318h 0792   4]  Max Raw Data Length : 1000

[31Ch 0796  12] Error Status Address : [Generic Address Structure]
[31Ch 0796   1] Space ID : 00 [SystemMemory]
[31Dh 0797   1]Bit Width : 40
[31Eh 0798   1]   Bit Offset : 00
[31Fh 0799   1] Encoded Access Width : 04 [QWord Access:64]
[320h 0800   8]  Address : 785D0040

[328h 0808  28]   Notify : [Hardware Error Notification 
Structure]
[328h 0808   1]  Notify Type : 08 [SEA]
[329h 0809   1]Notify Length : 1C
[32Ah 0810   2]   Configuration Write Enable : 
[32Ch 0812   4] PollInterval : 
[330h 0816   4]   Vector : 
[334h 0820   4]  Polling Threshold Value : 
[338h 0824   4] Polling Threshold Window : 
[33Ch 0828   4]Error Threshold Value : 
[340h 0832   4]   Error Threshold Window : 

[344h 0836   4]Error Status Block Length : 1000
[348h 0840  12]Read Ack Register : [Generic Address Structure]
[348h 0840   1] Space ID : 00 [SystemMemory]
[349h 0841   1]Bit Width : 40
[34Ah 0842   1]   Bit Offset : 00
[34Bh 0843   1] Encoded Access Width : 04 [QWord Access:64]
[34Ch 0844   8]  Address : 785D0098

[354h 0852   8]Read Ack Preserve : FFFE
[35Ch 0860   8]   Read Ack Write : 0001

[364h 0868   2]Subtable Type : 000A [Generic Hardware Error 
Source V2]
[366h 0870   2]Source Id : 0009
[368h 0872   2]Related Source Id : 
[36Ah 0874   1] Reserved : 00
[36Bh 0875   1]  Enabled : 01
[36Ch 0876   4]   Records To Preallocate : 0001
[370h 0880   4]  Max Sections Per Record : 0001
[374h 0884   4]  Max Raw Data Length : 1000

[378h 0888  1

[Qemu-devel] [PATCH v12 09/12] Move related hwpoison page function to accel/kvm/ folder

2017-11-10 Thread Dongjiu Geng

kvm_hwpoison_page_add() and kvm_unpoison_all() will be used
by both X86 and ARM platforms, so move them to a common accel/kvm/
folder to avoid duplicate code.

Signed-off-by: Dongjiu Geng 

---
Moving related hwpoison page function to accel/kvm folder is suggested here:
https://lists.gnu.org/archive/html/qemu-arm/2017-09/msg00077.html
https://lists.gnu.org/archive/html/qemu-arm/2017-09/msg00152.html
---
 accel/kvm/kvm-all.c | 29 +
 include/exec/ram_addr.h | 10 ++
 target/i386/kvm.c   | 33 -
 3 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 46ce479..72ab615 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -564,6 +564,34 @@ int kvm_vm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
+QLIST_HEAD_INITIALIZER(hwpoison_page_list);
+
+void kvm_unpoison_all(void *param)
+{
+HWPoisonPage *page, *next_page;
+
+QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
+QLIST_REMOVE(page, list);
+qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+g_free(page);
+}
+}
+
+void kvm_hwpoison_page_add(ram_addr_t ram_addr)
+{
+HWPoisonPage *page;
+
+QLIST_FOREACH(page, &hwpoison_page_list, list) {
+if (page->ram_addr == ram_addr) {
+return;
+}
+}
+page = g_new(HWPoisonPage, 1);
+page->ram_addr = ram_addr;
+QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if defined(HOST_WORDS_BIGENDIAN) != defined(TARGET_WORDS_BIGENDIAN)
@@ -2279,6 +2307,7 @@ bool kvm_arm_supports_user_irq(void)
 return kvm_check_extension(kvm_state, KVM_CAP_ARM_USER_IRQ);
 }
 
+
 #ifdef KVM_CAP_SET_GUEST_DEBUG
 struct kvm_sw_breakpoint *kvm_find_sw_breakpoint(CPUState *cpu,
  target_ulong pc)
diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index d017639..afe34b1 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -49,6 +49,11 @@ struct RAMBlock {
 unsigned long *unsentmap;
 };
 
+typedef struct HWPoisonPage {
+ram_addr_t ram_addr;
+QLIST_ENTRY(HWPoisonPage) list;
+} HWPoisonPage;
+
 static inline bool offset_in_ramblock(RAMBlock *b, ram_addr_t offset)
 {
 return (b && b->host && offset < b->used_length) ? true : false;
@@ -80,6 +85,11 @@ void qemu_ram_free(RAMBlock *block);
 
 int qemu_ram_resize(RAMBlock *block, ram_addr_t newsize, Error **errp);
 
+/* Free and remove all the poisoned pages in the list */
+void kvm_unpoison_all(void *param);
+/* Add a poisoned page to the list */
+void kvm_hwpoison_page_add(ram_addr_t ram_addr);
+
 #define DIRTY_CLIENTS_ALL ((1 << DIRTY_MEMORY_NUM) - 1)
 #define DIRTY_CLIENTS_NOCODE  (DIRTY_CLIENTS_ALL & ~(1 << DIRTY_MEMORY_CODE))
 
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index 6db7783..3e1afb6 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -390,39 +390,6 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, 
uint32_t function,
 return ret;
 }
 
-typedef struct HWPoisonPage {
-ram_addr_t ram_addr;
-QLIST_ENTRY(HWPoisonPage) list;
-} HWPoisonPage;
-
-static QLIST_HEAD(, HWPoisonPage) hwpoison_page_list =
-QLIST_HEAD_INITIALIZER(hwpoison_page_list);
-
-static void kvm_unpoison_all(void *param)
-{
-HWPoisonPage *page, *next_page;
-
-QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
-QLIST_REMOVE(page, list);
-qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
-g_free(page);
-}
-}
-
-static void kvm_hwpoison_page_add(ram_addr_t ram_addr)
-{
-HWPoisonPage *page;
-
-QLIST_FOREACH(page, &hwpoison_page_list, list) {
-if (page->ram_addr == ram_addr) {
-return;
-}
-}
-page = g_new(HWPoisonPage, 1);
-page->ram_addr = ram_addr;
-QLIST_INSERT_HEAD(&hwpoison_page_list, page, list);
-}
-
 static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
  int *max_banks)
 {
-- 
1.8.3.1

[Qemu-devel] [PATCH v12 03/12] docs: APEI GHES generation description

2017-11-10 Thread Dongjiu Geng

Add APEI/GHES description document

Signed-off-by: Dongjiu Geng 
---
 docs/specs/acpi_hest_ghes.txt | 98 +++
 1 file changed, 98 insertions(+)
 create mode 100644 docs/specs/acpi_hest_ghes.txt

diff --git a/docs/specs/acpi_hest_ghes.txt b/docs/specs/acpi_hest_ghes.txt
new file mode 100644
index 000..816d7b9
--- /dev/null
+++ b/docs/specs/acpi_hest_ghes.txt
@@ -0,0 +1,98 @@
+Generating APEI tables and record CPER
+=
+
+Copyright (C) 2017 HuaWei Corporation.
+
+Design Details:
+---
+
+   etc/acpi/tables   etc/hardware_errors
+
==
++ +--++--+
+| | HEST ||address   |  
+--+
+| +--+|registers |  | 
Error Status |
+| | GHES1|| ++  | 
Data Block 1 |
+| +--+ +->| |status_address1 |->| 
++
+| | .| |  | ++  | 
|  CPER  |
+| | error_status_address-+-+ +--->| |status_address2 |--+   | 
|  CPER  |
+| | .|   || ++  |   | 
|    |
+| | read_ack_register+-+ ||  .   |  |   | 
|  CPER  |
+| | read_ack_preserve| | |+--+  |   | 
++
+| | read_ack_write   | | | +->| |status_addressN |+ |   | 
Error Status |
++ +--+ | | |  | ++| |   | 
Data Block 2 |
+| | GHES2| +-+-+->| | ack_value1 || +-->| 
++
++ +--+   | |  | ++| | 
|  CPER  |
+| | .|   | | +--->| | ack_value2 || | 
|  CPER  |
+| | error_status_address-+---+ | || ++| | 
|    |
+| | .| | || |  . || | 
|  CPER  |
+| | read_ack_register+-+-+| ++| 
+-++
+| | read_ack_preserve| |   +->| | ack_valueN || | 
|..  |
+| | read_ack_write   | |   |  | ++| | 
++
++ +--| |   |  | | 
Error Status |
+| | ...  | |   |  | | 
Data Block N |
++ +--+ |   |  +>| 
++
+| | GHESN| |   || 
|  CPER  |
++ +--+ |   || 
|  CPER  |
+| | .| |   || 
|    |
+| | error_status_address-+-+   || 
|  CPER  |
+| | .| |
+-++
+| | read_ack_register+-+
+| | read_ack_preserve|
+| | read_ack_write   |
++ +--+
+
+(1) QEMU generates the ACPI HEST table. This table goes in the current
+"etc/acpi/tables" fw_cfg blob. Each error source has different
+notification type.
+
+(2) A new fw_cfg blob called "etc/hardware_errors" is introduced. QEMU
+also need to populate this blob. The "etc/hardwre_errors" fw_cfg blob 
contains
+one address registers table and one Error Status Data Block table, all
+of which are pre-allocated.
+
+(3) The address registers table contains N Error Status Address entries
+and N Read Ack Address entries, the size for each entry is 8-byte. The
+Error Status Data Block table contains N Error Status Data Block entry,
+the size for each entry is 0x1000(4096) bytes. The total size for
+"etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes
+
+(4) QEMU generates the ACPI linker/loader script for the firmware
+
+(4a) The HEST table is part of "etc/acpi/tables", which the firmware
+already allocates memory for it and downloads, because QEMU already
+generates an ALLOCATE linker/loader command for it.
+
+(4b) QEMU creates another ALLOCATE command for the "etc/hardware_errors"
+blob. The firmware allocates memory for this blob,
+and downloads it.
+
+(5) QEMU generates, N ADD_POINTER commands, which patch address in the
+"Error Status Address" fields of the HEST table with a pointer to the
+corresponding address registers in the downloaded "etc/hardware_errors" 
blob.
+
+(6) QEMU generates N ADD_POI

[Qemu-devel] [PATCH v12 04/12] ACPI: enable APEI GHES in the configure file and build it

2017-11-10 Thread Dongjiu Geng

Add CONFIG_ACPI_APEI configuration in the arm-softmmu.mak
and add build choice in the Makefile.objs.

Signed-off-by: Dongjiu Geng 
---
 default-configs/arm-softmmu.mak | 1 +
 hw/acpi/Makefile.objs   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index bbdd3c1..c362113 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -129,3 +129,4 @@ CONFIG_ACPI=y
 CONFIG_SMBIOS=y
 CONFIG_ASPEED_SOC=y
 CONFIG_GPIO_KEY=y
+CONFIG_ACPI_APEI=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 11c35bc..bafb148 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -6,6 +6,7 @@ common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
+common-obj-$(CONFIG_ACPI_APEI) += hest_ghes.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
-- 
1.8.3.1

[Qemu-devel] [PATCH v12 11/12] hw/arm/virt: Add RAS platform version for migration

2017-11-10 Thread Dongjiu Geng

Support this feature since version 2.10, disable it by
default in the old version.

Signed-off-by: Dongjiu Geng 
---
Adding platform version is suggested here:
https://lkml.org/lkml/2017/8/25/821
---
 hw/arm/virt-acpi-build.c | 14 +-
 hw/arm/virt.c|  4 
 include/hw/arm/virt.h|  1 +
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 92c8c38..961b67d 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -801,10 +801,11 @@ void virt_acpi_build(VirtMachineState *vms, 
AcpiBuildTables *tables)
 acpi_add_table(table_offsets, tables_blob);
 build_spcr(tables_blob, tables->linker, vms);
 
-acpi_add_table(table_offsets, tables_blob);
-build_error_block(tables->hardware_errors, tables->linker);
-build_apei_ghes(tables_blob, tables->hardware_errors, tables->linker);
-
+if (!vmc->no_ras) {
+acpi_add_table(table_offsets, tables_blob);
+build_error_block(tables->hardware_errors, tables->linker);
+build_apei_ghes(tables_blob, tables->hardware_errors, tables->linker);
+}
 
 if (nb_numa_nodes > 0) {
 acpi_add_table(table_offsets, tables_blob);
@@ -891,6 +892,7 @@ static const VMStateDescription vmstate_virt_acpi_build = {
 
 void virt_acpi_setup(VirtMachineState *vms)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(vms);
 AcpiBuildTables tables;
 AcpiBuildState *build_state;
 
@@ -922,7 +924,9 @@ void virt_acpi_setup(VirtMachineState *vms)
 fw_cfg_add_file(vms->fw_cfg, ACPI_BUILD_TPMLOG_FILE, tables.tcpalog->data,
 acpi_data_len(tables.tcpalog));
 
-ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
+if (!vmc->no_ras) {
+ghes_add_fw_cfg(vms->fw_cfg, tables.hardware_errors);
+}
 
 build_state->rsdp_mr = acpi_add_rom_blob(build_state, tables.rsdp,
   ACPI_BUILD_RSDP_FILE, 0);
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 68495c2..ab79988 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1732,8 +1732,12 @@ static void virt_2_9_instance_init(Object *obj)
 
 static void virt_machine_2_9_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_2_10_options(mc);
 SET_MACHINE_COMPAT(mc, VIRT_COMPAT_2_9);
+/* memory recovery feature was introduced with 2.10 */
+vmc->no_ras = true;
 }
 DEFINE_VIRT_MACHINE(2, 9)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 33b0ff3..8fbd664 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -84,6 +84,7 @@ typedef struct {
 bool disallow_affinity_adjustment;
 bool no_its;
 bool no_pmu;
+bool no_ras;
 bool claim_edge_triggered_timers;
 } VirtMachineClass;
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v12 10/12] ARM: ACPI: Add _E04 for hardware error device

2017-11-10 Thread Dongjiu Geng

In ARM platform we implements a notification of error
events via a GPIO pin. In this case of GPIO-signaled
events, an _AEI object lists the appropriate GPIO pin.

GPIO pin 4 is used for hardware error device (PNP0C33),
so add _E04 in ACPI DSDT table. When GPIO-pin 4 signaled
a events, the guest ACPI driver will receive this
notification and handing the error.

Signed-off-by: Dongjiu Geng 

---
1. Using which notification type for SIGBUS_MCEERR_AO SIGBUS(Polled, 
GPIO-Signal or ARMv8 SEI),
ever discussed here:
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03397.html
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03467.html
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03601.html
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg03775.html

2. How to dump the ASL for the GPIO and hardware error device


Device (GPO0)
{
Name (_AEI, ResourceTemplate ()  // _AEI: ACPI Event Interrupts
{
.
GpioInt (Edge, ActiveHigh, Exclusive, PullUp, 0x,
"GPO0", 0x00, ResourceConsumer, ,
)
{   // Pin list
0x0004
}
})
Method (_E04, 0, NotSerialized)  // _Exx: Edge-Triggered GPE
{
Notify (ERRD, 0x80) // Status Change
}
}
Device (ERRD)
{
Name (_HID, EisaId ("PNP0C33") /* Error Device */)  // _HID: Hardware ID
Name (_UID, Zero)  // _UID: Unique ID
Method (_STA, 0, NotSerialized)  // _STA: Status
{
Return (0x0F)
}
}

3. Below is the guest log that Qemu notifies guest using GPIO-signal after 
record a CPER
[  504.164899] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 7
[  504.166970] {1}[Hardware Error]: event severity: recoverable
[  504.251650] {1}[Hardware Error]:  Error 0, type: recoverable
[  504.252974] {1}[Hardware Error]:   section_type: memory error
[  504.254380] {1}[Hardware Error]:   physical_address: 0x03ec
[  504.255879] {1}[Hardware Error]:   error_type: 3, multi-bit ECC
---
 hw/arm/virt-acpi-build.c | 31 ++-
 hw/arm/virt.c| 18 ++
 include/sysemu/sysemu.h  |  3 +++
 vl.c | 12 
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 7b397c3..92c8c38 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -49,6 +49,7 @@
 
 #define ARM_SPI_BASE 32
 #define ACPI_POWER_BUTTON_DEVICE "PWRB"
+#define ACPI_HARDWARE_ERROR_DEVICE "ERRD"
 
 static void acpi_dsdt_add_cpus(Aml *scope, int smp_cpus)
 {
@@ -340,7 +341,13 @@ static void acpi_dsdt_add_gpio(Aml *scope, const 
MemMapEntry *gpio_memmap,
 
 Aml *aei = aml_resource_template();
 /* Pin 3 for power button */
-const uint32_t pin_list[1] = {3};
+uint32_t pin_list[1] = {3};
+aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
+ AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
+ "GPO0", NULL, 0));
+
+/* Pin 4 for hardware error device */
+pin_list[0] = 4;
 aml_append(aei, aml_gpio_int(AML_CONSUMER, AML_EDGE, AML_ACTIVE_HIGH,
  AML_EXCLUSIVE, AML_PULL_UP, 0, pin_list, 1,
  "GPO0", NULL, 0));
@@ -351,6 +358,13 @@ static void acpi_dsdt_add_gpio(Aml *scope, const 
MemMapEntry *gpio_memmap,
 aml_append(method, aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
   aml_int(0x80)));
 aml_append(dev, method);
+
+/* _E04 is handle for hardware error */
+method = aml_method("_E04", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_notify(aml_name(ACPI_HARDWARE_ERROR_DEVICE),
+  aml_int(0x80)));
+aml_append(dev, method);
+
 aml_append(scope, dev);
 }
 
@@ -363,6 +377,20 @@ static void acpi_dsdt_add_power_button(Aml *scope)
 aml_append(scope, dev);
 }
 
+static void acpi_dsdt_add_error_device(Aml *scope)
+{
+Aml *dev = aml_device(ACPI_HARDWARE_ERROR_DEVICE);
+Aml *method;
+
+aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0C33")));
+aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+method = aml_method("_STA", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_int(0x0f)));
+aml_append(dev, method);
+aml_append(scope, dev);
+}
+
 /* RSDP */
 static GArray *
 build_rsdp(GArray *rsdp_table, BIOSLinker *linker, unsigned xsdt_tbl_offset)
@@ -716,6 +744,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 acpi_dsdt_add_gpio(scope, &memmap[VIRT_GPIO],
(irqmap[VIRT_GPIO] + ARM_SPI_BASE));
 acpi_dsdt_add_power_button(scope);
+acpi_dsdt_add_error_device(scope);
 
 aml_append(dsdt, scope);
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 6b7a0fe..68495c2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -701,16 +70

[Qemu-devel] [PATCH v12 05/12] linux-headers: sync against Linux v4.14-rc8

2017-11-10 Thread Dongjiu Geng

Signed-off-by: Dongjiu Geng 

---
Suggested by here:
https://lkml.org/lkml/2017/9/5/575

---
 include/standard-headers/asm-s390/kvm_virtio.h | 1 +
 include/standard-headers/asm-s390/virtio-ccw.h | 1 +
 include/standard-headers/asm-x86/hyperv.h  | 1 +
 include/standard-headers/linux/input-event-codes.h | 1 +
 include/standard-headers/linux/input.h | 1 +
 include/standard-headers/linux/pci_regs.h  | 1 +
 linux-headers/asm-arm/kvm.h| 1 +
 linux-headers/asm-arm/kvm_para.h   | 1 +
 linux-headers/asm-arm/unistd.h | 1 +
 linux-headers/asm-arm64/kvm.h  | 1 +
 linux-headers/asm-arm64/unistd.h   | 1 +
 linux-headers/asm-powerpc/epapr_hcalls.h   | 1 +
 linux-headers/asm-powerpc/kvm.h| 1 +
 linux-headers/asm-powerpc/kvm_para.h   | 1 +
 linux-headers/asm-powerpc/unistd.h | 1 +
 linux-headers/asm-s390/kvm.h   | 1 +
 linux-headers/asm-s390/kvm_para.h  | 1 +
 linux-headers/asm-s390/unistd.h| 1 +
 linux-headers/asm-x86/kvm.h| 1 +
 linux-headers/asm-x86/kvm_para.h   | 1 +
 linux-headers/asm-x86/unistd.h | 1 +
 linux-headers/linux/kvm.h  | 4 
 linux-headers/linux/kvm_para.h | 1 +
 linux-headers/linux/psci.h | 1 +
 linux-headers/linux/userfaultfd.h  | 1 +
 linux-headers/linux/vfio.h | 1 +
 linux-headers/linux/vfio_ccw.h | 1 +
 linux-headers/linux/vhost.h| 1 +
 28 files changed, 31 insertions(+)

diff --git a/include/standard-headers/asm-s390/kvm_virtio.h 
b/include/standard-headers/asm-s390/kvm_virtio.h
index daad324..4af18ca 100644
--- a/include/standard-headers/asm-s390/kvm_virtio.h
+++ b/include/standard-headers/asm-s390/kvm_virtio.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * definition for virtio for kvm on s390
  *
diff --git a/include/standard-headers/asm-s390/virtio-ccw.h 
b/include/standard-headers/asm-s390/virtio-ccw.h
index a9a4ebf..967aad3 100644
--- a/include/standard-headers/asm-s390/virtio-ccw.h
+++ b/include/standard-headers/asm-s390/virtio-ccw.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * Definitions for virtio-ccw devices.
  *
diff --git a/include/standard-headers/asm-x86/hyperv.h 
b/include/standard-headers/asm-x86/hyperv.h
index fac7651..2c8a3ff 100644
--- a/include/standard-headers/asm-x86/hyperv.h
+++ b/include/standard-headers/asm-x86/hyperv.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 #ifndef _ASM_X86_HYPERV_H
 #define _ASM_X86_HYPERV_H
 
diff --git a/include/standard-headers/linux/input-event-codes.h 
b/include/standard-headers/linux/input-event-codes.h
index 2fa0f4e..7cc0fcb 100644
--- a/include/standard-headers/linux/input-event-codes.h
+++ b/include/standard-headers/linux/input-event-codes.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * Input event codes
  *
diff --git a/include/standard-headers/linux/input.h 
b/include/standard-headers/linux/input.h
index 666e201..bc3e6d3 100644
--- a/include/standard-headers/linux/input.h
+++ b/include/standard-headers/linux/input.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * Copyright (c) 1999-2002 Vojtech Pavlik
  *
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index c22d3eb..92500c3 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * pci_regs.h
  *
diff --git a/linux-headers/asm-arm/kvm.h b/linux-headers/asm-arm/kvm.h
index fa9fae8..17ffa87 100644
--- a/linux-headers/asm-arm/kvm.h
+++ b/linux-headers/asm-arm/kvm.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  * Copyright (C) 2012 - Virtual Open Systems and Columbia University
  * Author: Christoffer Dall 
diff --git a/linux-headers/asm-arm/kvm_para.h b/linux-headers/asm-arm/kvm_para.h
index 14fab8f..baacc49 100644
--- a/linux-headers/asm-arm/kvm_para.h
+++ b/linux-headers/asm-arm/kvm_para.h
@@ -1 +1,2 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 #include 
diff --git a/linux-headers/asm-arm/unistd.h b/linux-headers/asm-arm/unistd.h
index 155571b..8ee6d25 100644
--- a/linux-headers/asm-arm/unistd.h
+++ b/linux-headers/asm-arm/unistd.h
@@ -1,3 +1,4 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *  arch/arm/include/asm/unistd.h
  *
diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index d254700..2c79c6f 100644
--- a/linux-headers/asm-arm64/kvm.h
+++

[Qemu-devel] [PATCH v12 12/12] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2017-11-10 Thread Dongjiu Geng

Add SIGBUS signal handler. In this handler, it checks the SIGBUS type,
translate the host VA which is delivered by host to guest PA, then fill
this PA to CPER and fill the CPER to guest APEI GHES memory, finally
notify guest according the SIGBUS type. There are two kinds of SIGBUS
that QEMU need to handle, which are BUS_MCEERR_AO and BUS_MCEERR_AR.

Guest access device type poisoned memory, generate SError interrupt,
so it reports it to host firmware. Host kernel gets an APEI notification
and memory_failure() causes the affected page to be unmapped from the
guest's stage2, and SIGBUS_MCEERR_AO is sent to user-space. Here Qemu
will create a new CPER and add it to guest APEI GHES memory, and notify the
guest with a GPIO-Signal notification.

When guest hit a PG_hwpoison page, it will trap to KVM as stage2 fault,
here a SIGBUS_MCEERR_AR synchronous signal is delivered to user-space,
Qemu record this error into guest APEI GHES memory and notify guest using
Synchronous-External-Abort(SEA).

Suggested-by: James Morse 
Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 

---
QEMU handing the SIGBUS is discussed here:
https://lkml.org/lkml/2017/2/27/246

Using which error notification to notify guest is discussed here:
https://lkml.org/lkml/2017/9/14/241
https://lkml.org/lkml/2017/9/22/499
---
 include/sysemu/kvm.h |  2 +-
 target/arm/kvm.c |  2 ++
 target/arm/kvm64.c   | 34 ++
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 3a458f5..90c1605 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -361,7 +361,7 @@ bool kvm_vcpu_id_is_valid(int vcpu_id);
 /* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
 unsigned long kvm_arch_vcpu_id(CPUState *cpu);
 
-#ifdef TARGET_I386
+#if defined(TARGET_I386) || defined(TARGET_AARCH64)
 #define KVM_HAVE_MCE_INJECTION 1
 void kvm_arch_on_sigbus_vcpu(CPUState *cpu, int code, void *addr);
 #endif
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index d85e36a..8523158 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -26,6 +26,7 @@
 #include "exec/address-spaces.h"
 #include "hw/boards.h"
 #include "qemu/log.h"
+#include "exec/ram_addr.h"
 
 const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 KVM_CAP_LAST_INFO
@@ -182,6 +183,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
 
 cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
 
+qemu_register_reset(kvm_unpoison_all, NULL);
 type_register_static(&host_arm_cpu_type_info);
 
 return 0;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 7f662e9..3b532e1 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -27,6 +27,9 @@
 #include "kvm_arm.h"
 #include "internals.h"
 #include "hw/arm/arm.h"
+#include "exec/ram_addr.h"
+#include "hw/acpi/acpi-defs.h"
+#include "hw/acpi/hest_ghes.h"
 
 static bool have_guest_debug;
 
@@ -943,6 +946,37 @@ int kvm_arch_get_registers(CPUState *cs)
 return ret;
 }
 
+void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
+{
+ram_addr_t ram_addr;
+hwaddr paddr;
+
+assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
+if (addr) {
+ram_addr = qemu_ram_addr_from_host(addr);
+if (ram_addr != RAM_ADDR_INVALID &&
+kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
+kvm_cpu_synchronize_state(c);
+kvm_hwpoison_page_add(ram_addr);
+if (code == BUS_MCEERR_AR) {
+ghes_update_guest(ACPI_HEST_NOTIFY_SEA, paddr);
+kvm_inject_arm_sea(c);
+} else if (code == BUS_MCEERR_AO) {
+ghes_update_guest(ACPI_HEST_NOTIFY_GPIO, paddr);
+qemu_hardware_error_notify();
+}
+return;
+}
+fprintf(stderr, "Hardware memory error for memory used by "
+"QEMU itself instead of guest system!\n");
+}
+
+if (code == BUS_MCEERR_AR) {
+fprintf(stderr, "Hardware memory error!\n");
+exit(1);
+}
+}
+
 /* C6.6.29 BRK instruction */
 static const uint32_t brk_insn = 0xd420;
 
-- 
1.8.3.1

[Qemu-devel] [PATCH v12 06/12] target-arm: kvm64: detect whether can set vsesr_el2

2017-11-10 Thread Dongjiu Geng

Check if kvm can support to set vsesr_el2 value for vcpu. When
guest takes a virtual SError interrupt exception, this value
will provides syndrome value reported into ESR_EL1 ISS filed.

Signed-off-by: Dongjiu Geng 
Signed-off-by: Quanming Wu 

---
Detect whether KVM has the capability to set ESR instead of detecting CPU RAS 
capability
is discussed here:

https://www.spinics.net/lists/kvm-arm/msg27150.html
https://www.spinics.net/lists/arm-kernel/msg604440.html
---
 target/arm/kvm64.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index a16abc8..af8ebc9 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -980,3 +980,9 @@ bool kvm_arm_handle_debug(CPUState *cs, struct 
kvm_debug_exit_arch *debug_exit)
 
 return false;
 }
+
+static bool kvm_can_set_vcpu_esr(struct KVMState *state)
+{
+int ret = kvm_check_extension(state, KVM_CAP_ARM_INJECT_SERROR_ESR);
+return (ret) ? true : false;
+}
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v12 09/12] Move related hwpoison page function to accel/kvm/ folder

2017-11-10 Thread Paolo Bonzini

On 10/11/2017 20:19, Dongjiu Geng wrote:
> +typedef struct HWPoisonPage {
> +ram_addr_t ram_addr;
> +QLIST_ENTRY(HWPoisonPage) list;
> +} HWPoisonPage;
> +

Is this actually needed outside accel/kvm/kvm-all.c?

Thanks,

Paolo

Re: [Qemu-devel] [Qemu devel PATCH] MAINTAINERS: Add entries for Smartfusion2

2017-11-10 Thread Philippe Mathieu-Daudé

Hi Sundeep, Peter.

> On 11/09/2017 08:55 PM, Peter Maydell wrote:
> > On 9 November 2017 at 21:46, Philippe Mathieu-Daudé  > wrote:
> >> Hi Subbaraya,
> >>
> >> On 11/09/2017 09:02 AM, Subbaraya Sundeep wrote:
> >>> add voluntarily myself as maintainer for Smartfusion2
> >>
> >> You need to share your GnuPG key signed, I couldn't find it using
> >> http://pgp.mit.edu/pks/lookup?search=Subbaraya+Sundeep
> 
> >>
> >> from https://wiki.qemu.org/Contribute/SubmitAPullRequest
>  :
> >
> > I don't in general expect to take pull requests from
> > everybody listed as a maintainer in the MAINTAINERS file.
> > That just means "I'm going to be reviewing and should
> > be cc'd on patches". Pull requests are sent by people
> > who are maintainers for a subsystem. Rule of thumb:
> > unless somebody asks you to send a pull request, you
> > don't need to do it.
> 
> Ok, please apologize my misunderstanding. I still think the M: entry

Peter: Oops :) read "I apologize"

> stand for 'Maintainer' instead of 'Mail', and still don't understand the
> difference with a "Designated reviewer" (R: entry):
> 
>         M: Mail patches to: FullName 
>         R: Designated reviewer: FullName 
>            These reviewers should be CCed on patches.
> 
> "Designated reviewer" seems to duplicate the M: entry and is therefore
> confusing. Can we simply remove it instead?
> 
> When introduced in fdf6fab4df4 the explanation was:
> 
> --
> Some people are not content with the amount of mail they get, and would
> like to be CCed on patches for areas they do not maintain.  Let them
> satisfy their own appetite for qemu-devel messages.
> 
> Seriously: the purpose here is a bit different from the Linux kernel.
> While Linux uses "R" to designate non-maintainers for reviewing patches
> in a given area, in QEMU I would also like to use "R" so that people can
> delegate sending pull requests while keeping some degree of oversight.
> --
> 
> Do you want me to remove M: and put only R: ?

No, it seems you are correct and I was wrong :)

If you agree changing the first section title to "SmartFusion2" and
eventually the second to "Emcraft M2S-FG484":

Reviewed-by: Philippe Mathieu-Daudé 

Regards,

Phil.

Re: [Qemu-devel] [RFC PATCH 19/26] cpu-exec: reset exit flag before calling cpu_exec_nocache

2017-11-10 Thread Pavel Dovgalyuk

> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> >>>
> >>> I tried this approach and it didn't work.
> >>> I think iothread sets u16.high flag after resetting it in 
> >>> cpu_handle_interrupt.
> >>
> >> But why is this a problem?  The TB would exit immediately and go again
> >> to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
> >> cpu_handle_exception causes the exception via cpu_exec_nocache.
> >
> > I've tested your variant more thoroughly.
> > It seems, that iothread calls cpu_exec between 
> > atomic_set(&cpu->icount_decr.u16.high, 0);
> > in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
> > I see no other reason, because this happens not for the every time.
> > And cpu_handle_interrupt is not called again, because cpu_handle_exception 
> > returns true.
> > Therefore we have an infinite loop, because no other code here resets cpu-
> >icount_decr.u16.high.
> 
> Then returning true unconditionally is wrong in the cpu_exec_nocache
> case.  What if you do:
> 
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 61297f8f4a..fb5446be3e 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -470,7 +470,19 @@ static inline void cpu_handle_debug_exception(CPUState 
> *cpu)
> 
>  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>  {
> -if (cpu->exception_index >= 0) {
> +if (cpu->exception_index < 0) {
> +#ifndef CONFIG_USER_ONLY
> +if (replay_has_exception()
> +&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> +/* try to cause an exception pending in the log */
> +cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), 
> true);
> +}
> +#endif
> +if (cpu->exception_index < 0) {
> +return;

return false, I guess?
This approach allows iterating in case of races
and QEMU does not hangs anymore at replay.

> +}
> +}
> +
>  if (cpu->exception_index >= EXCP_INTERRUPT) {
>  /* exit request from the cpu execution loop */
>  *ret = cpu->exception_index;
> @@ -505,16 +517,6 @@ static inline bool cpu_handle_exception(CPUState *cpu, 
> int *ret)
>  }
>  #endif
>  }
> -#ifndef CONFIG_USER_ONLY
> -} else if (replay_has_exception()
> -   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
> -/* try to cause an exception pending in the log */
> -cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), true);
> -*ret = -1;
> -return true;
> -#endif
> -}
> -
>  return false;
>  }
> 


Pavel Dovgalyuk

Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field

2017-11-10 Thread David Hildenbrand

On 10.11.2017 11:14, Cornelia Huck wrote:
> On Thu, 9 Nov 2017 18:02:35 -0200
> Eduardo Habkost  wrote:
> 
>> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
>>> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
 On Mon, 6 Nov 2017 16:02:16 -0200
 Eduardo Habkost  wrote:
   
> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
>> On Thu, 19 Oct 2017 17:31:51 +1100
>> David Gibson  wrote:
>> 
>>> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
 For enabling early cpu to numa node configuration at runtime
 qmp_query_hotpluggable_cpus() should provide a list of available
 cpu slots at early stage, before machine_init() is called and
 the 1st cpu is created, so that mgmt might be able to call it
 and use output to set numa mapping.
 Use MachineClass::possible_cpu_arch_ids() callback to set
 cpu type info, along with the rest of possible cpu properties,
 to let machine define which cpu type* will be used.

 * for SPAPR it will be a spapr core type and for ARM/s390x/x86
   a respective descendant of CPUClass.

 Move parse_numa_opts() in vl.c after cpu_model is parsed into
 cpu_type so that possible_cpu_arch_ids() would know which
 cpu_type to use during layout initialization.

 Signed-off-by: Igor Mammedov   
>>>
>>> Reviewed-by: David Gibson 
>>> 
 ---
   v2:
  - fix NULL dereference caused by not initialized
MachineState::cpu_type at the time parse_numa_opts()
were called
 ---
  include/hw/boards.h|  2 ++
  hw/arm/virt.c  |  3 ++-
  hw/core/machine.c  | 12 ++--
  hw/i386/pc.c   |  4 +++-
  hw/ppc/spapr.c | 13 -
  hw/s390x/s390-virtio-ccw.c |  1 +
  vl.c   |  3 +--
  7 files changed, 23 insertions(+), 15 deletions(-)

 diff --git a/include/hw/boards.h b/include/hw/boards.h
 index 191a5b3..fa21758 100644
 --- a/include/hw/boards.h
 +++ b/include/hw/boards.h
 @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
   * CPUArchId:
   * @arch_id - architecture-dependent CPU ID of present or possible 
 CPU  
>>>
>>> I know this isn't really in scope for this patch, but is @arch_id here
>>> supposed to have meaning defined by the target, or by the machine?
>>>
>>> If it's the machime, it could do with a rename - "arch" means target
>>> to most people (thanks to Linux).
>>>
>>> If it's the target, it's kind of bogus, because it doesn't necessarily
>>> have a clear meaning per target - get_arch_id in CPUClass has the same
>>> problem, which is probably one reason it's basically only used by the
>>> x86 code at present.
>>>
>>> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
>>> CPU.. but only on some cpu models, not all.  There will generally be
>>> some kind of master PIC id, but there are different PIC models on
>>> different boards.  What goes in the devicetree?  Well only some
>>> machines use devicetree, and they might define the cpu reg 
>>> differently.
>>>
>>> Board designs will generally try to make some if not all of those
>>> possible values equal for simplicity, but there's still no real way of
>>> defining a sensible arch_id independent of machine / board.
>> I'd say arch_id is machine specific so far, it was introduced when we
>> didn't have CpuInstanceProperties and at that time we considered only
>> vcpus (threads) and doesn't really apply to spapr cores.
>>
>> In general we could do away with arch_id and use CpuInstanceProperties
>> instead, but arch_id also serves aux purpose, it allows machine to
>> pre-calculate(cache) apic-id/mpidr values in one place and then they
>> are/(could be) used by arch in-depended code to build acpi tables.
>> So if we drop arch_id we would need to introduce a machine hook,
>> which would translate CpuInstanceProperties into current arch_id.
>
> I think we need to do a better to job documenting where exactly
> we expect arch_id to be used and how, so people know what it's
> supposed to return.
>
> If the only place where it's useful now is ACPI code (is it?),
> should we rename it to something like get_acpi_id()?  

 It is also used in hw/s390x/sclp.c to fill out a control block, so acpi
 isn't the only user.  
>>>
>>> Yeah.. this is kind of bogus.  The s390 use is in machine specific
>>> code, so it's basically just re-using the field for an unrelated usage
>>> to the x86/arm one (ACPI).

as index == arch_id on s390x

Re: [Qemu-devel] [Qemu-block] [PATCH] block: all I/O should be completed before removing throttle timers.

2017-11-10 Thread Alberto Garcia

On Sat 21 Oct 2017 07:34:00 AM CEST, Zhengui Li wrote:
> From: Zhengui 
>
> In blk_remove_bs, all I/O should be completed before removing throttle
> timers. If there has inflight I/O, removing throttle timers here will
> cause the inflight I/O never return.
> This patch add bdrv_drained_begin before throttle_timers_detach_aio_context
> to let all I/O completed before removing throttle timers.
>
> Signed-off-by: Zhengui 
> ---
>  block/block-backend.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 45d9101..9edc452 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -660,7 +660,11 @@ void blk_remove_bs(BlockBackend *blk)
>  notifier_list_notify(&blk->remove_bs_notifiers, blk);
>  if (blk->public.throttle_group_member.throttle_state) {
>  tt = &blk->public.throttle_group_member.throttle_timers;
> +BlockDriverState *bs;
> +bs = blk_bs(blk);
> +bdrv_drained_begin(bs);
>  throttle_timers_detach_aio_context(tt);
> +bdrv_drained_end(bs);

You can keep my R-b, but seeing this in context I think you should
define BlockDriverState either at the beginning of the block or together
with ThrottleTimers at the beginning of the function.

Berto

Re: [Qemu-devel] [Qemu devel PATCH] MAINTAINERS: Add entries for Smartfusion2

2017-11-10 Thread Peter Maydell

On 10 November 2017 at 00:22, Philippe Mathieu-Daudé  wrote:
> On 11/09/2017 08:55 PM, Peter Maydell wrote:
>> I don't in general expect to take pull requests from
>> everybody listed as a maintainer in the MAINTAINERS file.
>> That just means "I'm going to be reviewing and should
>> be cc'd on patches". Pull requests are sent by people
>> who are maintainers for a subsystem. Rule of thumb:
>> unless somebody asks you to send a pull request, you
>> don't need to do it.
>
> Ok, please apologize my misunderstanding. I still think the M: entry
> stand for 'Maintainer' instead of 'Mail', and still don't understand the
> difference with a "Designated reviewer" (R: entry):
>
> M: Mail patches to: FullName 
> R: Designated reviewer: FullName 
>These reviewers should be CCed on patches.
>
> "Designated reviewer" seems to duplicate the M: entry and is therefore
> confusing. Can we simply remove it instead?

I hadn't realized we had an 'R:' tag in MAINTAINERS...

> --
> Some people are not content with the amount of mail they get, and would
> like to be CCed on patches for areas they do not maintain.  Let them
> satisfy their own appetite for qemu-devel messages.
>
> Seriously: the purpose here is a bit different from the Linux kernel.
> While Linux uses "R" to designate non-maintainers for reviewing patches
> in a given area, in QEMU I would also like to use "R" so that people can
> delegate sending pull requests while keeping some degree of oversight.
> --

So, my view, based on what happens in practice:
 * "maintainer" means you are in effect accepting some responsibility
   for the continued maintenance of some bit of the codebase, ie
   you actually will review stuff
 * "reviewer" is a bit weird but I guess is just asking for cc:
   without promising to actually do anything
 * somebody who sends me pull requests is effectively somebody we've
   given the ability to make direct more-or-less unchecked commits
   to master, so that is given out more sparingly and for larger
   subsystems

But MAINTAINERS is mostly about what submitters need to do (ie
who to send patchmails to), so it doesn't particularly document
how patches flow onward into master, which varies. (For instance
the block layer folks have a two-level setup where some trees
get merged into others before they go to master. ARM devboards
go through me, and "maintainer" just means I let somebody else
deal with the device specifics if possible but am still the
reviewer of last resort.)

thanks
-- PMM

Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field

2017-11-10 Thread Eduardo Habkost

On Fri, Nov 10, 2017 at 01:34:42PM +0100, David Hildenbrand wrote:
> On 10.11.2017 11:14, Cornelia Huck wrote:
> > On Thu, 9 Nov 2017 18:02:35 -0200
> > Eduardo Habkost  wrote:
> > 
> >> On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> >>> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
>  On Mon, 6 Nov 2017 16:02:16 -0200
>  Eduardo Habkost  wrote:
>    
> > On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
> >> On Thu, 19 Oct 2017 17:31:51 +1100
> >> David Gibson  wrote:
> >> 
> >>> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
>  For enabling early cpu to numa node configuration at runtime
>  qmp_query_hotpluggable_cpus() should provide a list of available
>  cpu slots at early stage, before machine_init() is called and
>  the 1st cpu is created, so that mgmt might be able to call it
>  and use output to set numa mapping.
>  Use MachineClass::possible_cpu_arch_ids() callback to set
>  cpu type info, along with the rest of possible cpu properties,
>  to let machine define which cpu type* will be used.
> 
>  * for SPAPR it will be a spapr core type and for ARM/s390x/x86
>    a respective descendant of CPUClass.
> 
>  Move parse_numa_opts() in vl.c after cpu_model is parsed into
>  cpu_type so that possible_cpu_arch_ids() would know which
>  cpu_type to use during layout initialization.
> 
>  Signed-off-by: Igor Mammedov   
> >>>
> >>> Reviewed-by: David Gibson 
> >>> 
>  ---
>    v2:
>   - fix NULL dereference caused by not initialized
> MachineState::cpu_type at the time parse_numa_opts()
> were called
>  ---
>   include/hw/boards.h|  2 ++
>   hw/arm/virt.c  |  3 ++-
>   hw/core/machine.c  | 12 ++--
>   hw/i386/pc.c   |  4 +++-
>   hw/ppc/spapr.c | 13 -
>   hw/s390x/s390-virtio-ccw.c |  1 +
>   vl.c   |  3 +--
>   7 files changed, 23 insertions(+), 15 deletions(-)
> 
>  diff --git a/include/hw/boards.h b/include/hw/boards.h
>  index 191a5b3..fa21758 100644
>  --- a/include/hw/boards.h
>  +++ b/include/hw/boards.h
>  @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState 
>  *machine,
>    * CPUArchId:
>    * @arch_id - architecture-dependent CPU ID of present or possible 
>  CPU  
> >>>
> >>> I know this isn't really in scope for this patch, but is @arch_id here
> >>> supposed to have meaning defined by the target, or by the machine?
> >>>
> >>> If it's the machime, it could do with a rename - "arch" means target
> >>> to most people (thanks to Linux).
> >>>
> >>> If it's the target, it's kind of bogus, because it doesn't necessarily
> >>> have a clear meaning per target - get_arch_id in CPUClass has the same
> >>> problem, which is probably one reason it's basically only used by the
> >>> x86 code at present.
> >>>
> >>> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> >>> CPU.. but only on some cpu models, not all.  There will generally be
> >>> some kind of master PIC id, but there are different PIC models on
> >>> different boards.  What goes in the devicetree?  Well only some
> >>> machines use devicetree, and they might define the cpu reg 
> >>> differently.
> >>>
> >>> Board designs will generally try to make some if not all of those
> >>> possible values equal for simplicity, but there's still no real way of
> >>> defining a sensible arch_id independent of machine / board.
> >> I'd say arch_id is machine specific so far, it was introduced when we
> >> didn't have CpuInstanceProperties and at that time we considered only
> >> vcpus (threads) and doesn't really apply to spapr cores.
> >>
> >> In general we could do away with arch_id and use CpuInstanceProperties
> >> instead, but arch_id also serves aux purpose, it allows machine to
> >> pre-calculate(cache) apic-id/mpidr values in one place and then they
> >> are/(could be) used by arch in-depended code to build acpi tables.
> >> So if we drop arch_id we would need to introduce a machine hook,
> >> which would translate CpuInstanceProperties into current arch_id.
> >
> > I think we need to do a better to job documenting where exactly
> > we expect arch_id to be used and how, so people know what it's
> > supposed to return.
> >
> > If the only place where it's useful now is ACPI code (is it?),
> > should we rename it to something like get_acpi_id()?  
> 
>  It is also used in hw/s390x/sclp.c

Re: [Qemu-devel] [RFC v2 3/6] possible_cpus: add CPUArchId::type field

2017-11-10 Thread David Hildenbrand

On 10.11.2017 13:58, Eduardo Habkost wrote:
> On Fri, Nov 10, 2017 at 01:34:42PM +0100, David Hildenbrand wrote:
>> On 10.11.2017 11:14, Cornelia Huck wrote:
>>> On Thu, 9 Nov 2017 18:02:35 -0200
>>> Eduardo Habkost  wrote:
>>>
 On Thu, Nov 09, 2017 at 05:58:03PM +1100, David Gibson wrote:
> On Tue, Nov 07, 2017 at 04:04:04PM +0100, Cornelia Huck wrote:  
>> On Mon, 6 Nov 2017 16:02:16 -0200
>> Eduardo Habkost  wrote:
>>   
>>> On Tue, Oct 31, 2017 at 03:01:14PM +0100, Igor Mammedov wrote:  
 On Thu, 19 Oct 2017 17:31:51 +1100
 David Gibson  wrote:
 
> On Wed, Oct 18, 2017 at 01:12:12PM +0200, Igor Mammedov wrote:
>> For enabling early cpu to numa node configuration at runtime
>> qmp_query_hotpluggable_cpus() should provide a list of available
>> cpu slots at early stage, before machine_init() is called and
>> the 1st cpu is created, so that mgmt might be able to call it
>> and use output to set numa mapping.
>> Use MachineClass::possible_cpu_arch_ids() callback to set
>> cpu type info, along with the rest of possible cpu properties,
>> to let machine define which cpu type* will be used.
>>
>> * for SPAPR it will be a spapr core type and for ARM/s390x/x86
>>   a respective descendant of CPUClass.
>>
>> Move parse_numa_opts() in vl.c after cpu_model is parsed into
>> cpu_type so that possible_cpu_arch_ids() would know which
>> cpu_type to use during layout initialization.
>>
>> Signed-off-by: Igor Mammedov   
>
> Reviewed-by: David Gibson 
> 
>> ---
>>   v2:
>>  - fix NULL dereference caused by not initialized
>>MachineState::cpu_type at the time parse_numa_opts()
>>were called
>> ---
>>  include/hw/boards.h|  2 ++
>>  hw/arm/virt.c  |  3 ++-
>>  hw/core/machine.c  | 12 ++--
>>  hw/i386/pc.c   |  4 +++-
>>  hw/ppc/spapr.c | 13 -
>>  hw/s390x/s390-virtio-ccw.c |  1 +
>>  vl.c   |  3 +--
>>  7 files changed, 23 insertions(+), 15 deletions(-)
>>
>> diff --git a/include/hw/boards.h b/include/hw/boards.h
>> index 191a5b3..fa21758 100644
>> --- a/include/hw/boards.h
>> +++ b/include/hw/boards.h
>> @@ -80,6 +80,7 @@ void machine_set_cpu_numa_node(MachineState 
>> *machine,
>>   * CPUArchId:
>>   * @arch_id - architecture-dependent CPU ID of present or possible 
>> CPU  
>
> I know this isn't really in scope for this patch, but is @arch_id here
> supposed to have meaning defined by the target, or by the machine?
>
> If it's the machime, it could do with a rename - "arch" means target
> to most people (thanks to Linux).
>
> If it's the target, it's kind of bogus, because it doesn't necessarily
> have a clear meaning per target - get_arch_id in CPUClass has the same
> problem, which is probably one reason it's basically only used by the
> x86 code at present.
>
> e.g. for target/ppc, what do we use?  There's the PIR, which is in the
> CPU.. but only on some cpu models, not all.  There will generally be
> some kind of master PIC id, but there are different PIC models on
> different boards.  What goes in the devicetree?  Well only some
> machines use devicetree, and they might define the cpu reg 
> differently.
>
> Board designs will generally try to make some if not all of those
> possible values equal for simplicity, but there's still no real way of
> defining a sensible arch_id independent of machine / board.
 I'd say arch_id is machine specific so far, it was introduced when we
 didn't have CpuInstanceProperties and at that time we considered only
 vcpus (threads) and doesn't really apply to spapr cores.

 In general we could do away with arch_id and use CpuInstanceProperties
 instead, but arch_id also serves aux purpose, it allows machine to
 pre-calculate(cache) apic-id/mpidr values in one place and then they
 are/(could be) used by arch in-depended code to build acpi tables.
 So if we drop arch_id we would need to introduce a machine hook,
 which would translate CpuInstanceProperties into current arch_id.
>>>
>>> I think we need to do a better to job documenting where exactly
>>> we expect arch_id to be used and how, so people know what it's
>>> supposed to return.
>>>
>>> If the only place where it's useful now is ACPI code (is it?),
>>> should we rename it to something like get_acpi_id()?  
>>>

Re: [Qemu-devel] [RFC PATCH 19/26] cpu-exec: reset exit flag before calling cpu_exec_nocache

2017-11-10 Thread Paolo Bonzini

On 10/11/2017 13:29, Pavel Dovgalyuk wrote:
>> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
>
> I tried this approach and it didn't work.
> I think iothread sets u16.high flag after resetting it in 
> cpu_handle_interrupt.

 But why is this a problem?  The TB would exit immediately and go again
 to cpu_handle_interrupt.  cpu_handle_interrupt returns true and
 cpu_handle_exception causes the exception via cpu_exec_nocache.
>>>
>>> I've tested your variant more thoroughly.
>>> It seems, that iothread calls cpu_exec between 
>>> atomic_set(&cpu->icount_decr.u16.high, 0);
>>> in cpu_handle_interrupt and cpu_exec_nocache in cpu_handle_exception.
>>> I see no other reason, because this happens not for the every time.
>>> And cpu_handle_interrupt is not called again, because cpu_handle_exception 
>>> returns true.
>>> Therefore we have an infinite loop, because no other code here resets cpu-
>>> icount_decr.u16.high.
>>
>> Then returning true unconditionally is wrong in the cpu_exec_nocache
>> case.  What if you do:
>>
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index 61297f8f4a..fb5446be3e 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -470,7 +470,19 @@ static inline void cpu_handle_debug_exception(CPUState 
>> *cpu)
>>
>>  static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
>>  {
>> -if (cpu->exception_index >= 0) {
>> +if (cpu->exception_index < 0) {
>> +#ifndef CONFIG_USER_ONLY
>> +if (replay_has_exception()
>> +&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> +/* try to cause an exception pending in the log */
>> +cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), 
>> true);
>> +}
>> +#endif
>> +if (cpu->exception_index < 0) {
>> +return;
> 
> return false, I guess?
> This approach allows iterating in case of races
> and QEMU does not hangs anymore at replay.

Great, can you put this change the next time you send your series?
There are some parts that can definitely go in for 2.11.

Thanks,

Paolo

>> +}
>> +}
>> +
>>  if (cpu->exception_index >= EXCP_INTERRUPT) {
>>  /* exit request from the cpu execution loop */
>>  *ret = cpu->exception_index;
>> @@ -505,16 +517,6 @@ static inline bool cpu_handle_exception(CPUState *cpu, 
>> int *ret)
>>  }
>>  #endif
>>  }
>> -#ifndef CONFIG_USER_ONLY
>> -} else if (replay_has_exception()
>> -   && cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
>> -/* try to cause an exception pending in the log */
>> -cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0, curr_cflags()), 
>> true);
>> -*ret = -1;
>> -return true;
>> -#endif
>> -}
>> -
>>  return false;
>>  }
>>
> 
> 
> Pavel Dovgalyuk
>

Re: [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-10 Thread Kevin Wolf

Am 10.11.2017 um 03:45 hat Fam Zheng geschrieben:
> On Thu, 11/09 21:43, Max Reitz wrote:
> > Draining a BDS may lead to graph modifications, which in turn may result
> > in it and other BDS being stripped of their current references.  If
> > bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
> > references themselves, the BDS they are trying to drain (or undrain) may
> > disappear right under their feet -- or, more specifically, under the
> > feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().
> > 
> > This fixes an occasional hang of iotest 194.
> > 
> > Signed-off-by: Max Reitz 
> > ---
> >  block/io.c | 47 ---
> >  1 file changed, 44 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index 3d5ef2cabe..a0a2833e8e 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
> >  bool waited = true;
> >  BlockDriverState *bs;
> >  BdrvNextIterator it;
> > -GSList *aio_ctxs = NULL, *ctx;
> > +GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
> > +
> > +/* Must be called from the main loop */
> > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> >  
> >  block_job_pause_all();
> >  
> > @@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
> >  if (!g_slist_find(aio_ctxs, aio_context)) {
> >  aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
> >  }
> > +
> > +/* Keep a strong reference to all root BDS and copy them into
> > + * an own list because draining them may lead to graph
> > + * modifications. */
> > +bdrv_ref(bs);
> > +bs_list = g_slist_prepend(bs_list, bs);
> >  }
> >  
> >  /* Note that completion of an asynchronous I/O operation can trigger 
> > any
> > @@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
> >  AioContext *aio_context = ctx->data;
> >  
> >  aio_context_acquire(aio_context);
> > -for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
> > +for (bs_list_entry = bs_list; bs_list_entry;
> > + bs_list_entry = bs_list_entry->next)
> > +{
> > +bs = bs_list_entry->data;
> > +
> >  if (aio_context == bdrv_get_aio_context(bs)) {
> >  waited |= bdrv_drain_recurse(bs, true);
> >  }
> > @@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
> >  }
> >  }
> >  
> > +for (bs_list_entry = bs_list; bs_list_entry;
> > + bs_list_entry = bs_list_entry->next)
> > +{
> > +bdrv_unref(bs_list_entry->data);
> > +}
> > +
> >  g_slist_free(aio_ctxs);
> > +g_slist_free(bs_list);
> >  }
> >  
> >  void bdrv_drain_all_end(void)
> >  {
> >  BlockDriverState *bs;
> >  BdrvNextIterator it;
> > +GSList *bs_list = NULL, *bs_list_entry;
> > +
> > +/* Must be called from the main loop */
> > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> >  
> > +/* Keep a strong reference to all root BDS and copy them into an
> > + * own list because draining them may lead to graph modifications.
> > + */
> >  for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
> > -AioContext *aio_context = bdrv_get_aio_context(bs);
> > +bdrv_ref(bs);
> > +bs_list = g_slist_prepend(bs_list, bs);
> > +}
> > +
> > +for (bs_list_entry = bs_list; bs_list_entry;
> > + bs_list_entry = bs_list_entry->next)
> > +{
> > +AioContext *aio_context;
> > +
> > +bs = bs_list_entry->data;
> > +aio_context = bdrv_get_aio_context(bs);
> >  
> >  aio_context_acquire(aio_context);
> >  aio_enable_external(aio_context);
> >  bdrv_parent_drained_end(bs);
> >  bdrv_drain_recurse(bs, false);
> >  aio_context_release(aio_context);
> > +
> > +bdrv_unref(bs);
> >  }
> >  
> > +g_slist_free(bs_list);
> > +
> >  block_job_resume_all();
> >  }
> 
> It is better to put the references into BdrvNextIterator and introduce
> bdrv_next_iterator_destroy() to free them? You'll need to touch all callers
> because it is not C++, but it secures all of rest, which seems vulnerable in 
> the
> same pattern, for example the aio_poll() in iothread_stop_all().

You could automatically free the references when bdrv_next() returns
NULL. Then you need an explicit bdrv_next_iterator_destroy() only for
callers that stop iterating halfway through the list.

Do you actually need to keep references to all BDSes in the whole list
while using the iterator or would it be enough to just keep a reference
to the current one?

Kevin

[Qemu-devel] [PULL 1/3] virtio-gpu: fix bug in host memory calculation.

2017-11-10 Thread Gerd Hoffmann

From: Tao Wu 

The old code treats bits as bytes when calculating host memory usage.
Change it to be consistent with allocation logic in pixman library.

Signed-off-by: Tao Wu 
Message-Id: <20171109181741.31318-1-lep...@google.com>
Reviewed-by: Marc-André Lureau 
Signed-off-by: Gerd Hoffmann 
---
 hw/display/virtio-gpu.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 43bbe09ea0..274e365713 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -322,6 +322,18 @@ static pixman_format_code_t get_pixman_format(uint32_t 
virtio_gpu_format)
 }
 }
 
+static uint32_t calc_image_hostmem(pixman_format_code_t pformat,
+   uint32_t width, uint32_t height)
+{
+/* Copied from pixman/pixman-bits-image.c, skip integer overflow check.
+ * pixman_image_create_bits will fail in case it overflow.
+ */
+
+int bpp = PIXMAN_FORMAT_BPP(pformat);
+int stride = ((width * bpp + 0x1f) >> 5) * sizeof(uint32_t);
+return height * stride;
+}
+
 static void virtio_gpu_resource_create_2d(VirtIOGPU *g,
   struct virtio_gpu_ctrl_command *cmd)
 {
@@ -366,7 +378,7 @@ static void virtio_gpu_resource_create_2d(VirtIOGPU *g,
 return;
 }
 
-res->hostmem = PIXMAN_FORMAT_BPP(pformat) * c2d.width * c2d.height;
+res->hostmem = calc_image_hostmem(pformat, c2d.width, c2d.height);
 if (res->hostmem + g->hostmem < g->conf.max_hostmem) {
 res->image = pixman_image_create_bits(pformat,
   c2d.width,
@@ -1087,7 +1099,7 @@ static int virtio_gpu_load(QEMUFile *f, void *opaque, 
size_t size,
 return -EINVAL;
 }
 
-res->hostmem = PIXMAN_FORMAT_BPP(pformat) * res->width * res->height;
+res->hostmem = calc_image_hostmem(pformat, res->width, res->height);
 
 res->addrs = g_new(uint64_t, res->iov_cnt);
 res->iov = g_new(struct iovec, res->iov_cnt);
-- 
2.9.3

[Qemu-devel] [PULL 3/3] vmsvga: use ARRAY_SIZE macro

2017-11-10 Thread Gerd Hoffmann

From: Philippe Mathieu-Daudé 

Applied using the Coccinelle semantic patch scripts/coccinelle/use_osdep.cocci

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170718061005.29518-23-f4...@amsat.org>
Signed-off-by: Gerd Hoffmann 
---
 hw/display/vmware_vga.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index cdc3fed6ca..0e6673a911 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -679,10 +679,9 @@ static void vmsvga_fifo_run(struct vmsvga_state_s *s)
 if (cursor.width > 256
 || cursor.height > 256
 || cursor.bpp > 32
-|| SVGA_BITMAP_SIZE(x, y)
-> sizeof(cursor.mask) / sizeof(cursor.mask[0])
+|| SVGA_BITMAP_SIZE(x, y) > ARRAY_SIZE(cursor.mask)
 || SVGA_PIXMAP_SIZE(x, y, cursor.bpp)
-> sizeof(cursor.image) / sizeof(cursor.image[0])) {
+> ARRAY_SIZE(cursor.image)) {
 goto badcmd;
 }
 
-- 
2.9.3

[Qemu-devel] [PULL 0/3] Vga 20171110 patches

2017-11-10 Thread Gerd Hoffmann

The following changes since commit b0fbe46ad82982b289a44ee2495b59b0bad8a842:

  Update version for v2.11.0-rc0 release (2017-11-07 16:05:28 +)

are available in the git repository at:

  git://git.kraxel.org/qemu tags/vga-20171110-pull-request

for you to fetch changes up to cf7040e284069fc235172c187551b268c66d8553:

  vmsvga: use ARRAY_SIZE macro (2017-11-10 14:25:56 +0100)


vga: bugfixes for 2.11



Gerd Hoffmann (1):
  vga: fix region checks in wraparound case

Philippe Mathieu-Daudé (1):
  vmsvga: use ARRAY_SIZE macro

Tao Wu (1):
  virtio-gpu: fix bug in host memory calculation.

 hw/display/vga.c|  4 ++--
 hw/display/virtio-gpu.c | 16 ++--
 hw/display/vmware_vga.c |  5 ++---
 3 files changed, 18 insertions(+), 7 deletions(-)

-- 
2.9.3

[Qemu-devel] [PULL 2/3] vga: fix region checks in wraparound case

2017-11-10 Thread Gerd Hoffmann

Cc: "Dr. David Alan Gilbert" 
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Dr. David Alan Gilbert 
Message-id: 20171030102830.4469-1-kra...@redhat.com
---
 hw/display/vga.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/display/vga.c b/hw/display/vga.c
index 1d19f6bc48..a64a0942da 100644
--- a/hw/display/vga.c
+++ b/hw/display/vga.c
@@ -1666,9 +1666,9 @@ static void vga_draw_graphic(VGACommonState *s, int 
full_update)
 /* scanline wraps from end of video memory to the start */
 assert(force_shadow);
 update = memory_region_snapshot_get_dirty(&s->vram, snap,
-  page0, 0);
+  page0, s->vbe_size - 
page0);
 update |= memory_region_snapshot_get_dirty(&s->vram, snap,
-   page1, 0);
+   0, page1);
 } else {
 update = memory_region_snapshot_get_dirty(&s->vram, snap,
   page0, page1 - page0);
-- 
2.9.3

Re: [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-10 Thread Fam Zheng

On Fri, 11/10 14:17, Kevin Wolf wrote:
> Am 10.11.2017 um 03:45 hat Fam Zheng geschrieben:
> > On Thu, 11/09 21:43, Max Reitz wrote:
> > > Draining a BDS may lead to graph modifications, which in turn may result
> > > in it and other BDS being stripped of their current references.  If
> > > bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
> > > references themselves, the BDS they are trying to drain (or undrain) may
> > > disappear right under their feet -- or, more specifically, under the
> > > feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().
> > > 
> > > This fixes an occasional hang of iotest 194.
> > > 
> > > Signed-off-by: Max Reitz 
> > > ---
> > >  block/io.c | 47 ---
> > >  1 file changed, 44 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/block/io.c b/block/io.c
> > > index 3d5ef2cabe..a0a2833e8e 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
> > >  bool waited = true;
> > >  BlockDriverState *bs;
> > >  BdrvNextIterator it;
> > > -GSList *aio_ctxs = NULL, *ctx;
> > > +GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
> > > +
> > > +/* Must be called from the main loop */
> > > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > >  
> > >  block_job_pause_all();
> > >  
> > > @@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
> > >  if (!g_slist_find(aio_ctxs, aio_context)) {
> > >  aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
> > >  }
> > > +
> > > +/* Keep a strong reference to all root BDS and copy them into
> > > + * an own list because draining them may lead to graph
> > > + * modifications. */
> > > +bdrv_ref(bs);
> > > +bs_list = g_slist_prepend(bs_list, bs);
> > >  }
> > >  
> > >  /* Note that completion of an asynchronous I/O operation can trigger 
> > > any
> > > @@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
> > >  AioContext *aio_context = ctx->data;
> > >  
> > >  aio_context_acquire(aio_context);
> > > -for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
> > > +for (bs_list_entry = bs_list; bs_list_entry;
> > > + bs_list_entry = bs_list_entry->next)
> > > +{
> > > +bs = bs_list_entry->data;
> > > +
> > >  if (aio_context == bdrv_get_aio_context(bs)) {
> > >  waited |= bdrv_drain_recurse(bs, true);
> > >  }
> > > @@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
> > >  }
> > >  }
> > >  
> > > +for (bs_list_entry = bs_list; bs_list_entry;
> > > + bs_list_entry = bs_list_entry->next)
> > > +{
> > > +bdrv_unref(bs_list_entry->data);
> > > +}
> > > +
> > >  g_slist_free(aio_ctxs);
> > > +g_slist_free(bs_list);
> > >  }
> > >  
> > >  void bdrv_drain_all_end(void)
> > >  {
> > >  BlockDriverState *bs;
> > >  BdrvNextIterator it;
> > > +GSList *bs_list = NULL, *bs_list_entry;
> > > +
> > > +/* Must be called from the main loop */
> > > +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > >  
> > > +/* Keep a strong reference to all root BDS and copy them into an
> > > + * own list because draining them may lead to graph modifications.
> > > + */
> > >  for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
> > > -AioContext *aio_context = bdrv_get_aio_context(bs);
> > > +bdrv_ref(bs);
> > > +bs_list = g_slist_prepend(bs_list, bs);
> > > +}
> > > +
> > > +for (bs_list_entry = bs_list; bs_list_entry;
> > > + bs_list_entry = bs_list_entry->next)
> > > +{
> > > +AioContext *aio_context;
> > > +
> > > +bs = bs_list_entry->data;
> > > +aio_context = bdrv_get_aio_context(bs);
> > >  
> > >  aio_context_acquire(aio_context);
> > >  aio_enable_external(aio_context);
> > >  bdrv_parent_drained_end(bs);
> > >  bdrv_drain_recurse(bs, false);
> > >  aio_context_release(aio_context);
> > > +
> > > +bdrv_unref(bs);
> > >  }
> > >  
> > > +g_slist_free(bs_list);
> > > +
> > >  block_job_resume_all();
> > >  }
> > 
> > It is better to put the references into BdrvNextIterator and introduce
> > bdrv_next_iterator_destroy() to free them? You'll need to touch all callers
> > because it is not C++, but it secures all of rest, which seems vulnerable 
> > in the
> > same pattern, for example the aio_poll() in iothread_stop_all().
> 
> You could automatically free the references when bdrv_next() returns
> NULL. Then you need an explicit bdrv_next_iterator_destroy() only for
> callers that stop iterating halfway through the list.

Yes, good idea.

> 
> Do you actually need to keep references to all BDSes in the whole list
> while using the iterator or woul

Re: [Qemu-devel] [Qemu-arm] [PATCH] highbank: validate register offset before access

2017-11-10 Thread Philippe Mathieu-Daudé

Hi Prasad, Moguofang.

On 11/09/2017 08:58 AM, P J P wrote:
> From: Prasad J Pandit 
> 
> An 'offset' parameter sent to highbank register r/w functions
> could be greater than number(NUM_REGS=0x200) of hb registers,
> leading to an OOB access issue. Add check to avoid it.
> 
> Reported-by: Moguofang (Dennis mo) 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/arm/highbank.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
> index 354c6b25a8..94df151454 100644
> --- a/hw/arm/highbank.c
> +++ b/hw/arm/highbank.c
> @@ -117,6 +117,9 @@ static void hb_regs_write(void *opaque, hwaddr offset,
>  }
>  }
>  
> +if (offset / 4 >= NUM_REGS) {

I'd report that:

   qemu_log_mask(LOG_UNIMP, ...

Cc'ing Shawn & Rob since this might also be a LOG_GUEST_ERROR.

> +return;
> +}
>  regs[offset/4] = value;
>  }
>  
> @@ -124,6 +127,10 @@ static uint64_t hb_regs_read(void *opaque, hwaddr offset,
>   unsigned size)
>  {
>  uint32_t *regs = opaque;
> +
> +if (offset / 4 >= NUM_REGS) {

Ditto.

> +return 0;
> +}

>From CODING_STYLE:

Mixed declarations (interleaving statements and declarations within
blocks) are generally not allowed; declarations should be at the
beginning of blocks.

>  uint32_t value = regs[offset/4];
>  
>  if ((offset == 0x100) || (offset == 0x108) || (offset == 0x10C)) {

Regards,

Phil.

Re: [Qemu-devel] [Qemu devel PATCH] MAINTAINERS: Add entries for Smartfusion2

2017-11-10 Thread Philippe Mathieu-Daudé

On 11/10/2017 09:56 AM, Peter Maydell wrote:
> On 10 November 2017 at 00:22, Philippe Mathieu-Daudé  wrote:
>> On 11/09/2017 08:55 PM, Peter Maydell wrote:
>>> I don't in general expect to take pull requests from
>>> everybody listed as a maintainer in the MAINTAINERS file.
>>> That just means "I'm going to be reviewing and should
>>> be cc'd on patches". Pull requests are sent by people
>>> who are maintainers for a subsystem. Rule of thumb:
>>> unless somebody asks you to send a pull request, you
>>> don't need to do it.
>>
>> Ok, please apologize my misunderstanding. I still think the M: entry
>> stand for 'Maintainer' instead of 'Mail', and still don't understand the
>> difference with a "Designated reviewer" (R: entry):
>>
>> M: Mail patches to: FullName 
>> R: Designated reviewer: FullName 
>>These reviewers should be CCed on patches.
>>
>> "Designated reviewer" seems to duplicate the M: entry and is therefore
>> confusing. Can we simply remove it instead?
> 
> I hadn't realized we had an 'R:' tag in MAINTAINERS...
> 
>> --
>> Some people are not content with the amount of mail they get, and would
>> like to be CCed on patches for areas they do not maintain.  Let them
>> satisfy their own appetite for qemu-devel messages.
>>
>> Seriously: the purpose here is a bit different from the Linux kernel.
>> While Linux uses "R" to designate non-maintainers for reviewing patches
>> in a given area, in QEMU I would also like to use "R" so that people can
>> delegate sending pull requests while keeping some degree of oversight.
>> --
> 
> So, my view, based on what happens in practice:
>  * "maintainer" means you are in effect accepting some responsibility
>for the continued maintenance of some bit of the codebase, ie
>you actually will review stuff
>  * "reviewer" is a bit weird but I guess is just asking for cc:
>without promising to actually do anything
>  * somebody who sends me pull requests is effectively somebody we've
>given the ability to make direct more-or-less unchecked commits
>to master, so that is given out more sparingly and for larger
>subsystems

Thanks for clarifying this!

This is more understandable (to me) than the "QEMU Maintainers" section
entries description.

> But MAINTAINERS is mostly about what submitters need to do (ie
> who to send patchmails to), so it doesn't particularly document
> how patches flow onward into master, which varies. (For instance
> the block layer folks have a two-level setup where some trees
> get merged into others before they go to master. ARM devboards
> go through me, and "maintainer" just means I let somebody else
> deal with the device specifics if possible but am still the
> reviewer of last resort.)

Ok :)

Regards,

Phil.

Re: [Qemu-devel] [Qemu-block] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete

2017-11-10 Thread Alberto Garcia

On Thu 09 Nov 2017 06:12:10 PM CET, Stefan Hajnoczi wrote:
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 45d9101..39c7cca 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -341,7 +341,7 @@ static void blk_delete(BlockBackend *blk)
>>  assert(!blk->name);
>>  assert(!blk->dev);
>>  if (blk->public.throttle_group_member.throttle_state) {
>> -blk_io_limits_disable(blk);
>> +throttle_group_unregister_tgm(&blk->public.throttle_group_member);
>
> The following assertions fail without the drain when there are pending
> requests:
>
>   void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
>   {
>   ThrottleState *ts = tgm->throttle_state;
>   ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
>   ThrottleGroupMember *token;
>   int i;
>
>   if (!ts) {
>   /* Discard already unregistered tgm */
>   return;
>   }
>
>   assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
>   assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
>   assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
>
> A safer approach is making blk_io_limits_disable(blk) skip the
> draining when blk_bs(blk) == NULL.  That is the only case where we are
> 100% sure that there are no pending requests.

I think so too.

And as I said in a previous e-mail I don't know if we should have a
valid blk->public.throttle_group_member.throttle_state with no timers at
all. I'd rather attach the default AioContext while the BDS is removed.

I'll try to prepare a patch with all of this together.

Berto

[Qemu-devel] [PULL 1/2] docker: Improved image checksum

2017-11-10 Thread Fam Zheng

When a base image locally defined by QEMU, such as in the debian images,
is updated, the dockerfile checksum mechanism in docker.py still skips
updating the derived image, because it only looks at the literal content
of the dockerfile, without considering changes to the base image.

For example we have a recent fix e58c1f9b35e81 that fixed
debian-win64-cross by updating its base image, debian8-mxe, but due to
above "feature" of docker.py the image in question is automatically NOT
rebuilt unless you add NOCACHE=1. It is noticed on Shippable:

https://app.shippable.com/github/qemu/qemu/runs/541/2/console

because after the fix is merged, the error still occurs, and the log
shows the container image is, as explained above, not updated.

This is because at the time docker.py was written, there wasn't any
dependencies between QEMU's docker images.

Now improve this to preprocess any "FROM qemu:*" directives in the
dockerfiles while doing checksum, and inline the base image's dockerfile
content, recursively. This ensures any changes on the depended _QEMU_
images are taken into account.

This means for external images that we expect to retrieve from docker
registries, we still do it as before. It is not perfect, because
registry images can get updated too. Technically we could substitute the
image name with its hex ID as obtained with $(docker images $IMAGE
--format="{{.Id}}"), but --format is not supported by RHEL 7, so leave
it for now.

Reported-by: Philippe Mathieu-Daudé 
Signed-off-by: Fam Zheng 
Message-Id: <20171103131229.4737-1-f...@redhat.com>
Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alex Bennée 
Signed-off-by: Fam Zheng 
---
 tests/docker/docker.py | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/tests/docker/docker.py b/tests/docker/docker.py
index 08122ca17d..1246ba9578 100755
--- a/tests/docker/docker.py
+++ b/tests/docker/docker.py
@@ -105,6 +105,28 @@ def _copy_binary_with_libs(src, dest_dir):
 so_path = os.path.dirname(l)
 _copy_with_mkdir(l , dest_dir, so_path)
 
+def _read_qemu_dockerfile(img_name):
+df = os.path.join(os.path.dirname(__file__), "dockerfiles",
+  img_name + ".docker")
+return open(df, "r").read()
+
+def _dockerfile_preprocess(df):
+out = ""
+for l in df.splitlines():
+if len(l.strip()) == 0 or l.startswith("#"):
+continue
+from_pref = "FROM qemu:"
+if l.startswith(from_pref):
+# TODO: Alternatively we could replace this line with "FROM $ID"
+# where $ID is the image's hex id obtained with
+#$ docker images $IMAGE --format="{{.Id}}"
+# but unfortunately that's not supported by RHEL 7.
+inlining = _read_qemu_dockerfile(l[len(from_pref):])
+out += _dockerfile_preprocess(inlining)
+continue
+out += l + "\n"
+return out
+
 class Docker(object):
 """ Running Docker commands """
 def __init__(self):
@@ -196,7 +218,7 @@ class Docker(object):
 checksum = self.get_image_dockerfile_checksum(tag)
 except Exception:
 return False
-return checksum == _text_checksum(dockerfile)
+return checksum == _text_checksum(_dockerfile_preprocess(dockerfile))
 
 def run(self, cmd, keep, quiet):
 label = uuid.uuid1().hex
-- 
2.13.6

[Qemu-devel] [PULL 0/2] Docker patches

2017-11-10 Thread Fam Zheng

The following changes since commit b0fbe46ad82982b289a44ee2495b59b0bad8a842:

  Update version for v2.11.0-rc0 release (2017-11-07 16:05:28 +)

are available in the git repository at:

  git://github.com/famz/qemu.git tags/docker-pull-request

for you to fetch changes up to 6423795efc5b665c595d9a0bf93cfbbca00362e9:

  docker: correctly escape $BACKEND in the help output (2017-11-08 10:59:42 
+0800)



Hi Peter,

Here are two fixes on docker testing (the one saying "improved" is to
make it actually "correct"). Thanks.

Fam



Fam Zheng (1):
  docker: Improved image checksum

Philippe Mathieu-Daudé (1):
  docker: correctly escape $BACKEND in the help output

 tests/docker/Makefile.include |  2 +-
 tests/docker/docker.py| 24 +++-
 2 files changed, 24 insertions(+), 2 deletions(-)

-- 
2.13.6

[Qemu-devel] [PULL 2/2] docker: correctly escape $BACKEND in the help output

2017-11-10 Thread Fam Zheng

From: Philippe Mathieu-Daudé 

In Makefiles the $ must be escaped as $$ in shell uses.

Since 8a2390a4f47:

 $ make docker
 [...]
 NETWORK=1Enable virtual network interface with default backend.
 NETWORK=ACKEND Enable virtual network interface with ACKEND.

Once escaped:

 $ make docker
 [...]
 NETWORK=1Enable virtual network interface with default backend.
 NETWORK=$BACKEND Enable virtual network interface with $BACKEND.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Fam Zheng 
Message-Id: <20171108024719.8389-1-f4...@amsat.org>
Signed-off-by: Fam Zheng 
---
 tests/docker/Makefile.include | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/docker/Makefile.include b/tests/docker/Makefile.include
index f1a398e9fa..de87341528 100644
--- a/tests/docker/Makefile.include
+++ b/tests/docker/Makefile.include
@@ -109,7 +109,7 @@ docker:
@echo 'DEBUG=1  Stop and drop to shell in the created 
container'
@echo ' before running the command.'
@echo 'NETWORK=1Enable virtual network interface with 
default backend.'
-   @echo 'NETWORK=$BACKEND Enable virtual network interface with 
$BACKEND.'
+   @echo 'NETWORK=$$BACKEND Enable virtual network interface with 
$$BACKEND.'
@echo 'NOUSER   Define to disable adding current user 
to containers passwd.'
@echo 'NOCACHE=1Ignore cache when build images.'
@echo 'EXECUTABLE=Include executable in image.'
-- 
2.13.6

Re: [Qemu-devel] Yet another git submodule rant

2017-11-10 Thread Alexey Kardashevskiy

On 10/11/17 21:41, Daniel P. Berrange wrote:
> On Fri, Nov 10, 2017 at 09:35:54PM +1100, Alexey Kardashevskiy wrote:
>> On 09/11/17 00:01, Daniel P. Berrange wrote:
>>> On Wed, Nov 08, 2017 at 09:26:01AM -0300, Philippe Mathieu-Daudé wrote:
 On 11/08/2017 06:57 AM, Thomas Huth wrote:
>
> That automatic git submodule stuff now broke my workflow again. I
> usually keep the git repository on my laptop and then simply rsync the
> sources (without .git directories) to my target machine to compile it
> there. Used to work great for years. Now it's broken, the build process
> complains:
>
> GIT submodule checkout is out of date. Please run
>   scripts/git-submodule.sh update
> from the source directory checkout /home/thuth/devel/qemu
>
> Running "scripts/git-submodule.sh update" did not fix the issue at all -
> I first had to tinker with it for a while to find out that I simply have
> to delete ".git-submodule-status" in my git tree to fix the issue.
>
> I've got the feeling that all this submodule crap is constantly causing
> pain ... do we really need this? Can't we find another solution instead?
> Or at least stop modifying files automatically in the $SRC_PATH ?

 Also yesterday on IRC:

  [...] I downloaded the qemu source from git and tried to compile
 it. I am getting this:

 ./configure --static && make && sudo make install
  CC  ui/input-keymap.o
 ui/input-keymap.c:8:10: fatal error: ui/input-keymap-linux-to-qcode.c:
 No such file or directory
>>>
>>> I had a pull request merged yesterday later afternoon which possibly
>>> would address that problem, though hard hard to say for certain.
>>
>> wow, already? :(
>>
>> I still wonder why do not we checkout submodules into the build directory
>> and why .git-submodule-status is not there too...
> 
> That simply isn't the way submodules work, they are inherently part of
> the source tree, and the status file reflects that too.

Sorry, I am missing the point here. What precisely does prevent us from
checking out the required modules to the build directory and build them
there? git provides a submodule repository url and sha1 for the current
qemu branch.

Yes, the submodule tree inside the qemu tree might be different from the
one in the build directory but the purpose of all of this is to always
build the correct code - so this requirement is met. And it will still be
better than changing the $SRC_PATH when a user specifically asked not to do
that by calling "./configure --source-path='.



-- 
Alexey

Re: [Qemu-devel] Yet another git submodule rant

2017-11-10 Thread Peter Maydell

On 10 November 2017 at 13:46, Alexey Kardashevskiy  wrote:
> And it will still be
> better than changing the $SRC_PATH when a user specifically asked not to do
> that by calling "./configure --source-path='.

I'm not terribly happy with the submodule stuff either, but
that configure rune is not making use of a documented
or supported feature. --source-path is for telling configure
where the source code is. Passing it an empty string should
probably be rejected as an error.

thanks
-- PMM

[Qemu-devel] [PULL 2/2] ui: use QEMU_IS_ALIGNED macro

2017-11-10 Thread Gerd Hoffmann

From: Philippe Mathieu-Daudé 

Applied using the Coccinelle semantic patch scripts/coccinelle/use_osdep.cocci

Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20170718061005.29518-9-f4...@amsat.org>
Signed-off-by: Gerd Hoffmann 
---
 ui/console-gl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ui/console-gl.c b/ui/console-gl.c
index 5b77e7aa88..a56e1cd8eb 100644
--- a/ui/console-gl.c
+++ b/ui/console-gl.c
@@ -48,7 +48,7 @@ void surface_gl_create_texture(QemuGLShader *gls,
DisplaySurface *surface)
 {
 assert(gls);
-assert(surface_stride(surface) % surface_bytes_per_pixel(surface) == 0);
+assert(QEMU_IS_ALIGNED(surface_stride(surface), 
surface_bytes_per_pixel(surface)));
 
 switch (surface->format) {
 case PIXMAN_BE_b8g8r8x8:
-- 
2.9.3

[Qemu-devel] [PULL 0/2] Ui 20171110 patches

2017-11-10 Thread Gerd Hoffmann

The following changes since commit b0fbe46ad82982b289a44ee2495b59b0bad8a842:

  Update version for v2.11.0-rc0 release (2017-11-07 16:05:28 +)

are available in the git repository at:

  git://git.kraxel.org/qemu tags/ui-20171110-pull-request

for you to fetch changes up to 2e9a8565701c22a0090876cb9e2293db4a6fb205:

  ui: use QEMU_IS_ALIGNED macro (2017-11-10 14:27:29 +0100)


ui: fixes for 2.11



Gerd Hoffmann (1):
  ui: fix dcl unregister

Philippe Mathieu-Daudé (1):
  ui: use QEMU_IS_ALIGNED macro

 ui/console-gl.c | 2 +-
 ui/console.c| 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

-- 
2.9.3

[Qemu-devel] [PULL 1/2] ui: fix dcl unregister

2017-11-10 Thread Gerd Hoffmann

register checks for dcl->ds being NULL, to avoid registering
the same dcl twice.

Therefore dcl->ds must be cleared on unregister, otherwise
un-registering and re-registering doesn't work.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1510809
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Marc-André Lureau 
Message-id: 20171109105154.29414-1-kra...@redhat.com
---
 ui/console.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ui/console.c b/ui/console.c
index eca854cbd5..c4c95abed7 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1471,6 +1471,7 @@ void 
unregister_displaychangelistener(DisplayChangeListener *dcl)
 dcl->con->dcls--;
 }
 QLIST_REMOVE(dcl, next);
+dcl->ds = NULL;
 gui_setup_refresh(ds);
 }
 
-- 
2.9.3

[Qemu-devel] [PATCH v3 00/13] tpm: Extend TPM with state migration support (not 2.11)

2017-11-10 Thread Stefan Berger

This set of patches implements support for migrating the state of the
external 'swtpm' TPM emulator as well as that of the emulated device
interfaces. I have primarily tested this with the TIS and TPM 1.2 so
far, but it also seems to work with TPM 2.

The TIS is simplified first by reducing the number of buffers and read
and write offsets into these buffers. Following the state machine of the
TIS, a single buffer and r/w offset is enough for all localities since
only one locality can ever be active.

This series applies on top of my tpm-next branch.

One of the challenges that is addressed by this set of patches is the fact
that the TPM emulator may be processing a command while the state
serialization of the devices is supposed to happen. A necessary first step
has been implemented here that ensures that a response has been received
from the exernal emulator and the bottom half function, which delivers the
response and adjusts device registers (TIS or CRB), has been executed,
before the device's state is serialized.

A subsequent extension may need to address the live migration loop and delay
the serialization of devices until the response from the external TPM has
been received. Though the likelihood that someone executes a long-lasting
TPM command while this is occurring is certainly rare.

   Stefan

Stefan Berger (13):
  tpm_tis: convert uint32_t to size_t
  tpm_tis: limit size of buffer from backend
  tpm_tis: remove TPMSizeBuffer usage
  tpm_tis: move buffers from localities into common location
  tpm_tis: merge read and write buffer into single buffer
  tpm_tis: move r/w_offsets to TPMState
  tpm_tis: merge r/w_offset into rw_offset
  tpm: Implement tpm_sized_buffer_reset
  tpm: Introduce condition to notify waiters of completed command
  tpm: Introduce condition in TPM backend for notification
  tpm: implement tpm_backend_wait_cmd_completed
  tpm: extend TPM emulator with state migration support
  tpm_tis: extend TPM TIS with state migration support

 backends/tpm.c   |  29 +
 hw/tpm/tpm_emulator.c| 303 +--
 hw/tpm/tpm_tis.c | 216 +-
 hw/tpm/tpm_util.c|   7 +
 hw/tpm/tpm_util.h|   7 +
 include/sysemu/tpm_backend.h |  22 
 6 files changed, 483 insertions(+), 101 deletions(-)

-- 
2.5.5

[Qemu-devel] [Bug 1713825] Re: Booting Windows 2016 with qxl video crashes qemu

2017-11-10 Thread Thomas Huth

** Changed in: qemu
   Status: Incomplete => New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1713825

Title:
  Booting Windows 2016 with qxl video crashes qemu

Status in QEMU:
  New

Bug description:
  launched from libvirt.

  qemu version: 2.9.0
  host: Linux  4.9.34-gentoo #1 SMP Sat Jul 29 13:28:43 PDT 2017 
x86_64 Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz GenuineIntel GNU/Linux
  guest: Windows 2016 64 bit

  Thread 28 (Thread 0x7f0e2edff700 (LWP 29860)):
  #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
  set = {__val = {18446744067266837079, 139698892694944, 
139699853745096, 139700858749789, 4222451712, 139694281220640, 139694281220741, 
139694281220640, 139694281220640, 139694281220810, 
  139694281220940, 139694281220640, 139694281220940, 0, 0, 0}}
  pid = 
  tid = 
  #1  0x7f0ea40b644a in __GI_abort () at abort.c:89
  save_stage = 2
  act = {__sigaction_handler = {sa_handler = 0x7f0e2edfe5c0, 
sa_sigaction = 0x7f0e2edfe5c0}, sa_mask = {__val = {139694281219872, 
139698106269697, 139698892695344, 4, 2676511744, 0, 139698892695144, 0, 
139698892694912, 1, 4737316546111099904, 139700859888720, 
4737316546111099904, 139700862161824, 139700911349760, 94211934977482}}, 
sa_flags = 416, 
sa_restorer = 0x55af6ceb0500 <__PRETTY_FUNCTION__.36381>}
  sigs = {__val = {32, 0 }}
  #2  0x7f0ea40abab6 in __assert_fail_base (fmt=, 
assertion=assertion@entry=0x55af6ceafdca "offset < qxl->vga.vram_size", 
  file=file@entry=0x55af6ceaeaa0 
"/var/tmp/portage/app-emulation/qemu-2.9.0-r2/work/qemu-2.9.0/hw/display/qxl.c",
 line=line@entry=416, 
  function=function@entry=0x55af6ceb0500 <__PRETTY_FUNCTION__.36381> 
"qxl_ram_set_dirty") at assert.c:92
  str = 0x7f0d1c026220 "\340r\002\034\r\177"
  total = 4096
  #3  0x7f0ea40abb81 in __GI___assert_fail 
(assertion=assertion@entry=0x55af6ceafdca "offset < qxl->vga.vram_size", 
  file=file@entry=0x55af6ceaeaa0 
"/var/tmp/portage/app-emulation/qemu-2.9.0-r2/work/qemu-2.9.0/hw/display/qxl.c",
 line=line@entry=416, 
  function=function@entry=0x55af6ceb0500 <__PRETTY_FUNCTION__.36381> 
"qxl_ram_set_dirty") at assert.c:101
  No locals.
  #4  0x55af6cc58805 in qxl_ram_set_dirty (qxl=, 
ptr=) at 
/var/tmp/portage/app-emulation/qemu-2.9.0-r2/work/qemu-2.9.0/hw/display/qxl.c:416
  base = 
  offset = 
  qxl = 
  ptr = 
  base = 
  offset = 
  #5  0x55af6cc5b9e2 in interface_release_resource (sin=0x55af71a91ed0, 
ext=...) at 
/var/tmp/portage/app-emulation/qemu-2.9.0-r2/work/qemu-2.9.0/hw/display/qxl.c:767
  qxl = 0x55af71a91450
  ring = 
  item = 
  id = 18446690739814400920
  __func__ = "interface_release_resource"
  #6  0x7f0ea510afa8 in red_drawable_unref (red_drawable=0x7f0d1c026120) at 
red-worker.c:101
  No locals.
  #7  0x7f0ea510b609 in red_drawable_unref (red_drawable=) 
at red-worker.c:104
  No locals.
  #8  0x7f0ea510eae9 in drawable_unref 
(drawable=drawable@entry=0x7f0e68285ac0) at display-channel.c:1438
  display = 0x55af71dbd3c0
  __FUNCTION__ = "drawable_unref"
  #9  0x7f0ea51109f7 in draw_until (display=display@entry=0x55af71dbd3c0, 
surface=surface@entry=0x7f0e6828aae8, last=0x7f0e68285ac0) at 
display-channel.c:1637
  container = 0x0
  now = 0x7f0e68285ac0
  #10 0x7f0ea510f93f in display_channel_draw (display=0x55af71dbd3c0, 
area=0x7f0e2edfe8e0, surface_id=) at display-channel.c:1729
  surface = 0x7f0e6828aae8
  last = 
  __FUNCTION__ = "display_channel_draw"
  __func__ = "display_channel_draw"

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1713825/+subscriptions

[Qemu-devel] [PATCH v3 02/13] tpm_tis: limit size of buffer from backend

2017-11-10 Thread Stefan Berger

This is a preparatory patch for the subsequent ones where we
get rid of the flexibility of supporting any kind of buffer size
that the backend may support. We keep the size at 4096, which is
also the size the external emulator supports. So, limit the size
of the buffer we can support and pass it back to the backend.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 69fe531..90c6df2 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -1008,7 +1008,8 @@ static void tpm_tis_reset(DeviceState *dev)
 int c;
 
 s->be_tpm_version = tpm_backend_get_tpm_version(s->be_driver);
-s->be_buffer_size = tpm_backend_get_buffer_size(s->be_driver);
+s->be_buffer_size = MIN(tpm_backend_get_buffer_size(s->be_driver),
+TPM_TIS_BUFFER_MAX);
 
 tpm_backend_reset(s->be_driver);
 
@@ -1040,7 +1041,7 @@ static void tpm_tis_reset(DeviceState *dev)
 tpm_tis_realloc_buffer(&s->loc[c].r_buffer, s->be_buffer_size);
 }
 
-tpm_tis_do_startup_tpm(s, 0);
+tpm_tis_do_startup_tpm(s, s->be_buffer_size);
 }
 
 static const VMStateDescription vmstate_tpm_tis = {
-- 
2.5.5

[Qemu-devel] [PATCH v3 04/13] tpm_tis: move buffers from localities into common location

2017-11-10 Thread Stefan Berger

One read buffer and one write buffer is sufficient for all localities.
The localities cannot all be active at the same time, and only the active
locality can use the r/w buffers. Inactive localities will require the
COMMAND_READY flag to be set on the STS register to move to the READY
state, which then enables access to using the buffer for writing of a
command, while all other localities are inactive.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 34 +++---
 1 file changed, 15 insertions(+), 19 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index ddfcfc9..f6f5f17 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -64,16 +64,14 @@ typedef struct TPMLocality {
 
 uint16_t w_offset;
 uint16_t r_offset;
-unsigned char w_buffer[TPM_TIS_BUFFER_MAX];
-unsigned char r_buffer[TPM_TIS_BUFFER_MAX];
 } TPMLocality;
 
 typedef struct TPMState {
 ISADevice busdev;
 MemoryRegion mmio;
 
-uint32_t offset;
-uint8_t buf[TPM_TIS_BUFFER_MAX];
+unsigned char w_buffer[TPM_TIS_BUFFER_MAX];
+unsigned char r_buffer[TPM_TIS_BUFFER_MAX];
 
 uint8_t active_locty;
 uint8_t aborting_locty;
@@ -259,7 +257,7 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 {
 TPMLocality *locty_data = &s->loc[locty];
 
-tpm_tis_show_buffer(s->loc[locty].w_buffer, s->be_buffer_size,
+tpm_tis_show_buffer(s->w_buffer, s->be_buffer_size,
 "tpm_tis: To TPM");
 
 /*
@@ -270,9 +268,9 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 
 s->cmd = (TPMBackendCmd) {
 .locty = locty,
-.in = locty_data->w_buffer,
+.in = s->w_buffer,
 .in_len = locty_data->w_offset,
-.out = locty_data->r_buffer,
+.out = s->r_buffer,
 .out_len = s->be_buffer_size,
 };
 
@@ -424,7 +422,7 @@ static void tpm_tis_request_completed(TPMIf *ti)
 s->loc[locty].r_offset = 0;
 s->loc[locty].w_offset = 0;
 
-tpm_tis_show_buffer(s->loc[locty].r_buffer, s->be_buffer_size,
+tpm_tis_show_buffer(s->r_buffer, s->be_buffer_size,
 "tpm_tis: From TPM");
 
 if (TPM_TIS_IS_VALID_LOCTY(s->next_locty)) {
@@ -444,10 +442,10 @@ static uint32_t tpm_tis_data_read(TPMState *s, uint8_t 
locty)
 uint16_t len;
 
 if ((s->loc[locty].sts & TPM_TIS_STS_DATA_AVAILABLE)) {
-len = MIN(tpm_cmd_get_size(&s->loc[locty].r_buffer),
+len = MIN(tpm_cmd_get_size(&s->r_buffer),
   s->be_buffer_size);
 
-ret = s->loc[locty].r_buffer[s->loc[locty].r_offset++];
+ret = s->r_buffer[s->loc[locty].r_offset++];
 if (s->loc[locty].r_offset >= len) {
 /* got last byte */
 tpm_tis_sts_set(&s->loc[locty], TPM_TIS_STS_VALID);
@@ -493,12 +491,11 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 "tpm_tis: result buffer : ",
 s->loc[locty].r_offset);
 for (idx = 0;
- idx < MIN(tpm_cmd_get_size(&s->loc[locty].r_buffer),
-   s->be_buffer_size);
+ idx < MIN(tpm_cmd_get_size(&s->r_buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
 s->loc[locty].r_offset == idx ? '>' : ' ',
-s->loc[locty].r_buffer[idx],
+s->r_buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n"
@@ -506,12 +503,11 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 "tpm_tis: request buffer: ",
 s->loc[locty].w_offset);
 for (idx = 0;
- idx < MIN(tpm_cmd_get_size(s->loc[locty].w_buffer),
-   s->be_buffer_size);
+ idx < MIN(tpm_cmd_get_size(s->w_buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
 s->loc[locty].w_offset == idx ? '>' : ' ',
-s->loc[locty].w_buffer[idx],
+s->w_buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n");
@@ -573,7 +569,7 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr addr,
 if (s->active_locty == locty) {
 if ((s->loc[locty].sts & TPM_TIS_STS_DATA_AVAILABLE)) {
 val = TPM_TIS_BURST_COUNT(
-   MIN(tpm_cmd_get_size(&s->loc[locty].r_buffer),
+   MIN(tpm_cmd_get_size(&s->r_buffer),
s->be_buffer_size)
- s->loc[locty].r_offset) | s->loc[locty].sts;
 } else {
@@ -926,7 +922,7 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 
 while ((s->loc[locty].sts & TPM_TIS_STS_EXPECT) && size > 0) {
 if (s->loc[locty].w_offset < s->be_buffer_size) {
-s->loc[locty].w_buffer[s->loc[locty].w_offset++] =
+s->w_buffer[s->loc[locty].w_offset++] =
 (uint8_t)val;

[Qemu-devel] [PATCH v3 01/13] tpm_tis: convert uint32_t to size_t

2017-11-10 Thread Stefan Berger

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index dd43630..69fe531 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -974,7 +974,7 @@ static const MemoryRegionOps tpm_tis_memory_ops = {
 },
 };
 
-static int tpm_tis_do_startup_tpm(TPMState *s, uint32_t buffersize)
+static int tpm_tis_do_startup_tpm(TPMState *s, size_t buffersize)
 {
 return tpm_backend_startup_tpm(s->be_driver, buffersize);
 }
-- 
2.5.5

[Qemu-devel] [PATCH v3 07/13] tpm_tis: merge r/w_offset into rw_offset

2017-11-10 Thread Stefan Berger

We can now merge the r_offset and w_offset into a single rw_offset.
This is possible since when the offset is used for writing in
RECEPTION state then reads are ignore. Conversly, when the offset
is used for reading when in COMPLETION state, then writes are
ignored.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 60 
 1 file changed, 21 insertions(+), 39 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index df3..7d7e2cd 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -68,8 +68,7 @@ typedef struct TPMState {
 MemoryRegion mmio;
 
 unsigned char buffer[TPM_TIS_BUFFER_MAX];
-uint16_t w_offset;
-uint16_t r_offset;
+uint16_t rw_offset;
 
 uint8_t active_locty;
 uint8_t aborting_locty;
@@ -257,7 +256,7 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 "tpm_tis: To TPM");
 
 /*
- * w_offset serves as length indicator for length of data;
+ * rw_offset serves as length indicator for length of data;
  * it's reset when the response comes back
  */
 s->loc[locty].state = TPM_TIS_STATE_EXECUTION;
@@ -265,7 +264,7 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 s->cmd = (TPMBackendCmd) {
 .locty = locty,
 .in = s->buffer,
-.in_len = s->w_offset,
+.in_len = s->rw_offset,
 .out = s->buffer,
 .out_len = s->be_buffer_size,
 };
@@ -347,8 +346,7 @@ static void tpm_tis_new_active_locality(TPMState *s, 
uint8_t new_active_locty)
 /* abort -- this function switches the locality */
 static void tpm_tis_abort(TPMState *s, uint8_t locty)
 {
-s->r_offset = 0;
-s->w_offset = 0;
+s->rw_offset = 0;
 
 DPRINTF("tpm_tis: tis_abort: new active locality is %d\n", s->next_locty);
 
@@ -415,8 +413,7 @@ static void tpm_tis_request_completed(TPMIf *ti)
 tpm_tis_sts_set(&s->loc[locty],
 TPM_TIS_STS_VALID | TPM_TIS_STS_DATA_AVAILABLE);
 s->loc[locty].state = TPM_TIS_STATE_COMPLETION;
-s->r_offset = 0;
-s->w_offset = 0;
+s->rw_offset = 0;
 
 tpm_tis_show_buffer(s->buffer, s->be_buffer_size,
 "tpm_tis: From TPM");
@@ -441,14 +438,14 @@ static uint32_t tpm_tis_data_read(TPMState *s, uint8_t 
locty)
 len = MIN(tpm_cmd_get_size(&s->buffer),
   s->be_buffer_size);
 
-ret = s->buffer[s->r_offset++];
-if (s->r_offset >= len) {
+ret = s->buffer[s->rw_offset++];
+if (s->rw_offset >= len) {
 /* got last byte */
 tpm_tis_sts_set(&s->loc[locty], TPM_TIS_STS_VALID);
 tpm_tis_raise_irq(s, locty, TPM_TIS_INT_STS_VALID);
 }
 DPRINTF("tpm_tis: tpm_tis_data_read byte 0x%02x   [%d]\n",
-ret, s->r_offset - 1);
+ret, s->rw_offset - 1);
 }
 
 return ret;
@@ -483,26 +480,14 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 (int)tpm_tis_mmio_read(opaque, base + regs[idx], 4));
 }
 
-DPRINTF("tpm_tis: read offset   : %d\n"
+DPRINTF("tpm_tis: r/w offset: %d\n"
 "tpm_tis: result buffer : ",
-s->r_offset);
+s->rw_offset);
 for (idx = 0;
  idx < MIN(tpm_cmd_get_size(&s->buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
-s->r_offset == idx ? '>' : ' ',
-s->buffer[idx],
-((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
-}
-DPRINTF("\n"
-"tpm_tis: write offset  : %d\n"
-"tpm_tis: request buffer: ",
-s->w_offset);
-for (idx = 0;
- idx < MIN(tpm_cmd_get_size(s->buffer), s->be_buffer_size);
- idx++) {
-DPRINTF("%c%02x%s",
-s->w_offset == idx ? '>' : ' ',
+s->rw_offset == idx ? '>' : ' ',
 s->buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
@@ -567,9 +552,9 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr addr,
 val = TPM_TIS_BURST_COUNT(
MIN(tpm_cmd_get_size(&s->buffer),
s->be_buffer_size)
-   - s->r_offset) | s->loc[locty].sts;
+   - s->rw_offset) | s->loc[locty].sts;
 } else {
-avail = s->be_buffer_size - s->w_offset;
+avail = s->be_buffer_size - s->rw_offset;
 /*
  * byte-sized reads should not return 0x00 for 0x100
  * available bytes.
@@ -833,8 +818,7 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 switch (s->loc[locty].state) {
 
 case TPM_TIS_STATE_READY:
-s->w_offset = 0;
-s->r_offset = 0;
+s->rw_offset = 0;
 break;
 
 case TPM_TIS_STATE_IDLE:
@@ -852

[Qemu-devel] [PATCH v3 05/13] tpm_tis: merge read and write buffer into single buffer

2017-11-10 Thread Stefan Berger

Since we can only be in read or write mode, we can merge the buffers
into a single buffer.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index f6f5f17..0b6dd7f 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -70,8 +70,7 @@ typedef struct TPMState {
 ISADevice busdev;
 MemoryRegion mmio;
 
-unsigned char w_buffer[TPM_TIS_BUFFER_MAX];
-unsigned char r_buffer[TPM_TIS_BUFFER_MAX];
+unsigned char buffer[TPM_TIS_BUFFER_MAX];
 
 uint8_t active_locty;
 uint8_t aborting_locty;
@@ -257,7 +256,7 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 {
 TPMLocality *locty_data = &s->loc[locty];
 
-tpm_tis_show_buffer(s->w_buffer, s->be_buffer_size,
+tpm_tis_show_buffer(s->buffer, s->be_buffer_size,
 "tpm_tis: To TPM");
 
 /*
@@ -268,9 +267,9 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 
 s->cmd = (TPMBackendCmd) {
 .locty = locty,
-.in = s->w_buffer,
+.in = s->buffer,
 .in_len = locty_data->w_offset,
-.out = s->r_buffer,
+.out = s->buffer,
 .out_len = s->be_buffer_size,
 };
 
@@ -422,7 +421,7 @@ static void tpm_tis_request_completed(TPMIf *ti)
 s->loc[locty].r_offset = 0;
 s->loc[locty].w_offset = 0;
 
-tpm_tis_show_buffer(s->r_buffer, s->be_buffer_size,
+tpm_tis_show_buffer(s->buffer, s->be_buffer_size,
 "tpm_tis: From TPM");
 
 if (TPM_TIS_IS_VALID_LOCTY(s->next_locty)) {
@@ -442,10 +441,10 @@ static uint32_t tpm_tis_data_read(TPMState *s, uint8_t 
locty)
 uint16_t len;
 
 if ((s->loc[locty].sts & TPM_TIS_STS_DATA_AVAILABLE)) {
-len = MIN(tpm_cmd_get_size(&s->r_buffer),
+len = MIN(tpm_cmd_get_size(&s->buffer),
   s->be_buffer_size);
 
-ret = s->r_buffer[s->loc[locty].r_offset++];
+ret = s->buffer[s->loc[locty].r_offset++];
 if (s->loc[locty].r_offset >= len) {
 /* got last byte */
 tpm_tis_sts_set(&s->loc[locty], TPM_TIS_STS_VALID);
@@ -491,11 +490,11 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 "tpm_tis: result buffer : ",
 s->loc[locty].r_offset);
 for (idx = 0;
- idx < MIN(tpm_cmd_get_size(&s->r_buffer), s->be_buffer_size);
+ idx < MIN(tpm_cmd_get_size(&s->buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
 s->loc[locty].r_offset == idx ? '>' : ' ',
-s->r_buffer[idx],
+s->buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n"
@@ -503,11 +502,11 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 "tpm_tis: request buffer: ",
 s->loc[locty].w_offset);
 for (idx = 0;
- idx < MIN(tpm_cmd_get_size(s->w_buffer), s->be_buffer_size);
+ idx < MIN(tpm_cmd_get_size(s->buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
 s->loc[locty].w_offset == idx ? '>' : ' ',
-s->w_buffer[idx],
+s->buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n");
@@ -569,7 +568,7 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr addr,
 if (s->active_locty == locty) {
 if ((s->loc[locty].sts & TPM_TIS_STS_DATA_AVAILABLE)) {
 val = TPM_TIS_BURST_COUNT(
-   MIN(tpm_cmd_get_size(&s->r_buffer),
+   MIN(tpm_cmd_get_size(&s->buffer),
s->be_buffer_size)
- s->loc[locty].r_offset) | s->loc[locty].sts;
 } else {
@@ -922,7 +921,7 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 
 while ((s->loc[locty].sts & TPM_TIS_STS_EXPECT) && size > 0) {
 if (s->loc[locty].w_offset < s->be_buffer_size) {
-s->w_buffer[s->loc[locty].w_offset++] =
+s->buffer[s->loc[locty].w_offset++] =
 (uint8_t)val;
 val >>= 8;
 size--;
@@ -937,7 +936,7 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 /* we have a packet length - see if we have all of it */
 bool need_irq = !(s->loc[locty].sts & TPM_TIS_STS_VALID);
 
-len = tpm_cmd_get_size(&s->w_buffer);
+len = tpm_cmd_get_size(&s->buffer);
 if (len > s->loc[locty].w_offset) {
 tpm_tis_sts_set(&s->loc[locty],
 TPM_TIS_STS_EXPECT | TPM_TIS_STS_VALID);
-- 
2.5.5

[Qemu-devel] [PATCH v3 08/13] tpm: Implement tpm_sized_buffer_reset

2017-11-10 Thread Stefan Berger

Move the definition of TPMSizedBuffer out of tpm_tis.c into tpm_util.h
and implement tpm_sized_buffer_reset() for the following patches to use.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c  | 5 -
 hw/tpm/tpm_util.c | 7 +++
 hw/tpm/tpm_util.h | 7 +++
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 7d7e2cd..035c6ef 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -48,11 +48,6 @@ typedef enum {
 TPM_TIS_STATE_RECEPTION,
 } TPMTISState;
 
-typedef struct TPMSizedBuffer {
-uint32_t size;
-uint8_t  *buffer;
-} TPMSizedBuffer;
-
 /* locality data  -- all fields are persisted */
 typedef struct TPMLocality {
 TPMTISState state;
diff --git a/hw/tpm/tpm_util.c b/hw/tpm/tpm_util.c
index a317243..bf97811 100644
--- a/hw/tpm/tpm_util.c
+++ b/hw/tpm/tpm_util.c
@@ -288,3 +288,10 @@ int tpm_util_get_buffer_size(int tpm_fd, TPMVersion 
tpm_version,
 
 return 0;
 }
+
+void tpm_sized_buffer_reset(TPMSizedBuffer *tsb)
+{
+g_free(tsb->buffer);
+tsb->buffer = NULL;
+tsb->size = 0;
+}
diff --git a/hw/tpm/tpm_util.h b/hw/tpm/tpm_util.h
index 1c17e39..26c9613 100644
--- a/hw/tpm/tpm_util.h
+++ b/hw/tpm/tpm_util.h
@@ -39,4 +39,11 @@ static inline uint32_t tpm_cmd_get_size(const void *b)
 int tpm_util_get_buffer_size(int tpm_fd, TPMVersion tpm_version,
  size_t *buffersize);
 
+typedef struct TPMSizedBuffer {
+uint32_t size;
+uint8_t  *buffer;
+} TPMSizedBuffer;
+
+void tpm_sized_buffer_reset(TPMSizedBuffer *tsb);
+
 #endif /* TPM_TPM_UTIL_H */
-- 
2.5.5

[Qemu-devel] [PATCH v3 09/13] tpm: Introduce condition to notify waiters of completed command

2017-11-10 Thread Stefan Berger

Introduce a lock and a condition to notify anyone waiting for the completion
of the execution of a TPM command by the backend (thread). The backend
uses the condition to signal anyone waiting for command completion.
We need to place the condition in two locations: one is invoked by the
backend thread, the other by the bottom half thread.
We will use the signaling to wait for command completion before VM
suspend.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 035c6ef..86e9a92 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -80,6 +80,9 @@ typedef struct TPMState {
 TPMVersion be_tpm_version;
 
 size_t be_buffer_size;
+
+QemuMutex state_lock;
+QemuCond cmd_complete;
 } TPMState;
 
 #define TPM(obj) OBJECT_CHECK(TPMState, (obj), TYPE_TPM_TIS)
@@ -405,6 +408,8 @@ static void tpm_tis_request_completed(TPMIf *ti)
 }
 }
 
+qemu_mutex_lock(&s->state_lock);
+
 tpm_tis_sts_set(&s->loc[locty],
 TPM_TIS_STS_VALID | TPM_TIS_STS_DATA_AVAILABLE);
 s->loc[locty].state = TPM_TIS_STATE_COMPLETION;
@@ -419,6 +424,10 @@ static void tpm_tis_request_completed(TPMIf *ti)
 
 tpm_tis_raise_irq(s, locty,
   TPM_TIS_INT_DATA_AVAILABLE | TPM_TIS_INT_STS_VALID);
+
+/* notify of completed command */
+qemu_cond_signal(&s->cmd_complete);
+qemu_mutex_unlock(&s->state_lock);
 }
 
 /*
@@ -1046,6 +1055,9 @@ static void tpm_tis_initfn(Object *obj)
 memory_region_init_io(&s->mmio, OBJECT(s), &tpm_tis_memory_ops,
   s, "tpm-tis-mmio",
   TPM_TIS_NUM_LOCALITIES << TPM_TIS_LOCALITY_SHIFT);
+
+qemu_mutex_init(&s->state_lock);
+qemu_cond_init(&s->cmd_complete);
 }
 
 static void tpm_tis_class_init(ObjectClass *klass, void *data)
-- 
2.5.5

[Qemu-devel] [PATCH v3 13/13] tpm_tis: extend TPM TIS with state migration support

2017-11-10 Thread Stefan Berger

Extend the TPM TIS interface with state migration support.

We need to synchronize with the backend thread to make sure that a command
being processed by the external TPM emulator has completed and its
response been received. In case the bottom half did not run, we run the
function it is supposed to run.

Since only 1 locality can be active at any time we only need
to store the command buffer of that active locality.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 74 +++-
 1 file changed, 68 insertions(+), 6 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 86e9a92..c0a0204 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -393,12 +393,8 @@ static void tpm_tis_prep_abort(TPMState *s, uint8_t locty, 
uint8_t newlocty)
 tpm_tis_abort(s, locty);
 }
 
-/*
- * Callback from the TPM to indicate that the response was received.
- */
-static void tpm_tis_request_completed(TPMIf *ti)
+static void _tpm_tis_request_completed(TPMState *s)
 {
-TPMState *s = TPM(ti);
 uint8_t locty = s->cmd.locty;
 uint8_t l;
 
@@ -431,6 +427,14 @@ static void tpm_tis_request_completed(TPMIf *ti)
 }
 
 /*
+ * Callback from the TPM to indicate that the response was received.
+ */
+static void tpm_tis_request_completed(TPMIf *ti)
+{
+_tpm_tis_request_completed(TPM(ti));
+}
+
+/*
  * Read a byte of response data
  */
 static uint32_t tpm_tis_data_read(TPMState *s, uint8_t locty)
@@ -1012,9 +1016,67 @@ static void tpm_tis_reset(DeviceState *dev)
 tpm_tis_do_startup_tpm(s, s->be_buffer_size);
 }
 
+/* persistent state handling */
+
+static int tpm_tis_pre_save(void *opaque)
+{
+TPMState *s = opaque;
+uint8_t locty = s->active_locty;
+
+DPRINTF("tpm_tis: suspend: locty = %d : rw_offset = %u\n",
+locty, s->rw_offset);
+#ifdef DEBUG_TIS
+tpm_tis_dump_state(opaque, 0);
+#endif
+
+/*
+ * Synchronize with backend completion.
+ */
+tpm_backend_wait_cmd_completed(s->be_driver);
+
+if (TPM_TIS_IS_VALID_LOCTY(locty) &&
+s->loc[locty].state == TPM_TIS_STATE_EXECUTION) {
+/* bottom half did not run - run its function */
+_tpm_tis_request_completed(s);
+}
+
+return 0;
+}
+
+static const VMStateDescription vmstate_locty = {
+.name = "loc",
+.version_id = 1,
+.minimum_version_id = 0,
+.minimum_version_id_old = 0,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT32(state, TPMLocality),
+VMSTATE_UINT32(inte, TPMLocality),
+VMSTATE_UINT32(ints, TPMLocality),
+VMSTATE_UINT8(access, TPMLocality),
+VMSTATE_UINT32(sts, TPMLocality),
+VMSTATE_UINT32(iface_id, TPMLocality),
+VMSTATE_END_OF_LIST(),
+}
+};
+
 static const VMStateDescription vmstate_tpm_tis = {
 .name = "tpm",
-.unmigratable = 1,
+.version_id = 1,
+.minimum_version_id = 0,
+.minimum_version_id_old = 0,
+.pre_save  = tpm_tis_pre_save,
+.fields = (VMStateField[]) {
+VMSTATE_BUFFER(buffer, TPMState),
+VMSTATE_UINT16(rw_offset, TPMState),
+VMSTATE_UINT8(active_locty, TPMState),
+VMSTATE_UINT8(aborting_locty, TPMState),
+VMSTATE_UINT8(next_locty, TPMState),
+
+VMSTATE_STRUCT_ARRAY(loc, TPMState, TPM_TIS_NUM_LOCALITIES, 1,
+ vmstate_locty, TPMLocality),
+
+VMSTATE_END_OF_LIST()
+}
 };
 
 static Property tpm_tis_properties[] = {
-- 
2.5.5

[Qemu-devel] [PATCH v3 06/13] tpm_tis: move r/w_offsets to TPMState

2017-11-10 Thread Stefan Berger

Now that we have a single buffer, we also only need a single set of
read/write offsets into that buffer. This works since only one
locality can be active.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 57 +++-
 1 file changed, 27 insertions(+), 30 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 0b6dd7f..df3 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -61,9 +61,6 @@ typedef struct TPMLocality {
 uint32_t iface_id;
 uint32_t inte;
 uint32_t ints;
-
-uint16_t w_offset;
-uint16_t r_offset;
 } TPMLocality;
 
 typedef struct TPMState {
@@ -71,6 +68,8 @@ typedef struct TPMState {
 MemoryRegion mmio;
 
 unsigned char buffer[TPM_TIS_BUFFER_MAX];
+uint16_t w_offset;
+uint16_t r_offset;
 
 uint8_t active_locty;
 uint8_t aborting_locty;
@@ -254,8 +253,6 @@ static void tpm_tis_sts_set(TPMLocality *l, uint32_t flags)
  */
 static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 {
-TPMLocality *locty_data = &s->loc[locty];
-
 tpm_tis_show_buffer(s->buffer, s->be_buffer_size,
 "tpm_tis: To TPM");
 
@@ -268,7 +265,7 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 s->cmd = (TPMBackendCmd) {
 .locty = locty,
 .in = s->buffer,
-.in_len = locty_data->w_offset,
+.in_len = s->w_offset,
 .out = s->buffer,
 .out_len = s->be_buffer_size,
 };
@@ -350,8 +347,8 @@ static void tpm_tis_new_active_locality(TPMState *s, 
uint8_t new_active_locty)
 /* abort -- this function switches the locality */
 static void tpm_tis_abort(TPMState *s, uint8_t locty)
 {
-s->loc[locty].r_offset = 0;
-s->loc[locty].w_offset = 0;
+s->r_offset = 0;
+s->w_offset = 0;
 
 DPRINTF("tpm_tis: tis_abort: new active locality is %d\n", s->next_locty);
 
@@ -418,8 +415,8 @@ static void tpm_tis_request_completed(TPMIf *ti)
 tpm_tis_sts_set(&s->loc[locty],
 TPM_TIS_STS_VALID | TPM_TIS_STS_DATA_AVAILABLE);
 s->loc[locty].state = TPM_TIS_STATE_COMPLETION;
-s->loc[locty].r_offset = 0;
-s->loc[locty].w_offset = 0;
+s->r_offset = 0;
+s->w_offset = 0;
 
 tpm_tis_show_buffer(s->buffer, s->be_buffer_size,
 "tpm_tis: From TPM");
@@ -444,14 +441,14 @@ static uint32_t tpm_tis_data_read(TPMState *s, uint8_t 
locty)
 len = MIN(tpm_cmd_get_size(&s->buffer),
   s->be_buffer_size);
 
-ret = s->buffer[s->loc[locty].r_offset++];
-if (s->loc[locty].r_offset >= len) {
+ret = s->buffer[s->r_offset++];
+if (s->r_offset >= len) {
 /* got last byte */
 tpm_tis_sts_set(&s->loc[locty], TPM_TIS_STS_VALID);
 tpm_tis_raise_irq(s, locty, TPM_TIS_INT_STS_VALID);
 }
 DPRINTF("tpm_tis: tpm_tis_data_read byte 0x%02x   [%d]\n",
-ret, s->loc[locty].r_offset - 1);
+ret, s->r_offset - 1);
 }
 
 return ret;
@@ -488,24 +485,24 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 
 DPRINTF("tpm_tis: read offset   : %d\n"
 "tpm_tis: result buffer : ",
-s->loc[locty].r_offset);
+s->r_offset);
 for (idx = 0;
  idx < MIN(tpm_cmd_get_size(&s->buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
-s->loc[locty].r_offset == idx ? '>' : ' ',
+s->r_offset == idx ? '>' : ' ',
 s->buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n"
 "tpm_tis: write offset  : %d\n"
 "tpm_tis: request buffer: ",
-s->loc[locty].w_offset);
+s->w_offset);
 for (idx = 0;
  idx < MIN(tpm_cmd_get_size(s->buffer), s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
-s->loc[locty].w_offset == idx ? '>' : ' ',
+s->w_offset == idx ? '>' : ' ',
 s->buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
@@ -570,9 +567,9 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr addr,
 val = TPM_TIS_BURST_COUNT(
MIN(tpm_cmd_get_size(&s->buffer),
s->be_buffer_size)
-   - s->loc[locty].r_offset) | s->loc[locty].sts;
+   - s->r_offset) | s->loc[locty].sts;
 } else {
-avail = s->be_buffer_size - s->loc[locty].w_offset;
+avail = s->be_buffer_size - s->w_offset;
 /*
  * byte-sized reads should not return 0x00 for 0x100
  * available bytes.
@@ -836,8 +833,8 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 switch (s->loc[locty].state) {
 
 case TPM_TIS_STATE_READY:
-s->loc[locty].w_of

[Qemu-devel] [PATCH v3 03/13] tpm_tis: remove TPMSizeBuffer usage

2017-11-10 Thread Stefan Berger

Remove usage of TPMSizeBuffer. The size of the buffers is limited now
by s->be_buffer_size, which is the size of the buffer the TIS has
negotiated with the backend.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_tis.c | 68 
 1 file changed, 29 insertions(+), 39 deletions(-)

diff --git a/hw/tpm/tpm_tis.c b/hw/tpm/tpm_tis.c
index 90c6df2..ddfcfc9 100644
--- a/hw/tpm/tpm_tis.c
+++ b/hw/tpm/tpm_tis.c
@@ -64,8 +64,8 @@ typedef struct TPMLocality {
 
 uint16_t w_offset;
 uint16_t r_offset;
-TPMSizedBuffer w_buffer;
-TPMSizedBuffer r_buffer;
+unsigned char w_buffer[TPM_TIS_BUFFER_MAX];
+unsigned char r_buffer[TPM_TIS_BUFFER_MAX];
 } TPMLocality;
 
 typedef struct TPMState {
@@ -215,23 +215,19 @@ static uint8_t tpm_tis_locality_from_addr(hwaddr addr)
 return (uint8_t)((addr >> TPM_TIS_LOCALITY_SHIFT) & 0x7);
 }
 
-static uint32_t tpm_tis_get_size_from_buffer(const TPMSizedBuffer *sb)
-{
-return tpm_cmd_get_size(sb->buffer);
-}
-
-static void tpm_tis_show_buffer(const TPMSizedBuffer *sb, const char *string)
+static void tpm_tis_show_buffer(const unsigned char *buffer,
+size_t buffer_size, const char *string)
 {
 #ifdef DEBUG_TIS
 uint32_t len, i;
 
-len = tpm_tis_get_size_from_buffer(sb);
+len = MIN(tpm_cmd_get_size(buffer), buffer_size);
 DPRINTF("tpm_tis: %s length = %d\n", string, len);
 for (i = 0; i < len; i++) {
 if (i && !(i % 16)) {
 DPRINTF("\n");
 }
-DPRINTF("%.2X ", sb->buffer[i]);
+DPRINTF("%.2X ", buffer[i]);
 }
 DPRINTF("\n");
 #endif
@@ -263,7 +259,8 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 {
 TPMLocality *locty_data = &s->loc[locty];
 
-tpm_tis_show_buffer(&s->loc[locty].w_buffer, "tpm_tis: To TPM");
+tpm_tis_show_buffer(s->loc[locty].w_buffer, s->be_buffer_size,
+"tpm_tis: To TPM");
 
 /*
  * w_offset serves as length indicator for length of data;
@@ -273,10 +270,10 @@ static void tpm_tis_tpm_send(TPMState *s, uint8_t locty)
 
 s->cmd = (TPMBackendCmd) {
 .locty = locty,
-.in = locty_data->w_buffer.buffer,
+.in = locty_data->w_buffer,
 .in_len = locty_data->w_offset,
-.out = locty_data->r_buffer.buffer,
-.out_len = locty_data->r_buffer.size
+.out = locty_data->r_buffer,
+.out_len = s->be_buffer_size,
 };
 
 tpm_backend_deliver_request(s->be_driver, &s->cmd);
@@ -427,7 +424,8 @@ static void tpm_tis_request_completed(TPMIf *ti)
 s->loc[locty].r_offset = 0;
 s->loc[locty].w_offset = 0;
 
-tpm_tis_show_buffer(&s->loc[locty].r_buffer, "tpm_tis: From TPM");
+tpm_tis_show_buffer(s->loc[locty].r_buffer, s->be_buffer_size,
+"tpm_tis: From TPM");
 
 if (TPM_TIS_IS_VALID_LOCTY(s->next_locty)) {
 tpm_tis_abort(s, locty);
@@ -446,9 +444,10 @@ static uint32_t tpm_tis_data_read(TPMState *s, uint8_t 
locty)
 uint16_t len;
 
 if ((s->loc[locty].sts & TPM_TIS_STS_DATA_AVAILABLE)) {
-len = tpm_tis_get_size_from_buffer(&s->loc[locty].r_buffer);
+len = MIN(tpm_cmd_get_size(&s->loc[locty].r_buffer),
+  s->be_buffer_size);
 
-ret = s->loc[locty].r_buffer.buffer[s->loc[locty].r_offset++];
+ret = s->loc[locty].r_buffer[s->loc[locty].r_offset++];
 if (s->loc[locty].r_offset >= len) {
 /* got last byte */
 tpm_tis_sts_set(&s->loc[locty], TPM_TIS_STS_VALID);
@@ -494,11 +493,12 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 "tpm_tis: result buffer : ",
 s->loc[locty].r_offset);
 for (idx = 0;
- idx < tpm_tis_get_size_from_buffer(&s->loc[locty].r_buffer);
+ idx < MIN(tpm_cmd_get_size(&s->loc[locty].r_buffer),
+   s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
 s->loc[locty].r_offset == idx ? '>' : ' ',
-s->loc[locty].r_buffer.buffer[idx],
+s->loc[locty].r_buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n"
@@ -506,11 +506,12 @@ static void tpm_tis_dump_state(void *opaque, hwaddr addr)
 "tpm_tis: request buffer: ",
 s->loc[locty].w_offset);
 for (idx = 0;
- idx < tpm_tis_get_size_from_buffer(&s->loc[locty].w_buffer);
+ idx < MIN(tpm_cmd_get_size(s->loc[locty].w_buffer),
+   s->be_buffer_size);
  idx++) {
 DPRINTF("%c%02x%s",
 s->loc[locty].w_offset == idx ? '>' : ' ',
-s->loc[locty].w_buffer.buffer[idx],
+s->loc[locty].w_buffer[idx],
 ((idx & 0xf) == 0xf) ? "\ntpm_tis: " : "");
 }
 DPRINTF("\n");
@@ -572,11 +573,11 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr 
addr,
 if (s->active

[Qemu-devel] [PATCH v3 10/13] tpm: Introduce condition in TPM backend for notification

2017-11-10 Thread Stefan Berger

TPM backends will suspend independently of the frontends. Also
here we need to be able to wait for the TPM command to have been
completely processed.

Signed-off-by: Stefan Berger 
---
 backends/tpm.c   | 19 +++
 include/sysemu/tpm_backend.h | 14 ++
 2 files changed, 33 insertions(+)

diff --git a/backends/tpm.c b/backends/tpm.c
index 91222c5..bf0e120 100644
--- a/backends/tpm.c
+++ b/backends/tpm.c
@@ -20,6 +20,14 @@
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
 
+void tpm_backend_cmd_completed(TPMBackend *s)
+{
+qemu_mutex_lock(&s->state_lock);
+s->tpm_busy = false;
+qemu_cond_signal(&s->cmd_complete);
+qemu_mutex_unlock(&s->state_lock);
+}
+
 static void tpm_backend_request_completed_bh(void *opaque)
 {
 TPMBackend *s = TPM_BACKEND(opaque);
@@ -36,6 +44,9 @@ static void tpm_backend_worker_thread(gpointer data, gpointer 
user_data)
 k->handle_request(s, (TPMBackendCmd *)data);
 
 qemu_bh_schedule(s->bh);
+
+/* result delivered */
+tpm_backend_cmd_completed(s);
 }
 
 static void tpm_backend_thread_end(TPMBackend *s)
@@ -64,6 +75,10 @@ int tpm_backend_init(TPMBackend *s, TPMIf *tpmif, Error 
**errp)
 object_ref(OBJECT(tpmif));
 
 s->had_startup_error = false;
+s->tpm_busy = false;
+
+qemu_mutex_init(&s->state_lock);
+qemu_cond_init(&s->cmd_complete);
 
 return 0;
 }
@@ -93,6 +108,10 @@ bool tpm_backend_had_startup_error(TPMBackend *s)
 
 void tpm_backend_deliver_request(TPMBackend *s, TPMBackendCmd *cmd)
 {
+qemu_mutex_lock(&s->state_lock);
+s->tpm_busy = true;
+qemu_mutex_unlock(&s->state_lock);
+
 g_thread_pool_push(s->thread_pool, cmd, NULL);
 }
 
diff --git a/include/sysemu/tpm_backend.h b/include/sysemu/tpm_backend.h
index 0d6c994..39598e3 100644
--- a/include/sysemu/tpm_backend.h
+++ b/include/sysemu/tpm_backend.h
@@ -18,6 +18,7 @@
 #include "qapi-types.h"
 #include "qemu/option.h"
 #include "sysemu/tpm.h"
+#include "qemu/thread.h"
 
 #define TYPE_TPM_BACKEND "tpm-backend"
 #define TPM_BACKEND(obj) \
@@ -53,6 +54,10 @@ struct TPMBackend {
 char *id;
 
 QLIST_ENTRY(TPMBackend) list;
+
+QemuMutex state_lock;
+QemuCond cmd_complete; /* signaled once tpm_busy is false */
+bool tpm_busy;
 };
 
 struct TPMBackendClass {
@@ -206,6 +211,15 @@ size_t tpm_backend_get_buffer_size(TPMBackend *s);
  */
 TPMInfo *tpm_backend_query_tpm(TPMBackend *s);
 
+/**
+ * tpm_backend_cmd_completed:
+ * @s: the backend
+ *
+ * Mark the backend as not busy and notify anyone interested
+ * in the state changed
+ */
+void tpm_backend_cmd_completed(TPMBackend *s);
+
 TPMBackend *qemu_find_tpm_be(const char *id);
 
 #endif
-- 
2.5.5

[Qemu-devel] [PATCH v3 11/13] tpm: implement tpm_backend_wait_cmd_completed

2017-11-10 Thread Stefan Berger

Implement tpm_backend_wait_cmd_completed to synchronize with the
backend (thread) for the completion of a command.

Signed-off-by: Stefan Berger 
---
 backends/tpm.c   | 10 ++
 include/sysemu/tpm_backend.h |  8 
 2 files changed, 18 insertions(+)

diff --git a/backends/tpm.c b/backends/tpm.c
index bf0e120..5e4de27 100644
--- a/backends/tpm.c
+++ b/backends/tpm.c
@@ -28,6 +28,16 @@ void tpm_backend_cmd_completed(TPMBackend *s)
 qemu_mutex_unlock(&s->state_lock);
 }
 
+void tpm_backend_wait_cmd_completed(TPMBackend *s)
+{
+qemu_mutex_lock(&s->state_lock);
+
+if (s->tpm_busy) {
+qemu_cond_wait(&s->cmd_complete, &s->state_lock);
+}
+qemu_mutex_unlock(&s->state_lock);
+}
+
 static void tpm_backend_request_completed_bh(void *opaque)
 {
 TPMBackend *s = TPM_BACKEND(opaque);
diff --git a/include/sysemu/tpm_backend.h b/include/sysemu/tpm_backend.h
index 39598e3..1170cb9 100644
--- a/include/sysemu/tpm_backend.h
+++ b/include/sysemu/tpm_backend.h
@@ -220,6 +220,14 @@ TPMInfo *tpm_backend_query_tpm(TPMBackend *s);
  */
 void tpm_backend_cmd_completed(TPMBackend *s);
 
+/**
+ * tpm_backend_wait_cmd_completed:
+ * @s: the backend
+ *
+ * Wait the backend to not be busy anymore
+ */
+void tpm_backend_wait_cmd_completed(TPMBackend *s);
+
 TPMBackend *qemu_find_tpm_be(const char *id);
 
 #endif
-- 
2.5.5

[Qemu-devel] [PATCH v3 12/13] tpm: extend TPM emulator with state migration support

2017-11-10 Thread Stefan Berger

Extend the TPM emulator backend device with state migration support.

The external TPM emulator 'swtpm' provides a protocol over
its control channel to retrieve its state blobs. We implement
functions for getting and setting the different state blobs.

Since we have an external TPM emulator, we need to make sure
that we do not migrate the state for as long as it is busy
processing a request. We need to wait for notification that
the request has completed processing.

Signed-off-by: Stefan Berger 
---
 hw/tpm/tpm_emulator.c | 303 --
 1 file changed, 293 insertions(+), 10 deletions(-)

diff --git a/hw/tpm/tpm_emulator.c b/hw/tpm/tpm_emulator.c
index 3ae8bf6..f05753f 100644
--- a/hw/tpm/tpm_emulator.c
+++ b/hw/tpm/tpm_emulator.c
@@ -61,6 +61,19 @@
 #define TPM_EMULATOR_IMPLEMENTS_ALL_CAPS(S, cap) (((S)->caps & (cap)) == (cap))
 
 /* data structures */
+
+/* blobs from the TPM; part of VM state when migrating */
+typedef struct TPMBlobBuffers {
+uint32_t permanent_flags;
+TPMSizedBuffer permanent;
+
+uint32_t volatil_flags;
+TPMSizedBuffer volatil;
+
+uint32_t savestate_flags;
+TPMSizedBuffer savestate;
+} TPMBlobBuffers;
+
 typedef struct TPMEmulator {
 TPMBackend parent;
 
@@ -73,6 +86,8 @@ typedef struct TPMEmulator {
 Error *migration_blocker;
 
 QemuMutex mutex;
+
+TPMBlobBuffers state_blobs;
 } TPMEmulator;
 
 
@@ -315,18 +330,24 @@ static int tpm_emulator_set_buffer_size(TPMBackend *tb,
 return 0;
 }
 
-static int tpm_emulator_startup_tpm(TPMBackend *tb, size_t buffersize)
+static int _tpm_emulator_startup_tpm(TPMBackend *tb, size_t buffersize,
+ bool is_resume)
 {
 TPMEmulator *tpm_emu = TPM_EMULATOR(tb);
 ptm_init init;
 ptm_res res;
 
+DPRINTF("%s   is_resume: %d", __func__, is_resume);
+
 if (buffersize != 0 &&
 tpm_emulator_set_buffer_size(tb, buffersize, NULL) < 0) {
 goto err_exit;
 }
 
-DPRINTF("%s", __func__);
+if (is_resume) {
+init.u.req.init_flags = cpu_to_be32(PTM_INIT_FLAG_DELETE_VOLATILE);
+}
+
 if (tpm_emulator_ctrlcmd(tpm_emu, CMD_INIT, &init, sizeof(init),
  sizeof(init)) < 0) {
 error_report("tpm-emulator: could not send INIT: %s",
@@ -345,6 +366,11 @@ err_exit:
 return -1;
 }
 
+static int tpm_emulator_startup_tpm(TPMBackend *tb, size_t buffersize)
+{
+return _tpm_emulator_startup_tpm(tb, buffersize, false);
+}
+
 static bool tpm_emulator_get_tpm_established_flag(TPMBackend *tb)
 {
 TPMEmulator *tpm_emu = TPM_EMULATOR(tb);
@@ -435,16 +461,21 @@ static size_t tpm_emulator_get_buffer_size(TPMBackend *tb)
 static int tpm_emulator_block_migration(TPMEmulator *tpm_emu)
 {
 Error *err = NULL;
+ptm_cap caps = PTM_CAP_GET_STATEBLOB | PTM_CAP_SET_STATEBLOB |
+   PTM_CAP_STOP;
 
-error_setg(&tpm_emu->migration_blocker,
-   "Migration disabled: TPM emulator not yet migratable");
-migrate_add_blocker(tpm_emu->migration_blocker, &err);
-if (err) {
-error_report_err(err);
-error_free(tpm_emu->migration_blocker);
-tpm_emu->migration_blocker = NULL;
+if (!TPM_EMULATOR_IMPLEMENTS_ALL_CAPS(tpm_emu, caps)) {
+error_setg(&tpm_emu->migration_blocker,
+   "Migration disabled: TPM emulator does not support "
+   "migration");
+migrate_add_blocker(tpm_emu->migration_blocker, &err);
+if (err) {
+error_report_err(err);
+error_free(tpm_emu->migration_blocker);
+tpm_emu->migration_blocker = NULL;
 
-return -1;
+return -1;
+}
 }
 
 return 0;
@@ -573,6 +604,253 @@ static const QemuOptDesc tpm_emulator_cmdline_opts[] = {
 { /* end of list */ },
 };
 
+/*
+ * Transfer a TPM state blob from the TPM into a provided buffer.
+ *
+ * @tpm_emu: TPMEmulator
+ * @type: the type of blob to transfer
+ * @tsb: the TPMSizeBuffer to fill with the blob
+ * @flags: the flags to return to the caller
+ */
+static int tpm_emulator_get_state_blob(TPMEmulator *tpm_emu,
+   uint8_t type,
+   TPMSizedBuffer *tsb,
+   uint32_t *flags)
+{
+ptm_getstate pgs;
+ptm_res res;
+ssize_t n;
+uint32_t totlength, length;
+
+tpm_sized_buffer_reset(tsb);
+
+pgs.u.req.state_flags = cpu_to_be32(PTM_STATE_FLAG_DECRYPTED);
+pgs.u.req.type = cpu_to_be32(type);
+pgs.u.req.offset = 0;
+
+if (tpm_emulator_ctrlcmd(tpm_emu, CMD_GET_STATEBLOB,
+ &pgs, sizeof(pgs.u.req),
+ offsetof(ptm_getstate, u.resp.data)) < 0) {
+error_report("tpm-emulator: could not get state blob type %d : %s",
+ type, strerror(errno));
+return -1;
+}
+
+res = be32_to_cpu(pgs.u.resp.tpm_result);
+if (res !=

Re: [Qemu-devel] Yet another git submodule rant

2017-11-10 Thread Daniel P. Berrange

On Sat, Nov 11, 2017 at 12:46:36AM +1100, Alexey Kardashevskiy wrote:
> On 10/11/17 21:41, Daniel P. Berrange wrote:
> > On Fri, Nov 10, 2017 at 09:35:54PM +1100, Alexey Kardashevskiy wrote:
> >> On 09/11/17 00:01, Daniel P. Berrange wrote:
> >>> On Wed, Nov 08, 2017 at 09:26:01AM -0300, Philippe Mathieu-Daudé wrote:
>  On 11/08/2017 06:57 AM, Thomas Huth wrote:
> >
> > That automatic git submodule stuff now broke my workflow again. I
> > usually keep the git repository on my laptop and then simply rsync the
> > sources (without .git directories) to my target machine to compile it
> > there. Used to work great for years. Now it's broken, the build process
> > complains:
> >
> > GIT submodule checkout is out of date. Please run
> >   scripts/git-submodule.sh update
> > from the source directory checkout /home/thuth/devel/qemu
> >
> > Running "scripts/git-submodule.sh update" did not fix the issue at all -
> > I first had to tinker with it for a while to find out that I simply have
> > to delete ".git-submodule-status" in my git tree to fix the issue.
> >
> > I've got the feeling that all this submodule crap is constantly causing
> > pain ... do we really need this? Can't we find another solution instead?
> > Or at least stop modifying files automatically in the $SRC_PATH ?
> 
>  Also yesterday on IRC:
> 
>   [...] I downloaded the qemu source from git and tried to compile
>  it. I am getting this:
> 
>  ./configure --static && make && sudo make install
>   CC  ui/input-keymap.o
>  ui/input-keymap.c:8:10: fatal error: ui/input-keymap-linux-to-qcode.c:
>  No such file or directory
> >>>
> >>> I had a pull request merged yesterday later afternoon which possibly
> >>> would address that problem, though hard hard to say for certain.
> >>
> >> wow, already? :(
> >>
> >> I still wonder why do not we checkout submodules into the build directory
> >> and why .git-submodule-status is not there too...
> > 
> > That simply isn't the way submodules work, they are inherently part of
> > the source tree, and the status file reflects that too.
> 
> Sorry, I am missing the point here. What precisely does prevent us from
> checking out the required modules to the build directory and build them
> there? git provides a submodule repository url and sha1 for the current
> qemu branch.

The build directory should never contain any of your version controlled
source, as that will get irretrievably lost when the build dir is purged.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v3 0/6] Convert to realize and improve error handling

2017-11-10 Thread Kevin Wolf

Am 19.09.2017 um 01:59 hat John Snow geschrieben:
> On 09/18/2017 10:05 AM, Mao Zhongyi wrote:
> > This series mainly implements the conversions of ide, floppy and nvme
> > device to realize. Add some error handling messages and remove the local
> > variable local_err, use errp to propagate the error directly. Also
> > fix the unusual function name.
> 
> I've staged patches one and two here for my IDE pull request.
> 
> I think patches 3-6 here would belong to Kevin.

Sorry, I completely missed this.

Thanks, applied patch 3 (nvme) to the block-next branch. I did not take
patches 4 and 5 because patch 5 doesn't apply cleanly any more, and
honestly I think the result is uglier than before.

Patch 6 is for Gerd.

Kevin

Re: [Qemu-devel] [Qemu-block] [PATCH v2 4/5] iotests: Make 083 less flaky

2017-11-10 Thread Max Reitz

On 2017-11-10 11:02, Alberto Garcia wrote:
> On Thu 09 Nov 2017 09:30:24 PM CET, Max Reitz wrote:
>> +echo > "$TEST_DIR/nbd-fault-injector.out"
>>  $PYTHON nbd-fault-injector.py $extra_args "$nbd_addr" 
>> "$TEST_DIR/nbd-fault-injector.conf" >"$TEST_DIR/nbd-fault-injector.out" 2>&1 
>> &
> 
> It seems that in this patch you're indenting with spaces but this file
> uses tabs.

Yes, but tabs are wrong. :-)

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v4] throttle-groups: drain before detaching ThrottleState

2017-11-10 Thread Stefan Hajnoczi

I/O requests hang after stop/cont commands at least since QEMU 2.10.0
with -drive iops=100:

  (guest)$ dd if=/dev/zero of=/dev/vdb oflag=direct count=1000
  (qemu) stop
  (qemu) cont
  ...I/O is stuck...

This happens because blk_set_aio_context() detaches the ThrottleState
while requests may still be in flight:

  if (tgm->throttle_state) {
  throttle_group_detach_aio_context(tgm);
  throttle_group_attach_aio_context(tgm, new_context);
  }

This patch encloses the detach/attach calls in a drained region so no
I/O request is left hanging.  Also add assertions so we don't make the
same mistake again in the future.

Reported-by: Yongxue Hong 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Simplified patch in response to Berto's review
---
 block/block-backend.c   | 2 ++
 block/throttle-groups.c | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 45d9101be3..da2f6c0f8a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1748,8 +1748,10 @@ void blk_set_aio_context(BlockBackend *blk, AioContext 
*new_context)
 
 if (bs) {
 if (tgm->throttle_state) {
+bdrv_drained_begin(bs);
 throttle_group_detach_aio_context(tgm);
 throttle_group_attach_aio_context(tgm, new_context);
+bdrv_drained_end(bs);
 }
 bdrv_set_aio_context(bs, new_context);
 }
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index b291a88481..2587f19ca3 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -594,6 +594,12 @@ void throttle_group_attach_aio_context(ThrottleGroupMember 
*tgm,
 void throttle_group_detach_aio_context(ThrottleGroupMember *tgm)
 {
 ThrottleTimers *tt = &tgm->throttle_timers;
+
+/* Requests must have been drained */
+assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
+assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
+assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
+
 throttle_timers_detach_aio_context(tt);
 tgm->aio_context = NULL;
 }
-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 00/11] spapr: introduce an IRQ allocator at the machine level

2017-11-10 Thread Cédric Le Goater

Hello,

Currently, the ICSState 'ics' object of the sPAPR machine acts as the
global interrupt source handler and also as the IRQ number allocator
for the machine. Some IRQ numbers are allocated very early in the
machine initialization sequence to populate the device tree, and this
is a problem to introduce the new POWER XIVE interrupt model, as it
needs to share the IRQ numbers with the older model.

To prepare ground for XIVE, here is a proposal adding a set of new
XICSFabric operations to let the machine handle directly the IRQ
number allocation and to decorrelate the allocation from the interrupt
source object :

bool (*irq_test)(XICSFabric *xi, int irq);
int (*irq_alloc_block)(XICSFabric *xi, int count, int align);
void (*irq_free_block)(XICSFabric *xi, int irq, int num);
bool (*irq_is_lsi)(XICSFabric *xi, int irq);

In these prototypes, the 'irq' parameter refers to a number in the
global IRQ number space.

On the latest pseries machines, these operations are simply backed by
a bitmap and to handle migration compatibility, we keep an old set of
operations using the ICSIRQState array.


To completely remove the use of the ICSState object (required to
introduce XIVE), we also need to change how the nature of an
interrupt, MSI or LSI, is stored. Today, this is done using the flag
attribute of the ICSIRQState array. We change that by splitting the
IRQ number space of the machine in two: first the LSIs and then the
MSIs. This has the benefit to keep the LSI IRQ numbers in a well known
range which is useful for PHB hotplug.

The git repo for this pachset can be found here along with the latest
XIVE model:

https://github.com/legoater/qemu/commits/xive

Thanks,

C.

Tests :

 - make check on each patch
 - migration :
 qemu-2.12 (pseries-2.12) <->  qemu-2.12 (pseries-2.12)
 qemu-2.12 (pseries-2.10) <->  qemu-2.12 (pseries-2.10)
 qemu-2.10 (pseries-2.10) <->  qemu-2.12 (pseries-2.10)

Changes since v2 :

 - introduced a second set of XICSFabric IRQ operations for older
   pseries machines

Changes since v1 :

 - reorganised patchset to introduce the XICSFabric operations before
   the major changes: bitmap and IRQ number space split   
 - introduced a reference bitmap to save some state in migration

Cédric Le Goater (11):
  spapr: add pseries 2.12 machine type
  ppc/xics: remove useless if condition
  spapr: introduce new XICSFabric operations for an IRQ allocator
  spapr: move current IRQ allocation under the machine
  spapr: introduce an IRQ allocator using a bitmap
  spapr: store a reference IRQ bitmap
  spapr: introduce an 'irq_base' number
  spapr: introduce a XICSFabric irq_is_lsi() operation
  spapr: split the IRQ number space for LSI interrupts
  sparp: merge ics_set_irq_type() in irq_alloc_block() operation
  spapr: use sPAPRMachineState in spapr_ics_ prototypes

 hw/intc/trace-events   |   2 -
 hw/intc/xics.c |  37 -
 hw/intc/xics_kvm.c |   4 +-
 hw/intc/xics_spapr.c   |  76 +++---
 hw/ppc/pnv.c   |  34 
 hw/ppc/pnv_psi.c   |   4 -
 hw/ppc/spapr.c | 209 -
 hw/ppc/spapr_events.c  |   4 +-
 hw/ppc/spapr_pci.c |   8 +-
 hw/ppc/spapr_vio.c |   2 +-
 hw/ppc/trace-events|   2 +
 include/hw/ppc/spapr.h |   5 ++
 include/hw/ppc/xics.h  |  20 +++--
 13 files changed, 301 insertions(+), 106 deletions(-)

-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 02/11] ppc/xics: remove useless if condition

2017-11-10 Thread Cédric Le Goater

The previous code section uses a 'first < 0' test and returns. Therefore,
there is no need to test the 'first' variable against '>= 0' afterwards.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics_spapr.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index d98ea8b13068..e8c0a1b3e903 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -329,10 +329,8 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
 return -1;
 }
 
-if (first >= 0) {
-for (i = first; i < first + num; ++i) {
-ics_set_irq_type(ics, i, lsi);
-}
+for (i = first; i < first + num; ++i) {
+ics_set_irq_type(ics, i, lsi);
 }
 first += ics->offset;
 
-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 07/11] spapr: introduce an 'irq_base' number

2017-11-10 Thread Cédric Le Goater

'irq_base' is a base IRQ number which lets us allocate only the subset
of the IRQ numbers used on the sPAPR platform. It is sync with the
ICSState 'offset' attribute and this is slightly redundant. We could
also choose to waste some extra bytes (512) and allocate the whole
number space. To be discussed.

But more important, it removes a dependency on the ICSState object of
the sPAPR machine which is required for XIVE.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/spapr.c | 7 ---
 include/hw/ppc/spapr.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf0e5b4f815b..1cbbd7715a85 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2362,6 +2362,7 @@ static void ppc_spapr_init(MachineState *machine)
 /* Initialize the IRQ allocator */
 spapr->nr_irqs  = XICS_IRQS_SPAPR;
 spapr->irq_map  = bitmap_new(spapr->nr_irqs);
+spapr->irq_base = XICS_IRQ_BASE;
 
 /* Set up Interrupt Controller before we create the VCPUs */
 xics_system_init(machine, spapr->nr_irqs, &error_fatal);
@@ -3630,7 +3631,7 @@ static void spapr_irq_free_block_2_11(XICSFabric *xi, int 
irq, int num)
 static bool spapr_irq_test(XICSFabric *xi, int irq)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
-int srcno = irq - spapr->ics->offset;
+int srcno = irq - spapr->irq_base;
 
 return test_bit(srcno, spapr->irq_map);
 }
@@ -3656,13 +3657,13 @@ static int spapr_irq_alloc_block(XICSFabric *xi, int 
count, int align)
 }
 
 bitmap_set(spapr->irq_map, srcno, count);
-return srcno + spapr->ics->offset;
+return srcno + spapr->irq_base;
 }
 
 static void spapr_irq_free_block(XICSFabric *xi, int irq, int num)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
-int srcno = irq - spapr->ics->offset;
+int srcno = irq - spapr->irq_base;
 
 bitmap_clear(spapr->irq_map, srcno, num);
 }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 023436c32b2a..200667dcff9d 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -82,6 +82,7 @@ struct sPAPRMachineState {
 int32_t nr_irqs;
 unsigned long *irq_map;
 unsigned long *irq_map_ref;
+uint32_t irq_base;
 ICSState *ics;
 sPAPRRTCState rtc;
 
-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 01/11] spapr: add pseries 2.12 machine type

2017-11-10 Thread Cédric Le Goater

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/spapr.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d682f013d422..a2dcbee07214 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3687,6 +3687,20 @@ static const TypeInfo spapr_machine_info = {
 type_init(spapr_machine_register_##suffix)
 
 /*
+ * pseries-2.12
+ */
+static void spapr_machine_2_12_instance_options(MachineState *machine)
+{
+}
+
+static void spapr_machine_2_12_class_options(MachineClass *mc)
+{
+/* Defaults for the latest behaviour inherited from the base class */
+}
+
+DEFINE_SPAPR_MACHINE(2_12, "2.12", true);
+
+/*
  * pseries-2.11
  */
 static void spapr_machine_2_11_instance_options(MachineState *machine)
@@ -3698,7 +3712,7 @@ static void spapr_machine_2_11_class_options(MachineClass 
*mc)
 /* Defaults for the latest behaviour inherited from the base class */
 }
 
-DEFINE_SPAPR_MACHINE(2_11, "2.11", true);
+DEFINE_SPAPR_MACHINE(2_11, "2.11", false);
 
 /*
  * pseries-2.10
-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 03/11] spapr: introduce new XICSFabric operations for an IRQ allocator

2017-11-10 Thread Cédric Le Goater

Currently, the ICSState 'ics' object of the sPAPR machine acts as the
global interrupt source handler and also as the IRQ number allocator
for the machine. Some IRQ numbers are allocated very early in the
machine initialization sequence to populate the device tree, and this
is a problem to introduce the new POWER XIVE interrupt model, as it
needs to share the IRQ numbers with the older model.

To prepare ground for XIVE, here is a set of new XICSFabric operations
to let the machine handle directly the IRQ number allocation and to
decorrelate the allocation from the interrupt source object :

bool (*irq_test)(XICSFabric *xi, int irq);
int (*irq_alloc_block)(XICSFabric *xi, int count, int align);
void (*irq_free_block)(XICSFabric *xi, int irq, int num);

In these prototypes, the 'irq' parameter refers to a number in the
global IRQ number space. Indexes for arrays storing different state
informations on the interrupts, like the ICSIRQState, are usually
named 'srcno'.

Signed-off-by: Cédric Le Goater 
---
 hw/ppc/spapr.c| 19 +++
 include/hw/ppc/xics.h |  4 
 2 files changed, 23 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a2dcbee07214..84d68f2fdbae 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3536,6 +3536,21 @@ static ICPState *spapr_icp_get(XICSFabric *xi, int 
vcpu_id)
 return cpu ? ICP(cpu->intc) : NULL;
 }
 
+static bool spapr_irq_test(XICSFabric *xi, int irq)
+{
+return false;
+}
+
+static int spapr_irq_alloc_block(XICSFabric *xi, int count, int align)
+{
+return -1;
+}
+
+static void spapr_irq_free_block(XICSFabric *xi, int irq, int num)
+{
+;
+}
+
 static void spapr_pic_print_info(InterruptStatsProvider *obj,
  Monitor *mon)
 {
@@ -3630,6 +3645,10 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 xic->ics_get = spapr_ics_get;
 xic->ics_resend = spapr_ics_resend;
 xic->icp_get = spapr_icp_get;
+xic->irq_test = spapr_irq_test;
+xic->irq_alloc_block = spapr_irq_alloc_block;
+xic->irq_free_block = spapr_irq_free_block;
+
 ispc->print_info = spapr_pic_print_info;
 /* Force NUMA node memory size to be a multiple of
  * SPAPR_MEMORY_BLOCK_SIZE (256M) since that's the granularity
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 28d248abad61..30e7f2e0a7dd 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -175,6 +175,10 @@ typedef struct XICSFabricClass {
 ICSState *(*ics_get)(XICSFabric *xi, int irq);
 void (*ics_resend)(XICSFabric *xi);
 ICPState *(*icp_get)(XICSFabric *xi, int server);
+/* IRQ allocator helpers */
+bool (*irq_test)(XICSFabric *xi, int irq);
+int (*irq_alloc_block)(XICSFabric *xi, int count, int align);
+void (*irq_free_block)(XICSFabric *xi, int irq, int num);
 } XICSFabricClass;
 
 #define XICS_IRQS_SPAPR   1024
-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 05/11] spapr: introduce an IRQ allocator using a bitmap

2017-11-10 Thread Cédric Le Goater

Let's define a new set of XICSFabric IRQ operations for the latest
pseries machine. These simply use a a bitmap 'irq_map' as a IRQ number
allocator.

The previous pseries machines keep the old set of IRQ operations using
the ICSIRQState array.

Signed-off-by: Cédric Le Goater 
---

 Changes since v2 :

 - introduced a second set of XICSFabric IRQ operations for older
   pseries machines

 hw/ppc/spapr.c | 76 ++
 include/hw/ppc/spapr.h |  3 ++
 2 files changed, 74 insertions(+), 5 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4bdceb45a14f..4ef0b73559ca 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1681,6 +1681,22 @@ static const VMStateDescription vmstate_spapr_patb_entry 
= {
 },
 };
 
+static bool spapr_irq_map_needed(void *opaque)
+{
+return true;
+}
+
+static const VMStateDescription vmstate_spapr_irq_map = {
+.name = "spapr_irq_map",
+.version_id = 0,
+.minimum_version_id = 0,
+.needed = spapr_irq_map_needed,
+.fields = (VMStateField[]) {
+VMSTATE_BITMAP(irq_map, sPAPRMachineState, 0, nr_irqs),
+VMSTATE_END_OF_LIST()
+},
+};
+
 static const VMStateDescription vmstate_spapr = {
 .name = "spapr",
 .version_id = 3,
@@ -1700,6 +1716,7 @@ static const VMStateDescription vmstate_spapr = {
 &vmstate_spapr_ov5_cas,
 &vmstate_spapr_patb_entry,
 &vmstate_spapr_pending_events,
+&vmstate_spapr_irq_map,
 NULL
 }
 };
@@ -2337,8 +2354,12 @@ static void ppc_spapr_init(MachineState *machine)
 /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */
 load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD;
 
+/* Initialize the IRQ allocator */
+spapr->nr_irqs  = XICS_IRQS_SPAPR;
+spapr->irq_map  = bitmap_new(spapr->nr_irqs);
+
 /* Set up Interrupt Controller before we create the VCPUs */
-xics_system_init(machine, XICS_IRQS_SPAPR, &error_fatal);
+xics_system_init(machine, spapr->nr_irqs, &error_fatal);
 
 /* Set up containers for ibm,client-architecture-support negotiated options
  */
@@ -3560,7 +3581,7 @@ static int ics_find_free_block(ICSState *ics, int num, 
int alignnum)
 return -1;
 }
 
-static bool spapr_irq_test(XICSFabric *xi, int irq)
+static bool spapr_irq_test_2_11(XICSFabric *xi, int irq)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 ICSState *ics = spapr->ics;
@@ -3569,7 +3590,7 @@ static bool spapr_irq_test(XICSFabric *xi, int irq)
 return !ICS_IRQ_FREE(ics, srcno);
 }
 
-static int spapr_irq_alloc_block(XICSFabric *xi, int count, int align)
+static int spapr_irq_alloc_block_2_11(XICSFabric *xi, int count, int align)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 ICSState *ics = spapr->ics;
@@ -3583,7 +3604,7 @@ static int spapr_irq_alloc_block(XICSFabric *xi, int 
count, int align)
 return srcno + ics->offset;
 }
 
-static void spapr_irq_free_block(XICSFabric *xi, int irq, int num)
+static void spapr_irq_free_block_2_11(XICSFabric *xi, int irq, int num)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 ICSState *ics = spapr->ics;
@@ -3601,6 +3622,46 @@ static void spapr_irq_free_block(XICSFabric *xi, int 
irq, int num)
 }
 }
 
+static bool spapr_irq_test(XICSFabric *xi, int irq)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
+int srcno = irq - spapr->ics->offset;
+
+return test_bit(srcno, spapr->irq_map);
+}
+
+static int spapr_irq_alloc_block(XICSFabric *xi, int count, int align)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
+int start = 0;
+int srcno;
+
+/*
+ * The 'align_mask' parameter of bitmap_find_next_zero_area()
+ * should be one less than a power of 2; 0 means no
+ * alignment. Adapt the 'align' value of the former allocator to
+ * fit the requirements of bitmap_find_next_zero_area()
+ */
+align -= 1;
+
+srcno = bitmap_find_next_zero_area(spapr->irq_map, spapr->nr_irqs, start,
+   count, align);
+if (srcno == spapr->nr_irqs) {
+return -1;
+}
+
+bitmap_set(spapr->irq_map, srcno, count);
+return srcno + spapr->ics->offset;
+}
+
+static void spapr_irq_free_block(XICSFabric *xi, int irq, int num)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
+int srcno = irq - spapr->ics->offset;
+
+bitmap_clear(spapr->irq_map, srcno, num);
+}
+
 static void spapr_pic_print_info(InterruptStatsProvider *obj,
  Monitor *mon)
 {
@@ -3778,7 +3839,12 @@ static void 
spapr_machine_2_11_instance_options(MachineState *machine)
 
 static void spapr_machine_2_11_class_options(MachineClass *mc)
 {
-/* Defaults for the latest behaviour inherited from the base class */
+XICSFabricClass *xic = XICS_FABRIC_CLASS(mc);
+
+spapr_machine_2_12_class_options(mc);
+xic->irq_test = spapr_irq_test_2_11;
+xic->irq_alloc_block = spapr_irq_alloc_block_2_11;
+xic->

[Qemu-devel] [PATCH for-2.12 v3 08/11] spapr: introduce a XICSFabric irq_is_lsi() operation

2017-11-10 Thread Cédric Le Goater

It will be used later on to distinguish the allocation of an LSI
interrupt from an MSI and also to reduce the use of the ICSIRQState
array of the ICSState object, which is on our way to introduce XIVE.

The 'irq' parameter continues to refer to the global IRQ number space.

On PowerNV, only the PSI controller interrupts are handled and they
are all LSIs.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics.c| 26 +-
 hw/intc/xics_kvm.c|  4 ++--
 hw/ppc/pnv.c  | 16 
 hw/ppc/spapr.c|  9 +
 include/hw/ppc/xics.h |  2 ++
 5 files changed, 46 insertions(+), 11 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 2c4899f278e2..42880e736697 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -33,6 +33,7 @@
 #include "trace.h"
 #include "qemu/timer.h"
 #include "hw/ppc/xics.h"
+#include "hw/ppc/spapr.h"
 #include "qemu/error-report.h"
 #include "qapi/visitor.h"
 #include "monitor/monitor.h"
@@ -70,8 +71,7 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
 }
 monitor_printf(mon, "  %4x %s %02x %02x\n",
ics->offset + i,
-   (irq->flags & XICS_FLAGS_IRQ_LSI) ?
-   "LSI" : "MSI",
+   ics_is_lsi(ics, i) ? "LSI" : "MSI",
irq->priority, irq->status);
 }
 }
@@ -377,6 +377,14 @@ static const TypeInfo icp_info = {
 /*
  * ICS: Source layer
  */
+bool ics_is_lsi(ICSState *ics, int srcno)
+{
+XICSFabric *xi = ics->xics;
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(xi);
+
+return xic->irq_is_lsi(xi, srcno + ics->offset);
+}
+
 static void ics_simple_resend_msi(ICSState *ics, int srcno)
 {
 ICSIRQState *irq = ics->irqs + srcno;
@@ -435,7 +443,7 @@ static void ics_simple_set_irq(void *opaque, int srcno, int 
val)
 {
 ICSState *ics = (ICSState *)opaque;
 
-if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
+if (ics_is_lsi(ics, srcno)) {
 ics_simple_set_irq_lsi(ics, srcno, val);
 } else {
 ics_simple_set_irq_msi(ics, srcno, val);
@@ -472,7 +480,7 @@ void ics_simple_write_xive(ICSState *ics, int srcno, int 
server,
 trace_xics_ics_simple_write_xive(ics->offset + srcno, srcno, server,
  priority);
 
-if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
+if (ics_is_lsi(ics, srcno)) {
 ics_simple_write_xive_lsi(ics, srcno);
 } else {
 ics_simple_write_xive_msi(ics, srcno);
@@ -484,10 +492,10 @@ static void ics_simple_reject(ICSState *ics, uint32_t nr)
 ICSIRQState *irq = ics->irqs + nr - ics->offset;
 
 trace_xics_ics_simple_reject(nr, nr - ics->offset);
-if (irq->flags & XICS_FLAGS_IRQ_MSI) {
-irq->status |= XICS_STATUS_REJECTED;
-} else if (irq->flags & XICS_FLAGS_IRQ_LSI) {
+if (ics_is_lsi(ics, nr - ics->offset)) {
 irq->status &= ~XICS_STATUS_SENT;
+} else {
+irq->status |= XICS_STATUS_REJECTED;
 }
 }
 
@@ -497,7 +505,7 @@ static void ics_simple_resend(ICSState *ics)
 
 for (i = 0; i < ics->nr_irqs; i++) {
 /* FIXME: filter by server#? */
-if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
+if (ics_is_lsi(ics, i)) {
 ics_simple_resend_lsi(ics, i);
 } else {
 ics_simple_resend_msi(ics, i);
@@ -512,7 +520,7 @@ static void ics_simple_eoi(ICSState *ics, uint32_t nr)
 
 trace_xics_ics_simple_eoi(nr);
 
-if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
+if (ics_is_lsi(ics, srcno)) {
 irq->status &= ~XICS_STATUS_SENT;
 }
 }
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 3091ad3ac2c8..2f10637c9f7c 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -258,7 +258,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
 state |= KVM_XICS_MASKED;
 }
 
-if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
+if (ics_is_lsi(ics, i)) {
 state |= KVM_XICS_LEVEL_SENSITIVE;
 if (irq->status & XICS_STATUS_ASSERTED) {
 state |= KVM_XICS_PENDING;
@@ -293,7 +293,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int 
val)
 int rc;
 
 args.irq = srcno + ics->offset;
-if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
+if (!ics_is_lsi(ics, srcno)) {
 if (!val) {
 return;
 }
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 8288940ef9d7..958223376b4c 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1035,6 +1035,21 @@ static bool pnv_irq_test(XICSFabric *xi, int irq)
 return false;
 }
 
+static bool pnv_irq_is_lsi(XICSFabric *xi, int irq)
+{
+PnvMachineState *pnv = POWERNV_MACHINE(xi);
+int i;
+
+/* PowerNV machine only has PSI interrupts which are all LSIs */
+for (i = 0; i < pnv->num_chips; i++) {
+ICSState *ics = &pnv->chips[i]->psi.ics;
+if (ics_valid_irq(ics, irq)) {
+return true;
+}
+}
+

[Qemu-devel] [PATCH for-2.12 v3 04/11] spapr: move current IRQ allocation under the machine

2017-11-10 Thread Cédric Le Goater

Use the new XICSFabric operations to handle the IRQ number allocation
directly under the machine. These changes only move code and adapt it
to take into account the new API which uses IRQ numbers.

On PowerNV, only provide a basic irq_test() operation. For the moment,
there is no need for more.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/trace-events |  2 --
 hw/intc/xics.c   |  3 ++-
 hw/intc/xics_spapr.c | 57 +---
 hw/ppc/pnv.c | 18 +
 hw/ppc/spapr.c   | 56 ---
 hw/ppc/trace-events  |  2 ++
 6 files changed, 85 insertions(+), 53 deletions(-)

diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index b86f242b0fcf..e34ecf7a16e5 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -65,8 +65,6 @@ xics_ics_simple_reject(int nr, int srcno) "reject irq 0x%x 
[src %d]"
 xics_ics_simple_eoi(int nr) "ics_eoi: irq 0x%x"
 xics_alloc(int irq) "irq %d"
 xics_alloc_block(int first, int num, bool lsi, int align) "first irq %d, %d 
irqs, lsi=%d, alignnum %d"
-xics_ics_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
-xics_ics_free_warn(int src, int irq) "Source#%d, irq %d is already free"
 
 # hw/intc/s390_flic_kvm.c
 flic_create_device(int err) "flic: create device failed %d"
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index cc9816e7f204..2c4899f278e2 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -53,6 +53,7 @@ void icp_pic_print_info(ICPState *icp, Monitor *mon)
 void ics_pic_print_info(ICSState *ics, Monitor *mon)
 {
 uint32_t i;
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
 
 monitor_printf(mon, "ICS %4x..%4x %p\n",
ics->offset, ics->offset + ics->nr_irqs - 1, ics);
@@ -64,7 +65,7 @@ void ics_pic_print_info(ICSState *ics, Monitor *mon)
 for (i = 0; i < ics->nr_irqs; i++) {
 ICSIRQState *irq = ics->irqs + i;
 
-if (!(irq->flags & XICS_FLAGS_IRQ_MASK)) {
+if (!xic->irq_test(ics->xics, i + ics->offset)) {
 continue;
 }
 monitor_printf(mon, "  %4x %s %02x %02x\n",
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index e8c0a1b3e903..de9e65d35247 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -245,50 +245,26 @@ void xics_spapr_init(sPAPRMachineState *spapr)
 spapr_register_hypercall(H_IPOLL, h_ipoll);
 }
 
-#define ICS_IRQ_FREE(ics, srcno)   \
-(!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
-
-static int ics_find_free_block(ICSState *ics, int num, int alignnum)
-{
-int first, i;
-
-for (first = 0; first < ics->nr_irqs; first += alignnum) {
-if (num > (ics->nr_irqs - first)) {
-return -1;
-}
-for (i = first; i < first + num; ++i) {
-if (!ICS_IRQ_FREE(ics, i)) {
-break;
-}
-}
-if (i == (first + num)) {
-return first;
-}
-}
-
-return -1;
-}
-
 int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp)
 {
 int irq;
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
 
 if (!ics) {
 return -1;
 }
 if (irq_hint) {
-if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
+if (xic->irq_test(ics->xics, irq_hint)) {
 error_setg(errp, "can't allocate IRQ %d: already in use", 
irq_hint);
 return -1;
 }
 irq = irq_hint;
 } else {
-irq = ics_find_free_block(ics, 1, 1);
+irq = xic->irq_alloc_block(ics->xics, 1, 1);
 if (irq < 0) {
 error_setg(errp, "can't allocate IRQ: no IRQ left");
 return -1;
 }
-irq += ics->offset;
 }
 
 ics_set_irq_type(ics, irq - ics->offset, lsi);
@@ -305,6 +281,7 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
   bool align, Error **errp)
 {
 int i, first = -1;
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
 
 if (!ics) {
 return -1;
@@ -320,9 +297,9 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
 if (align) {
 assert((num == 1) || (num == 2) || (num == 4) ||
(num == 8) || (num == 16) || (num == 32));
-first = ics_find_free_block(ics, num, num);
+first = xic->irq_alloc_block(ics->xics, num, num);
 } else {
-first = ics_find_free_block(ics, num, 1);
+first = xic->irq_alloc_block(ics->xics, num, 1);
 }
 if (first < 0) {
 error_setg(errp, "can't find a free %d-IRQ block", num);
@@ -330,33 +307,19 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool 
lsi,
 }
 
 for (i = first; i < first + num; ++i) {
-ics_set_irq_type(ics, i, lsi);
+ics_set_irq_type(ics, i - ics->offset, lsi);
 }
-first += ics->offset;
 
 trace_xics_alloc_block(first, num, lsi, align);
 
 return first;
 }
 
-static void ics_free(ICSState

[Qemu-devel] [PATCH for-2.12 v3 10/11] sparp: merge ics_set_irq_type() in irq_alloc_block() operation

2017-11-10 Thread Cédric Le Goater

Setting the XICS_FLAGS_IRQ_LSI (or XICS_FLAGS_IRQ_MSI) for older
pseries machines can now be done directly under the irq_alloc_block()
operation.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics.c|  8 
 hw/intc/xics_spapr.c  |  7 +--
 hw/ppc/pnv_psi.c  |  4 
 hw/ppc/spapr.c| 13 +
 include/hw/ppc/xics.h |  1 -
 5 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 42880e736697..237eed3c11f8 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -710,14 +710,6 @@ ICPState *xics_icp_get(XICSFabric *xi, int server)
 return xic->icp_get(xi, server);
 }
 
-void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
-{
-assert(!(ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MASK));
-
-ics->irqs[srcno].flags |=
-lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
-}
-
 static void xics_register_types(void)
 {
 type_register_static(&ics_simple_info);
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index b8e91aaf52bd..f28e9136f2f6 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -267,7 +267,6 @@ int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, 
Error **errp)
 }
 }
 
-ics_set_irq_type(ics, irq - ics->offset, lsi);
 trace_xics_alloc(irq);
 
 return irq;
@@ -280,7 +279,7 @@ int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, 
Error **errp)
 int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
   bool align, Error **errp)
 {
-int i, first = -1;
+int first = -1;
 XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
 
 if (!ics) {
@@ -306,10 +305,6 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
 return -1;
 }
 
-for (i = first; i < first + num; ++i) {
-ics_set_irq_type(ics, i - ics->offset, lsi);
-}
-
 trace_xics_alloc_block(first, num, lsi, align);
 
 return first;
diff --git a/hw/ppc/pnv_psi.c b/hw/ppc/pnv_psi.c
index 9876c266223d..ee7fca311cbf 100644
--- a/hw/ppc/pnv_psi.c
+++ b/hw/ppc/pnv_psi.c
@@ -487,10 +487,6 @@ static void pnv_psi_realize(DeviceState *dev, Error **errp)
 return;
 }
 
-for (i = 0; i < ics->nr_irqs; i++) {
-ics_set_irq_type(ics, i, true);
-}
-
 /* XSCOM region for PSI registers */
 pnv_xscom_region_init(&psi->xscom_regs, OBJECT(dev), &pnv_psi_xscom_ops,
 psi, "xscom-psi", PNV_XSCOM_PSIHB_SIZE);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f14eae6196cd..8c2cff93f933 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3596,18 +3596,31 @@ static bool spapr_irq_test_2_11(XICSFabric *xi, int irq)
 return !ICS_IRQ_FREE(ics, srcno);
 }
 
+static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
+{
+assert(!(ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MASK));
+
+ics->irqs[srcno].flags |=
+lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
+}
+
 static int spapr_irq_alloc_block_2_11(XICSFabric *xi, int count, int align,
   bool lsi)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 ICSState *ics = spapr->ics;
 int srcno;
+int i;
 
 srcno = ics_find_free_block(ics, count, align);
 if (srcno == -1) {
 return -1;
 }
 
+for (i = srcno; i < srcno + count; ++i) {
+ics_set_irq_type(ics, i, lsi);
+}
+
 return srcno + ics->offset;
 }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 292b929e88eb..056cf37bc68f 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -203,7 +203,6 @@ void icp_eoi(ICPState *icp, uint32_t xirr);
 void ics_simple_write_xive(ICSState *ics, int nr, int server,
uint8_t priority, uint8_t saved_priority);
 
-void ics_set_irq_type(ICSState *ics, int srcno, bool lsi);
 void icp_pic_print_info(ICPState *icp, Monitor *mon);
 void ics_pic_print_info(ICSState *ics, Monitor *mon);
 bool ics_is_lsi(ICSState *ics, int srno);
-- 
2.13.6

[Qemu-devel] [PATCH for-2.12 v3 09/11] spapr: split the IRQ number space for LSI interrupts

2017-11-10 Thread Cédric Le Goater

The type of an interrupt, MSI or LSI, is stored under the flag
attribute of the ICSIRQState array. To reduce the use of this array
and consequently of the ICSState object (This is needed to introduce
the new XIVE model), we choose to split the IRQ number space of the
machine in two: first the LSIs and then the MSIs.

This also has the benefit to keep the LSI IRQ numbers in a well known
range which will be useful for PHB hotplug.

This change only applies to the latest pseries machines. Older
machines still use the ICSIRQState array to define the IRQ type.

Signed-off-by: Cédric Le Goater 
---

 Changes since v2 :

 - introduced a second set of XICSFabric IRQ operations for older
   pseries machines

 hw/intc/xics_spapr.c  |  6 +++---
 hw/ppc/spapr.c| 33 +
 include/hw/ppc/xics.h |  2 +-
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index de9e65d35247..b8e91aaf52bd 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -260,7 +260,7 @@ int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, 
Error **errp)
 }
 irq = irq_hint;
 } else {
-irq = xic->irq_alloc_block(ics->xics, 1, 1);
+irq = xic->irq_alloc_block(ics->xics, 1, 1, lsi);
 if (irq < 0) {
 error_setg(errp, "can't allocate IRQ: no IRQ left");
 return -1;
@@ -297,9 +297,9 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
 if (align) {
 assert((num == 1) || (num == 2) || (num == 4) ||
(num == 8) || (num == 16) || (num == 32));
-first = xic->irq_alloc_block(ics->xics, num, num);
+first = xic->irq_alloc_block(ics->xics, num, num, lsi);
 } else {
-first = xic->irq_alloc_block(ics->xics, num, 1);
+first = xic->irq_alloc_block(ics->xics, num, 1, lsi);
 }
 if (first < 0) {
 error_setg(errp, "can't find a free %d-IRQ block", num);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ce314fcf38db..f14eae6196cd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3596,7 +3596,8 @@ static bool spapr_irq_test_2_11(XICSFabric *xi, int irq)
 return !ICS_IRQ_FREE(ics, srcno);
 }
 
-static int spapr_irq_alloc_block_2_11(XICSFabric *xi, int count, int align)
+static int spapr_irq_alloc_block_2_11(XICSFabric *xi, int count, int align,
+  bool lsi)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 ICSState *ics = spapr->ics;
@@ -3628,7 +3629,7 @@ static void spapr_irq_free_block_2_11(XICSFabric *xi, int 
irq, int num)
 }
 }
 
-static bool spapr_irq_is_lsi(XICSFabric *xi, int irq)
+static bool spapr_irq_is_lsi_2_11(XICSFabric *xi, int irq)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 int srcno = irq - spapr->ics->offset;
@@ -3644,10 +3645,21 @@ static bool spapr_irq_test(XICSFabric *xi, int irq)
 return test_bit(srcno, spapr->irq_map);
 }
 
-static int spapr_irq_alloc_block(XICSFabric *xi, int count, int align)
+
+/*
+ * Let's provision 4 LSIs per PHBs
+ */
+#define SPAPR_MAX_LSI (SPAPR_MAX_PHBS * 4)
+
+/*
+ * Split the IRQ number space of the machine in two: first the LSIs
+ * and then the MSIs. This allows us to keep the LSI IRQ numbers in a
+ * well known range which is useful for PHB hotplug.
+ */
+static int spapr_irq_alloc_block(XICSFabric *xi, int count, int align, bool 
lsi)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
-int start = 0;
+int start = lsi ? 0 : SPAPR_MAX_LSI;
 int srcno;
 
 /*
@@ -3664,6 +3676,10 @@ static int spapr_irq_alloc_block(XICSFabric *xi, int 
count, int align)
 return -1;
 }
 
+if (lsi && srcno >= SPAPR_MAX_LSI) {
+return -1;
+}
+
 bitmap_set(spapr->irq_map, srcno, count);
 return srcno + spapr->irq_base;
 }
@@ -3676,6 +3692,14 @@ static void spapr_irq_free_block(XICSFabric *xi, int 
irq, int num)
 bitmap_clear(spapr->irq_map, srcno, num);
 }
 
+static bool spapr_irq_is_lsi(XICSFabric *xi, int irq)
+{
+sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
+int srcno = irq - spapr->irq_base;
+
+return (srcno >= 0) && (srcno < SPAPR_MAX_LSI);
+}
+
 static void spapr_pic_print_info(InterruptStatsProvider *obj,
  Monitor *mon)
 {
@@ -3860,6 +3884,7 @@ static void spapr_machine_2_11_class_options(MachineClass 
*mc)
 xic->irq_test = spapr_irq_test_2_11;
 xic->irq_alloc_block = spapr_irq_alloc_block_2_11;
 xic->irq_free_block = spapr_irq_free_block_2_11;
+xic->irq_is_lsi = spapr_irq_is_lsi_2_11;
 }
 
 DEFINE_SPAPR_MACHINE(2_11, "2.11", false);
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 478f8e510179..292b929e88eb 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -177,7 +177,7 @@ typedef struct XICSFabricClass {
 ICPState *(*icp_get)(XICSFabric *xi, int server);
 /* IRQ allocator helpers */
 bool (*irq_test)(XICSFabric *xi, int irq);
-int (*irq_

[Qemu-devel] [PATCH for-2.12 v3 06/11] spapr: store a reference IRQ bitmap

2017-11-10 Thread Cédric Le Goater

To save some state when the guest is migrated, we capture the IRQ
bitmap after all devices have been reseted and store it as a reference
for the machine.

Signed-off-by: Cédric Le Goater 
---

 We should probably merge this patch with the previous in the next
 versions of the patchset. For the moment, I thought it would be
 interesting to isolate the topic for discussion.

 hw/ppc/spapr.c | 7 ++-
 include/hw/ppc/spapr.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4ef0b73559ca..bf0e5b4f815b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1437,6 +1437,9 @@ static void ppc_spapr_reset(void)
 qemu_devices_reset();
 spapr_clear_pending_events(spapr);
 
+spapr->irq_map_ref = bitmap_new(spapr->nr_irqs);
+bitmap_copy(spapr->irq_map_ref, spapr->irq_map, spapr->nr_irqs);
+
 /*
  * We place the device tree and RTAS just below either the top of the RMA,
  * or just below 2GB, whichever is lowere, so that it can be
@@ -1683,7 +1686,9 @@ static const VMStateDescription vmstate_spapr_patb_entry 
= {
 
 static bool spapr_irq_map_needed(void *opaque)
 {
-return true;
+sPAPRMachineState *spapr = opaque;
+
+return !bitmap_equal(spapr->irq_map, spapr->irq_map_ref, spapr->nr_irqs);
 }
 
 static const VMStateDescription vmstate_spapr_irq_map = {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5835c694caff..023436c32b2a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -81,6 +81,7 @@ struct sPAPRMachineState {
 struct sPAPRNVRAM *nvram;
 int32_t nr_irqs;
 unsigned long *irq_map;
+unsigned long *irq_map_ref;
 ICSState *ics;
 sPAPRRTCState rtc;
 
-- 
2.13.6

Re: [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-10 Thread Max Reitz

On 2017-11-10 10:19, Stefan Hajnoczi wrote:
> On Thu, Nov 09, 2017 at 09:43:15PM +0100, Max Reitz wrote:
>> Draining a BDS may lead to graph modifications, which in turn may result
>> in it and other BDS being stripped of their current references.  If
>> bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
>> references themselves, the BDS they are trying to drain (or undrain) may
>> disappear right under their feet -- or, more specifically, under the
>> feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().
>>
>> This fixes an occasional hang of iotest 194.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block/io.c | 47 ---
>>  1 file changed, 44 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index 3d5ef2cabe..a0a2833e8e 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
>>  bool waited = true;
>>  BlockDriverState *bs;
>>  BdrvNextIterator it;
>> -GSList *aio_ctxs = NULL, *ctx;
>> +GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
>> +
>> +/* Must be called from the main loop */
>> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
>>  
>>  block_job_pause_all();
>>  
>> @@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
>>  if (!g_slist_find(aio_ctxs, aio_context)) {
>>  aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
>>  }
>> +
>> +/* Keep a strong reference to all root BDS and copy them into
>> + * an own list because draining them may lead to graph
>> + * modifications. */
>> +bdrv_ref(bs);
>> +bs_list = g_slist_prepend(bs_list, bs);
>>  }
>>  
>>  /* Note that completion of an asynchronous I/O operation can trigger any
>> @@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
>>  AioContext *aio_context = ctx->data;
>>  
>>  aio_context_acquire(aio_context);
>> -for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
>> +for (bs_list_entry = bs_list; bs_list_entry;
>> + bs_list_entry = bs_list_entry->next)
>> +{
>> +bs = bs_list_entry->data;
>> +
>>  if (aio_context == bdrv_get_aio_context(bs)) {
>>  waited |= bdrv_drain_recurse(bs, true);
>>  }
>> @@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
>>  }
>>  }
>>  
>> +for (bs_list_entry = bs_list; bs_list_entry;
>> + bs_list_entry = bs_list_entry->next)
>> +{
>> +bdrv_unref(bs_list_entry->data);
>> +}
>> +
>>  g_slist_free(aio_ctxs);
>> +g_slist_free(bs_list);
>>  }
> 
> Which specific parts of this function access bs without a reference?
> 
> I see bdrv_next() may do QTAILQ_NEXT(bs, monitor_list) after
> bdrv_drain_recurse() has returned.
> 
> Anything else?
> 
> If bdrv_next() is the only issue then I agree with Fam that it makes
> sense to build the ref/unref into bdrv_next().

These don't.  It's BDRV_POLL_WHILE() in bdrv_drain_recurse(), as written
in the commit message.

You cannot add a bdrv_ref()/bdrv_unref() pair there because
bdrv_drain_recurse() is called from bdrv_close() with a refcount of 0,
so having a bdrv_unref() there would cause an infinite recursion.

I think it's reasonable to expect callers of any bdrv_* function (except
for bdrv_unref()) to make sure that the BDS isn't deleted over the
course of that function.  Therefore, I think that the actual issue is
here and we need to make sure here that we have a strong reference
before invoking bdrv_drain_recurse().

Max



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH for-2.12 v3 11/11] spapr: use sPAPRMachineState in spapr_ics_ prototypes

2017-11-10 Thread Cédric Le Goater

The routines manipulating the IRQ numbers for the sPAPR machine do not
have any relation with the ICSState anymore. So use a sPAPRMachineState
parameter in their prototype and prefix them with spapr_irq_.

Signed-off-by: Cédric Le Goater 
---
 hw/intc/xics_spapr.c  | 30 --
 hw/ppc/spapr.c|  5 +++--
 hw/ppc/spapr_events.c |  4 ++--
 hw/ppc/spapr_pci.c|  8 
 hw/ppc/spapr_vio.c|  2 +-
 include/hw/ppc/xics.h | 13 +++--
 6 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index f28e9136f2f6..b5c8b8fa0e89 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -245,22 +245,20 @@ void xics_spapr_init(sPAPRMachineState *spapr)
 spapr_register_hypercall(H_IPOLL, h_ipoll);
 }
 
-int spapr_ics_alloc(ICSState *ics, int irq_hint, bool lsi, Error **errp)
+int spapr_irq_alloc(sPAPRMachineState *spapr, int irq_hint, bool lsi,
+Error **errp)
 {
 int irq;
-XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
 
-if (!ics) {
-return -1;
-}
 if (irq_hint) {
-if (xic->irq_test(ics->xics, irq_hint)) {
+if (xic->irq_test(XICS_FABRIC(spapr), irq_hint)) {
 error_setg(errp, "can't allocate IRQ %d: already in use", 
irq_hint);
 return -1;
 }
 irq = irq_hint;
 } else {
-irq = xic->irq_alloc_block(ics->xics, 1, 1, lsi);
+irq = xic->irq_alloc_block(XICS_FABRIC(spapr), 1, 1, lsi);
 if (irq < 0) {
 error_setg(errp, "can't allocate IRQ: no IRQ left");
 return -1;
@@ -276,15 +274,11 @@ int spapr_ics_alloc(ICSState *ics, int irq_hint, bool 
lsi, Error **errp)
  * Allocate block of consecutive IRQs, and return the number of the first IRQ 
in
  * the block. If align==true, aligns the first IRQ number to num.
  */
-int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
+int spapr_irq_alloc_block(sPAPRMachineState *spapr, int num, bool lsi,
   bool align, Error **errp)
 {
 int first = -1;
-XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
-
-if (!ics) {
-return -1;
-}
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
 
 /*
  * MSIMesage::data is used for storing VIRQ so
@@ -296,9 +290,9 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool lsi,
 if (align) {
 assert((num == 1) || (num == 2) || (num == 4) ||
(num == 8) || (num == 16) || (num == 32));
-first = xic->irq_alloc_block(ics->xics, num, num, lsi);
+first = xic->irq_alloc_block(XICS_FABRIC(spapr), num, num, lsi);
 } else {
-first = xic->irq_alloc_block(ics->xics, num, 1, lsi);
+first = xic->irq_alloc_block(XICS_FABRIC(spapr), num, 1, lsi);
 }
 if (first < 0) {
 error_setg(errp, "can't find a free %d-IRQ block", num);
@@ -310,11 +304,11 @@ int spapr_ics_alloc_block(ICSState *ics, int num, bool 
lsi,
 return first;
 }
 
-void spapr_ics_free(ICSState *ics, int irq, int num)
+void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num)
 {
-XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(ics->xics);
+XICSFabricClass *xic = XICS_FABRIC_GET_CLASS(spapr);
 
-xic->irq_free_block(ics->xics, irq, num);
+xic->irq_free_block(XICS_FABRIC(spapr), irq, num);
 }
 
 void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8c2cff93f933..1ef09963519f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3669,7 +3669,8 @@ static bool spapr_irq_test(XICSFabric *xi, int irq)
  * and then the MSIs. This allows us to keep the LSI IRQ numbers in a
  * well known range which is useful for PHB hotplug.
  */
-static int spapr_irq_alloc_block(XICSFabric *xi, int count, int align, bool 
lsi)
+static int spapr_irq_alloc_block_xi(XICSFabric *xi, int count, int align,
+bool lsi)
 {
 sPAPRMachineState *spapr = SPAPR_MACHINE(xi);
 int start = lsi ? 0 : SPAPR_MAX_LSI;
@@ -3808,7 +3809,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 xic->ics_resend = spapr_ics_resend;
 xic->icp_get = spapr_icp_get;
 xic->irq_test = spapr_irq_test;
-xic->irq_alloc_block = spapr_irq_alloc_block;
+xic->irq_alloc_block = spapr_irq_alloc_block_xi;
 xic->irq_free_block = spapr_irq_free_block;
 xic->irq_is_lsi = spapr_irq_is_lsi;
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index e377fc7ddea2..cead596f3e7a 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -718,7 +718,7 @@ void spapr_events_init(sPAPRMachineState *spapr)
 spapr->event_sources = spapr_event_sources_new();
 
 spapr_event_sources_register(spapr->event_sources, EVENT_CLASS_EPOW,
- spapr_ics_alloc(spapr->ics, 0, false,
+

Re: [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-10 Thread Max Reitz

On 2017-11-10 14:32, Fam Zheng wrote:
> On Fri, 11/10 14:17, Kevin Wolf wrote:
>> Am 10.11.2017 um 03:45 hat Fam Zheng geschrieben:
>>> On Thu, 11/09 21:43, Max Reitz wrote:
 Draining a BDS may lead to graph modifications, which in turn may result
 in it and other BDS being stripped of their current references.  If
 bdrv_drain_all_begin() and bdrv_drain_all_end() do not keep strong
 references themselves, the BDS they are trying to drain (or undrain) may
 disappear right under their feet -- or, more specifically, under the
 feet of BDRV_POLL_WHILE() in bdrv_drain_recurse().

 This fixes an occasional hang of iotest 194.

 Signed-off-by: Max Reitz 
 ---
  block/io.c | 47 ---
  1 file changed, 44 insertions(+), 3 deletions(-)

 diff --git a/block/io.c b/block/io.c
 index 3d5ef2cabe..a0a2833e8e 100644
 --- a/block/io.c
 +++ b/block/io.c
 @@ -340,7 +340,10 @@ void bdrv_drain_all_begin(void)
  bool waited = true;
  BlockDriverState *bs;
  BdrvNextIterator it;
 -GSList *aio_ctxs = NULL, *ctx;
 +GSList *aio_ctxs = NULL, *ctx, *bs_list = NULL, *bs_list_entry;
 +
 +/* Must be called from the main loop */
 +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
  
  block_job_pause_all();
  
 @@ -355,6 +358,12 @@ void bdrv_drain_all_begin(void)
  if (!g_slist_find(aio_ctxs, aio_context)) {
  aio_ctxs = g_slist_prepend(aio_ctxs, aio_context);
  }
 +
 +/* Keep a strong reference to all root BDS and copy them into
 + * an own list because draining them may lead to graph
 + * modifications. */
 +bdrv_ref(bs);
 +bs_list = g_slist_prepend(bs_list, bs);
  }
  
  /* Note that completion of an asynchronous I/O operation can trigger 
 any
 @@ -370,7 +379,11 @@ void bdrv_drain_all_begin(void)
  AioContext *aio_context = ctx->data;
  
  aio_context_acquire(aio_context);
 -for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
 +for (bs_list_entry = bs_list; bs_list_entry;
 + bs_list_entry = bs_list_entry->next)
 +{
 +bs = bs_list_entry->data;
 +
  if (aio_context == bdrv_get_aio_context(bs)) {
  waited |= bdrv_drain_recurse(bs, true);
  }
 @@ -379,24 +392,52 @@ void bdrv_drain_all_begin(void)
  }
  }
  
 +for (bs_list_entry = bs_list; bs_list_entry;
 + bs_list_entry = bs_list_entry->next)
 +{
 +bdrv_unref(bs_list_entry->data);
 +}
 +
  g_slist_free(aio_ctxs);
 +g_slist_free(bs_list);
  }
  
  void bdrv_drain_all_end(void)
  {
  BlockDriverState *bs;
  BdrvNextIterator it;
 +GSList *bs_list = NULL, *bs_list_entry;
 +
 +/* Must be called from the main loop */
 +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
  
 +/* Keep a strong reference to all root BDS and copy them into an
 + * own list because draining them may lead to graph modifications.
 + */
  for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
 -AioContext *aio_context = bdrv_get_aio_context(bs);
 +bdrv_ref(bs);
 +bs_list = g_slist_prepend(bs_list, bs);
 +}
 +
 +for (bs_list_entry = bs_list; bs_list_entry;
 + bs_list_entry = bs_list_entry->next)
 +{
 +AioContext *aio_context;
 +
 +bs = bs_list_entry->data;
 +aio_context = bdrv_get_aio_context(bs);
  
  aio_context_acquire(aio_context);
  aio_enable_external(aio_context);
  bdrv_parent_drained_end(bs);
  bdrv_drain_recurse(bs, false);
  aio_context_release(aio_context);
 +
 +bdrv_unref(bs);
  }
  
 +g_slist_free(bs_list);
 +
  block_job_resume_all();
  }
>>>
>>> It is better to put the references into BdrvNextIterator and introduce
>>> bdrv_next_iterator_destroy() to free them? You'll need to touch all callers
>>> because it is not C++, but it secures all of rest, which seems vulnerable 
>>> in the
>>> same pattern, for example the aio_poll() in iothread_stop_all().
>>
>> You could automatically free the references when bdrv_next() returns
>> NULL. Then you need an explicit bdrv_next_iterator_destroy() only for
>> callers that stop iterating halfway through the list.
>> Yes, good idea.

But bdrv_unref() is safe only in the main loop.  Without having checked,
I'm not sure whether all callers of bdrv_next() are running in the main
loop.

I'd rather introduce a bdr

Re: [Qemu-devel] [PATCH for-2.11] block: Keep strong reference when draining all BDS

2017-11-10 Thread Fam Zheng

On Fri, 11/10 16:23, Max Reitz wrote:
> But bdrv_unref() is safe only in the main loop.  Without having checked,
> I'm not sure whether all callers of bdrv_next() are running in the main
> loop.

They must be. The reasoning is simple:

1) one needs to acquire the ctx of all the BDSes for safe access;
2) only main loop can acquire any BDS' ctx;

So there is no way bdrv_next can work in an IOThread.

Fam

Re: [Qemu-devel] [PATCH] fix scripts/update-linux-headers.sh here document

2017-11-10 Thread Stefan Hajnoczi

On Fri, Nov 10, 2017 at 10:03:54AM +0100, Gerd Hoffmann wrote:
> Signed-off-by: Gerd Hoffmann 
> ---
>  scripts/update-linux-headers.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [Qemu-block] [PATCH] block: all I/O should be completed before removing throttle timers.

2017-11-10 Thread Stefan Hajnoczi

On Sat, Oct 21, 2017 at 01:34:00PM +0800, Zhengui Li wrote:
> From: Zhengui 
> 
> In blk_remove_bs, all I/O should be completed before removing throttle
> timers. If there has inflight I/O, removing throttle timers here will
> cause the inflight I/O never return.
> This patch add bdrv_drained_begin before throttle_timers_detach_aio_context
> to let all I/O completed before removing throttle timers.
> 
> Signed-off-by: Zhengui 
> ---
>  block/block-backend.c | 4 
>  1 file changed, 4 insertions(+)

Hi Zhengui,
Sorry it took so long to get thism merged!

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

I moved the declaration of bs as suggested by Berto when I merged the
patch.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] virtio: fix descriptor counting in virtqueue_pop

2017-11-10 Thread Stefan Hajnoczi

On Thu, Oct 05, 2017 at 08:03:35PM +0200, Alexandre DERUMIER wrote:
> Hi,
> 
> has somebody reviewed this patch ?
> 
> I'm also able de reproduce the vm crash like the proxmox user.
> This patch is fixing it for me too.

This patch should go through Michael Tsirkin's tree.  I have pinged him
separately in case this email thread got buried in his inbox.

Stefan

> 
> Regards,
> 
> Alexandre
> 
> 
> - Mail original -
> De: "Wolfgang Bumiller" 
> À: "qemu-devel" 
> Cc: "pbonzini" , "Michael S. Tsirkin" 
> Envoyé: Mercredi 20 Septembre 2017 08:09:33
> Objet: [Qemu-devel] [PATCH] virtio: fix descriptor counting in virtqueue_pop
> 
> While changing the s/g list allocation, commit 3b3b0628 
> also changed the descriptor counting to count iovec entries 
> as split by cpu_physical_memory_map(). Previously only the 
> actual descriptor entries were counted and the split into 
> the iovec happened afterwards in virtqueue_map(). 
> Count the entries again instead to avoid erroneous 
> "Looped descriptor" errors. 
> 
> Reported-by: Hans Middelhoek  
> Link: https://forum.proxmox.com/threads/vm-crash-with-memory-hotplug.35904/ 
> Fixes: 3b3b0628217e ("virtio: slim down allocation of VirtQueueElements") 
> Signed-off-by: Wolfgang Bumiller  
> --- 
> hw/virtio/virtio.c | 6 +++--- 
> 1 file changed, 3 insertions(+), 3 deletions(-) 
> 
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c 
> index 890b4d7eb7..33bb770177 100644 
> --- a/hw/virtio/virtio.c 
> +++ b/hw/virtio/virtio.c 
> @@ -834,7 +834,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) 
> int64_t len; 
> VirtIODevice *vdev = vq->vdev; 
> VirtQueueElement *elem = NULL; 
> - unsigned out_num, in_num; 
> + unsigned out_num, in_num, elem_entries; 
> hwaddr addr[VIRTQUEUE_MAX_SIZE]; 
> struct iovec iov[VIRTQUEUE_MAX_SIZE]; 
> VRingDesc desc; 
> @@ -852,7 +852,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) 
> smp_rmb(); 
> 
> /* When we start there are none of either input nor output. */ 
> - out_num = in_num = 0; 
> + out_num = in_num = elem_entries = 0; 
> 
> max = vq->vring.num; 
> 
> @@ -922,7 +922,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) 
> } 
> 
> /* If we've got too many, that implies a descriptor loop. */ 
> - if ((in_num + out_num) > max) { 
> + if (++elem_entries > max) { 
> virtio_error(vdev, "Looped descriptor"); 
> goto err_undo_map; 
> } 
> -- 
> 2.11.0 
> 
> 


signature.asc
Description: PGP signature

1 2 3 >

1 - 100 of 220 matches

Mail list logo