date:20190403

Re: [Qemu-devel] [PATCH 01/26] tcg: Assert h2g_valid for 32-bit guest on 64-bit host

2019-04-03 Thread Richard Henderson

On 4/3/19 11:59 AM, Peter Maydell wrote:
>> +#if TARGET_LONG_BITS == 32 && HOST_LONG_BITS == 64
>> +g_assert(h2g_valid(address));
>> +#endif
> 
> I'm not sure this is right. h2g_valid() will check whether the guest address 
> is
> below GUEST_ADDR_MAX. For architectures which set
> TARGET_VIRT_ADDR_SPACE_BITS to something less than 32 there are
> address values which aren't h2g_valid() but which we still want to cause
> a guest exception rather than asserting, aren't there ?

Hmm, you're right, this should test something else.
I'll drop this for now...


r~

Re: [Qemu-devel] [Qemu-arm] [PATCH 04/26] target/arm: Convert to CPUClass::tlb_fill

2019-04-03 Thread Richard Henderson

On 4/3/19 12:14 PM, Peter Maydell wrote:
> On Wed, 3 Apr 2019 at 10:44, Richard Henderson
>  wrote:
> 
>> +bool arm_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
>> +  MMUAccessType access_type, int mmu_idx,
>> +  bool probe, uintptr_t retaddr)
>> +{
>> +ARMCPU *cpu = ARM_CPU(cs);
>> +
>> +#ifdef CONFIG_USER_ONLY
>> +cpu->env.exception.vaddress = address;
>> +if (access_type == MMU_INST_FETCH) {
>> +cs->exception_index = EXCP_PREFETCH_ABORT;
>> +} else {
>> +cs->exception_index = EXCP_DATA_ABORT;
>> +}
>> +cpu_loop_exit_restore(cs, retaddr);
>> +#else
>> +hwaddr phys_addr;
>> +target_ulong page_size;
>> +int prot, ret;
>> +MemTxAttrs attrs = {};
>> +ARMMMUFaultInfo fi = {};
>> +
>> +/*
>> + * Walk the page table and (if the mapping exists) add the page
>> + * to the TLB. Return false on success, or true on failure. Populate
>> + * fsr with ARM DFSR/IFSR fault register format value on failure.
>> + */
> 
> This comment about what we return doesn't seem to match what
> the code is doing.

Bah.  The perils of copying comments while changing the function signature.


r~

[Qemu-devel] Question: can we hot plug a PCIe switch on machine "virt"

2019-04-03 Thread Heyi Guo


Hi folks,

In physical world, a PCIe switch including one upstream port and several 
downstream ports is a single physical device, however we treat each port as a 
device in qemu world. In qemu docs/pcie.txt, we have below statements:

Line 230: Be aware that PCI Express Downstream Ports can't be hot-plugged into
Line 231: an existing PCI Express Upstream Port.

To my understanding, it implies PCIe downstream ports *can* be hot-plugged into 
something which is not an existing upstream port. If it is true, how can we do 
that? AFAIK monitor command device_add can only add one device at a time.

Please help to show the truth.

Thanks,

Heyi

Re: [Qemu-devel] [PATCH v5 1/2] target/mips: Optimize ILVOD. MSA instructions

2019-04-03 Thread Richard Henderson

On 4/2/19 10:15 PM, Mateja Marjanovic wrote:
> +static inline void gen_ilvod_w(CPUMIPSState *env, uint32_t wd,
> +   uint32_t ws, uint32_t wt)
> +{
> +TCGv_i64 t1 = tcg_temp_new_i64();
> +const uint64_t mask = 0xULL;
> +
> +tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
> +tcg_gen_shri_i64(t1, t1, 32);

The andi is useless.  The bits that it discards are also discarded by the shift.


r~

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Richard Henderson

On 4/2/19 11:19 PM, Philippe Mathieu-Daudé wrote:
>> +static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd,
>> +   uint32_t ws, uint32_t wt)
>> +{
>> +TCGv_i64 t1 = tcg_temp_new_i64();
>> +TCGv_i64 t2 = tcg_temp_new_i64();
>> +const uint64_t mask = 0x00ff00ff00ff00ffULL;
>> +
>> +tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
>> +tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
>> +tcg_gen_shli_i64(t2, t2, 8);
>> +tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
>> +
> 
> Richard, is it cheaper to use another register to keep the constant mask
> (here reused 4x)?
> 
> Such:
> 
>TCGv_i64 mask = tcg_const_i64(0x00ff00ff00ff00ffULL);
> 
>tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
>tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
>tcg_gen_shli_i64(t2, t2, 8);
>tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);

With the current state of the tcg optimizer, yes.


r~

Re: [Qemu-devel] [PATCH v2] vmstate: check subsection_found is enough

2019-04-03 Thread Stefano Garzarella

On Wed, Apr 03, 2019 at 09:10:16AM +0800, Wei Yang wrote:
> subsection_found is true implies vmdesc is not NULL.
> 
> This patch remove the additional check on vmdesc and rename
> subsection_found to vmdesc_has_subsections to make it more self-explain.
> 
> Signed-off-by: Wei Yang 
> 
> ---
> v2:
>   * rename it to vmdesc_has_subsections
> ---
>  migration/vmstate.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 

Acked-by: Stefano Garzarella

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Richard Henderson

On 4/2/19 10:15 PM, Mateja Marjanovic wrote:
> +static inline void gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
> +   uint32_t ws, uint32_t wt)
> +{
> +TCGv_i64 t1 = tcg_temp_new_i64();
> +const uint64_t mask = 0xULL;
> +
> +tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
> +tcg_gen_deposit_i64(msa_wr_d[wd * 2], t1, msa_wr_d[ws * 2], 32, 32);

The andi of mask is redundant with the deposit.  Remove it.
This should be just

tcg_gen_deposit_i64(msa_wr_d[wd * 2], msa_wr_d[wt * 2],
msa_wr_d[ws * 2], 32, 32);


r~

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Richard Henderson

On 4/3/19 5:28 AM, Aleksandar Markovic wrote:
>> +    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
>> +    tcg_gen_deposit_i64(msa_wr_d[wd * 2], t1, msa_wr_d[ws * 2], 32, 32);
>> +
>> +    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
>> +    tcg_gen_deposit_i64(msa_wr_d[wd * 2 + 1], t1, msa_wr_d[ws * 2 + 1], 32, 
>> 32);
>> +
>> +    tcg_temp_free_i64(t1);
>> +}
>> +
> 
> This can be further optimized this way: (doublecheck the accuracy, writing 
> from
> home)
> 
> gen_ilvev_w(CPUMIPSState *env, uint32_t wd,
>                                 uint32_t ws, uint32_t wt)
> {
>     tcg_gen_shli_i64(msa_wr_d[wd * 2], msa_wr_d[ws * 2], 32);
>     tcg_gen_deposit_i64(msa_wr_d[wd * 2], t1, msa_wr_d[wt * 2], 0, 32);
> 

No, the shift can be performed by the deposit.
See my other reply in this thread.


r~

Re: [Qemu-devel] [PATCH v2 2/3] block/stream: refactor stream_run: drop goto

2019-04-03 Thread Andrey Shinkevich



On 02/04/2019 15:43, Alberto Garcia wrote:
> On Mon 01 Apr 2019 02:06:04 PM CEST, Andrey Shinkevich wrote:
>> From: Vladimir Sementsov-Ogievskiy 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Andrey Shinkevich 
>> ---
> 
> You can also say in the log message that the goto is not necessary since
> the common exit code was removed in commit eb23654dbe43b549ea2a9ebff9d8e
> 
> Reviewed-by: Alberto Garcia 
> 
> Berto
> 

Alberto,
Thank you for the commit reference.
-- 
With the best regards,
Andrey Shinkevich

Re: [Qemu-devel] [PATCH 3/3] block/stream: introduce a bottom node

2019-04-03 Thread Andrey Shinkevich



On 29/03/2019 19:07, Alberto Garcia wrote:
> On Fri 29 Mar 2019 02:29:14 PM CET, Andrey Shinkevich wrote:
>> @@ -3237,7 +3238,14 @@ void qmp_block_stream(bool has_job_id, const char 
>> *job_id, const char *device,
>>   job_flags |= JOB_MANUAL_DISMISS;
>>   }
>>   
>> -stream_start(has_job_id ? job_id : NULL, bs, base_bs, base_name,
>> +/* Find the bottom node that has the base as its backing image */
>> +bottom_node = bs;
>> +while ((iter = backing_bs(bottom_node)) != base_bs) {
>> +bottom_node = iter;
>> +}
>> +assert(bottom_node);
>> +
>> +stream_start(has_job_id ? job_id : NULL, bs, bottom_node, base_name,
>>job_flags, has_speed ? speed : 0, on_error, &local_err);
> 
> Isn't it simpler to pass 'base' to stream_start() and find the bottom
> node there? (with bdrv_find_overlay()).
> 

I am going to move the bottom = bdrv_find_overlay() into the
stream_start() in v3, which is coming soon.

> I think bottom should be an internal implementation detail of the
> block-stream driver, callers don't need to know about it, or do they?
> 
> Berto
> 

-- 
With the best regards,
Andrey Shinkevich

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Mateja Marjanovic

On 2.4.19. 20:37, Philippe Mathieu-Daudé wrote:

On 4/2/19 7:07 PM, Aleksandar Markovic wrote:

From: Philippe Mathieu-Daudé 
Subject: Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. 
MSA instructions

Hi Mateja,

On 4/2/19 5:15 PM, Mateja Marjanovic wrote:

From: Mateja Marjanovic 

Optimize set of MSA instructions ILVEV, using directly
tcg registers and performing logic on them instead of
using helpers.

Maybe you can still let this previous comment (if still valid):

   Performance measurement is done by executing the
   instructions large number of times on a computer
   with Intel Core i7-3770 CPU @ 3.40GHz×8.

Agreed.

I will add that in v6.

In the following table, the first column is the performance
before this patch. The second represents the performance,
after converting from helpers to tcg, but without using
tcg_gen_deposit function. The third one is the solution
which is implemented in this patch.

You are describing the "no-deposit" which refers to a previous series
but won't be accessible in the git repository.

I think this table is useful in the cover of this series, and in the
commit message you should use || before || after || and drop the
"no-deposit" case.

  instr||   before|| no-deposit ||  with-deposit

  ilvev.b  ||  126.92 ms  ||  24.52 ms  ||  24.43 ms
  ilvev.h  ||   93.67 ms  ||  23.92 ms  ||  23.86 ms
  ilvev.w  ||  117.86 ms  ||  23.83 ms  ||  22.17 ms
  ilvev.d  ||   45.49 ms  ||  19.74 ms  ||  19.71 ms

The solution with deposit is suggested by Richard Henderson.

I think the table should remain in the commit message, to keep it
visible in the git logs.

You could insert the "no-deposit" source code of gen_ilvev_w()
in the commit message, for reference reasons - it is not a too
large function.

Clever :)

I agree, I will add the code in the commit message in v6.

Thanks,
Aleksandar

The gitdm parsable form is:

Suggested-by: Richard Henderson 

Signed-off-by: Mateja Marjanovic 
---
  target/mips/helper.h |   1 -
  target/mips/msa_helper.c |   9 
  target/mips/translate.c  | 105 ++-
  3 files changed, 104 insertions(+), 11 deletions(-)

diff --git a/target/mips/helper.h b/target/mips/helper.h
index 02e16c7..82f6a40 100644
--- a/target/mips/helper.h
+++ b/target/mips/helper.h
@@ -864,7 +864,6 @@ DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, i32, i32)
  DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
  DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
  DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32)
  DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
  DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
  DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
index a7ea6aa..d5c3842 100644
--- a/target/mips/msa_helper.c
+++ b/target/mips/msa_helper.c
@@ -1197,15 +1197,6 @@ MSA_FN_DF(ilvl_df)
  } while (0)
  MSA_FN_DF(ilvr_df)
  #undef MSA_DO
-
-#define MSA_DO(DF)  \
-do {\
-pwx->DF[2*i]   = pwt->DF[2*i];  \
-pwx->DF[2*i+1] = pws->DF[2*i];  \
-} while (0)
-MSA_FN_DF(ilvev_df)
-#undef MSA_DO
-
  #undef MSA_LOOP_COND

  #define MSA_LOOP_COND(DF) \
diff --git a/target/mips/translate.c b/target/mips/translate.c
index 04406d6..e26c6a6 100644
--- a/target/mips/translate.c
+++ b/target/mips/translate.c
@@ -28974,6 +28974,94 @@ static inline void gen_ilvod_d(CPUMIPSState *env, 
uint32_t wd,
  tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
  }

+/*
+ * [MSA] ILVEV.B wd, ws, wt
+ *
+ *   Vector Interleave Even (byte data elements)
+ *
+ */
+static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd,
+   uint32_t ws, uint32_t wt)
+{
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 t2 = tcg_temp_new_i64();
+const uint64_t mask = 0x00ff00ff00ff00ffULL;
+
+tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
+tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
+tcg_gen_shli_i64(t2, t2, 8);
+tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
+

Richard, is it cheaper to use another register to keep the constant mask
(here reused 4x)?

Such:

TCGv_i64 mask = tcg_const_i64(0x00ff00ff00ff00ffULL);

tcg_gen_and_i64(t1, msa_wr_d[wt * 2], mask);
tcg_gen_and_i64(t2, msa_wr_d[ws * 2], mask);
tcg_gen_shli_i64(t2, t2, 8);
tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);

+tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
+tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask);

Here use tcg_gen_and_i64() too.

+tcg_gen_shli_i64(t2, t2, 8);
+tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
+

tcg_temp_free_i64(mask);

+tcg_temp_free_i64(t1);
+tcg_temp_free_i64(t2);

Mateja: Can you test for perf easily?

I will test

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Mateja Marjanovic




On 2.4.19. 20:51, Aleksandar Markovic wrote:

+/*
+ * [MSA] ILVEV.D wd, ws, wt
+ *
+ *   Vector Interleave Even (Double data elements)
+ *
+ */

Double -> Doubleword


I will change it in v6.

Re: [Qemu-devel] [PATCH v4 5/5] target/mips: Refactor and fix INSERT. instructions

2019-04-03 Thread Mateja Marjanovic

On 2.4.19. 22:50, Aleksandar Markovic wrote:

From: Mateja Marjanovic 
Subject: [PATCH v4 5/5] target/mips: Refactor and fix INSERT. 
instructions

From: Mateja Marjanovic 

The old version of the helper for the INSERT. MSA instructions
has been replaced with four helpers that don't use switch, and change
the endianness of the given index, when executed on a big endian host.

Signed-off-by: Mateja Marjanovic 
---

...

+n %= 16;

Mateja, could you just clarify what is the purpose of this line (and
similar three lines involving "%=")? It looks to me that n is already
limited here to be between 0 and 15, isn't it? (that follows from the
source code of gen_msa_elm().) What made you insert this line,
as it stands?

It was

n %= DF_ELEMENTS(df);

but when I deleted the df argument, so it had to be done like
this. I think it's a matter of precaution (in case a number
greater than 15, or 8... is passed as an argument).

Thanks,
Aleksandar

Thanks,
Mateja

[Qemu-devel] [PATCH] migration: fix migration shutdown

2019-04-03 Thread Yury Kotov

It fixes heap-use-after-free which was found by clang's ASAN.

Control flow of this use-after-free:
main_thread:
* Got SIGTERM and completes main loop
* Calls migration_shutdown
  - migrate_fd_cancel (so, migration_thread begins to complete)
  - object_unref(OBJECT(current_migration));

migration_thread:
* migration_iteration_finish -> schedule cleanup bh
* object_unref(OBJECT(s)); (Now, current_migration is freed)
* exits

main_thread:
* Calls vm_shutdown -> drain bdrvs -> main loop
  -> cleanup_bh -> use after free

If you want to reproduce, these couple of sleeps will help:
vl.c:4613:
 migration_shutdown();
+sleep(2);
migration.c:3269:
+sleep(1);
 trace_migration_thread_after_loop();
 migration_iteration_finish(s);

Original output:
qemu-system-x86_64: terminating on signal 15 from pid 31980 ()
=
==31958==ERROR: AddressSanitizer: heap-use-after-free on address 0x6191d210
  at pc 0x58a535ca bp 0x7fffb190 sp 0x7fffb188
READ of size 8 at 0x6191d210 thread T0 (qemu-vm-0)
#0 0x58a535c9 in migrate_fd_cleanup migration/migration.c:1502:23
#1 0x594fde0a in aio_bh_call util/async.c:90:5
#2 0x594fe522 in aio_bh_poll util/async.c:118:13
#3 0x59524783 in aio_poll util/aio-posix.c:725:17
#4 0x59504fb3 in aio_wait_bh_oneshot util/aio-wait.c:71:5
#5 0x573bddf6 in virtio_blk_data_plane_stop
  hw/block/dataplane/virtio-blk.c:282:5
#6 0x589d5c09 in virtio_bus_stop_ioeventfd hw/virtio/virtio-bus.c:246:9
#7 0x589e9917 in virtio_pci_stop_ioeventfd hw/virtio/virtio-pci.c:287:5
#8 0x589e22bf in virtio_pci_vmstate_change hw/virtio/virtio-pci.c:1072:9
#9 0x57628931 in virtio_vmstate_change hw/virtio/virtio.c:2257:9
#10 0x57c36713 in vm_state_notify vl.c:1605:9
#11 0x5716ef53 in do_vm_stop cpus.c:1074:9
#12 0x5716eeff in vm_shutdown cpus.c:1092:12
#13 0x57c4283e in main vl.c:4617:5
#14 0x7fffdfdb482f in __libc_start_main
  (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#15 0x56ecb118 in _start (x86_64-softmmu/qemu-system-x86_64+0x1977118)

0x6191d210 is located 144 bytes inside of 952-byte region
  [0x6191d180,0x6191d538)
freed by thread T6 (live_migration) here:
#0 0x56f76782 in __interceptor_free
  
/tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3
#1 0x58d5fa94 in object_finalize qom/object.c:618:9
#2 0x58d57651 in object_unref qom/object.c:1068:9
#3 0x58a55588 in migration_thread migration/migration.c:3272:5
#4 0x595393f2 in qemu_thread_start util/qemu-thread-posix.c:502:9
#5 0x7fffe057f6b9 in start_thread 
(/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)

previously allocated by thread T0 (qemu-vm-0) here:
#0 0x56f76b03 in __interceptor_malloc
  
/tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3
#1 0x76ee37b8 in g_malloc 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4f7b8)
#2 0x58d58031 in object_new qom/object.c:640:12
#3 0x58a31f21 in migration_object_init migration/migration.c:139:25
#4 0x57c41398 in main vl.c:4320:5
#5 0x7fffdfdb482f in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

Thread T6 (live_migration) created by T0 (qemu-vm-0) here:
#0 0x56f5f0dd in pthread_create
  
/tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:210:3
#1 0x59538cf9 in qemu_thread_create util/qemu-thread-posix.c:539:11
#2 0x58a53304 in migrate_fd_connect migration/migration.c:3332:5
#3 0x58a72bd8 in migration_channel_connect migration/channel.c:92:5
#4 0x58a6ef87 in exec_start_outgoing_migration migration/exec.c:42:5
#5 0x58a4f3c2 in qmp_migrate migration/migration.c:1922:9
#6 0x58bb4f6a in qmp_marshal_migrate 
qapi/qapi-commands-migration.c:607:5
#7 0x59363738 in do_qmp_dispatch qapi/qmp-dispatch.c:131:5
#8 0x59362a15 in qmp_dispatch qapi/qmp-dispatch.c:174:11
#9 0x571bac15 in monitor_qmp_dispatch monitor.c:4124:11
#10 0x5719a22d in monitor_qmp_bh_dispatcher monitor.c:4207:9
#11 0x594fde0a in aio_bh_call util/async.c:90:5
#12 0x594fe522 in aio_bh_poll util/async.c:118:13
#13 0x595201e0 in aio_dispatch util/aio-posix.c:460:5
#14 0x59503553 in aio_ctx_dispatch util/async.c:261:5
#15 0x76ede196 in g_main_context_dispatch
  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4a196)

SUMMARY: AddressSanitizer: heap-use-after-free migration/migration.c:1502:23
  in migrate_fd_cleanup
Shadow bytes around the buggy address:
  0x0c327fffb9f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c327fffba20: fa fa fa fa fa fa fa fa fa fa fa fa fa

Re: [Qemu-devel] [PATCH v1] exec: check the range in the address_space_unmap routine

2019-04-03 Thread Dima Stepanov

On Fri, Mar 22, 2019 at 01:35:57PM +, Peter Maydell wrote:
> On Fri, 22 Mar 2019 at 13:19, Dima Stepanov  wrote:
> >
> > In case of the virtio-blk communication, can get the following assertion
> > for the specifically crafted virtio packet:
> >   qemu-system-x86_64: exec.c:3725: address_space_unmap: Assertion `mr !=
> >   NULL' failed.
> > This assertion is triggered if the length of the first descriptor in the
> > block request chain (block command descriptor) is more than block command
> > size. In this case the hw/block/virtio-blk.c:virtio_blk_handle_request()
> > routine calls the iov_discard_front() function and the iov base and size
> > are changed. As a result the address can not be found during the
> > address_space_unmap() call.
> >
> > The fix is to check the whole address range in the address_space_unmap
> > function.
> >
> > Signed-off-by: Dima Stepanov 
> > ---
> >  exec.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/exec.c b/exec.c
> > index 86a38d3..0eeb018 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -3717,7 +3717,7 @@ void *address_space_map(AddressSpace *as,
> >  void address_space_unmap(AddressSpace *as, void *buffer, hwaddr len,
> >   int is_write, hwaddr access_len)
> >  {
> > -if (buffer != bounce.buffer) {
> > +if ((buffer < bounce.buffer) || (buffer + access_len > bounce.buffer + 
> > bounce.len)) {
> >  MemoryRegion *mr;
> >  ram_addr_t addr1;
> 
> A quick look at the xen_invalidate_map_cache_entry() implementation
> suggests that it also assumes that the address passed to
> address_space_unmap() must be the same address that was
> originally handed out via address_space_map().
Hard to say for me, if it is needed or not, since we have no xen
reproducer for this issue. Right now we are making some fuzzing for the
virtio-blk devices and hit these asserts which are good to fix.

> 
> So I think we either need to also change the Xen code, or
> we need to fix this at the virtio level by having it keep
> track of the buffer it was handed so it can unmap it.
Maybe a fix at virtio level will be better in general, what do you
think?

Thanks, Dima.

> 
> thanks
> -- PMM

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Mateja Marjanovic




On 3.4.19. 01:25, Aleksandar Markovic wrote:



On Apr 2, 2019 5:20 PM, "Mateja Marjanovic" 
mailto:mateja.marjano...@rt-rk.com>> wrote:

>
> From: Mateja Marjanovic >

>
> Optimize set of MSA instructions ILVEV, using directly

Use full instruction names, with the only exception of possible 
Bachus-Naur forms... again.



I will, I didn't change it from some of the previous versions.


> tcg registers and performing logic on them instead of
> using helpers.
>
> In the following table, the first column is the performance
> before this patch. The second represents the performance,
> after converting from helpers to tcg, but without using
> tcg_gen_deposit function. The third one is the solution
> which is implemented in this patch.
>
>  instr    ||   before    || no-deposit ||  with-deposit
> 
>  ilvev.b  ||  126.92 ms  ||  24.52 ms  ||  24.43 ms
>  ilvev.h  ||   93.67 ms  ||  23.92 ms  ||  23.86 ms
>  ilvev.w  ||  117.86 ms  ||  23.83 ms  ||  22.17 ms
>  ilvev.d  ||   45.49 ms  ||  19.74 ms  ||  19.71 ms
>
> The solution with deposit is suggested by Richard Henderson.
>
> Signed-off-by: Mateja Marjanovic >

> ---

The byte and halfword cases of this patch most likely produce highly 
unoptimized code for cases:


wd == wt == ws
wd == wt != ws
wd != ws == wt
wd == ws != wt

Please take these cases into account.

The same for patch 1/2.


Maybe, but if I put if statements asking are the registers the same,
it would affect the performance significantly in all cases. If some
registers were equal, it would be faster, but if not, just those if 
statements

would slow things down.


Thanks,
Aleksandar


Thanks,
Mateja


>  target/mips/helper.h     |   1 -
>  target/mips/msa_helper.c |   9 
>  target/mips/translate.c  | 105 
++-

>  3 files changed, 104 insertions(+), 11 deletions(-)
>
> diff --git a/target/mips/helper.h b/target/mips/helper.h
> index 02e16c7..82f6a40 100644
> --- a/target/mips/helper.h
> +++ b/target/mips/helper.h
> @@ -864,7 +864,6 @@ DEF_HELPER_5(msa_pckev_df, void, env, i32, i32, 
i32, i32)

>  DEF_HELPER_5(msa_pckod_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_ilvl_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_ilvr_df, void, env, i32, i32, i32, i32)
> -DEF_HELPER_5(msa_ilvev_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_vshf_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_srar_df, void, env, i32, i32, i32, i32)
>  DEF_HELPER_5(msa_srlr_df, void, env, i32, i32, i32, i32)
> diff --git a/target/mips/msa_helper.c b/target/mips/msa_helper.c
> index a7ea6aa..d5c3842 100644
> --- a/target/mips/msa_helper.c
> +++ b/target/mips/msa_helper.c
> @@ -1197,15 +1197,6 @@ MSA_FN_DF(ilvl_df)
>      } while (0)
>  MSA_FN_DF(ilvr_df)
>  #undef MSA_DO
> -
> -#define MSA_DO(DF)                      \
> -    do {                                \
> -        pwx->DF[2*i]   = pwt->DF[2*i];  \
> -        pwx->DF[2*i+1] = pws->DF[2*i];  \
> -    } while (0)
> -MSA_FN_DF(ilvev_df)
> -#undef MSA_DO
> -
>  #undef MSA_LOOP_COND
>
>  #define MSA_LOOP_COND(DF) \
> diff --git a/target/mips/translate.c b/target/mips/translate.c
> index 04406d6..e26c6a6 100644
> --- a/target/mips/translate.c
> +++ b/target/mips/translate.c
> @@ -28974,6 +28974,94 @@ static inline void gen_ilvod_d(CPUMIPSState 
*env, uint32_t wd,

>      tcg_gen_mov_i64(msa_wr_d[wd * 2 + 1], msa_wr_d[ws * 2 + 1]);
>  }
>
> +/*
> + * [MSA] ILVEV.B wd, ws, wt
> + *
> + *   Vector Interleave Even (byte data elements)
> + *
> + */
> +static inline void gen_ilvev_b(CPUMIPSState *env, uint32_t wd,
> +                               uint32_t ws, uint32_t wt)
> +{
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    const uint64_t mask = 0x00ff00ff00ff00ffULL;
> +
> +    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
> +    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
> +    tcg_gen_shli_i64(t2, t2, 8);
> +    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
> +
> +    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2 + 1], mask);
> +    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2 + 1], mask);
> +    tcg_gen_shli_i64(t2, t2, 8);
> +    tcg_gen_or_i64(msa_wr_d[wd * 2 + 1], t1, t2);
> +
> +    tcg_temp_free_i64(t1);
> +    tcg_temp_free_i64(t2);
> +}
> +
> +/*
> + * [MSA] ILVEV.H wd, ws, wt
> + *
> + *   Vector Interleave Even (halfword data elements)
> + *
> + */
> +static inline void gen_ilvev_h(CPUMIPSState *env, uint32_t wd,
> +                               uint32_t ws, uint32_t wt)
> +{
> +    TCGv_i64 t1 = tcg_temp_new_i64();
> +    TCGv_i64 t2 = tcg_temp_new_i64();
> +    const uint64_t mask = 0xULL;
> +
> +    tcg_gen_andi_i64(t1, msa_wr_d[wt * 2], mask);
> +    tcg_gen_andi_i64(t2, msa_wr_d[ws * 2], mask);
> +    tcg_gen_shli_i64(t2, t2, 16);
> +    tcg_gen_or_i64(msa_wr_d[wd * 2], t1, t2);
> +
> +    tcg_gen_andi_i64(t1, msa_wr_

Re: [Qemu-devel] Issues around TYPE_INTERFACE

2019-04-03 Thread Markus Armbruster

I figure what I wrote was too long to read, so let me try again,
focusing on just the main issue, leaving the bonus messes for later.

Here's the main issue in a nutshell: (1) our interfaces can't have state
(but that's okay), (2) our conversions to interface types actually
convert to "an Object that has this interface" (and that's anything but
obvious).


Why (1) our interfaces can't have state:

Since interfaces are abstract, there's no such thing as an interface
instance.

Okay, but WTF does a conversion to an interface type *mean* then?
Recall, interface TYPE_MY_IFACE commonly defines struct type MyIface and
macro MY_IFACE(), which is a checked conversion from any instance of
Object to MyIface.  What *is* MyIface?


What (2) our conversions to interface types do:

MY_IFACE(obj) either fails its assertion or returns (MyIface *)obj.

When @obj points to an instance of MyFrob, then we can safely cast it to
any supertype of MyFrob.

MyIface is not a supertype of MyFrob, it's an interface.  WTF?

Turns out we only ever define MyIface in one of two ways:

typedef struct MyIface MyIface; // incomplete

typedef struct MyIface {// fake subtype of Object
Object parent;
} MyIface;

If we somehow know that only subtypes of SuperFrob have interface
MyIface, then we can exploit that and safely cast to SuperFrob * or any
of its supertypes.

If we don't (want to) know, then we can only cast to Object *.

The "fake subtype of Object" kind could save casts to Object *, but we
don't seem to use that.  Cargo cult?  Fear of not completing your types?

Regardless of how we cast or convert, we still know the object has
interface TYPE_MY_IFACE.


Example of how we actually use this: TYPE_MEMORY_DEVICE and unplug

pc_memory_unplug() converts DeviceState * to subtype PCDIMMDevice *,
checked.  Okay.

pc_dimm_unplug() converts PCDIMMDevice * to interface MemoryDeviceState
*.  Ooookay.  Note that MemoryDeviceState is incomplete.

memory_device_unplug() passes it on to its get_memory_region() method.
Since it's actually a PCDIMMDevice, this is
pc_dimm_md_get_memory_region().

pc_dimm_md_get_memory_region() converts the MemoryDeviceState * right
back to PCDIMMDevice *, checked.  This works even though
MemoryDeviceState is incomplete, because the checked conversion
PC_DIMM() accepts *any* pointer type.  It casts to Object * below the
hood, and if it isn't one, we get undefined behavior.

Re: [Qemu-devel] [PATCH v2 0/2] intel_iommu: misc scalable mode fixes for 4.0

2019-04-03 Thread Peter Xu

On Fri, Mar 29, 2019 at 02:14:20PM +0800, Peter Xu wrote:
> v2:
> - patch 2: use "1" instead of "sizeof(bool)" for VMSTATE_UNUSED
>   because sizeof(bool) can be >1 depends on definition [Dave]
> 
> The first patch is the important one.  It should fix up a migration
> issue that Dave reported between 3.1<->4.0.  The second patch is born
> only because I noticed it when drafting patch 1 and I think we can
> probably do that too together as a fixup to the scalable patchset for
> 4.0.

Ping - just want to make sure this series won't miss 4.0.  Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] hostmem: Disable add/del memory during migration

2019-04-03 Thread Yury Kotov

25.03.2019, 14:58, "Juan Quintela" :
> Yury Kotov  wrote:
>>  I found a bug in QEMU 2.12 with adding memory-backend while live migration
>>  thread is running.
>>
>>  But it seems that this bug was implicitly fixed in this commit (QEMU 3.0):
>>    b895de50: migration: discard non-migratable RAMBlocks
>>
>>  I think it's better to disallow add/del memory backends during migration to
>>  to prevent other possible problems. Anyway, user can't use this memory 
>> because
>>  of disabled hotplug/hotunplug devs.
>
> Hi
>
> My understanding is that we already disable memory hotplug/unplug during
> migration. At least, the idea of those patches was to disable all
> hotplug/unplug during migration. The only reason that I can think for
> using this patch is if anyone is planning about support some
> hotplug/upplug during migration (to my knowledge, nobody is working on
> that).
>

Sorry, what patches do you mean? It seems that I missed them...

> So, I think it is better to just disallow on high level all hoplug
> operations, instead of in all backends.
>

Agreed. I also had such idea and I did a quick look on this, but din't found how
to do it without some refactoring.

> But will wait to see what everybody else think about it.
>
> BTW, if we plan to ship this for an old qemu, I will try to disable
> hotplug for all devices, not only memory, no?
>
> Later, Juan.
>
>>  The idea of this commit is the same as that:
>>    b06424de: migration: Disable hotplug/unplug memory during migration
>>
>>  Backtrace of this bug in QEMU 2.12:
>>  0 find_next_bit (addr=addr@entry=0x0, size=size@entry=262144, 
>> offset=offset@entry=0) at util/bitops.c:46
>>  1 migration_bitmap_find_dirty (rs=0x7f58f80008c0, start=0, 
>> rb=0x5557e66e3200) at migration/ram.c:816
>>  2 find_dirty_block (again=, pss=, 
>> rs=0x7f58f80008c0) at migration/ram.c:1243
>>  3 ram_find_and_save_block (rs=rs@entry=0x7f58f80008c0, 
>> last_stage=last_stage@entry=false) at migration/ram.c:1592
>>  4 ram_find_and_save_block (last_stage=false, rs=0x7f58f80008c0) at 
>> migration/ram.c:2335
>>  5 ram_save_iterate (f=0x5557e69f1000, opaque=) at 
>> migration/ram.c:2338
>>  6 qemu_savevm_state_iterate (f=0x5557e69f1000, postcopy=false) at 
>> migration/savevm.c:1191
>>  7 migration_iteration_run (s=0x5557e666b030) at migration/migration.c:2301
>>  8 migration_thread (opaque=0x5557e666b030) at migration/migration.c:2409
>>  9 start_thread (arg=0x7f59055d5700) at pthread_create.c:333
>>  10 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>
>>  Signed-off-by: Yury Kotov 
>>  ---
>>   backends/hostmem.c | 9 -
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>>  diff --git a/backends/hostmem.c b/backends/hostmem.c
>>  index f61093654e..5c71bd3f6b 100644
>>  --- a/backends/hostmem.c
>>  +++ b/backends/hostmem.c
>>  @@ -18,6 +18,7 @@
>>   #include "qapi/visitor.h"
>>   #include "qemu/config-file.h"
>>   #include "qom/object_interfaces.h"
>>  +#include "migration/misc.h"
>>
>>   #ifdef CONFIG_NUMA
>>   #include 
>>  @@ -271,6 +272,11 @@ host_memory_backend_memory_complete(UserCreatable *uc, 
>> Error **errp)
>>   void *ptr;
>>   uint64_t sz;
>>
>>  + if (!migration_is_idle()) {
>>  + error_setg(errp, "Adding memory-backend isn't allowed while migrating");
>>  + goto out;
>>  + }
>>  +
>>   if (bc->alloc) {
>>   bc->alloc(backend, &local_err);
>>   if (local_err) {
>>  @@ -344,7 +350,8 @@ out:
>>   static bool
>>   host_memory_backend_can_be_deleted(UserCreatable *uc)
>>   {
>>  - if (host_memory_backend_is_mapped(MEMORY_BACKEND(uc))) {
>>  + if (host_memory_backend_is_mapped(MEMORY_BACKEND(uc)) ||
>>  + !migration_is_idle()) {
>>   return false;
>>   } else {
>>   return true;

Regards,
Yury

Re: [Qemu-devel] [RFC PATCH] QEMU may write to system_memory before guest starts

2019-04-03 Thread Юрий Котов

Ping

21.03.2019, 19:27, "Yury Kotov" :
> Hi,
>
> 19.03.2019, 14:52, "Dr. David Alan Gilbert" :
>>  * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>   On Tue, 19 Mar 2019 at 11:03, Dr. David Alan Gilbert
>>>    wrote:
>>>   >
>>>   > * Peter Maydell (peter.mayd...@linaro.org) wrote:
>>>   > > I didn't think migration distinguished between "main memory"
>>>   > > and any other kind of RAMBlock-backed memory ?
>>>   >
>>>   > In Yury's case there's a distinction between RAMBlock's that are mapped
>>>   > with RAM_SHARED (which normally ends up as MAP_SHARED) and all others.
>>>   > You can set that for main memory by using -numa to specify a memdev
>>>   > that's backed by a file and has the share=on property.
>>>   >
>>>   > On x86 the ROMs end up as separate RAMBlock's that aren't affected
>>>   > by that -numa/share=on - so they don't fight Yury's trick.
>>>
>>>   You can use the generic loader on x86 to load an ELF file
>>>   into RAM if you want, which would I think also trigger this.
>>
>>  OK, although that doesn't worry me too much - since in the majority
>>  of cases Yury's trick still works well.
>>
>>  I wonder if there's a way to make Yury's code to detect these cases
>>  and not allow the feature; the best thing for the moment would seem to
>>  be to skip the aarch test that uses elf loading.
>
> Currently, I've no idea how to detect such cases, but there is an ability to
> detect memory corruption. I want to update the RFC patch to let user to map 
> some
> memory regions as readonly until incoming migration start.
>
> E.g.
> 1) If x-ignore-shared is enabled in command line or memory region is marked
>    (something like ',readonly=on'),
> 2) Memory region is shared (,share=on),
> 3) And qemu is started with '-incoming' option
>
> Then map such regions as readonly until incoming migration finished.
> Thus, the patch will be able to detect memory corruption and will not affect
> normal cases.
>
> How do you think, is it needed?
>
> I already have a cleaner version of the RFC patch, but I'm not sure about 1).
> Which way is better: enable capability in command line, add a new option for
> memory-backend or something else.
>
>>  Dave
>>
>>>   thanks
>>>   -- PMM
>>  --
>>  Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
>
> Regards,
> Yury

Re: [Qemu-devel] [PATCH 11/14] hw/vfio/ccw: avoid taking address members in packed structs

2019-04-03 Thread Cornelia Huck

On Tue, 2 Apr 2019 17:11:29 +0100
Daniel P. Berrangé  wrote:

> On Tue, Apr 02, 2019 at 06:00:33PM +0200, Cornelia Huck wrote:
> > On Fri, 29 Mar 2019 11:11:01 +
> > Daniel P. Berrangé  wrote:
> >   
> > > The GCC 9 compiler complains about many places in s390 code
> > > that take the address of members of the 'struct SCHIB' which
> > > is marked packed:
> > > 
> > > hw/vfio/ccw.c: In function ‘vfio_ccw_io_notifier_handler’:
> > > hw/vfio/ccw.c:133:15: warning: taking address of packed member of ‘struct 
> > > SCHIB’ may result in an unaligned pointer value \
> > > [-Waddress-of-packed-member]
> > >   133 | SCSW *s = &sch->curr_status.scsw;
> > >   |   ^~
> > > hw/vfio/ccw.c:134:15: warning: taking address of packed member of ‘struct 
> > > SCHIB’ may result in an unaligned pointer value \
> > > [-Waddress-of-packed-member]
> > >   134 | PMCW *p = &sch->curr_status.pmcw;
> > >   |   ^~
> > > 
> > > ...snip many more...
> > > 
> > > Almost all of these are just done for convenience to avoid
> > > typing out long variable/field names when referencing struct
> > > members. We can get most of this convenience by taking the
> > > address of the 'struct SCHIB' instead, avoiding triggering
> > > the compiler warnings.
> > > 
> > > In a couple of places we copy via a local variable which is
> > > a technique already applied elsewhere in s390 code for this
> > > problem.
> > > 
> > > Signed-off-by: Daniel P. Berrangé 
> > > ---
> > >  hw/vfio/ccw.c | 42 ++
> > >  1 file changed, 22 insertions(+), 20 deletions(-)  
> > 
> > I'm currently in the process of queuing this and the other three s390x
> > fixes, but I'm inclined to do so for 4.1 (it feels a bit late in the
> > cycle for 4.0.)
> > 
> > Other opinions?  
> 
> It would be nice to be warning free for 4.0, but I agree that it feels
> kind of late to be making these changes. They're not fixing real world
> bugs, and even if you queue the s390 bits we're unlikely to get all the
> others merged, especially the usb-mtp one is a nasty mess. So we'll
> not be 100% warning free.

Yeah, but OTOH, the s390x changes are straightforward and have been
reviewed by several people.

So I changed my mind and queued them to s390-fixes :)

Re: [Qemu-devel] [PATCH] hostmem: Disable add/del memory during migration

2019-04-03 Thread Daniel P . Berrangé

On Mon, Mar 25, 2019 at 12:52:06PM +0100, Juan Quintela wrote:
> Yury Kotov  wrote:
> > I found a bug in QEMU 2.12 with adding memory-backend while live migration
> > thread is running.
> >
> > But it seems that this bug was implicitly fixed in this commit (QEMU 3.0):
> >   b895de50: migration: discard non-migratable RAMBlocks
> >
> > I think it's better to disallow add/del memory backends during migration to
> > to prevent other possible problems. Anyway, user can't use this memory 
> > because
> > of disabled hotplug/hotunplug devs.
> 
> Hi
> 
> My understanding is that we already disable memory hotplug/unplug during
> migration. At least, the idea of those patches was to disable all
> hotplug/unplug during migration.  The only reason that I can think for
> using this patch is if anyone is planning about support some
> hotplug/upplug during migration (to my knowledge, nobody is working on
> that).
> 
> So, I think it is better to just disallow on high level all hoplug
> operations, instead of in all backends.
> 
> But will wait to see what everybody else think about it.

FWIW, libvirt already rejects any API call that would change guest
ABI while migration is taking place, which covers any hotplug or
hotunplug operation for devices or memory or anything else.

In fact libvirt whitelist which QMP commands it is willing to allow
during migration to only those which it knows are going to be safe
to use.  Libvirt's whitelist is quite narrow since there are many
QMP command it'll never run at any time.

> BTW,   if we plan to ship this for an old qemu, I will try to disable
> hotplug for all devices, not only memory, no?

Perhaps QMP could have some concept of a dynamic whitelist of commands,
so at start of migration, migration code would register a whitelist with
QMP, so it would reject everything else upfront. This would avoid needing
to put migration checks into every QMP command handler that can cause
problems ?

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt"

2019-04-03 Thread Igor Mammedov

On Tue, 2 Apr 2019 17:38:26 +0200
Laszlo Ersek  wrote:

> On 04/02/19 17:29, Auger Eric wrote:
> > Hi Laszlo,
> > 
> > On 4/1/19 3:07 PM, Laszlo Ersek wrote:  
> >> On 03/29/19 14:56, Auger Eric wrote:  
> >>> Hi Ard,
> >>>
> >>> On 3/29/19 2:14 PM, Ard Biesheuvel wrote:  
>  On Fri, 29 Mar 2019 at 14:12, Auger Eric  wrote:  
> >
> > Hi Shameer,
> >
> > On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote:  
> >>
> >>  
> >>> -Original Message-
> >>> From: Auger Eric [mailto:eric.au...@redhat.com]
> >>> Sent: 29 March 2019 09:32
> >>> To: Shameerali Kolothum Thodi ;
> >>> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com;
> >>> peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> >>> sa...@linux.intel.com; sebastien.bo...@intel.com
> >>> Cc: Linuxarm ; xuwei (O) ;
> >>> Laszlo Ersek ; Ard Biesheuvel
> >>> ; Leif Lindholm 
> >>> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature 
> >>> "fdt"
> >>>
> >>> Hi Shameer,
> >>>
> >>> [ + Laszlo, Ard, Leif ]
> >>>
> >>> On 3/21/19 11:47 AM, Shameer Kolothum wrote:  
>  This is to disable/enable populating DT nodes in case
>  any conflict with acpi tables. The default is "off".  
> >>> The name of the option sounds misleading to me. Also we don't really
> >>> know the scope of the disablement. At the moment this just aims to
> >>> prevent the hotpluggable dt nodes from being added if we boot in ACPI 
> >>> mode.
> >>>  
> 
>  This will be used in subsequent patch where cold plug
>  device-memory support is added for DT boot.  
> >>> I am concerned about the fact that in dt mode, by default, you won't 
> >>> see
> >>> any PCDIMM nodes.  
> 
>  If DT memory node support is added for cold-plugged device
>  memory, those memory will be visible to Guest kernel via
>  UEFI GetMemoryMap() and gets treated as early boot memory.  
> >>> Don't we have an issue in UEFI then. Normally the SRAT indicates 
> >>> whether
> >>> the slots are hotpluggable or not. Shouldn't the UEFI code look at 
> >>> this
> >>> info.  
> >>
> >> Sorry I missed this part. Yes, that will be a more cleaner solution.
> >>
> >> Also, to be more clear on what happens,
> >>
> >> Guest ACPI boot with "fdt=on" ,
> >>
> >> From kernel log,
> >>
> >> [0.00] Early memory node ranges
> >> [0.00]   node   0: [mem 0x4000-0xbbf5]
> >> [0.00]   node   0: [mem 0xbbf6-0xbbff]
> >> [0.00]   node   0: [mem 0xbc00-0xbc02]
> >> [0.00]   node   0: [mem 0xbc03-0xbc36]
> >> [0.00]   node   0: [mem 0xbc37-0xbf64]
> >> [0.00]   node   0: [mem 0xbf65-0xbf6d]
> >> [0.00]   node   0: [mem 0xbf6e-0xbf6e]
> >> [0.00]   node   0: [mem 0xbf6f-0xbf80]
> >> [0.00]   node   0: [mem 0xbf81-0xbfff]
> >> [0.00]   node   0: [mem 0xc000-0x] 
> >>  --> This is the hotpluggable memory node from DT.
> >> [0.00] Zeroed struct page in unavailable ranges: 1040 pages
> >> [0.00] Initmem setup node 0 [mem 
> >> 0x4000-0x]
> >>
> >>
> >> Guest ACPI boot with "fdt=off" ,
> >>
> >> [0.00] Movable zone start for each node
> >> [0.00] Early memory node ranges
> >> [0.00]   node   0: [mem 0x4000-0xbbf5]
> >> [0.00]   node   0: [mem 0xbbf6-0xbbff]
> >> [0.00]   node   0: [mem 0xbc00-0xbc02]
> >> [0.00]   node   0: [mem 0xbc03-0xbc36]
> >> [0.00]   node   0: [mem 0xbc37-0xbf64]
> >> [0.00]   node   0: [mem 0xbf65-0xbf6d]
> >> [0.00]   node   0: [mem 0xbf6e-0xbf6e]
> >> [0.00]   node   0: [mem 0xbf6f-0xbf80]
> >> [0.00]   node   0: [mem 0xbf81-0xbfff]
> >> [0.00] Zeroed struct page in unavailable ranges: 1040 pages
> >> [0.00] Initmem setup node 0 [mem 
> >> 0x4000-0xbfff
> >>
> >> The hotpluggable memory node is absent from early memory nodes here.  
> >
> > OK thank you for the example illustrating the concern.  
> >>
> >> As you said, it could be possible to detect this node using SRAT in 
> >> UEFI.  
> >
> > Let's wait for EDK2 experts on this.
> >  
> 
>  Happy to chime in,

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Richard Henderson

On 4/3/19 6:25 AM, Aleksandar Markovic wrote:
> 
> On Apr 2, 2019 5:20 PM, "Mateja Marjanovic"  > wrote:
>>
>> From: Mateja Marjanovic  >
>>
>> Optimize set of MSA instructions ILVEV, using directly
> 
> Use full instruction names, with the only exception of possible Bachus-Naur
> forms... again.
> 
>> tcg registers and performing logic on them instead of
>> using helpers.
>>
>> In the following table, the first column is the performance
>> before this patch. The second represents the performance,
>> after converting from helpers to tcg, but without using
>> tcg_gen_deposit function. The third one is the solution
>> which is implemented in this patch.
>>
>>  instr    ||   before    || no-deposit ||  with-deposit
>> 
>>  ilvev.b  ||  126.92 ms  ||  24.52 ms  ||  24.43 ms
>>  ilvev.h  ||   93.67 ms  ||  23.92 ms  ||  23.86 ms
>>  ilvev.w  ||  117.86 ms  ||  23.83 ms  ||  22.17 ms
>>  ilvev.d  ||   45.49 ms  ||  19.74 ms  ||  19.71 ms
>>
>> The solution with deposit is suggested by Richard Henderson.
>>
>> Signed-off-by: Mateja Marjanovic  >
>> ---
> 
> The byte and halfword cases of this patch most likely produce highly
> unoptimized code for cases:
> 
> wd == wt == ws
> wd == wt != ws
> wd != ws == wt
> wd == ws != wt
> 
> Please take these cases into account.

I beg to differ.  We want to minimize the amount of special cases.

If you multiply the different cases like this you also multiply the maintenance
overhead.  You force future maintainers to wonder if the cases are truly
distinct or if they are mere optimization.

The only special cases that I advocate that you add are driven by standard
macros that the assembler generates -- e.g. register move (via or), register
negate (via nor), etc.


r~

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Aleksandar Markovic

> > From: Mateja Marjanovic 
> > Subject: Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize 
> > ILVEV. MSA > > instructions
> > 
> > 
> > On 3.4.19. 01:25, Aleksandar Markovic wrote:
> > 
> > On Apr 2, 2019 5:20 PM, "Mateja Marjanovic"  
> > > mailto:mateja.marjano...@rt-rk.com>> wrote:
> > >
> > > From: Mateja Marjanovic  > 
> > > mailto:mateja.marjano...@rt-rk.com>>
> > >
> > > Optimize set of MSA instructions ILVEV
> > >
> > > ...
> > 
> > The byte and halfword cases of this patch most likely produce highly 
> > unoptimized code > > for cases:
> > 
> > wd == wt == ws
> > wd == wt != ws
> > wd != ws == wt
> > wd == ws != wt
> > 
> > Please take these cases into account.
> > 
> > The same for patch 1/2.
> 
> Maybe, but if I put if statements asking are the registers the same,
> it would affect the performance significantly in all cases. If some
> registers were equal, it would be faster, but if not, just those if statements
> would slow things down.

Mateja,

It won't affect the performance significantly at all. Distinguish between
the code executed in translate time (rarely) and execute time (often).
If statement you mention are executed in translate time.

Thanks,
Aleksandar

[Qemu-devel] [PULL for-4.0 1/4] hw/vfio/ccw: avoid taking address members in packed structs

2019-04-03 Thread Cornelia Huck

From: Daniel P. Berrangé 

The GCC 9 compiler complains about many places in s390 code
that take the address of members of the 'struct SCHIB' which
is marked packed:

hw/vfio/ccw.c: In function ‘vfio_ccw_io_notifier_handler’:
hw/vfio/ccw.c:133:15: warning: taking address of packed member of ‘struct 
SCHIB’ may result in an unaligned pointer value \
[-Waddress-of-packed-member]
  133 | SCSW *s = &sch->curr_status.scsw;
  |   ^~
hw/vfio/ccw.c:134:15: warning: taking address of packed member of ‘struct 
SCHIB’ may result in an unaligned pointer value \
[-Waddress-of-packed-member]
  134 | PMCW *p = &sch->curr_status.pmcw;
  |   ^~

...snip many more...

Almost all of these are just done for convenience to avoid
typing out long variable/field names when referencing struct
members. We can get most of this convenience by taking the
address of the 'struct SCHIB' instead, avoiding triggering
the compiler warnings.

In a couple of places we copy via a local variable which is
a technique already applied elsewhere in s390 code for this
problem.

Signed-off-by: Daniel P. Berrangé 
Message-Id: <2019032904.17223-12-berra...@redhat.com>
Reviewed-by: Eric Farman 
Reviewed-by: Halil Pasic 
Reviewed-by: Farhan Ali 
Signed-off-by: Cornelia Huck 
---
 hw/vfio/ccw.c | 42 ++
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 9246729a75d6..c44d13cc5081 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -130,8 +130,8 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 S390CCWDevice *cdev = S390_CCW_DEVICE(vcdev);
 CcwDevice *ccw_dev = CCW_DEVICE(cdev);
 SubchDev *sch = ccw_dev->sch;
-SCSW *s = &sch->curr_status.scsw;
-PMCW *p = &sch->curr_status.pmcw;
+SCHIB *schib = &sch->curr_status;
+SCSW s;
 IRB irb;
 int size;
 
@@ -145,33 +145,33 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 switch (errno) {
 case ENODEV:
 /* Generate a deferred cc 3 condition. */
-s->flags |= SCSW_FLAGS_MASK_CC;
-s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
-s->ctrl |= (SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND);
+schib->scsw.flags |= SCSW_FLAGS_MASK_CC;
+schib->scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+schib->scsw.ctrl |= (SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND);
 goto read_err;
 case EFAULT:
 /* Memory problem, generate channel data check. */
-s->ctrl &= ~SCSW_ACTL_START_PEND;
-s->cstat = SCSW_CSTAT_DATA_CHECK;
-s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
-s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+schib->scsw.ctrl &= ~SCSW_ACTL_START_PEND;
+schib->scsw.cstat = SCSW_CSTAT_DATA_CHECK;
+schib->scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+schib->scsw.ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
 goto read_err;
 default:
 /* Error, generate channel program check. */
-s->ctrl &= ~SCSW_ACTL_START_PEND;
-s->cstat = SCSW_CSTAT_PROG_CHECK;
-s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
-s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+schib->scsw.ctrl &= ~SCSW_ACTL_START_PEND;
+schib->scsw.cstat = SCSW_CSTAT_PROG_CHECK;
+schib->scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+schib->scsw.ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
 goto read_err;
 }
 } else if (size != vcdev->io_region_size) {
 /* Information transfer error, generate channel-control check. */
-s->ctrl &= ~SCSW_ACTL_START_PEND;
-s->cstat = SCSW_CSTAT_CHN_CTRL_CHK;
-s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
-s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+schib->scsw.ctrl &= ~SCSW_ACTL_START_PEND;
+schib->scsw.cstat = SCSW_CSTAT_CHN_CTRL_CHK;
+schib->scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+schib->scsw.ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
 goto read_err;
 }
@@ -179,11 +179,13 @@ static void vfio_ccw_io_notifier_handler(void *opaque)
 memcpy(&irb, region->irb_area, sizeof(IRB));
 
 /* Update control block via irb. */
-copy_scsw_to_guest(s, &irb.scsw);
+s = schib->scsw;
+copy_scsw_to_guest(&s, &irb.scsw);
+schib->scsw = s;
 
 /* If a uint check is pending, copy sense data. */
-if ((s->dstat & SCSW_DSTAT_UNIT_CHECK) &&
-(p->chars & PMCW_CHARS_MASK_CSENSE)) {
+if ((schib->scsw.dstat & SCSW_DSTAT_UNIT_CHECK) &&
+(schib->pmcw.chars & PMCW_CHARS_MASK_CSENSE)) {
 memcpy(sch->sense_da

[Qemu-devel] [PULL for-4.0 4/4] hw/s390x/3270-ccw: avoid taking address of fields in packed struct

2019-04-03 Thread Cornelia Huck

From: Daniel P. Berrangé 

Compiling with GCC 9 complains

hw/s390x/3270-ccw.c: In function ‘emulated_ccw_3270_cb’:
hw/s390x/3270-ccw.c:81:19: error: taking address of packed member of ‘struct 
SCHIB’ may result in an unaligned pointer value 
[-Werror=address-of-packed-member]
   81 | SCSW *s = &sch->curr_status.scsw;
  |   ^~

This local variable is only present to save a little bit of
typing when setting the field later. Get rid of this to avoid
the warning about unaligned accesses.

Signed-off-by: Daniel P. Berrangé 
Message-Id: <2019032904.17223-15-berra...@redhat.com>
Reviewed-by: David Hildenbrand 
Reviewed-by: Thomas Huth 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/3270-ccw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/3270-ccw.c b/hw/s390x/3270-ccw.c
index 2c8d16ccf7e2..14882242c3de 100644
--- a/hw/s390x/3270-ccw.c
+++ b/hw/s390x/3270-ccw.c
@@ -78,13 +78,13 @@ static int emulated_ccw_3270_cb(SubchDev *sch, CCW1 ccw)
 
 if (rc == -EIO) {
 /* I/O error, specific devices generate specific conditions */
-SCSW *s = &sch->curr_status.scsw;
+SCHIB *schib = &sch->curr_status;
 
 sch->curr_status.scsw.dstat = SCSW_DSTAT_UNIT_CHECK;
 sch->sense_data[0] = 0x40;/* intervention-req */
-s->ctrl &= ~SCSW_ACTL_START_PEND;
-s->ctrl &= ~SCSW_CTRL_MASK_STCTL;
-s->ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+schib->scsw.ctrl &= ~SCSW_ACTL_START_PEND;
+schib->scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+schib->scsw.ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
 }
 
-- 
2.17.2

[Qemu-devel] [PULL for-4.0 0/4] s390x gcc 9 warning fixes

2019-04-03 Thread Cornelia Huck

The following changes since commit 061b51e9195670e9d190cdec46fabcb3c77763fb:

  Update version for v4.0.0-rc2 release (2019-04-02 17:01:20 +0100)

are available in the Git repository at:

  https://github.com/cohuck/qemu tags/s390x-20190403

for you to fetch changes up to 7357b2215978debf2fd17b525ba745d3c69272a3:

  hw/s390x/3270-ccw: avoid taking address of fields in packed struct 
(2019-04-03 11:19:57 +0200)


Fix taking address of fields in packed structs warnings
by gcc 9



Daniel P. Berrangé (4):
  hw/vfio/ccw: avoid taking address members in packed structs
  hw/s390/css: avoid taking address members in packed structs
  hw/s390x/ipl: avoid taking address of fields in packed struct
  hw/s390x/3270-ccw: avoid taking address of fields in packed struct

 hw/s390x/3270-ccw.c |   8 +-
 hw/s390x/css.c  | 388 +---
 hw/s390x/ipl.c  |  12 +-
 hw/vfio/ccw.c   |  42 ++---
 4 files changed, 218 insertions(+), 232 deletions(-)

-- 
2.17.2

[Qemu-devel] [PULL for-4.0 3/4] hw/s390x/ipl: avoid taking address of fields in packed struct

2019-04-03 Thread Cornelia Huck

From: Daniel P. Berrangé 

Compiling with GCC 9 complains

hw/s390x/ipl.c: In function ‘s390_ipl_set_boot_menu’:
hw/s390x/ipl.c:256:25: warning: taking address of packed member of ‘struct 
QemuIplParameters’ may result in an unaligned pointer value 
[-Waddress-of-packed-member]
  256 | uint32_t *timeout = &ipl->qipl.boot_menu_timeout;
  | ^~~~

This local variable is only present to save a little bit of
typing when setting the field later. Get rid of this to avoid
the warning about unaligned accesses.

Signed-off-by: Daniel P. Berrangé 
Message-Id: <2019032904.17223-14-berra...@redhat.com>
Reviewed-by: David Hildenbrand 
Reviewed-by: Thomas Huth 
Reviewed-by: Farhan Ali 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/ipl.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c
index 896888bf8f00..51b272e190a9 100644
--- a/hw/s390x/ipl.c
+++ b/hw/s390x/ipl.c
@@ -252,8 +252,6 @@ static void s390_ipl_set_boot_menu(S390IPLState *ipl)
 {
 QemuOptsList *plist = qemu_find_opts("boot-opts");
 QemuOpts *opts = QTAILQ_FIRST(&plist->head);
-uint8_t *flags = &ipl->qipl.qipl_flags;
-uint32_t *timeout = &ipl->qipl.boot_menu_timeout;
 const char *tmp;
 unsigned long splash_time = 0;
 
@@ -269,7 +267,7 @@ static void s390_ipl_set_boot_menu(S390IPLState *ipl)
 case S390_IPL_TYPE_CCW:
 /* In the absence of -boot menu, use zipl parameters */
 if (!qemu_opt_get(opts, "menu")) {
-*flags |= QIPL_FLAG_BM_OPTS_ZIPL;
+ipl->qipl.qipl_flags |= QIPL_FLAG_BM_OPTS_ZIPL;
 return;
 }
 break;
@@ -286,23 +284,23 @@ static void s390_ipl_set_boot_menu(S390IPLState *ipl)
 return;
 }
 
-*flags |= QIPL_FLAG_BM_OPTS_CMD;
+ipl->qipl.qipl_flags |= QIPL_FLAG_BM_OPTS_CMD;
 
 tmp = qemu_opt_get(opts, "splash-time");
 
 if (tmp && qemu_strtoul(tmp, NULL, 10, &splash_time)) {
 error_report("splash-time is invalid, forcing it to 0");
-*timeout = 0;
+ipl->qipl.boot_menu_timeout = 0;
 return;
 }
 
 if (splash_time > 0x) {
 error_report("splash-time is too large, forcing it to max value");
-*timeout = 0x;
+ipl->qipl.boot_menu_timeout = 0x;
 return;
 }
 
-*timeout = cpu_to_be32(splash_time);
+ipl->qipl.boot_menu_timeout = cpu_to_be32(splash_time);
 }
 
 static CcwDevice *s390_get_ccw_device(DeviceState *dev_st)
-- 
2.17.2

[Qemu-devel] [PULL for-4.0 2/4] hw/s390/css: avoid taking address members in packed structs

2019-04-03 Thread Cornelia Huck

From: Daniel P. Berrangé 

The GCC 9 compiler complains about many places in s390 code
that take the address of members of the 'struct SCHIB' which
is marked packed:

hw/s390x/css.c: In function ‘sch_handle_clear_func’:
hw/s390x/css.c:698:15: warning: taking address of packed member of ‘struct 
SCHIB’ may result in an unaligned pointer val\
ue [-Waddress-of-packed-member]
  698 | PMCW *p = &sch->curr_status.pmcw;
  |   ^~
hw/s390x/css.c:699:15: warning: taking address of packed member of ‘struct 
SCHIB’ may result in an unaligned pointer val\
ue [-Waddress-of-packed-member]
  699 | SCSW *s = &sch->curr_status.scsw;
  |   ^~

...snip many more...

Almost all of these are just done for convenience to avoid
typing out long variable/field names when referencing struct
members. We can get most of this convenience by taking the
address of the 'struct SCHIB' instead, avoiding triggering
the compiler warnings.

In a couple of places we copy via a local variable which is
a technique already applied elsewhere in s390 code for this
problem.

Signed-off-by: Daniel P. Berrangé 
Message-Id: <2019032904.17223-13-berra...@redhat.com>
Reviewed-by: Thomas Huth 
Reviewed-by: Halil Pasic 
Signed-off-by: Cornelia Huck 
---
 hw/s390x/css.c | 388 -
 1 file changed, 187 insertions(+), 201 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index f92b046cd33e..8fc9e35ba5d3 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -695,35 +695,32 @@ void css_adapter_interrupt(CssIoAdapterType type, uint8_t 
isc)
 
 static void sch_handle_clear_func(SubchDev *sch)
 {
-PMCW *p = &sch->curr_status.pmcw;
-SCSW *s = &sch->curr_status.scsw;
+SCHIB *schib = &sch->curr_status;
 int path;
 
 /* Path management: In our simple css, we always choose the only path. */
 path = 0x80;
 
 /* Reset values prior to 'issuing the clear signal'. */
-p->lpum = 0;
-p->pom = 0xff;
-s->flags &= ~SCSW_FLAGS_MASK_PNO;
+schib->pmcw.lpum = 0;
+schib->pmcw.pom = 0xff;
+schib->scsw.flags &= ~SCSW_FLAGS_MASK_PNO;
 
 /* We always 'attempt to issue the clear signal', and we always succeed. */
 sch->channel_prog = 0x0;
 sch->last_cmd_valid = false;
-s->ctrl &= ~SCSW_ACTL_CLEAR_PEND;
-s->ctrl |= SCSW_STCTL_STATUS_PEND;
+schib->scsw.ctrl &= ~SCSW_ACTL_CLEAR_PEND;
+schib->scsw.ctrl |= SCSW_STCTL_STATUS_PEND;
 
-s->dstat = 0;
-s->cstat = 0;
-p->lpum = path;
+schib->scsw.dstat = 0;
+schib->scsw.cstat = 0;
+schib->pmcw.lpum = path;
 
 }
 
 static void sch_handle_halt_func(SubchDev *sch)
 {
-
-PMCW *p = &sch->curr_status.pmcw;
-SCSW *s = &sch->curr_status.scsw;
+SCHIB *schib = &sch->curr_status;
 hwaddr curr_ccw = sch->channel_prog;
 int path;
 
@@ -733,20 +730,22 @@ static void sch_handle_halt_func(SubchDev *sch)
 /* We always 'attempt to issue the halt signal', and we always succeed. */
 sch->channel_prog = 0x0;
 sch->last_cmd_valid = false;
-s->ctrl &= ~SCSW_ACTL_HALT_PEND;
-s->ctrl |= SCSW_STCTL_STATUS_PEND;
+schib->scsw.ctrl &= ~SCSW_ACTL_HALT_PEND;
+schib->scsw.ctrl |= SCSW_STCTL_STATUS_PEND;
 
-if ((s->ctrl & (SCSW_ACTL_SUBCH_ACTIVE | SCSW_ACTL_DEVICE_ACTIVE)) ||
-!((s->ctrl & SCSW_ACTL_START_PEND) ||
-  (s->ctrl & SCSW_ACTL_SUSP))) {
-s->dstat = SCSW_DSTAT_DEVICE_END;
+if ((schib->scsw.ctrl & (SCSW_ACTL_SUBCH_ACTIVE |
+ SCSW_ACTL_DEVICE_ACTIVE)) ||
+!((schib->scsw.ctrl & SCSW_ACTL_START_PEND) ||
+  (schib->scsw.ctrl & SCSW_ACTL_SUSP))) {
+schib->scsw.dstat = SCSW_DSTAT_DEVICE_END;
 }
-if ((s->ctrl & (SCSW_ACTL_SUBCH_ACTIVE | SCSW_ACTL_DEVICE_ACTIVE)) ||
-(s->ctrl & SCSW_ACTL_SUSP)) {
-s->cpa = curr_ccw + 8;
+if ((schib->scsw.ctrl & (SCSW_ACTL_SUBCH_ACTIVE |
+ SCSW_ACTL_DEVICE_ACTIVE)) ||
+(schib->scsw.ctrl & SCSW_ACTL_SUSP)) {
+schib->scsw.cpa = curr_ccw + 8;
 }
-s->cstat = 0;
-p->lpum = path;
+schib->scsw.cstat = 0;
+schib->pmcw.lpum = path;
 
 }
 
@@ -,9 +1110,7 @@ static int css_interpret_ccw(SubchDev *sch, hwaddr 
ccw_addr,
 
 static void sch_handle_start_func_virtual(SubchDev *sch)
 {
-
-PMCW *p = &sch->curr_status.pmcw;
-SCSW *s = &sch->curr_status.scsw;
+SCHIB *schib = &sch->curr_status;
 int path;
 int ret;
 bool suspend_allowed;
@@ -1121,27 +1118,27 @@ static void sch_handle_start_func_virtual(SubchDev *sch)
 /* Path management: In our simple css, we always choose the only path. */
 path = 0x80;
 
-if (!(s->ctrl & SCSW_ACTL_SUSP)) {
+if (!(schib->scsw.ctrl & SCSW_ACTL_SUSP)) {
 /* Start Function triggered via ssch, i.e. we have an ORB */
 ORB *orb = &sch->orb;
-s->cstat = 0;
-s->dstat = 0;
+schib->scsw.cstat = 0;
+

Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize ILVEV. MSA instructions

2019-04-03 Thread Aleksandar Markovic

> From: Richard Henderson 
> Subject: Re: [Qemu-devel] [PATCH v5 2/2] target/mips: Optimize 
> ILVEV. MSA > instructions
> 
> On 4/3/19 6:25 AM, Aleksandar Markovic wrote:
> >
> > On Apr 2, 2019 5:20 PM, "Mateja Marjanovic"  > > wrote:
> >>
> >> From: Mateja Marjanovic  > >
> >>
> >> Optimize set of MSA instructions ILVEV, using directly
> >
> > Use full instruction names, with the only exception of possible Bachus-Naur
> > forms... again.
> >
> >> tcg registers and performing logic on them instead of
> >> using helpers.
> >>
> >> In the following table, the first column is the performance
> >> before this patch. The second represents the performance,
> >> after converting from helpers to tcg, but without using
> >> tcg_gen_deposit function. The third one is the solution
> >> which is implemented in this patch.
> >>
> >>  instr||   before|| no-deposit ||  with-deposit
> >> 
> >>  ilvev.b  ||  126.92 ms  ||  24.52 ms  ||  24.43 ms
> >>  ilvev.h  ||   93.67 ms  ||  23.92 ms  ||  23.86 ms
> >>  ilvev.w  ||  117.86 ms  ||  23.83 ms  ||  22.17 ms
> >>  ilvev.d  ||   45.49 ms  ||  19.74 ms  ||  19.71 ms
> >>
> >> The solution with deposit is suggested by Richard Henderson.
> >>
> >> Signed-off-by: Mateja Marjanovic  > >
> >> ---
> >
> > The byte and halfword cases of this patch most likely produce highly
> > unoptimized code for cases:
> >
> > wd == wt == ws
> > wd == wt != ws
> > wd != ws == wt
> > wd == ws != wt
> >
> > Please take these cases into account.
> 
> I beg to differ.  We want to minimize the amount of special cases.
> 
> If you multiply the different cases like this you also multiply the 
> maintenance
> overhead.  You force future maintainers to wonder if the cases are truly
> distinct or if they are mere optimization.
> 

I find your objection hard to understand.

The subject and the goal of the patch is obviously optimization. If there is
a danger of unclarity in the resulting source code, this is easily alleviated
by, for example, inserting informative comments, as is routinely done in
many other areas of QEMU or any software product.

Sincerely,
Aleksandar

> The only special cases that I advocate that you add are driven by standard
> macros that the assembler generates -- e.g. register move (via or), register
> negate (via nor), etc.
> 
> 
> r~

Re: [Qemu-devel] [PATCH v2] vmstate: check subsection_found is enough

2019-04-03 Thread Dr. David Alan Gilbert

* Wei Yang (richardw.y...@linux.intel.com) wrote:
> subsection_found is true implies vmdesc is not NULL.
> 
> This patch remove the additional check on vmdesc and rename
> subsection_found to vmdesc_has_subsections to make it more self-explain.
> 
> Signed-off-by: Wei Yang 

Thanks,

Reviewed-by: Dr. David Alan Gilbert 

> 
> ---
> v2:
>   * rename it to vmdesc_has_subsections
> ---
>  migration/vmstate.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index e2bbb7b5f7..1305d1a528 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -496,7 +496,7 @@ static int vmstate_subsection_save(QEMUFile *f, const 
> VMStateDescription *vmsd,
> void *opaque, QJSON *vmdesc)
>  {
>  const VMStateDescription **sub = vmsd->subsections;
> -bool subsection_found = false;
> +bool vmdesc_has_subsections = false;
>  int ret = 0;
>  
>  trace_vmstate_subsection_save_top(vmsd->name);
> @@ -508,9 +508,9 @@ static int vmstate_subsection_save(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  trace_vmstate_subsection_save_loop(vmsd->name, vmsdsub->name);
>  if (vmdesc) {
>  /* Only create subsection array when we have any */
> -if (!subsection_found) {
> +if (!vmdesc_has_subsections) {
>  json_start_array(vmdesc, "subsections");
> -subsection_found = true;
> +vmdesc_has_subsections = true;
>  }
>  
>  json_start_object(vmdesc, NULL);
> @@ -533,7 +533,7 @@ static int vmstate_subsection_save(QEMUFile *f, const 
> VMStateDescription *vmsd,
>  sub++;
>  }
>  
> -if (vmdesc && subsection_found) {
> +if (vmdesc_has_subsections) {
>  json_end_array(vmdesc);
>  }
>  
> -- 
> 2.19.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v4 0/5] virtio pmem driver

2019-04-03 Thread Pankaj Gupta

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v3. Tested with Qemu side device 
 emulation [6] for virtio-pmem. 

 We have incorporated all the suggestions in V3. Documented 
 the impact of possible page cache side channel attacks with 
 suggested countermeasures.
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
-
   - Reads persistent memory range from paravirt device and 
 registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
 persistent memory region and setup filesystem operations 
 to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
 interface to flush from guest to host.

2. Qemu virtio-pmem device
-
   - Creates virtio pmem device and exposes a memory range to 
 KVM guest. 
   - At host side this is file backed memory which acts as 
 persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
 for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[7] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem security implications and countermeasures:
 -

 In previous posting of kernel driver, there was discussion [9]
 on possible implications of page cache side channel attacks with 
 virtio pmem. After thorough analysis of details of known side 
 channel attacks, below are the suggestions:

 - Depends entirely on how host backing image file is mapped 
   into guest address space. 

 - virtio-pmem device emulation, by default shared mapping is used
   to map host backing file. It is recommended to use separate
   backing file at host side for every guest. This will prevent
   any possibility of executing common code from multiple guests
   and any chance of inferring guest local data based based on 
   execution time.

 - If backing file is required to be shared among multiple guests 
   it is recommended to don't support host page cache eviction 
   commands from the guest driver. This will avoid any possibility
   of inferring guest local data or host data from another guest. 

 - Proposed device specification [8] for virtio-pmem device with 
   details of possible security implications and suggested 
   countermeasures for device emulation.

 Virtio-pmem errors handling:
 
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
- As per current logic if error page belongs to Qemu process, 
  host MCE handler isolates(hwpoison) that page and send SIGBUS. 
  Qemu SIGBUS handler injects exception to KVM guest. 
- KVM guest then isolates the page and send SIGBUS to guest 
  userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
- Handles such errors with MCE notifier and creates a list 
  of bad blocks. Read/direct access DAX operation return EIO 
  if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.  
- Similar functionality can be reused in virtio-pmem with MCE 
  notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
  confirm if this behaviour is ok or needs any change?

Changes from PATCH v3: [1]

- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: [2]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

[Qemu-devel] [PATCH v4 5/5] xfs: disable map_sync for async flush

2019-04-03 Thread Pankaj Gupta

Virtio pmem provides asynchronous host page cache flush
mechanism. we don't support 'MAP_SYNC' with virtio pmem 
and xfs.

Signed-off-by: Pankaj Gupta 
---
 fs/xfs/xfs_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 1f2e2845eb76..dced2eb8c91a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,6 +1203,14 @@ xfs_file_mmap(
if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with DAX files if
+* dax_device is not synchronous.
+*/
+   if (IS_DAX(file_inode(filp)) && !dax_synchronous(
+   xfs_find_daxdev_for_inode(file_inode(filp))) &&
+   (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(filp);
vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(file_inode(filp)))
-- 
2.20.1

[Qemu-devel] [PATCH v4 2/5] virtio-pmem: Add virtio pmem driver

2019-04-03 Thread Pankaj Gupta

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta 
---
 drivers/nvdimm/virtio_pmem.c |  84 +
 drivers/virtio/Kconfig   |  10 +++
 drivers/virtio/Makefile  |   1 +
 drivers/virtio/pmem.c| 125 +++
 include/linux/virtio_pmem.h  |  60 +++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 291 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index ..2a1b1ba2c1ff
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,84 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include 
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+   unsigned int len;
+   unsigned long flags;
+   struct virtio_pmem_request *req, *req_buf;
+   struct virtio_pmem *vpmem = vq->vdev->priv;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+   req->done = true;
+   wake_up(&req->host_acked);
+
+   if (!list_empty(&vpmem->req_list)) {
+   req_buf = list_first_entry(&vpmem->req_list,
+   struct virtio_pmem_request, list);
+   list_del(&vpmem->req_list);
+   req_buf->wq_buf_avail = true;
+   wake_up(&req_buf->wq_buf);
+   }
+   }
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+   int err;
+   unsigned long flags;
+   struct scatterlist *sgs[2], sg, ret;
+   struct virtio_device *vdev = nd_region->provider_data;
+   struct virtio_pmem *vpmem = vdev->priv;
+   struct virtio_pmem_request *req;
+
+   might_sleep();
+   req = kmalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   req->done = req->wq_buf_avail = false;
+   strcpy(req->name, "FLUSH");
+   init_waitqueue_head(&req->host_acked);
+   init_waitqueue_head(&req->wq_buf);
+   sg_init_one(&sg, req->name, strlen(req->name));
+   sgs[0] = &sg;
+   sg_init_one(&ret, &req->ret, sizeof(req->ret));
+   sgs[1] = &ret;
+
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+   if (err) {
+   dev_err(&vdev->dev, "failed to send command to virtio pmem 
device\n");
+
+   list_add_tail(&vpmem->req_list, &req->list);
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->wq_buf, req->wq_buf_avail);
+   spin_lock_irqsave(&vpmem->pmem_lock, flags);
+   }
+   virtqueue_kick(vpmem->req_vq);
+   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+   /* When host has read buffer, this completes via host_ack */
+   wait_event(req->host_acked, req->done);
+   err = req->ret;
+   kfree(req);
+
+   return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 35897649c24f..9f634a2ed638 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
  If unsure, say Y.
 
+config VIRTIO_PMEM
+   tristate "Support for virtio pmem driver"
+   depends on VIRTIO
+   depends on LIBNVDIMM
+   help
+   This driver provides support for virtio based flushing interface
+   for persistent memory range.
+
+   If unsure, say M.
+
 config VIRTIO_BALLOON
tristate "Virtio balloon driver"
depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..143ce91eabe9 100644
--- a/drivers/virtio/Makefile

[Qemu-devel] [PATCH v4 3/5] libnvdimm: add dax_dev sync flag

2019-04-03 Thread Pankaj Gupta

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later 
is used to disable MAP_SYNC functionality for 
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta 
---
 drivers/dax/bus.c|  2 +-
 drivers/dax/super.c  | 13 -
 drivers/md/dm.c  |  2 +-
 drivers/nvdimm/pmem.c|  3 ++-
 drivers/nvdimm/region_devs.c |  7 +++
 include/linux/dax.h  |  9 +++--
 include/linux/libnvdimm.h|  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..431bf7d2a7f9 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region 
*dax_region, int id,
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL);
+   dax_dev = alloc_dax(dev_dax, NULL, NULL, true);
if (!dax_dev)
goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0a339b85133e..bd6509308d05 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+   /* flag to check if device supports synchronous flush */
+   DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+   return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
lockdep_assert_held(&dax_srcu);
@@ -511,7 +519,7 @@ static void dax_add_host(struct dax_device *dax_dev, const 
char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-   const struct dax_operations *ops)
+   const struct dax_operations *ops, bool sync)
 {
struct dax_device *dax_dev;
const char *host;
@@ -534,6 +542,9 @@ struct dax_device *alloc_dax(void *private, const char 
*__host,
dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
+   if (sync)
+   set_bit(DAXDEV_SYNC, &dax_dev->flags);
+
return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 68d24056d0b1..534e12ca6329 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1965,7 +1965,7 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);
 
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-   dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+   dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops, true);
if (!dax_dev)
goto bad;
}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5a5b3ea4d073..78f71ba0e7cf 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -466,7 +466,8 @@ static int pmem_attach_disk(struct device *dev,
nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
disk->bb = &pmem->bb;
 
-   dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+   dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops,
+   is_nvdimm_sync(nd_region));
 
if (!dax_dev) {
put_disk(disk);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fb1041ab32a6..8c7aa047fe2b 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1231,6 +1231,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+   return is_nd_pmem(&nd_region->dev) &&
+   !test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
struct nd_region *nd_region;
resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..9bdd50d06ef6 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -32,18 +32,19 @@ extern struct attribute_group dax_attribute_group;
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
 struct dax_device *alloc_dax(void *private, const char *host,
-   const struct dax_operations *ops);
+   const struct dax_operations *ops, bool sync);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+bool dax_synchronous(struct dax_device *dax_dev);
 #

[Qemu-devel] [PATCH v4 1/5] ibnvdimm: nd_region flush callback support

2019-04-03 Thread Pankaj Gupta

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

This also handles asynchronous flush requests from the block layer
by creating a child bio and chaining it with parent bio.

Signed-off-by: Pankaj Gupta 
---
 drivers/acpi/nfit/core.c |  4 ++--
 drivers/nvdimm/claim.c   |  6 --
 drivers/nvdimm/nd.h  |  1 +
 drivers/nvdimm/pmem.c| 14 -
 drivers/nvdimm/region_devs.c | 38 ++--
 include/linux/libnvdimm.h|  8 +++-
 6 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..567017a2190e 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, 
unsigned int bw,
offset = to_interleave_offset(offset, mmio);
 
writeq(cmd, mmio->addr.base + offset);
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk 
*nfit_blk,
}
 
if (rw)
-   nvdimm_flush(nfit_blk->nd_region);
+   nvdimm_flush(nfit_blk->nd_region, NULL, false);
 
rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..a1dfa066786b 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
sector_t sector = offset >> 9;
-   int rc = 0;
+   int rc = 0, ret = 0;
 
if (unlikely(!size))
return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
}
 
memcpy_flushcache(nsio->addr + offset, buf, size);
-   nvdimm_flush(to_nd_region(ndns->dev.parent));
+   ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL, false);
+   if (ret)
+   rc = ret;
 
return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..916cd6c5451a 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
struct badblocks bb;
struct nd_interleave_set *nd_set;
struct nd_percpu_lane __percpu *lane;
+   int (*flush)(struct nd_region *nd_region);
struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..5a5b3ea4d073 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, 
struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+   int ret = 0;
blk_status_t rc = 0;
bool do_acct;
unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
 
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
nd_iostat_end(bio, start);
 
if (bio->bi_opf & REQ_FUA)
-   nvdimm_flush(nd_region);
+   ret = nvdimm_flush(nd_region, bio, true);
+
+   if (ret)
+   bio->bi_status = errno_to_blk_status(ret);
 
bio_endio(bio);
return BLK_QC_T_NONE;
@@ -463,13 +467,13 @@ static int pmem_attach_disk(struct device *dev,
disk->bb = &pmem->bb;
 
dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+
if (!dax_dev) {
put_disk(disk);
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;
-
gendev = disk_to_dev(disk);
gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +531,14 @@ static int nd_pmem_remove(struct device *dev)
sysfs_put(pmem->bb_state);
pmem->bb_state = NULL;
}
-   nvdimm_flush(to_nd_region(dev->parent));
+   nvdimm_flush(to_nd_region(dev->parent), NULL, false);
 
return 0;
 }
 
 static void nd_pmem_shutdown(struct device

[Qemu-devel] [PATCH v4 4/5] ext4: disable map_sync for async flush

2019-04-03 Thread Pankaj Gupta

Virtio pmem provides asynchronous host page cache flush
mechanism. We don't support 'MAP_SYNC' with virtio pmem 
and ext4. 

Signed-off-by: Pankaj Gupta 
---
 fs/ext4/file.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 69d65d49837b..86e4bf464320 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,8 +360,10 @@ static const struct vm_operations_struct ext4_file_vm_ops 
= {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
struct inode *inode = file->f_mapping->host;
+   struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+   struct dax_device *dax_dev = sbi->s_daxdev;
 
-   if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
+   if (unlikely(ext4_forced_shutdown(sbi)))
return -EIO;
 
/*
@@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
vm_area_struct *vma)
if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
return -EOPNOTSUPP;
 
+   /* We don't support synchronous mappings with DAX files if
+* dax_device is not synchronous.
+*/
+   if (IS_DAX(file_inode(file)) && !dax_synchronous(dax_dev)
+   && (vma->vm_flags & VM_SYNC))
+   return -EOPNOTSUPP;
+
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
-- 
2.20.1

Re: [Qemu-devel] [PATCH 17/26] target/s390x: Convert to CPUClass::tlb_fill

2019-04-03 Thread David Hildenbrand

>  #endif /* CONFIG_USER_ONLY */
> +
> +bool s390_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
> +   MMUAccessType access_type, int mmu_idx,
> +   bool probe, uintptr_t retaddr)
> +{
> +S390CPU *cpu = S390_CPU(cs);
> +
> +#ifndef CONFIG_USER_ONLY
> +CPUS390XState *env = &cpu->env;
> +target_ulong vaddr, raddr;
> +uint64_t asc;
> +int prot, fail;
> +
> +qemu_log_mask(CPU_LOG_MMU, "%s: addr 0x%" VADDR_PRIx " rw %d mmu_idx 
> %d\n",
> +  __func__, address, access_type, mmu_idx);
> +
> +vaddr = address;
> +
> +if (mmu_idx < MMU_REAL_IDX) {
> +asc = cpu_mmu_idx_to_asc(mmu_idx);
> +/* 31-Bit mode */
> +if (!(env->psw.mask & PSW_MASK_64)) {
> +vaddr &= 0x7fff;
> +}
> +fail = mmu_translate(env, vaddr, access_type, asc, &raddr, &prot, 
> true);
> +} else if (mmu_idx == MMU_REAL_IDX) {
> +/* 31-Bit mode */
> +if (!(env->psw.mask & PSW_MASK_64)) {
> +vaddr &= 0x7fff;
> +}
> +fail = mmu_translate_real(env, vaddr, access_type, &raddr, &prot);
> +} else {
> +g_assert_not_reached();
> +}
> +
> +/* check out of RAM access */
> +if (!fail &&
> +!address_space_access_valid(&address_space_memory, raddr,
> +TARGET_PAGE_SIZE, access_type,
> +MEMTXATTRS_UNSPECIFIED)) {
> +qemu_log_mask(CPU_LOG_MMU,
> +  "%s: raddr %" PRIx64 " > ram_size %" PRIx64 "\n",
> +  __func__, (uint64_t)raddr, (uint64_t)ram_size);
> +trigger_pgm_exception(env, PGM_ADDRESSING, ILEN_AUTO);
> +fail = 1;
> +}
> +
> +if (!fail) {
> +qemu_log_mask(CPU_LOG_MMU,
> +  "%s: set tlb %" PRIx64 " -> %" PRIx64 " (%x)\n",
> +  __func__, (uint64_t)vaddr, (uint64_t)raddr, prot);
> +tlb_set_page(cs, address & TARGET_PAGE_MASK, raddr, prot,
> + mmu_idx, TARGET_PAGE_SIZE);
> +return true;
> +}
> +if (probe) {
> +return false;
> +}
> +#else
> +trigger_pgm_exception(&cpu->env, PGM_ADDRESSING, ILEN_AUTO);
> +/*
> + * On real machines this value is dropped into LowMem.  Since this
> + * is userland, simply put this someplace that cpu_loop can find it.
> + */
> +cpu->env.__excp_addr = address;
> +#endif
> +
> +cpu_restore_state(cs, retaddr, true);
> +
> +/*
> + * Note that handle_mmu_fault sets ilen to either 2 (for code)

This comment no longer matches.

> + * or AUTO (for data).  We can resolve AUTO now, as if it was
> + * set to UNWIND -- that will have been done via assignment
> + * in cpu_restore_state.  Otherwise re-examine access_type.
> + */
> +if (access_type == MMU_INST_FETCH) {
> +CPUS390XState *env = cs->env_ptr;
> +env->int_pgm_ilen = 2;
> +}
> +
> +cpu_loop_exit(cs);
> +}
> +

Apart from that, looks good to me.

-- 

Thanks,

David / dhildenb

Re: [Qemu-devel] [PATCH v4 4/5] ext4: disable map_sync for async flush

2019-04-03 Thread Jan Kara

On Wed 03-04-19 16:10:17, Pankaj Gupta wrote:
> Virtio pmem provides asynchronous host page cache flush
> mechanism. We don't support 'MAP_SYNC' with virtio pmem 
> and ext4. 
> 
> Signed-off-by: Pankaj Gupta 

The patch looks good to me. You can add:

Reviewed-by: Jan Kara 

Honza

> ---
>  fs/ext4/file.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 69d65d49837b..86e4bf464320 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -360,8 +360,10 @@ static const struct vm_operations_struct 
> ext4_file_vm_ops = {
>  static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
>  {
>   struct inode *inode = file->f_mapping->host;
> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> + struct dax_device *dax_dev = sbi->s_daxdev;
>  
> - if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb
> + if (unlikely(ext4_forced_shutdown(sbi)))
>   return -EIO;
>  
>   /*
> @@ -371,6 +373,13 @@ static int ext4_file_mmap(struct file *file, struct 
> vm_area_struct *vma)
>   if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
>   return -EOPNOTSUPP;
>  
> + /* We don't support synchronous mappings with DAX files if
> +  * dax_device is not synchronous.
> +  */
> + if (IS_DAX(file_inode(file)) && !dax_synchronous(dax_dev)
> + && (vma->vm_flags & VM_SYNC))
> + return -EOPNOTSUPP;
> +
>   file_accessed(file);
>   if (IS_DAX(file_inode(file))) {
>   vma->vm_ops = &ext4_dax_vm_ops;
> -- 
> 2.20.1
> 
-- 
Jan Kara 
SUSE Labs, CR

[Qemu-devel] [PATCH v3 0/4] pvrdma: Add support for SRQ

2019-04-03 Thread Kamal Heib

This series implements the SRQ (Shared Receive Queue) for the pvrdma
device, It also includes all the needed functions and definitions for
support SRQ in the backend and resource management layers.

Changes from v2->3:
- Patch #1:
-- Fix commit message.
-- Remove initialization of backend_qp from rdma_backend_post_srq_recv().
-- Add rx_srq counter.
- Patch #2:
-- Add checks for srq attrs.
- Patch #3:
-- Move initialization of recv_cq_handle to be under is_srq.
-- Rearrange destroy_qp() to avoid use after free.
- Patch #4:
-- Avoid use after free.
-- Fix indentation.

Changes from v1->v2:
- Handle checkpatch.pl warnings. 

Kamal Heib (4):
  hw/rdma: Add SRQ support to backend layer
  hw/rdma: Add support for managing SRQ resource
  hw/rdma: Modify create/destroy QP to support SRQ
  hw/pvrdma: Add support for SRQ

 hw/rdma/rdma_backend.c  | 125 +-
 hw/rdma/rdma_backend.h  |  18 +++-
 hw/rdma/rdma_backend_defs.h |   5 +
 hw/rdma/rdma_rm.c   | 117 +++-
 hw/rdma/rdma_rm.h   |  13 ++-
 hw/rdma/rdma_rm_defs.h  |  10 ++
 hw/rdma/vmw/pvrdma_cmd.c| 206 
 hw/rdma/vmw/pvrdma_main.c   |  16 +++
 hw/rdma/vmw/pvrdma_qp_ops.c |  46 +++-
 hw/rdma/vmw/pvrdma_qp_ops.h |   1 +
 10 files changed, 521 insertions(+), 36 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH v3 1/4] hw/rdma: Add SRQ support to backend layer

2019-04-03 Thread Kamal Heib

Add the required functions and definitions to support shared receive
queues (SRQs) in the backend layer.

Signed-off-by: Kamal Heib 
---
 hw/rdma/rdma_backend.c  | 116 +++-
 hw/rdma/rdma_backend.h  |  12 
 hw/rdma/rdma_backend_defs.h |   5 ++
 hw/rdma/rdma_rm.c   |   2 +
 hw/rdma/rdma_rm_defs.h  |   1 +
 5 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index d1660b6474fa..04dfd63a573b 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -40,6 +40,7 @@ typedef struct BackendCtx {
 void *up_ctx;
 struct ibv_sge sge; /* Used to save MAD recv buffer */
 RdmaBackendQP *backend_qp; /* To maintain recv buffers */
+RdmaBackendSRQ *backend_srq;
 } BackendCtx;
 
 struct backend_umad {
@@ -99,6 +100,7 @@ static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 int i, ne, total_ne = 0;
 BackendCtx *bctx;
 struct ibv_wc wc[2];
+RdmaProtectedGSList *cqe_ctx_list;
 
 qemu_mutex_lock(&rdma_dev_res->lock);
 do {
@@ -116,8 +118,13 @@ static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 
 comp_handler(bctx->up_ctx, &wc[i]);
 
-rdma_protected_gslist_remove_int32(&bctx->backend_qp->cqe_ctx_list,
-   wc[i].wr_id);
+if (bctx->backend_qp) {
+cqe_ctx_list = &bctx->backend_qp->cqe_ctx_list;
+} else {
+cqe_ctx_list = &bctx->backend_srq->cqe_ctx_list;
+}
+
+rdma_protected_gslist_remove_int32(cqe_ctx_list, wc[i].wr_id);
 rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
 g_free(bctx);
 }
@@ -662,6 +669,60 @@ err_free_bctx:
 g_free(bctx);
 }
 
+void rdma_backend_post_srq_recv(RdmaBackendDev *backend_dev,
+RdmaBackendSRQ *srq, struct ibv_sge *sge,
+uint32_t num_sge, void *ctx)
+{
+BackendCtx *bctx;
+struct ibv_sge new_sge[MAX_SGE];
+uint32_t bctx_id;
+int rc;
+struct ibv_recv_wr wr = {}, *bad_wr;
+
+bctx = g_malloc0(sizeof(*bctx));
+bctx->up_ctx = ctx;
+bctx->backend_srq = srq;
+
+rc = rdma_rm_alloc_cqe_ctx(backend_dev->rdma_dev_res, &bctx_id, bctx);
+if (unlikely(rc)) {
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_NOMEM, ctx);
+goto err_free_bctx;
+}
+
+rdma_protected_gslist_append_int32(&srq->cqe_ctx_list, bctx_id);
+
+rc = build_host_sge_array(backend_dev->rdma_dev_res, new_sge, sge, num_sge,
+  &backend_dev->rdma_dev_res->stats.rx_bufs_len);
+if (rc) {
+complete_work(IBV_WC_GENERAL_ERR, rc, ctx);
+goto err_dealloc_cqe_ctx;
+}
+
+wr.num_sge = num_sge;
+wr.sg_list = new_sge;
+wr.wr_id = bctx_id;
+rc = ibv_post_srq_recv(srq->ibsrq, &wr, &bad_wr);
+if (rc) {
+rdma_error_report("ibv_post_srq_recv fail, srqn=0x%x, rc=%d, errno=%d",
+  srq->ibsrq->handle, rc, errno);
+complete_work(IBV_WC_GENERAL_ERR, VENDOR_ERR_FAIL_BACKEND, ctx);
+goto err_dealloc_cqe_ctx;
+}
+
+atomic_inc(&backend_dev->rdma_dev_res->stats.missing_cqe);
+backend_dev->rdma_dev_res->stats.rx_bufs++;
+backend_dev->rdma_dev_res->stats.rx_srq++;
+
+return;
+
+err_dealloc_cqe_ctx:
+backend_dev->rdma_dev_res->stats.rx_bufs_err++;
+rdma_rm_dealloc_cqe_ctx(backend_dev->rdma_dev_res, bctx_id);
+
+err_free_bctx:
+g_free(bctx);
+}
+
 int rdma_backend_create_pd(RdmaBackendDev *backend_dev, RdmaBackendPD *pd)
 {
 pd->ibpd = ibv_alloc_pd(backend_dev->context);
@@ -938,6 +999,55 @@ void rdma_backend_destroy_qp(RdmaBackendQP *qp, 
RdmaDeviceResources *dev_res)
 rdma_protected_gslist_destroy(&qp->cqe_ctx_list);
 }
 
+int rdma_backend_create_srq(RdmaBackendSRQ *srq, RdmaBackendPD *pd,
+uint32_t max_wr, uint32_t max_sge,
+uint32_t srq_limit)
+{
+struct ibv_srq_init_attr srq_init_attr = {};
+
+srq_init_attr.attr.max_wr = max_wr;
+srq_init_attr.attr.max_sge = max_sge;
+srq_init_attr.attr.srq_limit = srq_limit;
+
+srq->ibsrq = ibv_create_srq(pd->ibpd, &srq_init_attr);
+if (!srq->ibsrq) {
+rdma_error_report("ibv_create_srq failed, errno=%d", errno);
+return -EIO;
+}
+
+rdma_protected_gslist_init(&srq->cqe_ctx_list);
+
+return 0;
+}
+
+int rdma_backend_query_srq(RdmaBackendSRQ *srq, struct ibv_srq_attr *srq_attr)
+{
+if (!srq->ibsrq) {
+return -EINVAL;
+}
+
+return ibv_query_srq(srq->ibsrq, srq_attr);
+}
+
+int rdma_backend_modify_srq(RdmaBackendSRQ *srq, struct ibv_srq_attr *srq_attr,
+int srq_attr_mask)
+{
+if (!srq->ibsrq) {
+return -EINVAL;
+}
+
+return ibv_modify_srq(srq->ibsrq, srq_attr, srq_attr_mask);
+}
+
+void rdma_backend_destroy

[Qemu-devel] [PATCH 2/4] hw/rdma: Add support for managing SRQ resource

2019-04-03 Thread Kamal Heib

Adding the required functions and definitions for support managing the
shared receive queues (SRQs).

Signed-off-by: Kamal Heib 
---
 hw/rdma/rdma_rm.c  | 93 ++
 hw/rdma/rdma_rm.h  | 10 +
 hw/rdma/rdma_rm_defs.h |  8 
 3 files changed, 111 insertions(+)

diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index b683506b8616..c4fb140dcd96 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -544,6 +544,96 @@ void rdma_rm_dealloc_qp(RdmaDeviceResources *dev_res, 
uint32_t qp_handle)
 rdma_res_tbl_dealloc(&dev_res->qp_tbl, qp->qpn);
 }
 
+RdmaRmSRQ *rdma_rm_get_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle)
+{
+return rdma_res_tbl_get(&dev_res->srq_tbl, srq_handle);
+}
+
+int rdma_rm_alloc_srq(RdmaDeviceResources *dev_res, uint32_t pd_handle,
+  uint32_t max_wr, uint32_t max_sge, uint32_t srq_limit,
+  uint32_t *srq_handle, void *opaque)
+{
+RdmaRmSRQ *srq;
+RdmaRmPD *pd;
+int rc;
+
+pd = rdma_rm_get_pd(dev_res, pd_handle);
+if (!pd) {
+return -EINVAL;
+}
+
+srq = rdma_res_tbl_alloc(&dev_res->srq_tbl, srq_handle);
+if (!srq) {
+return -ENOMEM;
+}
+
+rc = rdma_backend_create_srq(&srq->backend_srq, &pd->backend_pd,
+ max_wr, max_sge, srq_limit);
+if (rc) {
+rc = -EIO;
+goto out_dealloc_srq;
+}
+
+srq->opaque = opaque;
+
+return 0;
+
+out_dealloc_srq:
+rdma_res_tbl_dealloc(&dev_res->srq_tbl, *srq_handle);
+
+return rc;
+}
+
+int rdma_rm_query_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle,
+  struct ibv_srq_attr *srq_attr)
+{
+RdmaRmSRQ *srq;
+
+srq = rdma_rm_get_srq(dev_res, srq_handle);
+if (!srq) {
+return -EINVAL;
+}
+
+return rdma_backend_query_srq(&srq->backend_srq, srq_attr);
+}
+
+int rdma_rm_modify_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle,
+   struct ibv_srq_attr *srq_attr, int srq_attr_mask)
+{
+RdmaRmSRQ *srq;
+
+srq = rdma_rm_get_srq(dev_res, srq_handle);
+if (!srq) {
+return -EINVAL;
+}
+
+if ((srq_attr_mask & IBV_SRQ_LIMIT) &&
+(srq_attr->srq_limit == 0)) {
+return -EINVAL;
+}
+
+if ((srq_attr_mask & IBV_SRQ_MAX_WR) &&
+(srq_attr->max_wr == 0)) {
+return -EINVAL;
+}
+
+return rdma_backend_modify_srq(&srq->backend_srq, srq_attr,
+   srq_attr_mask);
+}
+
+void rdma_rm_dealloc_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle)
+{
+RdmaRmSRQ *srq;
+
+srq = rdma_rm_get_srq(dev_res, srq_handle);
+if (!srq) {
+return;
+}
+
+rdma_backend_destroy_srq(&srq->backend_srq, dev_res);
+rdma_res_tbl_dealloc(&dev_res->srq_tbl, srq_handle);
+}
+
 void *rdma_rm_get_cqe_ctx(RdmaDeviceResources *dev_res, uint32_t cqe_ctx_id)
 {
 void **cqe_ctx;
@@ -673,6 +763,8 @@ int rdma_rm_init(RdmaDeviceResources *dev_res, struct 
ibv_device_attr *dev_attr)
 res_tbl_init("CQE_CTX", &dev_res->cqe_ctx_tbl, dev_attr->max_qp *
dev_attr->max_qp_wr, sizeof(void *));
 res_tbl_init("UC", &dev_res->uc_tbl, MAX_UCS, sizeof(RdmaRmUC));
+res_tbl_init("SRQ", &dev_res->srq_tbl, dev_attr->max_srq,
+ sizeof(RdmaRmSRQ));
 
 init_ports(dev_res);
 
@@ -691,6 +783,7 @@ void rdma_rm_fini(RdmaDeviceResources *dev_res, 
RdmaBackendDev *backend_dev,
 
 fini_ports(dev_res, backend_dev, ifname);
 
+res_tbl_free(&dev_res->srq_tbl);
 res_tbl_free(&dev_res->uc_tbl);
 res_tbl_free(&dev_res->cqe_ctx_tbl);
 res_tbl_free(&dev_res->qp_tbl);
diff --git a/hw/rdma/rdma_rm.h b/hw/rdma/rdma_rm.h
index 4f03f9b8c5f1..e88ab95e264b 100644
--- a/hw/rdma/rdma_rm.h
+++ b/hw/rdma/rdma_rm.h
@@ -65,6 +65,16 @@ int rdma_rm_query_qp(RdmaDeviceResources *dev_res, 
RdmaBackendDev *backend_dev,
  int attr_mask, struct ibv_qp_init_attr *init_attr);
 void rdma_rm_dealloc_qp(RdmaDeviceResources *dev_res, uint32_t qp_handle);
 
+RdmaRmSRQ *rdma_rm_get_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle);
+int rdma_rm_alloc_srq(RdmaDeviceResources *dev_res, uint32_t pd_handle,
+  uint32_t max_wr, uint32_t max_sge, uint32_t srq_limit,
+  uint32_t *srq_handle, void *opaque);
+int rdma_rm_query_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle,
+  struct ibv_srq_attr *srq_attr);
+int rdma_rm_modify_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle,
+   struct ibv_srq_attr *srq_attr, int srq_attr_mask);
+void rdma_rm_dealloc_srq(RdmaDeviceResources *dev_res, uint32_t srq_handle);
+
 int rdma_rm_alloc_cqe_ctx(RdmaDeviceResources *dev_res, uint32_t *cqe_ctx_id,
   void *ctx);
 void *rdma_rm_get_cqe_ctx(RdmaDeviceResources *dev_res, uint32_t cqe_ctx_id);
diff --git a/hw/rdma/rdma_rm_defs.h b/hw/rdma/rdma_rm_defs.h
index

[Qemu-devel] [PATCH v3 3/4] hw/rdma: Modify create/destroy QP to support SRQ

2019-04-03 Thread Kamal Heib

Modify create/destroy QP to support shared receive queue and rearrange
the destroy_qp() code to avoid touching the QP after calling
rdma_rm_dealloc_qp().

Signed-off-by: Kamal Heib 
---
 hw/rdma/rdma_backend.c   |  9 --
 hw/rdma/rdma_backend.h   |  6 ++--
 hw/rdma/rdma_rm.c| 22 +--
 hw/rdma/rdma_rm.h|  3 +-
 hw/rdma/rdma_rm_defs.h   |  1 +
 hw/rdma/vmw/pvrdma_cmd.c | 59 
 6 files changed, 67 insertions(+), 33 deletions(-)

diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 04dfd63a573b..cf34874e9d2f 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -794,9 +794,9 @@ void rdma_backend_destroy_cq(RdmaBackendCQ *cq)
 
 int rdma_backend_create_qp(RdmaBackendQP *qp, uint8_t qp_type,
RdmaBackendPD *pd, RdmaBackendCQ *scq,
-   RdmaBackendCQ *rcq, uint32_t max_send_wr,
-   uint32_t max_recv_wr, uint32_t max_send_sge,
-   uint32_t max_recv_sge)
+   RdmaBackendCQ *rcq, RdmaBackendSRQ *srq,
+   uint32_t max_send_wr, uint32_t max_recv_wr,
+   uint32_t max_send_sge, uint32_t max_recv_sge)
 {
 struct ibv_qp_init_attr attr = {};
 
@@ -824,6 +824,9 @@ int rdma_backend_create_qp(RdmaBackendQP *qp, uint8_t 
qp_type,
 attr.cap.max_recv_wr = max_recv_wr;
 attr.cap.max_send_sge = max_send_sge;
 attr.cap.max_recv_sge = max_recv_sge;
+if (srq) {
+attr.srq = srq->ibsrq;
+}
 
 qp->ibqp = ibv_create_qp(pd->ibpd, &attr);
 if (!qp->ibqp) {
diff --git a/hw/rdma/rdma_backend.h b/hw/rdma/rdma_backend.h
index cad7956d98e8..7c1a19a2b5ff 100644
--- a/hw/rdma/rdma_backend.h
+++ b/hw/rdma/rdma_backend.h
@@ -89,9 +89,9 @@ void rdma_backend_poll_cq(RdmaDeviceResources *rdma_dev_res, 
RdmaBackendCQ *cq);
 
 int rdma_backend_create_qp(RdmaBackendQP *qp, uint8_t qp_type,
RdmaBackendPD *pd, RdmaBackendCQ *scq,
-   RdmaBackendCQ *rcq, uint32_t max_send_wr,
-   uint32_t max_recv_wr, uint32_t max_send_sge,
-   uint32_t max_recv_sge);
+   RdmaBackendCQ *rcq, RdmaBackendSRQ *srq,
+   uint32_t max_send_wr, uint32_t max_recv_wr,
+   uint32_t max_send_sge, uint32_t max_recv_sge);
 int rdma_backend_qp_state_init(RdmaBackendDev *backend_dev, RdmaBackendQP *qp,
uint8_t qp_type, uint32_t qkey);
 int rdma_backend_qp_state_rtr(RdmaBackendDev *backend_dev, RdmaBackendQP *qp,
diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index c0bb27cb6b90..96279e8d6561 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -386,12 +386,14 @@ int rdma_rm_alloc_qp(RdmaDeviceResources *dev_res, 
uint32_t pd_handle,
  uint8_t qp_type, uint32_t max_send_wr,
  uint32_t max_send_sge, uint32_t send_cq_handle,
  uint32_t max_recv_wr, uint32_t max_recv_sge,
- uint32_t recv_cq_handle, void *opaque, uint32_t *qpn)
+ uint32_t recv_cq_handle, void *opaque, uint32_t *qpn,
+ uint8_t is_srq, uint32_t srq_handle)
 {
 int rc;
 RdmaRmQP *qp;
 RdmaRmCQ *scq, *rcq;
 RdmaRmPD *pd;
+RdmaRmSRQ *srq = NULL;
 uint32_t rm_qpn;
 
 pd = rdma_rm_get_pd(dev_res, pd_handle);
@@ -408,6 +410,16 @@ int rdma_rm_alloc_qp(RdmaDeviceResources *dev_res, 
uint32_t pd_handle,
 return -EINVAL;
 }
 
+if (is_srq) {
+srq = rdma_rm_get_srq(dev_res, srq_handle);
+if (!srq) {
+rdma_error_report("Invalid srqn %d", srq_handle);
+return -EINVAL;
+}
+
+srq->recv_cq_handle = recv_cq_handle;
+}
+
 if (qp_type == IBV_QPT_GSI) {
 scq->notify = CNT_SET;
 rcq->notify = CNT_SET;
@@ -424,10 +436,14 @@ int rdma_rm_alloc_qp(RdmaDeviceResources *dev_res, 
uint32_t pd_handle,
 qp->send_cq_handle = send_cq_handle;
 qp->recv_cq_handle = recv_cq_handle;
 qp->opaque = opaque;
+qp->is_srq = is_srq;
 
 rc = rdma_backend_create_qp(&qp->backend_qp, qp_type, &pd->backend_pd,
-&scq->backend_cq, &rcq->backend_cq, 
max_send_wr,
-max_recv_wr, max_send_sge, max_recv_sge);
+&scq->backend_cq, &rcq->backend_cq,
+is_srq ? &srq->backend_srq : NULL,
+max_send_wr, max_recv_wr, max_send_sge,
+max_recv_sge);
+
 if (rc) {
 rc = -EIO;
 goto out_dealloc_qp;
diff --git a/hw/rdma/rdma_rm.h b/hw/rdma/rdma_rm.h
index e88ab95e264b..e8639909cd34 100644
--- a/hw/rdma/rdma_rm.h
+++ b/hw/rdma/rdma_rm.h
@@ -53,7 +53,8 @@ int rdma_rm_alloc_qp(RdmaDeviceResources *dev_res, uint32_t 
pd_handle,

[Qemu-devel] [PATCH v3 4/4] hw/pvrdma: Add support for SRQ

2019-04-03 Thread Kamal Heib

Implement the pvrdma device commands for supporting SRQ

Signed-off-by: Kamal Heib 
---
 hw/rdma/vmw/pvrdma_cmd.c| 147 
 hw/rdma/vmw/pvrdma_main.c   |  16 
 hw/rdma/vmw/pvrdma_qp_ops.c |  46 ++-
 hw/rdma/vmw/pvrdma_qp_ops.h |   1 +
 4 files changed, 209 insertions(+), 1 deletion(-)

diff --git a/hw/rdma/vmw/pvrdma_cmd.c b/hw/rdma/vmw/pvrdma_cmd.c
index b931bb6dc9d4..8d70c0d23de4 100644
--- a/hw/rdma/vmw/pvrdma_cmd.c
+++ b/hw/rdma/vmw/pvrdma_cmd.c
@@ -609,6 +609,149 @@ static int destroy_uc(PVRDMADev *dev, union 
pvrdma_cmd_req *req,
 return 0;
 }
 
+static int create_srq_ring(PCIDevice *pci_dev, PvrdmaRing **ring,
+   uint64_t pdir_dma, uint32_t max_wr,
+   uint32_t max_sge, uint32_t nchunks)
+{
+uint64_t *dir = NULL, *tbl = NULL;
+PvrdmaRing *r;
+int rc = -EINVAL;
+char ring_name[MAX_RING_NAME_SZ];
+uint32_t wqe_sz;
+
+if (!nchunks || nchunks > PVRDMA_MAX_FAST_REG_PAGES) {
+rdma_error_report("Got invalid page count for SRQ ring: %d",
+  nchunks);
+return rc;
+}
+
+dir = rdma_pci_dma_map(pci_dev, pdir_dma, TARGET_PAGE_SIZE);
+if (!dir) {
+rdma_error_report("Failed to map to SRQ page directory");
+goto out;
+}
+
+tbl = rdma_pci_dma_map(pci_dev, dir[0], TARGET_PAGE_SIZE);
+if (!tbl) {
+rdma_error_report("Failed to map to SRQ page table");
+goto out;
+}
+
+r = g_malloc(sizeof(*r));
+*ring = r;
+
+r->ring_state = (struct pvrdma_ring *)
+rdma_pci_dma_map(pci_dev, tbl[0], TARGET_PAGE_SIZE);
+if (!r->ring_state) {
+rdma_error_report("Failed to map tp SRQ ring state");
+goto out_free_ring_mem;
+}
+
+wqe_sz = pow2ceil(sizeof(struct pvrdma_rq_wqe_hdr) +
+  sizeof(struct pvrdma_sge) * max_sge - 1);
+sprintf(ring_name, "srq_ring_%" PRIx64, pdir_dma);
+rc = pvrdma_ring_init(r, ring_name, pci_dev, &r->ring_state[1], max_wr,
+  wqe_sz, (dma_addr_t *)&tbl[1], nchunks - 1);
+if (rc) {
+goto out_unmap_ring_state;
+}
+
+goto out;
+
+out_unmap_ring_state:
+rdma_pci_dma_unmap(pci_dev, r->ring_state, TARGET_PAGE_SIZE);
+
+out_free_ring_mem:
+g_free(r);
+
+out:
+rdma_pci_dma_unmap(pci_dev, tbl, TARGET_PAGE_SIZE);
+rdma_pci_dma_unmap(pci_dev, dir, TARGET_PAGE_SIZE);
+
+return rc;
+}
+
+static void destroy_srq_ring(PvrdmaRing *ring)
+{
+pvrdma_ring_free(ring);
+rdma_pci_dma_unmap(ring->dev, ring->ring_state, TARGET_PAGE_SIZE);
+g_free(ring);
+}
+
+static int create_srq(PVRDMADev *dev, union pvrdma_cmd_req *req,
+  union pvrdma_cmd_resp *rsp)
+{
+struct pvrdma_cmd_create_srq *cmd = &req->create_srq;
+struct pvrdma_cmd_create_srq_resp *resp = &rsp->create_srq_resp;
+PvrdmaRing *ring = NULL;
+int rc;
+
+memset(resp, 0, sizeof(*resp));
+
+rc = create_srq_ring(PCI_DEVICE(dev), &ring, cmd->pdir_dma,
+ cmd->attrs.max_wr, cmd->attrs.max_sge,
+ cmd->nchunks);
+if (rc) {
+return rc;
+}
+
+rc = rdma_rm_alloc_srq(&dev->rdma_dev_res, cmd->pd_handle,
+   cmd->attrs.max_wr, cmd->attrs.max_sge,
+   cmd->attrs.srq_limit, &resp->srqn, ring);
+if (rc) {
+destroy_srq_ring(ring);
+return rc;
+}
+
+return 0;
+}
+
+static int query_srq(PVRDMADev *dev, union pvrdma_cmd_req *req,
+ union pvrdma_cmd_resp *rsp)
+{
+struct pvrdma_cmd_query_srq *cmd = &req->query_srq;
+struct pvrdma_cmd_query_srq_resp *resp = &rsp->query_srq_resp;
+
+memset(resp, 0, sizeof(*resp));
+
+return rdma_rm_query_srq(&dev->rdma_dev_res, cmd->srq_handle,
+ (struct ibv_srq_attr *)&resp->attrs);
+}
+
+static int modify_srq(PVRDMADev *dev, union pvrdma_cmd_req *req,
+  union pvrdma_cmd_resp *rsp)
+{
+struct pvrdma_cmd_modify_srq *cmd = &req->modify_srq;
+
+/* Only support SRQ limit */
+if (!(cmd->attr_mask & IBV_SRQ_LIMIT) ||
+(cmd->attr_mask & IBV_SRQ_MAX_WR))
+return -EINVAL;
+
+return rdma_rm_modify_srq(&dev->rdma_dev_res, cmd->srq_handle,
+  (struct ibv_srq_attr *)&cmd->attrs,
+  cmd->attr_mask);
+}
+
+static int destroy_srq(PVRDMADev *dev, union pvrdma_cmd_req *req,
+   union pvrdma_cmd_resp *rsp)
+{
+struct pvrdma_cmd_destroy_srq *cmd = &req->destroy_srq;
+RdmaRmSRQ *srq;
+PvrdmaRing *ring;
+
+srq = rdma_rm_get_srq(&dev->rdma_dev_res, cmd->srq_handle);
+if (!srq) {
+return -EINVAL;
+}
+
+ring = (PvrdmaRing *)srq->opaque;
+destroy_srq_ring(ring);
+rdma_rm_dealloc_srq(&dev->rdma_dev_res, cmd->srq_handle);
+
+return 0;
+}
+
 struct cmd_handler {
 uint32_t cmd;
 uint32_t ack;
@

Re: [Qemu-devel] [PATCH 2/9] tcg: Add INDEX_op_extract2_{i32,i64}

2019-04-03 Thread Richard Henderson

On 3/26/19 8:35 PM, Peter Maydell wrote:
> On Thu, 7 Mar 2019 at 14:47, Richard Henderson
>  wrote:
>>
>> This will let backends implement the double-word shift operation.
>>
>> Signed-off-by: Richard Henderson 
>> diff --git a/tcg/README b/tcg/README
>> index 603f4df659..ddabf33017 100644
>> --- a/tcg/README
>> +++ b/tcg/README
>> @@ -343,6 +343,11 @@ at bit 8.  This operation would be equivalent to
>>
>>  (using an arithmetic right shift).
>>
>> +* extract2_i64 dest, t1, t2, pos
>> +
>> +Extract a 64-bit quantity from the concatenation of t2:t1,
>> +beginning at pos.
>> +
> 
> I think we should document the valid values of 'pos'.
> My guess is "0 <= pos <= 63".

How about


* extract2_i32/i64 dest, t1, t2, pos

For N = {32,64}, extract an N-bit quantity from the concatenation
of t2:t1, beginning at pos.  The tcg_gen_extract2_* expander allows
values 0 <= pos <= N, but will expand 0 and N with mov, so only
1 <= pos <= N-1 will be seen by the host tcg_out_op.


?


r~

Re: [Qemu-devel] [PATCH v4 2/5] virtio-pmem: Add virtio pmem driver

2019-04-03 Thread Yuval Shaia

On Wed, Apr 03, 2019 at 04:10:15PM +0530, Pankaj Gupta wrote:
> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta 
> ---
>  drivers/nvdimm/virtio_pmem.c |  84 +
>  drivers/virtio/Kconfig   |  10 +++
>  drivers/virtio/Makefile  |   1 +
>  drivers/virtio/pmem.c| 125 +++
>  include/linux/virtio_pmem.h  |  60 +++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 291 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index ..2a1b1ba2c1ff
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,84 @@
> +// SPDX-License-Identifier: GPL-2.0

Is this comment stile (//) acceptable?

> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include 
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> + unsigned int len;
> + unsigned long flags;
> + struct virtio_pmem_request *req, *req_buf;
> + struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> + spin_lock_irqsave(&vpmem->pmem_lock, flags);
> + while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> + req->done = true;
> + wake_up(&req->host_acked);
> +
> + if (!list_empty(&vpmem->req_list)) {
> + req_buf = list_first_entry(&vpmem->req_list,
> + struct virtio_pmem_request, list);
> + list_del(&vpmem->req_list);
> + req_buf->wq_buf_avail = true;
> + wake_up(&req_buf->wq_buf);
> + }
> + }
> + spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> + int err;
> + unsigned long flags;
> + struct scatterlist *sgs[2], sg, ret;
> + struct virtio_device *vdev = nd_region->provider_data;
> + struct virtio_pmem *vpmem = vdev->priv;
> + struct virtio_pmem_request *req;
> +
> + might_sleep();

[1]

> + req = kmalloc(sizeof(*req), GFP_KERNEL);
> + if (!req)
> + return -ENOMEM;
> +
> + req->done = req->wq_buf_avail = false;
> + strcpy(req->name, "FLUSH");
> + init_waitqueue_head(&req->host_acked);
> + init_waitqueue_head(&req->wq_buf);
> + sg_init_one(&sg, req->name, strlen(req->name));
> + sgs[0] = &sg;
> + sg_init_one(&ret, &req->ret, sizeof(req->ret));
> + sgs[1] = &ret;
> +
> + spin_lock_irqsave(&vpmem->pmem_lock, flags);
> + err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);

Is it okay to use GFP_ATOMIC in a might-sleep ([1]) function?

> + if (err) {
> + dev_err(&vdev->dev, "failed to send command to virtio pmem 
> device\n");
> +
> + list_add_tail(&vpmem->req_list, &req->list);
> + spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> + /* When host has read buffer, this completes via host_ack */
> + wait_event(req->wq_buf, req->wq_buf_avail);
> + spin_lock_irqsave(&vpmem->pmem_lock, flags);
> + }
> + virtqueue_kick(vpmem->req_vq);

You probably want to check return value here.

> + spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> + /* When host has read buffer, this completes via host_ack */
> + wait_event(req->host_acked, req->done);
> + err = req->ret;
> + kfree(req);
> +
> + return err;
> +};
> +EXPORT_SYMBOL_GPL(virtio_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>  
> If unsure, say Y.
>  
> +config VIRTIO_PMEM
> + tristate "Support for virtio pmem driver"
> + depends on VIRTIO
> + depends on LIBNVDIMM

Re: [Qemu-devel] [PATCH v3] s390: diagnose 318 info reset and migration support

2019-04-03 Thread David Hildenbrand

On 01.04.19 23:48, Collin Walling wrote:
> DIAGNOSE 0x318 (diag318) is a privileged s390x instruction that must
> be intercepted by SIE and handled via KVM. Let's introduce some
> functions to communicate between QEMU and KVM via ioctls. These
> will be used to get/set the diag318 related information (also known
> as the "Control Program Code" or "CPC"), as well as check the system
> if KVM supports handling this instruction.
> 
> Diag318 must also be reset on a load normal and modified clear, so
> we use the set function (wrapped in a reset function) to explicitly
> set the diag318 info to 0 for these cases.
> 
> Lastly, we want to ensure the diag318 info is migrated. The diag318
> info migration is handled via a VMStateDescription. This feature is
> only supported on QEMU machines 4.0 and later.

This has to become 4.1

> 
> Signed-off-by: Collin Walling 
> ---
> 
> This version is posted in tandem with a new kernel patch that changes
> how the execution of the diag 0x318 instruction is handled. A link to
> this series will be attached as a reply to this series for convenience.
> 
> Changelog:
> 
> v3
> - removed CPU model code
> - removed RSCPI and SCLP code
> - reverted max cpus back to 248 (previous patches limited this
> to 247)
> - introduced VMStateDescription handlers for migration
> - disabled migration of diag318 info for machines 3.1 and older
> - a warning is printed if migration is disabled and KVM
>   supports handling this instruction
> 
> ---
>  hw/s390x/s390-virtio-ccw.c   |  6 
>  linux-headers/asm-s390/kvm.h |  4 +++
>  target/s390x/diag.c  | 63 
>  target/s390x/internal.h  |  5 ++-
>  target/s390x/kvm-stub.c  | 15 +
>  target/s390x/kvm.c   | 32 ++
>  target/s390x/kvm_s390x.h |  3 ++
>  7 files changed, 127 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index d11069b860..2a50868496 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -36,6 +36,7 @@
>  #include "cpu_models.h"
>  #include "hw/nmi.h"
>  #include "hw/s390x/tod.h"
> +#include "internal.h"
>  
>  S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
>  {
> @@ -302,6 +303,8 @@ static void ccw_init(MachineState *machine)
>  
>  /* init the TOD clock */
>  s390_init_tod();
> +
> +diag318_register_migration();
>  }
>  
>  static void s390_cpu_plug(HotplugHandler *hotplug_dev,
> @@ -352,6 +355,7 @@ static void s390_machine_reset(void)
>  }
>  subsystem_reset();
>  s390_crypto_reset();
> +diag318_reset();

Shouldn't this go into subsystem_reset?

Aren't you missing resets during external/reipl resets?

Also, I was wondering if this would be worth creating a fake device like
diag288. The resets can be handled similar to diag288. Resets during
external/reipl reset would come natural.
>  
>  static void ccw_machine_3_1_class_options(MachineClass *mc)
> diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
> index 0265482f8f..735f5a46e8 100644
> --- a/linux-headers/asm-s390/kvm.h
> +++ b/linux-headers/asm-s390/kvm.h
> @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
>  #define KVM_S390_VM_CRYPTO   2
>  #define KVM_S390_VM_CPU_MODEL3
>  #define KVM_S390_VM_MIGRATION4
> +#define KVM_S390_VM_MISC 5
>  
>  /* kvm attributes for mem_ctrl */
>  #define KVM_S390_VM_MEM_ENABLE_CMMA  0
> @@ -168,6 +169,9 @@ struct kvm_s390_vm_cpu_subfunc {
>  #define KVM_S390_VM_MIGRATION_START  1
>  #define KVM_S390_VM_MIGRATION_STATUS 2
>  
> +/* kvm attributes for KVM_S390_VM_MISC */
> +#define KVM_S390_VM_MISC_CPC 0
> +
>  /* for KVM_GET_REGS and KVM_SET_REGS */
>  struct kvm_regs {
>   /* general purpose regs for s390 */
> diff --git a/target/s390x/diag.c b/target/s390x/diag.c
> index aafa740f61..bbb151e3eb 100644
> --- a/target/s390x/diag.c
> +++ b/target/s390x/diag.c
> @@ -20,6 +20,8 @@
>  #include "sysemu/cpus.h"
>  #include "hw/s390x/ipl.h"
>  #include "hw/s390x/s390-virtio-ccw.h"
> +#include "kvm_s390x.h"
> +#include "sysemu/kvm.h"
>  
>  int handle_diag_288(CPUS390XState *env, uint64_t r1, uint64_t r3)
>  {
> @@ -134,3 +136,64 @@ out:
>  break;
>  }
>  }
> +
> +typedef struct Diag318Data {
> +uint64_t cpc;
> +} Diag318Data;
> +static Diag318Data diag318data;
> +
> +void diag318_reset(void)
> +{
> +if (kvm_s390_has_diag318()) {
> +kvm_s390_set_cpc(0);
> +}
> +}
> +
> +static int diag318_post_load(void *opaque, int version_id)
> +{
> +Diag318Data *d = opaque;
> +
> +kvm_s390_set_cpc(d->cpc);
> +return 0;
> +}
> +
> +static int diag318_pre_save(void *opaque)
> +{
> +Diag318Data *d = opaque;
> +
> +kvm_s390_get_cpc(&d->cpc);
> +return 0;
> +}
> +
> +static bool diag318_needed(void *opaque)
> +{
> +return kvm_s390_has_dia

[Qemu-devel] [PATCH v2 0/8] WIP: Multifd compression support

2019-04-03 Thread Juan Quintela

v2:
- improve the code left and right
- Split better the zlib code
- rename everything to v4.1
- Add tests for multifd-compress zlib
- Parameter is now an enum (soon will see sztd)

ToDo:
- Make operations for diferent methods:
  * multifd_prepare_send_none/zlib
  * multifd_send_none/zlib
  * multifd_recv_none/zlib
- Use the MULTIFD_FLAG_ZLIB (it is unused so far).

Please review and comment.

v1:

This series create compression code on top of multifd.  It is still
WIP, but it is already:
- faster that current compression code
- it does the minimum amount of copies possible
- we allow support for other compression codes
- it pass the multifd test sent in my previous series

Test for existing code didn't work because code is too slow, I need to
make downtime 10 times bigger to make it to converge on my test
machine.  This code works with same limits that multifd no-

ToDo:
- move printf's  to traces
- move code to a struct instead of if (zlib) inside the main threads.
- improve error handling.

Please, review and coment.

Juan Quintela (8):
  migration: Fix migrate_set_parameter
  migration: fix multifd_recv event typo
  migration-test: rename parameter to parameter_int
  tests: Add migration multifd test
  migration-test: introduce functions to handle string parameters
  migration: Add multifd-compress parameter
  multifd: Add zlib compression support
  multifd: rest of zlib compression (WIP)

 hmp.c|  23 +-
 hw/core/qdev-properties.c|  11 +++
 include/hw/qdev-properties.h |   1 +
 migration/migration.c|  25 ++
 migration/migration.h|   1 +
 migration/ram.c  | 140 +++--
 migration/trace-events   |   2 +-
 qapi/common.json |  15 
 qapi/migration.json  |  19 -
 tests/migration-test.c   | 147 +--
 10 files changed, 348 insertions(+), 36 deletions(-)

-- 
2.20.1

Re: [Qemu-devel] [PATCH 2/9] tcg: Add INDEX_op_extract2_{i32,i64}

2019-04-03 Thread Peter Maydell

On Wed, 3 Apr 2019 at 18:37, Richard Henderson
 wrote:
>
> On 3/26/19 8:35 PM, Peter Maydell wrote:
> > On Thu, 7 Mar 2019 at 14:47, Richard Henderson
> >  wrote:
> >>
> >> This will let backends implement the double-word shift operation.
> >>
> >> Signed-off-by: Richard Henderson 
> >> diff --git a/tcg/README b/tcg/README
> >> index 603f4df659..ddabf33017 100644
> >> --- a/tcg/README
> >> +++ b/tcg/README
> >> @@ -343,6 +343,11 @@ at bit 8.  This operation would be equivalent to
> >>
> >>  (using an arithmetic right shift).
> >>
> >> +* extract2_i64 dest, t1, t2, pos
> >> +
> >> +Extract a 64-bit quantity from the concatenation of t2:t1,
> >> +beginning at pos.
> >> +
> >
> > I think we should document the valid values of 'pos'.
> > My guess is "0 <= pos <= 63".
>
> How about
>
> 
> * extract2_i32/i64 dest, t1, t2, pos
>
> For N = {32,64}, extract an N-bit quantity from the concatenation
> of t2:t1, beginning at pos.  The tcg_gen_extract2_* expander allows
> values 0 <= pos <= N, but will expand 0 and N with mov, so only
> 1 <= pos <= N-1 will be seen by the host tcg_out_op.

If I'm reading that correctly, it seems to be combining in one sentence
the behaviour of the TCG API exposed to the front-end (pos can be
between 0 and N inclusive) with a detail of the API that a backend
needs to care about (that it can assume it never sees 0 or N).
I think we should be more careful to keep those separate, because
a reader of this document is almost always going to care only about
one or the other, never both at the same time. Perhaps things that
apply only to the backend end of the interface should go in section 4
of tcg/README? At any rate I think they should at least be in
different sentences :-)

thanks
-- PMM

[Qemu-devel] [PATCH v2 8/8] multifd: rest of zlib compression (WIP)

2019-04-03 Thread Juan Quintela

This is still a work in progress, but get everything sent as expected
and it is faster than the code that is already there.

Signed-off-by: Juan Quintela 
---
 migration/ram.c | 93 ++---
 1 file changed, 88 insertions(+), 5 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 06b25ac66d..1b3b88d711 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1118,7 +1118,41 @@ static void *multifd_send_thread(void *opaque)
 uint64_t packet_num = p->packet_num;
 uint32_t flags = p->flags;
 
-p->next_packet_size = used * qemu_target_page_size();
+if (used) {
+if (migrate_use_multifd_zlib()) {
+struct iovec *iov = p->pages->iov;
+z_stream *zs = &p->zs;
+uint32_t out_size = 0;
+int i;
+
+for (i = 0; i < used; i++ ) {
+uint32_t available = p->zbuff_len - out_size;
+int flush = Z_NO_FLUSH;
+
+if (i == used  - 1) {
+flush = Z_SYNC_FLUSH;
+}
+
+zs->avail_in = iov[i].iov_len;
+zs->next_in = iov[i].iov_base;
+
+zs->avail_out = available;
+zs->next_out = p->zbuff + out_size;
+
+ret = deflate(zs, flush);
+if (ret != Z_OK) {
+printf("problem with deflate? %d\n", ret);
+qemu_mutex_unlock(&p->mutex);
+break;
+}
+out_size += available - zs->avail_out;
+}
+p->next_packet_size = out_size;
+} else {
+p->next_packet_size = used * qemu_target_page_size();
+}
+}
+
 multifd_send_fill_packet(p);
 p->flags = 0;
 p->num_packets++;
@@ -1136,8 +1170,13 @@ static void *multifd_send_thread(void *opaque)
 }
 
 if (used) {
-ret = qio_channel_writev_all(p->c, p->pages->iov,
- used, &local_err);
+if (migrate_use_multifd_zlib()) {
+ret = qio_channel_write_all(p->c, (void *)p->zbuff,
+   p->next_packet_size, 
&local_err);
+} else {
+ret = qio_channel_writev_all(p->c, p->pages->iov,
+ used, &local_err);
+}
 if (ret != 0) {
 break;
 }
@@ -1384,8 +1423,52 @@ static void *multifd_recv_thread(void *opaque)
 qemu_mutex_unlock(&p->mutex);
 
 if (used) {
-ret = qio_channel_readv_all(p->c, p->pages->iov,
-used, &local_err);
+uint32_t in_size = p->next_packet_size;
+uint32_t out_size = 0;
+uint32_t expected_size = used * qemu_target_page_size();
+int i;
+
+if (migrate_use_multifd_zlib()) {
+z_stream *zs = &p->zs;
+
+ret = qio_channel_read_all(p->c, (void *)p->zbuff,
+   in_size, &local_err);
+
+if (ret != 0) {
+break;
+}
+
+zs->avail_in = in_size;
+zs->next_in = p->zbuff;
+
+for (i = 0; i < used; i++ ) {
+struct iovec *iov = &p->pages->iov[i];
+int flush = Z_NO_FLUSH;
+
+if (i == used  - 1) {
+flush = Z_SYNC_FLUSH;
+}
+
+zs->avail_out = iov->iov_len;
+zs->next_out = iov->iov_base;
+
+ret = inflate(zs, flush);
+if (ret != Z_OK) {
+printf("%d: problem with inflate? %d\n", p->id, ret);
+qemu_mutex_unlock(&p->mutex);
+break;
+}
+out_size += iov->iov_len;
+}
+if (out_size != expected_size) {
+printf("out size %d expected size %d\n",
+   out_size, expected_size);
+break;
+}
+} else {
+ret = qio_channel_readv_all(p->c, p->pages->iov,
+used, &local_err);
+}
 if (ret != 0) {
 break;
 }
-- 
2.20.1

[Qemu-devel] [PATCH v2 3/8] migration-test: rename parameter to parameter_int

2019-04-03 Thread Juan Quintela

We would need _str ones on the next patch.

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 49 +-
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index bd3f5c3125..0b25aa3d6c 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -392,7 +392,8 @@ static char *migrate_get_socket_address(QTestState *who, 
const char *parameter)
 return result;
 }
 
-static long long migrate_get_parameter(QTestState *who, const char *parameter)
+static long long migrate_get_parameter_int(QTestState *who,
+   const char *parameter)
 {
 QDict *rsp;
 long long result;
@@ -403,17 +404,17 @@ static long long migrate_get_parameter(QTestState *who, 
const char *parameter)
 return result;
 }
 
-static void migrate_check_parameter(QTestState *who, const char *parameter,
-long long value)
+static void migrate_check_parameter_int(QTestState *who, const char *parameter,
+long long value)
 {
 long long result;
 
-result = migrate_get_parameter(who, parameter);
+result = migrate_get_parameter_int(who, parameter);
 g_assert_cmpint(result, ==, value);
 }
 
-static void migrate_set_parameter(QTestState *who, const char *parameter,
-  long long value)
+static void migrate_set_parameter_int(QTestState *who, const char *parameter,
+  long long value)
 {
 QDict *rsp;
 
@@ -423,7 +424,7 @@ static void migrate_set_parameter(QTestState *who, const 
char *parameter,
 parameter, value);
 g_assert(qdict_haskey(rsp, "return"));
 qobject_unref(rsp);
-migrate_check_parameter(who, parameter, value);
+migrate_check_parameter_int(who, parameter, value);
 }
 
 static void migrate_pause(QTestState *who)
@@ -672,7 +673,7 @@ static void deprecated_set_downtime(QTestState *who, const 
double value)
 " 'arguments': { 'value': %f } }", value);
 g_assert(qdict_haskey(rsp, "return"));
 qobject_unref(rsp);
-migrate_check_parameter(who, "downtime-limit", value * 1000);
+migrate_check_parameter_int(who, "downtime-limit", value * 1000);
 }
 
 static void deprecated_set_speed(QTestState *who, long long value)
@@ -683,7 +684,7 @@ static void deprecated_set_speed(QTestState *who, long long 
value)
   "'arguments': { 'value': %lld } }", value);
 g_assert(qdict_haskey(rsp, "return"));
 qobject_unref(rsp);
-migrate_check_parameter(who, "max-bandwidth", value);
+migrate_check_parameter_int(who, "max-bandwidth", value);
 }
 
 static void deprecated_set_cache_size(QTestState *who, long long value)
@@ -694,7 +695,7 @@ static void deprecated_set_cache_size(QTestState *who, long 
long value)
  "'arguments': { 'value': %lld } }", value);
 g_assert(qdict_haskey(rsp, "return"));
 qobject_unref(rsp);
-migrate_check_parameter(who, "xbzrle-cache-size", value);
+migrate_check_parameter_int(who, "xbzrle-cache-size", value);
 }
 
 static void test_deprecated(void)
@@ -729,8 +730,8 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
  * quickly, but that it doesn't complete precopy even on a slow
  * machine, so also set the downtime.
  */
-migrate_set_parameter(from, "max-bandwidth", 1);
-migrate_set_parameter(from, "downtime-limit", 1);
+migrate_set_parameter_int(from, "max-bandwidth", 1);
+migrate_set_parameter_int(from, "downtime-limit", 1);
 
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
@@ -781,7 +782,7 @@ static void test_postcopy_recovery(void)
 }
 
 /* Turn postcopy speed down, 4K/s is slow enough on any machines */
-migrate_set_parameter(from, "max-postcopy-bandwidth", 4096);
+migrate_set_parameter_int(from, "max-postcopy-bandwidth", 4096);
 
 /* Now we start the postcopy */
 migrate_postcopy_start(from, to);
@@ -822,7 +823,7 @@ static void test_postcopy_recovery(void)
 g_free(uri);
 
 /* Restore the postcopy bandwidth to unlimited */
-migrate_set_parameter(from, "max-postcopy-bandwidth", 0);
+migrate_set_parameter_int(from, "max-postcopy-bandwidth", 0);
 
 migrate_postcopy_complete(from, to);
 }
@@ -868,9 +869,9 @@ static void test_precopy_unix(void)
  * machine, so also set the downtime.
  */
 /* 1 ms should make it not converge*/
-migrate_set_parameter(from, "downtime-limit", 1);
+migrate_set_parameter_int(from, "downtime-limit", 1);
 /* 1GB/s */
-migrate_set_parameter(from, "max-bandwidth", 10);
+migrate_set_parameter_int(from, "max-bandwidth", 10);
 
 /* Wait for the first serial output from the source */
 wait_for_serial("src_serial");
@@ -880,7 +881,7 @@ static void test_precopy_unix

[Qemu-devel] [PATCH v2 5/8] migration-test: introduce functions to handle string parameters

2019-04-03 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index ff480e0682..65d5e256a7 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -427,6 +427,43 @@ static void migrate_set_parameter_int(QTestState *who, 
const char *parameter,
 migrate_check_parameter_int(who, parameter, value);
 }
 
+static char *migrate_get_parameter_str(QTestState *who,
+   const char *parameter)
+{
+QDict *rsp;
+char *result;
+
+rsp = wait_command(who, "{ 'execute': 'query-migrate-parameters' }");
+result = g_strdup(qdict_get_str(rsp, parameter));
+qobject_unref(rsp);
+return result;
+}
+
+static void migrate_check_parameter_str(QTestState *who, const char *parameter,
+const char *value)
+{
+char *result;
+
+result = migrate_get_parameter_str(who, parameter);
+g_assert_cmpstr(result, ==, value);
+g_free(result);
+}
+
+__attribute__((unused))
+static void migrate_set_parameter_str(QTestState *who, const char *parameter,
+  const char *value)
+{
+QDict *rsp;
+
+rsp = qtest_qmp(who,
+"{ 'execute': 'migrate-set-parameters',"
+"'arguments': { %s: %s } }",
+parameter, value);
+g_assert(qdict_haskey(rsp, "return"));
+qobject_unref(rsp);
+migrate_check_parameter_str(who, parameter, value);
+}
+
 static void migrate_pause(QTestState *who)
 {
 QDict *rsp;
-- 
2.20.1

[Qemu-devel] [PATCH v2 2/8] migration: fix multifd_recv event typo

2019-04-03 Thread Juan Quintela

It uses num in multifd_send().  Make it coherent.

Signed-off-by: Juan Quintela 
---
 migration/trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/trace-events b/migration/trace-events
index de2e136e57..cd50a1e659 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -80,7 +80,7 @@ get_queued_page_not_dirty(const char *block_name, uint64_t 
tmp_offset, unsigned
 migration_bitmap_sync_start(void) ""
 migration_bitmap_sync_end(uint64_t dirty_pages) "dirty_pages %" PRIu64
 migration_throttle(void) ""
-multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, 
uint32_t next_packet_size) "channel %d packet number %" PRIu64 " pages %d flags 
0x%x next packet size %d"
+multifd_recv(uint8_t id, uint64_t packet_num, uint32_t used, uint32_t flags, 
uint32_t next_packet_size) "channel %d packet_num %" PRIu64 " pages %d flags 
0x%x next packet size %d"
 multifd_recv_sync_main(long packet_num) "packet num %ld"
 multifd_recv_sync_main_signal(uint8_t id) "channel %d"
 multifd_recv_sync_main_wait(uint8_t id) "channel %d"
-- 
2.20.1

[Qemu-devel] [PATCH v2 7/8] multifd: Add zlib compression support

2019-04-03 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/migration.c  |  9 
 migration/migration.h  |  1 +
 migration/ram.c| 47 ++
 qapi/common.json   |  4 +++-
 tests/migration-test.c |  6 ++
 5 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index d6f8ef342a..69d85cbe5e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2141,6 +2141,15 @@ bool migrate_use_multifd(void)
 return s->enabled_capabilities[MIGRATION_CAPABILITY_MULTIFD];
 }
 
+bool migrate_use_multifd_zlib(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->parameters.multifd_compress == MULTIFD_COMPRESS_ZLIB;
+}
+
 bool migrate_pause_before_switchover(void)
 {
 MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 438f17edad..fc4fb841d4 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -269,6 +269,7 @@ bool migrate_ignore_shared(void);
 
 bool migrate_auto_converge(void);
 bool migrate_use_multifd(void);
+bool migrate_use_multifd_zlib(void);
 bool migrate_pause_before_switchover(void);
 int migrate_multifd_channels(void);
 
diff --git a/migration/ram.c b/migration/ram.c
index d7f8fe45a8..06b25ac66d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -582,6 +582,7 @@ exit:
 #define MULTIFD_VERSION 1
 
 #define MULTIFD_FLAG_SYNC (1 << 0)
+#define MULTIFD_FLAG_ZLIB (1 << 1)
 
 /* This value needs to be a multiple of qemu_target_page_size() */
 #define MULTIFD_PACKET_SIZE (512 * 1024)
@@ -663,6 +664,12 @@ typedef struct {
 uint64_t num_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* stream for compression */
+z_stream zs;
+/* compressed buffer */
+uint8_t *zbuff;
+/* size of compressed buffer */
+uint32_t zbuff_len;
 }  MultiFDSendParams;
 
 typedef struct {
@@ -698,6 +705,12 @@ typedef struct {
 uint64_t num_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* stream for compression */
+z_stream zs;
+/* compressed buffer */
+uint8_t *zbuff;
+/* size of compressed buffer */
+uint32_t zbuff_len;
 } MultiFDRecvParams;
 
 static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
@@ -1035,6 +1048,9 @@ void multifd_save_cleanup(void)
 p->packet_len = 0;
 g_free(p->packet);
 p->packet = NULL;
+deflateEnd(&p->zs);
+g_free(p->zbuff);
+p->zbuff = NULL;
 }
 qemu_sem_destroy(&multifd_send_state->channels_ready);
 qemu_sem_destroy(&multifd_send_state->sem_sync);
@@ -1198,6 +1214,7 @@ int multifd_save_setup(void)
 
 for (i = 0; i < thread_count; i++) {
 MultiFDSendParams *p = &multifd_send_state->params[i];
+z_stream *zs = &p->zs;
 
 qemu_mutex_init(&p->mutex);
 qemu_sem_init(&p->sem, 0);
@@ -1211,6 +1228,17 @@ int multifd_save_setup(void)
 p->packet = g_malloc0(p->packet_len);
 p->name = g_strdup_printf("multifdsend_%d", i);
 socket_send_channel_create(multifd_new_send_channel_async, p);
+zs->zalloc = Z_NULL;
+zs->zfree = Z_NULL;
+zs->opaque = Z_NULL;
+if (deflateInit(zs, migrate_compress_level()) != Z_OK) {
+printf("deflate init failed\n");
+return -1;
+}
+/* We will never have more than page_count pages */
+p->zbuff_len = page_count * qemu_target_page_size();
+p->zbuff_len *= 2;
+p->zbuff = g_malloc0(p->zbuff_len);
 }
 return 0;
 }
@@ -1278,6 +1306,9 @@ int multifd_load_cleanup(Error **errp)
 p->packet_len = 0;
 g_free(p->packet);
 p->packet = NULL;
+inflateEnd(&p->zs);
+g_free(p->zbuff);
+p->zbuff = NULL;
 }
 qemu_sem_destroy(&multifd_recv_state->sem_sync);
 g_free(multifd_recv_state->params);
@@ -1396,6 +1427,7 @@ int multifd_load_setup(void)
 
 for (i = 0; i < thread_count; i++) {
 MultiFDRecvParams *p = &multifd_recv_state->params[i];
+z_stream *zs = &p->zs;
 
 qemu_mutex_init(&p->mutex);
 qemu_sem_init(&p->sem_sync, 0);
@@ -1405,6 +1437,21 @@ int multifd_load_setup(void)
   + sizeof(ram_addr_t) * page_count;
 p->packet = g_malloc0(p->packet_len);
 p->name = g_strdup_printf("multifdrecv_%d", i);
+
+zs->zalloc = Z_NULL;
+zs->zfree = Z_NULL;
+zs->opaque = Z_NULL;
+zs->avail_in = 0;
+zs->next_in = Z_NULL;
+if (inflateInit(zs) != Z_OK) {
+printf("inflate init failed\n");
+return -1;
+}
+/* We will never have more than page_count pages */
+p->zbuff_len = page_count * qemu_target_page_size();
+/* We know compression "could" use more space */
+p->zbuff_len *= 2;
+p->zbuff = g_malloc0(p->zbuff_len);
 }
 return 0;
 }
diff --git a/qapi/common.j

[Qemu-devel] [PATCH v2 1/8] migration: Fix migrate_set_parameter

2019-04-03 Thread Juan Quintela

Otherwise we are setting err twice, what is wrong and causes an abort.

Signed-off-by: Juan Quintela 
---
 hmp.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hmp.c b/hmp.c
index 92941142af..8eec768088 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1825,8 +1825,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
 p->has_xbzrle_cache_size = true;
 visit_type_size(v, param, &cache_size, &err);
-if (err || cache_size > INT64_MAX
-|| (size_t)cache_size != cache_size) {
+if (err) {
+break;
+}
+if (cache_size > INT64_MAX || (size_t)cache_size != cache_size) {
 error_setg(&err, "Invalid size %s", valuestr);
 break;
 }
-- 
2.20.1

Re: [Qemu-devel] [PATCH v2 0/2] util/readline: errors clean-ups

2019-04-03 Thread Stefan Hajnoczi

On Mon, Apr 01, 2019 at 03:44:04AM +0100, Jules Irenge wrote:
> This v2 version combines two fix of errors into one  and replace tab
> indent by four spaces
> 
> Jules Irenge (2):
>   util/readline: add a space to fix errors by checkpatch tool
>   util: readline: replace tab indent by four spaces to fix checkpatch
> errors
> 
>  util/readline.c | 132 
>  1 file changed, 66 insertions(+), 66 deletions(-)

This file is unmaintained, I'll take this patch through my tree.

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] Add coroutine_fn to bdrv_check_co_entry

2019-04-03 Thread Stefan Hajnoczi

On Mon, Apr 01, 2019 at 12:30:51PM +0300, n.alekseev2...@gmail.com wrote:
> From: Nikita Alekseev 
> 
> bdrv_check_co_entry calls bdrv_co_check, which is a coroutine function.
> Thus, it also needs to be marked as a coroutine.
> 
> Signed-off-by: Nikita Alekseev 
> ---
>  block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thank you!  I prefixed the commit message with "block: ".  Most commits
have a subsystem prefix to make browsing/searching git log easier.  In
the future you can add it yourself - look at git log for a file to
discover what prefix other people recently used.

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan

signature.asc
Description: PGP signature

[Qemu-devel] [PATCH v2 4/8] tests: Add migration multifd test

2019-04-03 Thread Juan Quintela

We set multifd-channels.

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Thomas Huth 
Signed-off-by: Juan Quintela 
---
 tests/migration-test.c | 48 ++
 1 file changed, 48 insertions(+)

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 0b25aa3d6c..ff480e0682 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -1028,6 +1028,53 @@ static void test_precopy_tcp(void)
 g_free(uri);
 }
 
+static void test_multifd_tcp(void)
+{
+char *uri;
+QTestState *from, *to;
+
+if (test_migrate_start(&from, &to, "tcp:127.0.0.1:0", false, false)) {
+return;
+}
+
+/*
+ * We want to pick a speed slow enough that the test completes
+ * quickly, but that it doesn't complete precopy even on a slow
+ * machine, so also set the downtime.
+ */
+/* 1 ms should make it not converge*/
+migrate_set_parameter_int(from, "downtime-limit", 1);
+/* 1GB/s */
+migrate_set_parameter_int(from, "max-bandwidth", 10);
+
+migrate_set_parameter_int(from, "multifd-channels", 2);
+migrate_set_parameter_int(to, "multifd-channels", 2);
+
+migrate_set_capability(from, "multifd", "true");
+migrate_set_capability(to, "multifd", "true");
+/* Wait for the first serial output from the source */
+wait_for_serial("src_serial");
+
+uri = migrate_get_socket_address(to, "socket-address");
+
+migrate(from, uri, "{}");
+
+wait_for_migration_pass(from);
+
+/* 300ms it should converge */
+migrate_set_parameter_int(from, "downtime-limit", 600);
+
+if (!got_stop) {
+qtest_qmp_eventwait(from, "STOP");
+}
+qtest_qmp_eventwait(to, "RESUME");
+
+wait_for_serial("dest_serial");
+wait_for_migration_complete(from);
+
+test_migrate_end(from, to, true);
+}
+
 int main(int argc, char **argv)
 {
 char template[] = "/tmp/migration-test-XX";
@@ -1082,6 +1129,7 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/precopy/tcp", test_precopy_tcp);
 /* qtest_add_func("/migration/ignore_shared", test_ignore_shared); */
 qtest_add_func("/migration/xbzrle/unix", test_xbzrle_unix);
+qtest_add_func("/migration/multifd/tcp", test_multifd_tcp);
 
 ret = g_test_run();
 
-- 
2.20.1

[Qemu-devel] [PATCH v2 6/8] migration: Add multifd-compress parameter

2019-04-03 Thread Juan Quintela

Signed-off-by: Juan Quintela 

---
Rename it to NONE
---
 hmp.c| 17 +
 hw/core/qdev-properties.c| 11 +++
 include/hw/qdev-properties.h |  1 +
 migration/migration.c| 16 
 qapi/common.json | 13 +
 qapi/migration.json  | 19 +++
 tests/migration-test.c   | 13 ++---
 7 files changed, 83 insertions(+), 7 deletions(-)

diff --git a/hmp.c b/hmp.c
index 8eec768088..02fbe27757 100644
--- a/hmp.c
+++ b/hmp.c
@@ -435,6 +435,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %u\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_CHANNELS),
 params->multifd_channels);
+monitor_printf(mon, "%s: %s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_MULTIFD_COMPRESS),
+MultifdCompress_str(params->multifd_compress));
 monitor_printf(mon, "%s: %" PRIu64 "\n",
 MigrationParameter_str(MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE),
 params->xbzrle_cache_size);
@@ -1737,6 +1740,7 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 MigrateSetParameters *p = g_new0(MigrateSetParameters, 1);
 uint64_t valuebw = 0;
 uint64_t cache_size;
+int compress_type;
 Error *err = NULL;
 int val, ret;
 
@@ -1822,6 +1826,19 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_multifd_channels = true;
 visit_type_int(v, param, &p->multifd_channels, &err);
 break;
+case MIGRATION_PARAMETER_MULTIFD_COMPRESS:
+p->has_multifd_compress = true;
+visit_type_enum(v, param, &compress_type,
+&MultifdCompress_lookup, &err);
+if (err) {
+break;
+}
+if (compress_type < 0 || compress_type > MULTIFD_COMPRESS__MAX) {
+error_setg(&err, "Invalid multifd_compress option %s", valuestr);
+break;
+}
+p->multifd_compress = compress_type;
+break;
 case MIGRATION_PARAMETER_XBZRLE_CACHE_SIZE:
 p->has_xbzrle_cache_size = true;
 visit_type_size(v, param, &cache_size, &err);
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 5da1439a8b..7c8e71532f 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -645,6 +645,17 @@ const PropertyInfo qdev_prop_fdc_drive_type = {
 .set_default_value = set_default_value_enum,
 };
 
+/* --- MultifdCompress --- */
+
+const PropertyInfo qdev_prop_multifd_compress = {
+.name = "MultifdCompress",
+.description = "multifd_compress values",
+.enum_table = &MultifdCompress_lookup,
+.get = get_enum,
+.set = set_enum,
+.set_default_value = set_default_value_enum,
+};
+
 /* --- pci address --- */
 
 /*
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index b6758c852e..ac452d8f2c 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -23,6 +23,7 @@ extern const PropertyInfo qdev_prop_tpm;
 extern const PropertyInfo qdev_prop_ptr;
 extern const PropertyInfo qdev_prop_macaddr;
 extern const PropertyInfo qdev_prop_on_off_auto;
+extern const PropertyInfo qdev_prop_multifd_compress;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
 extern const PropertyInfo qdev_prop_bios_chs_trans;
diff --git a/migration/migration.c b/migration/migration.c
index 609e0df5d0..d6f8ef342a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -82,6 +82,7 @@
 /* The delay time (in ms) between two COLO checkpoints */
 #define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
+#define DEFAULT_MIGRATE_MULTIFD_COMPRESS MULTIFD_COMPRESS_NONE
 
 /* Background transfer rate for postcopy, 0 means unlimited, note
  * that page requests can still exceed this limit.
@@ -769,6 +770,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->block_incremental = s->parameters.block_incremental;
 params->has_multifd_channels = true;
 params->multifd_channels = s->parameters.multifd_channels;
+params->has_multifd_compress = true;
+params->multifd_compress = s->parameters.multifd_compress;
 params->has_xbzrle_cache_size = true;
 params->xbzrle_cache_size = s->parameters.xbzrle_cache_size;
 params->has_max_postcopy_bandwidth = true;
@@ -1268,6 +1271,9 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_multifd_channels) {
 dest->multifd_channels = params->multifd_channels;
 }
+if (params->has_multifd_compress) {
+dest->multifd_compress = params->multifd_compress;
+}
 if (params->has_xbzrle_cache_size) {
 dest->xbzrle_cache_size = params->xbzrle_cache_size;
 }
@@ -1364,6 +1370,9 @@ static void migrate_params_apply(MigrateSetParamete

Re: [Qemu-devel] [PATCH] util/readline: Add braces to fix checkpatch errors

2019-04-03 Thread Stefan Hajnoczi

On Sat, Mar 30, 2019 at 11:21:42AM +, Jules Irenge wrote:
> Add braces to fix errors issued by checkpatch.pl tool
> "ERROR: braces {} are necessary for all arms of this statement"
> Within "util/readline.c" file
> ---
>  util/readline.c | 50 -
>  1 file changed, 33 insertions(+), 17 deletions(-)

This file is unmaintained, I'll take this patch through my tree.

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt"

2019-04-03 Thread Shameerali Kolothum Thodi




> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: 03 April 2019 10:49
> To: Laszlo Ersek 
> Cc: Auger Eric ; Ard Biesheuvel
> ; peter.mayd...@linaro.org;
> sa...@linux.intel.com; qemu-devel@nongnu.org; Shameerali Kolothum Thodi
> ; Linuxarm
> ; shannon.zha...@gmail.com;
> qemu-...@nongnu.org; xuwei (O) ;
> sebastien.bo...@intel.com; Leif Lindholm 
> Subject: Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in
> feature "fdt"
> 
> On Tue, 2 Apr 2019 17:38:26 +0200
> Laszlo Ersek  wrote:

[...]

> > >>> Sure, sorry.
> > >>>
> > >>> This series:
> > >>> - [PATCH v3 00/10] ARM virt: ACPI memory hotplug support,
> > >>> https://patchwork.kernel.org/cover/10863301/
> > >>>
> > >>> aims to introduce PCDIMM support in qemu. In ACPI mode, it builds the
> > >>> SRAT and DSDT parts and relies on GED to trigger the hotplug.
> > >>>
> > >>> We noticed that if we build the hotpluggable memory dt nodes on top of
> > >>> the above ACPI tables, the DIMM slots are interpreted as not
> > >>> hotpluggable memory slots (at least we think so).
> > >>>
> > >>> We think the EDK2 GetMemoryMap() uses the dt node info and ignores
> the
> > >>> fact that those slots are exposed as hotpluggable in the SRAT for
> example.
> > >>>
> > >>> So in this series, we are forced to not generate the hotpluggable memory
> > >>> dt nodes if we want the DIMM slots to be effectively recognized as
> > >>> hotpluggable.
> > >>>
> > >>> Could you confirm we have a correct understanding of the EDK2
> behaviour
> > >>> and if so, would there be any solution for EDK2 to absorb both the DT
> > >>> nodes and the relevant SRAT/DSDT tables and make the slots
> hotpluggable.
> > >>>
> > >>> At qemu level, detecting we are booting in ACPI mode and purposely
> > >>> removing the above mentioned DT nodes does not look straightforward.
> > >>
> > >> The firmware is not enlightened about the ACPI content that comes from
> > >> QEMU / fw_cfg. That ACPI content is *blindly* processed by the firmware,
> > >> as instructed through the ACPI linker/loader script, in order to install
> > >> the ACPI content for the OS. No actual information is consumed by the
> > >> firmware from the ACPI payload -- and that's a feature.
> > >>
> > >> The firmware does consume DT:
> > >>
> > >> - If you start QEMU *with* "-no-acpi", then the DT is both consumed by
> > >> the firmware (for its own information needs), and passed on to the OS.
> > >>
> > >> - If you start QEMU *without* "-no-acpi" (the default), then the DT is
> > >> consumed only by the firmware (for its own information needs), and the
> > >> DT is hidden from the OS. The OS gets only the ACPI content
> > >> (processed/prepared as described above).
> > >>
> > >>
> > >> In the firmware, the "ArmVirtPkg/HighMemDxe" driver iterates over the
> > >> base/size pairs in all the memory nodes in the DT. For each such base
> > >> address that is currently tracked as "nonexistent" in the GCD memory
> > >> space map, the driver currently adds the base/size range as "system
> > >> memory". This in turn is reflected by the UEFI memmap that the OS gets
> > >> to see as "conventional memory".
> > >>
> > >> If you need some memory ranges to show up as "special" in the UEFI
> > >> memmap, then you need to distinguish them somehow from the "regular"
> > >> memory areas, in the DT. And then extend "ArmVirtPkg/HighMemDxe" in
> the
> > >> firmware, so that it act upon the discriminator that you set in the DT.
> > >>
> > >>
> > >> Now... from a brief look at the Platform Init and UEFI specs, my
> > >> impression is that the hotpluggable (but presently not plugged) DIMM
> > >> ranges should simply be *absent* from the UEFI memmap; is that
> correct?
> > >> (I didn't check the ACPI spec, maybe it specifies the expected behavior
> > >> in full.) If my impression is correct, then two options (alternatives)
> > >> exist:
> > >>
> > >> (1) Hide the affected memory nodes -- or at least the affected base/size
> > >> pairs -- from the DT, in case you boot without "-no-acpi" but with an
> > >> external firmware loaded. Then the firmware will not expose those ranges
> > >> as "conventional memory" in the UEFI memmap. This approach requires
> no
> > >> changes to edk2.
> > >>
> > >> This option is precisely what Eric described up-thread, at
> > >>
>  at.com>:
> > >>
> > >>> in machvirt_init, there is firmware_loaded that tells you whether you
> > >>> have a FW image. If this one is not set, you can induce dt. But if
> > >>> there is a FW it can be either DT or ACPI booted. You also have the
> > >>> acpi_enabled knob.
> > >>
> > >> (The "-no-acpi" cmdline option clears the "acpi_enabled" variable in
> > >> "vl.c").
> > >>
> > >> So, the condition for hiding the hotpluggable memory nodes in question
> > >> from the DT is:
> > >
> > >>
> > >>   (aarch64 && firmware_loaded && acpi_enabled)
> > >
> > > Thanks a lot for all those inputs!
> >

Re: [Qemu-devel] [PATCH v4 5/5] target/mips: Refactor and fix INSERT. instructions

2019-04-03 Thread Aleksandar Markovic

> From: Mateja Marjanovic 
> Subject: Re: [PATCH v4 5/5] target/mips: Refactor and fix INSERT. 
> instructions
> 
> On 2.4.19. 22:50, Aleksandar Markovic wrote:
> >> From: Mateja Marjanovic 
> >> Subject: [PATCH v4 5/5] target/mips: Refactor and fix INSERT. 
> >> instructions
> >>
> >> From: Mateja Marjanovic 
> >>
> >> The old version of the helper for the INSERT. MSA instructions
> >> has been replaced with four helpers that don't use switch, and change
> >> the endianness of the given index, when executed on a big endian host.
> >>
> >> Signed-off-by: Mateja Marjanovic 
> >> ---
> > ...
> >
> >
> >> +n %= 16;
> > Mateja, could you just clarify what is the purpose of this line (and
> > similar three lines involving "%=")? It looks to me that n is already
> > limited here to be between 0 and 15, isn't it? (that follows from the
> > source code of gen_msa_elm().) What made you insert this line,
> > as it stands?
> 
> It was
> 
> n %= DF_ELEMENTS(df);
> 
> but when I deleted the df argument, so it had to be done like
> this. I think it's a matter of precaution (in case a number
> greater than 15, or 8... is passed as an argument).

Oh, I see.

At this moment, I think a more appropriate code would be:

assert(n < 16)

The whole set of functions decoding this group of instructions is
too complicated (several layers, repeated calculations), and could
be further simplified. But let's say that would be outside of the
scope of this patch and series. Your series, in fact, begins that
simplification by removing redundant decoding of "df" in helpers,
so it is a good step towards better code organization, and we'll
leave more detailed cleanup for some future endeavors.

Yours,
Aleksandar

Re: [Qemu-devel] [PATCH v2 0/8] WIP: Multifd compression support

2019-04-03 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20190403114958.3705-1-quint...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20190403114958.3705-1-quint...@redhat.com
Subject: [Qemu-devel] [PATCH v2 0/8] WIP: Multifd compression support
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag]   patchew/20190403114958.3705-1-quint...@redhat.com -> 
patchew/20190403114958.3705-1-quint...@redhat.com
Switched to a new branch 'test'
2e0702511c multifd: rest of zlib compression (WIP)
1f67f96a63 multifd: Add zlib compression support
f5c3ad30bc migration: Add multifd-compress parameter
c5c77f97be migration-test: introduce functions to handle string parameters
94ab8bb4f3 tests: Add migration multifd test
d73c9d31bb migration-test: rename parameter to parameter_int
5c0dbda2f7 migration: fix multifd_recv event typo
227dcab34a migration: Fix migrate_set_parameter

=== OUTPUT BEGIN ===
1/8 Checking commit 227dcab34a61 (migration: Fix migrate_set_parameter)
2/8 Checking commit 5c0dbda2f7b4 (migration: fix multifd_recv event typo)
3/8 Checking commit d73c9d31bb62 (migration-test: rename parameter to 
parameter_int)
4/8 Checking commit 94ab8bb4f3c6 (tests: Add migration multifd test)
5/8 Checking commit c5c77f97be9e (migration-test: introduce functions to handle 
string parameters)
6/8 Checking commit f5c3ad30bc62 (migration: Add multifd-compress parameter)
WARNING: line over 80 characters
#133: FILE: migration/migration.c:3366:
+DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_multifd_compress, 
MultifdCompress)

total: 0 errors, 1 warnings, 231 lines checked

Patch 6/8 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/8 Checking commit 1f67f96a636a (multifd: Add zlib compression support)
8/8 Checking commit 2e0702511c82 (multifd: rest of zlib compression (WIP))
ERROR: space prohibited before that close parenthesis ')'
#29: FILE: migration/ram.c:1128:
+for (i = 0; i < used; i++ ) {

ERROR: space prohibited before that close parenthesis ')'
#100: FILE: migration/ram.c:1444:
+for (i = 0; i < used; i++ ) {

total: 2 errors, 0 warnings, 111 lines checked

Patch 8/8 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190403114958.3705-1-quint...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Qemu-devel] [PATCH] sockets: Fix stringop-truncation warning

2019-04-03 Thread Philippe Mathieu-Daudé

Compiling with clang-8 fails with:

CC  util/qemu-sockets.o
  util/qemu-sockets.c: In function 'unix_connect_saddr':
  util/qemu-sockets.c:925:5: error: 'strncpy' specified bound 108 equals 
destination size [-Werror=stringop-truncation]
   strncpy(un.sun_path, saddr->path, sizeof(un.sun_path));
   ^~
  util/qemu-sockets.c: In function 'unix_listen_saddr':
  util/qemu-sockets.c:880:5: error: 'strncpy' specified bound 108 equals 
destination size [-Werror=stringop-truncation]
   strncpy(un.sun_path, path, sizeof(un.sun_path));
   ^~~

Per the unix socket manpage:

  UNIX(7)

  Pathname sockets
  When binding a socket to a pathname, a few rules should be observed for 
maximum portability and ease of coding:
  *  The pathname in sun_path should be null-terminated.
  *  The length of the pathname, including the terminating null byte, should 
not exceed the size of sun_path.

Reduce the length of the unix socket path by 1 to hold the NUL byte.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/qemu-sockets.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 97050516900..935271d58c0 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -845,10 +845,10 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
 path = pathbuf = g_strdup_printf("%s/qemu-socket-XX", tmpdir);
 }
 
-if (strlen(path) > sizeof(un.sun_path)) {
+if (strlen(path) > sizeof(un.sun_path) - 1) {
 error_setg(errp, "UNIX socket path '%s' is too long", path);
 error_append_hint(errp, "Path must be less than %zu bytes\n",
-  sizeof(un.sun_path));
+  sizeof(un.sun_path) - 1);
 goto err;
 }
 
@@ -877,7 +877,7 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
 
 memset(&un, 0, sizeof(un));
 un.sun_family = AF_UNIX;
-strncpy(un.sun_path, path, sizeof(un.sun_path));
+strncpy(un.sun_path, path, sizeof(un.sun_path) - 1);
 
 if (bind(sock, (struct sockaddr*) &un, sizeof(un)) < 0) {
 error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
@@ -913,16 +913,16 @@ static int unix_connect_saddr(UnixSocketAddress *saddr, 
Error **errp)
 return -1;
 }
 
-if (strlen(saddr->path) > sizeof(un.sun_path)) {
+if (strlen(saddr->path) > sizeof(un.sun_path) - 1) {
 error_setg(errp, "UNIX socket path '%s' is too long", saddr->path);
 error_append_hint(errp, "Path must be less than %zu bytes\n",
-  sizeof(un.sun_path));
+  sizeof(un.sun_path) - 1);
 goto err;
 }
 
 memset(&un, 0, sizeof(un));
 un.sun_family = AF_UNIX;
-strncpy(un.sun_path, saddr->path, sizeof(un.sun_path));
+strncpy(un.sun_path, saddr->path, sizeof(un.sun_path) - 1);
 
 /* connect to peer */
 do {
-- 
2.20.1

[Qemu-devel] [Bug 1815889] Re: qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()

2019-04-03 Thread Christian Ehrhardt 

Thank you Daniel,
we will most likely keep Disco as-is for now and merge this in 19.10 where then 
mesa can drop the revert. I tagged it for 19.10 to be revisited.

** Tags added: qemu-19.10

** Also affects: mesa (Ubuntu Ee-series)
   Importance: Undecided
   Status: New

** Also affects: qemu (Ubuntu Ee-series)
   Importance: Undecided
   Status: New

** Changed in: qemu (Ubuntu Ee-series)
   Status: New => Triaged

** Changed in: qemu (Ubuntu)
   Status: Triaged => Won't Fix

** Changed in: qemu (Ubuntu)
   Status: Won't Fix => Invalid

** Changed in: mesa (Ubuntu Ee-series)
   Status: New => Triaged

** Changed in: qemu (Ubuntu Ee-series)
 Assignee: (unassigned) => Christian Ehrhardt  (paelzer)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1815889

Title:
  qemu-system-x86_64 crashed with signal 31 in
  __pthread_setaffinity_new()

Status in Mesa:
  Confirmed
Status in QEMU:
  Fix Committed
Status in mesa package in Ubuntu:
  Fix Released
Status in qemu package in Ubuntu:
  Invalid
Status in mesa source package in Disco:
  Fix Released
Status in mesa source package in EE-Series:
  Triaged
Status in qemu source package in EE-Series:
  Triaged

Bug description:
  Unable to launch Default Fedora 29 images in gnome-boxes

  ProblemType: Crash
  DistroRelease: Ubuntu 19.04
  Package: qemu-system-x86 1:3.1+dfsg-2ubuntu1
  ProcVersionSignature: Ubuntu 4.19.0-12.13-generic 4.19.18
  Uname: Linux 4.19.0-12-generic x86_64
  ApportVersion: 2.20.10-0ubuntu20
  Architecture: amd64
  Date: Thu Feb 14 11:00:45 2019
  ExecutablePath: /usr/bin/qemu-system-x86_64
  KvmCmdLine: COMMAND STAT  EUID  RUID   PID  PPID %CPU COMMAND
  MachineType: Dell Inc. Precision T3610
  ProcEnviron: PATH=(custom, user)
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.19.0-12-generic 
root=UUID=939b509b-d627-4642-a655-979b44972d17 ro splash quiet vt.handoff=1
  Signal: 31
  SourcePackage: qemu
  StacktraceTop:
   __pthread_setaffinity_new (th=, cpusetsize=128, 
cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
   () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
   () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
   start_thread (arg=) at pthread_create.c:486
   clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  Title: qemu-system-x86_64 crashed with signal 31 in 
__pthread_setaffinity_new()
  UpgradeStatus: Upgraded to disco on 2018-11-14 (91 days ago)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo video
  dmi.bios.date: 11/14/2018
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: A18
  dmi.board.name: 09M8Y8
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A01
  dmi.chassis.type: 7
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvrA18:bd11/14/2018:svnDellInc.:pnPrecisionT3610:pvr00:rvnDellInc.:rn09M8Y8:rvrA01:cvnDellInc.:ct7:cvr:
  dmi.product.name: Precision T3610
  dmi.product.sku: 05D2
  dmi.product.version: 00
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/mesa/+bug/1815889/+subscriptions

Re: [Qemu-devel] [PATCH] sockets: Fix stringop-truncation warning

2019-04-03 Thread Daniel P . Berrangé

On Wed, Apr 03, 2019 at 02:16:20PM +0200, Philippe Mathieu-Daudé wrote:
> Compiling with clang-8 fails with:
> 
> CC  util/qemu-sockets.o
>   util/qemu-sockets.c: In function 'unix_connect_saddr':
>   util/qemu-sockets.c:925:5: error: 'strncpy' specified bound 108 equals 
> destination size [-Werror=stringop-truncation]
>strncpy(un.sun_path, saddr->path, sizeof(un.sun_path));
>^~
>   util/qemu-sockets.c: In function 'unix_listen_saddr':
>   util/qemu-sockets.c:880:5: error: 'strncpy' specified bound 108 equals 
> destination size [-Werror=stringop-truncation]
>strncpy(un.sun_path, path, sizeof(un.sun_path));
>^~~
> 
> Per the unix socket manpage:
> 
>   UNIX(7)
> 
>   Pathname sockets
>   When binding a socket to a pathname, a few rules should be observed for 
> maximum portability and ease of coding:
>   *  The pathname in sun_path should be null-terminated.
>   *  The length of the pathname, including the terminating null byte, should 
> not exceed the size of sun_path.
> 
> Reduce the length of the unix socket path by 1 to hold the NUL byte.

Note it just says "should", not "must" here. IOW, there is no requirement
to NUL terminate and so we should not artifically require that at QEMU
level either. If mgmt apps want to have NUL termination then they can
just pass a shorter path to QEMU to start with.

I've proposed the fix for the warning you mention here:

  https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg07759.html


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH v2] Replace calls to object_child_foreach() with object_child_foreach_recursive()

2019-04-03 Thread Stefan Hajnoczi

On Mon, Apr 01, 2019 at 04:40:28PM +0100, Ernest Esene wrote:
> Replace calls to object_child_foreach() with object_child_foreach_recursive()
> when applicable: nvdimm_device_list, nmi_monitor_handle,
> find_sysbus_device,
> pc_dimm_slot2bitmap, build_dimm_list.
> 
> Signed-off-by: Ernest Esene 
> 
> ---
> v2:
>   * applied changes suggested by Paolo
> ---
>  hw/acpi/nvdimm.c   |  4 +---
>  hw/core/sysbus.c   | 11 ---
>  hw/mem/pc-dimm.c   |  3 +--
>  hw/virtio/virtio-balloon.c |  3 +--
>  4 files changed, 7 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3] s390: diagnose 318 info reset and migration support

2019-04-03 Thread Cornelia Huck

On Wed, 3 Apr 2019 13:46:07 +0200
David Hildenbrand  wrote:

> On 01.04.19 23:48, Collin Walling wrote:
> > DIAGNOSE 0x318 (diag318) is a privileged s390x instruction that must
> > be intercepted by SIE and handled via KVM. Let's introduce some
> > functions to communicate between QEMU and KVM via ioctls. These
> > will be used to get/set the diag318 related information (also known
> > as the "Control Program Code" or "CPC"), as well as check the system
> > if KVM supports handling this instruction.
> > 
> > Diag318 must also be reset on a load normal and modified clear, so
> > we use the set function (wrapped in a reset function) to explicitly
> > set the diag318 info to 0 for these cases.
> > 
> > Lastly, we want to ensure the diag318 info is migrated. The diag318
> > info migration is handled via a VMStateDescription. This feature is
> > only supported on QEMU machines 4.0 and later.  
> 
> This has to become 4.1
> 
> > 
> > Signed-off-by: Collin Walling 
> > ---
> > 
> > This version is posted in tandem with a new kernel patch that changes
> > how the execution of the diag 0x318 instruction is handled. A link to
> > this series will be attached as a reply to this series for convenience.
> > 
> > Changelog:
> > 
> > v3
> > - removed CPU model code
> > - removed RSCPI and SCLP code
> > - reverted max cpus back to 248 (previous patches limited this
> > to 247)
> > - introduced VMStateDescription handlers for migration
> > - disabled migration of diag318 info for machines 3.1 and older
> > - a warning is printed if migration is disabled and KVM
> >   supports handling this instruction
> > 
> > ---
> >  hw/s390x/s390-virtio-ccw.c   |  6 
> >  linux-headers/asm-s390/kvm.h |  4 +++
> >  target/s390x/diag.c  | 63 
> >  target/s390x/internal.h  |  5 ++-
> >  target/s390x/kvm-stub.c  | 15 +
> >  target/s390x/kvm.c   | 32 ++
> >  target/s390x/kvm_s390x.h |  3 ++
> >  7 files changed, 127 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> > index d11069b860..2a50868496 100644
> > --- a/hw/s390x/s390-virtio-ccw.c
> > +++ b/hw/s390x/s390-virtio-ccw.c
> > @@ -36,6 +36,7 @@
> >  #include "cpu_models.h"
> >  #include "hw/nmi.h"
> >  #include "hw/s390x/tod.h"
> > +#include "internal.h"
> >  
> >  S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
> >  {
> > @@ -302,6 +303,8 @@ static void ccw_init(MachineState *machine)
> >  
> >  /* init the TOD clock */
> >  s390_init_tod();
> > +
> > +diag318_register_migration();
> >  }
> >  
> >  static void s390_cpu_plug(HotplugHandler *hotplug_dev,
> > @@ -352,6 +355,7 @@ static void s390_machine_reset(void)
> >  }
> >  subsystem_reset();
> >  s390_crypto_reset();
> > +diag318_reset();  
> 
> Shouldn't this go into subsystem_reset?
> 
> Aren't you missing resets during external/reipl resets?
> 
> Also, I was wondering if this would be worth creating a fake device like
> diag288. The resets can be handled similar to diag288. Resets during
> external/reipl reset would come natural.

I like the idea of adding a new device. Would also make the migration
code nicer (as you suggested below).

> >  
> >  static void ccw_machine_3_1_class_options(MachineClass *mc)
> > diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
> > index 0265482f8f..735f5a46e8 100644
> > --- a/linux-headers/asm-s390/kvm.h
> > +++ b/linux-headers/asm-s390/kvm.h

Updates of linux headers should always go into a separate patch that
can be replaced by a proper headers update when applying.

> > @@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
> >  #define KVM_S390_VM_CRYPTO 2
> >  #define KVM_S390_VM_CPU_MODEL  3
> >  #define KVM_S390_VM_MIGRATION  4
> > +#define KVM_S390_VM_MISC   5
> >  
> >  /* kvm attributes for mem_ctrl */
> >  #define KVM_S390_VM_MEM_ENABLE_CMMA0
> > @@ -168,6 +169,9 @@ struct kvm_s390_vm_cpu_subfunc {
> >  #define KVM_S390_VM_MIGRATION_START1
> >  #define KVM_S390_VM_MIGRATION_STATUS   2
> >  
> > +/* kvm attributes for KVM_S390_VM_MISC */
> > +#define KVM_S390_VM_MISC_CPC   0
> > +
> >  /* for KVM_GET_REGS and KVM_SET_REGS */
> >  struct kvm_regs {
> > /* general purpose regs for s390 */

Re: [Qemu-devel] VSOCK benchmark and optimizations

2019-04-03 Thread Stefan Hajnoczi

On Mon, Apr 01, 2019 at 06:32:40PM +0200, Stefano Garzarella wrote:
> Hi Alex,
> I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> and linux-netdev (maybe this info could be useful for others :))
> 
> One of the VSOCK advantages is the simple configuration: you don't need to set
> up IP addresses for guest/host, and it can be used with the standard POSIX
> socket API. [1]
> 
> I'm currently working on it, so the "optimized" values are still work in
> progress and I'll send the patches upstream (Linux) as soon as possible.
> (I hope in 1 or 2 weeks)
> 
> Optimizations:
> + reducing the number of credit update packets
>   - RX side sent, on every packet received, an empty packet only to inform the
> TX side about the space in the RX buffer.
> + increase RX buffers size to 64 KB (from 4 KB)
> + merge packets to fill RX buffers
> 
> As benchmark tool I used iperf3 [2] modified with VSOCK support:
> 
>  host -> guest [Gbps]  guest -> host [Gbps]
> pkt_sizebefore opt.  optimizedbefore opt.  optimized
>   1K0.5 1.6   1.4 1.4

This is a "large" small package size.  I think 64 bytes is a common
"small" packet size and is worth benchmarking too.

>   2K1.1 3.1   2.3 2.5
>   4K2.0 5.6   4.2 4.4
>   8K3.210.2   7.2 7.5
>   16K   6.414.2   9.411.3
>   32K   9.818.9   9.217.8
>   64K  13.822.9   8.825.0
>   128K 17.624.5   7.725.7
>   256K 19.024.8   8.125.6
>   512K 20.825.1   8.125.4

Nice improvements!

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] VSOCK benchmark and optimizations

2019-04-03 Thread Stefan Hajnoczi

On Tue, Apr 02, 2019 at 09:37:06AM +0200, Stefano Garzarella wrote:
> On Tue, Apr 02, 2019 at 04:19:25AM +, Alex Bennée wrote:
> > 
> > Stefano Garzarella  writes:
> > 
> > > Hi Alex,
> > > I'm sending you some benchmarks and information about VSOCK CCing 
> > > qemu-devel
> > > and linux-netdev (maybe this info could be useful for others :))
> > >
> > > One of the VSOCK advantages is the simple configuration: you don't need 
> > > to set
> > > up IP addresses for guest/host, and it can be used with the standard POSIX
> > > socket API. [1]
> > >
> > > I'm currently working on it, so the "optimized" values are still work in
> > > progress and I'll send the patches upstream (Linux) as soon as possible.
> > > (I hope in 1 or 2 weeks)
> > >
> > > Optimizations:
> > > + reducing the number of credit update packets
> > >   - RX side sent, on every packet received, an empty packet only to 
> > > inform the
> > > TX side about the space in the RX buffer.
> > > + increase RX buffers size to 64 KB (from 4 KB)
> > > + merge packets to fill RX buffers
> > >
> > > As benchmark tool I used iperf3 [2] modified with VSOCK support:
> > >
> > >  host -> guest [Gbps]  guest -> host [Gbps]
> > > pkt_sizebefore opt.  optimizedbefore opt.  optimized
> > >   1K0.5 1.6   1.4 1.4
> > >   2K1.1 3.1   2.3 2.5
> > >   4K2.0 5.6   4.2 4.4
> > >   8K3.210.2   7.2 7.5
> > >   16K   6.414.2   9.411.3
> > >   32K   9.818.9   9.217.8
> > >   64K  13.822.9   8.825.0
> > >   128K 17.624.5   7.725.7
> > >   256K 19.024.8   8.125.6
> > >   512K 20.825.1   8.125.4
> > >
> > >
> > > How to reproduce:
> > >
> > > host$ modprobe vhost_vsock
> > > host$ qemu-system-x86_64 ... -device vhost-vsock-pci,guest-cid=3
> > >   # Note: Guest CID should be >= 3
> > >   # (0, 1 are reserved and 2 identify the host)
> > >
> > > guest$ iperf3 --vsock -s
> > >
> > > host$ iperf3 --vsock -c 3 -l ${pkt_size}  # host -> guest
> > > host$ iperf3 --vsock -c 3 -l ${pkt_size} -R   # guest -> host
> > >
> > >
> > > If you want, I can do a similar benchmark (with iperf3) using a networking
> > > card (do you have a specific configuration?).
> > 
> > My main interest is how it stacks up against:
> > 
> >   --device virtio-net-pci and I guess the vhost equivalent
> 
> I'll do some tests with virtio-net and vhost.
> 
> > 
> > AIUI one of the motivators was being able to run something like NFS for
> > a guest FS over vsock instead of the overhead from UDP and having to
> > deal with the additional complication of having a working network setup.
> > 
> 
> CCing Stefan.
> 
> I know he is working on virtio-fs that maybe suite better with your use cases.
> He also worked on VSOCK support for NFS, but I think it is not merged 
> upstream.

Hi Alex,
David Gilbert, Vivek Goyal, Miklos Szeredi, and I are working on
virtio-fs for host<->guest file sharing.  It performs better than
virtio-9p and we're currently working on getting it upstream (first the
VIRTIO device spec, then Linux and QEMU patches).

You can read about it and try it here:

https://virtio-fs.gitlab.io/

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 2/5] virtio-pmem: Add virtio pmem driver

2019-04-03 Thread Pankaj Gupta



> Subject: Re: [Qemu-devel] [PATCH v4 2/5] virtio-pmem: Add virtio pmem driver
> 
> On Wed, Apr 03, 2019 at 04:10:15PM +0530, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta 
> > ---
> >  drivers/nvdimm/virtio_pmem.c |  84 +
> >  drivers/virtio/Kconfig   |  10 +++
> >  drivers/virtio/Makefile  |   1 +
> >  drivers/virtio/pmem.c| 125 +++
> >  include/linux/virtio_pmem.h  |  60 +++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 291 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index ..2a1b1ba2c1ff
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,84 @@
> > +// SPDX-License-Identifier: GPL-2.0
> 
> Is this comment stile (//) acceptable?

In existing code, i can see same comment
pattern for license at some places.

> 
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include 
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +   unsigned int len;
> > +   unsigned long flags;
> > +   struct virtio_pmem_request *req, *req_buf;
> > +   struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +   spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +   while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +   req->done = true;
> > +   wake_up(&req->host_acked);
> > +
> > +   if (!list_empty(&vpmem->req_list)) {
> > +   req_buf = list_first_entry(&vpmem->req_list,
> > +   struct virtio_pmem_request, list);
> > +   list_del(&vpmem->req_list);
> > +   req_buf->wq_buf_avail = true;
> > +   wake_up(&req_buf->wq_buf);
> > +   }
> > +   }
> > +   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +   int err;
> > +   unsigned long flags;
> > +   struct scatterlist *sgs[2], sg, ret;
> > +   struct virtio_device *vdev = nd_region->provider_data;
> > +   struct virtio_pmem *vpmem = vdev->priv;
> > +   struct virtio_pmem_request *req;
> > +
> > +   might_sleep();
> 
> [1]
> 
> > +   req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +   if (!req)
> > +   return -ENOMEM;
> > +
> > +   req->done = req->wq_buf_avail = false;
> > +   strcpy(req->name, "FLUSH");
> > +   init_waitqueue_head(&req->host_acked);
> > +   init_waitqueue_head(&req->wq_buf);
> > +   sg_init_one(&sg, req->name, strlen(req->name));
> > +   sgs[0] = &sg;
> > +   sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +   sgs[1] = &ret;
> > +
> > +   spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +   err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> 
> Is it okay to use GFP_ATOMIC in a might-sleep ([1]) function?

might sleep will give us a warning if we try to sleep from non-sleepable
context. 

We are doing it other way, i.e might_sleep is not inside GFP_ATOMIC. 

> 
> > +   if (err) {
> > +   dev_err(&vdev->dev, "failed to send command to virtio pmem 
> > device\n");
> > +
> > +   list_add_tail(&vpmem->req_list, &req->list);
> > +   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +   /* When host has read buffer, this completes via host_ack */
> > +   wait_event(req->wq_buf, req->wq_buf_avail);
> > +   spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +   }
> > +   virtqueue_kick(vpmem->req_vq);
> 
> You probably want to check return value here.

Don't think it will matter in this case?

> 
> > +   spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +   /* When host has read buffer, this completes via host_ack */
> > +   wait_event(req->host_acked, req->done);

[Qemu-devel] [PATCH] configure: Relax check for libseccomp

2019-04-03 Thread Helge Deller

On a non-release architecture, the configure program aborts if the
--enable-seccomp flag was given (with no way to work around it on the
command line):

ERROR: User requested feature libseccomp
configure was not able to find it.
libseccomp is not supported for host cpu parisc64

Instead of aborting, fall back to require libseccomp version 2.2.0
(or any higher version currently installed) which should be OK for
non-release architectures.

Signed-off-by: Helge Deller 

diff --git a/configure b/configure
index 1c563a7027..8632267049 100755
--- a/configure
+++ b/configure
@@ -2389,7 +2389,6 @@ if test "$seccomp" != "no" ; then
 libseccomp_minver="2.3.0"
 ;;
 *)
-libseccomp_minver=""
 ;;
 esac

Re: [Qemu-devel] [PATCH] sun4m: obey -vga none

2019-04-03 Thread Philippe Mathieu-Daudé

Hi Paolo,

On 3/19/19 3:40 PM, Paolo Bonzini wrote:
> Do not create a TCX if "-vga none" was passed on the command line.
> Remove some dead code along the way to avoid big reindentation.

Can you add:

This fixes when configuring with --without-default-devices:

  $ sparc-softmmu/qemu-system-sparc
  qemu-system-sparc: Unknown device 'SUNW,tcx' for default sysbus
  Aborted (core dumped)

  (gdb) bt
  #1  0x7fc78d17d895 in __GI_abort () at abort.c:79
  #2  0x560beaf637f3 in qdev_create (bus=bus@entry=0x0,
name=name@entry=0x560beb1be36b "SUNW,tcx") at hw/core/qdev.c:131
  #3  0x560beaf1392d in tcx_init (vram_size=1048576, width=1024,
height=768, depth=8, irq=0x560bed1a0230, addr=1342177280) at
hw/sparc/sun4m.c:477
  #4  0x560beaf1392d in sun4m_hw_init (hwdef=0x560beb1be780
, machine=0x560becf65f00) at hw/sparc/sun4m.c:943
  #5  0x560beaf6b15b in machine_run_board_init
(machine=0x560becf65f00) at hw/core/machine.c:1030
  #6  0x560beae86692 in main (argc=, argv=, envp=) at vl.c:4463

> Signed-off-by: Paolo Bonzini 

Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Philippe Mathieu-Daudé 

> ---
>  hw/sparc/sun4m.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
> index ca1e3825d5..07d126aea8 100644
> --- a/hw/sparc/sun4m.c
> +++ b/hw/sparc/sun4m.c
> @@ -850,7 +850,6 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
>  uint32_t initrd_size;
>  DriveInfo *fd[MAX_FD];
>  FWCfgState *fw_cfg;
> -unsigned int num_vsimms;
>  DeviceState *dev;
>  SysBusDevice *s;
>  
> @@ -909,8 +908,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
>  error_report("Unsupported depth: %d", graphic_depth);
>  exit (1);
>  }
> -num_vsimms = 0;
> -if (num_vsimms == 0) {
> +if (vga_interface_type != VGA_NONE) {
>  if (vga_interface_type == VGA_CG3) {
>  if (graphic_depth != 8) {
>  error_report("Unsupported depth: %d", graphic_depth);
> @@ -945,7 +943,7 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
>  }
>  }
>  
> -for (i = num_vsimms; i < MAX_VSIMMS; i++) {
> +for (i = 0; i < MAX_VSIMMS; i++) {
>  /* vsimm registers probed by OBP */
>  if (hwdef->vsimm[i].reg_base) {
>  empty_slot_init(hwdef->vsimm[i].reg_base, 0x2000);
>

[Qemu-devel] [PATCH 3/5] target/mips: replace indentation with space to fix checkpatch errors

2019-04-03 Thread Jules Irenge

Replace indentation with space to fix errors issued by checkpatch.pl tool
"ERROR: code indent should never use tabs"
 within "target/mips/cpu.h" file.

Signed-off-by: Jules Irenge 
---
 target/mips/cpu.h | 136 +++---
 1 file changed, 68 insertions(+), 68 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index bfa595c8a9..c4278b3ffe 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -96,25 +96,25 @@ struct CPUMIPSFPUContext {
 typedef struct CPUMIPSMVPContext CPUMIPSMVPContext;
 struct CPUMIPSMVPContext {
 int32_t CP0_MVPControl;
-#define CP0MVPCo_CPA   3
-#define CP0MVPCo_STLB  2
-#define CP0MVPCo_VPC   1
-#define CP0MVPCo_EVP   0
+#define CP0MVPCo_CPA 3
+#define CP0MVPCo_STLB2
+#define CP0MVPCo_VPC 1
+#define CP0MVPCo_EVP 0
 int32_t CP0_MVPConf0;
-#define CP0MVPC0_M 31
-#define CP0MVPC0_TLBS  29
-#define CP0MVPC0_GS28
-#define CP0MVPC0_PCP   27
-#define CP0MVPC0_PTLBE 16
-#define CP0MVPC0_TCA   15
-#define CP0MVPC0_PVPE  10
-#define CP0MVPC0_PTC   0
+#define CP0MVPC0_M  31
+#define CP0MVPC0_TLBS   29
+#define CP0MVPC0_GS 28
+#define CP0MVPC0_PCP27
+#define CP0MVPC0_PTLBE  16
+#define CP0MVPC0_TCA15
+#define CP0MVPC0_PVPE   10
+#define CP0MVPC0_PTC0
 int32_t CP0_MVPConf1;
-#define CP0MVPC1_CIM   31
-#define CP0MVPC1_CIF   30
-#define CP0MVPC1_PCX   20
-#define CP0MVPC1_PCP2  10
-#define CP0MVPC1_PCP1  0
+#define CP0MVPC1_CIM31
+#define CP0MVPC1_CIF30
+#define CP0MVPC1_PCX20
+#define CP0MVPC1_PCP2   10
+#define CP0MVPC1_PCP1   0
 };
 
 typedef struct mips_def_t mips_def_t;
@@ -482,44 +482,44 @@ struct CPUMIPSState {
  */
 int32_t CP0_Random;
 int32_t CP0_VPEControl;
-#define CP0VPECo_YSI   21
-#define CP0VPECo_GSI   20
-#define CP0VPECo_EXCPT 16
-#define CP0VPECo_TE15
-#define CP0VPECo_TargTC0
+#define CP0VPECo_YSI21
+#define CP0VPECo_GSI20
+#define CP0VPECo_EXCPT  16
+#define CP0VPECo_TE 15
+#define CP0VPECo_TargTC 0
 int32_t CP0_VPEConf0;
-#define CP0VPEC0_M 31
-#define CP0VPEC0_XTC   21
-#define CP0VPEC0_TCS   19
-#define CP0VPEC0_SCS   18
-#define CP0VPEC0_DSC   17
-#define CP0VPEC0_ICS   16
-#define CP0VPEC0_MVP   1
-#define CP0VPEC0_VPA   0
+#define CP0VPEC0_M  31
+#define CP0VPEC0_XTC21
+#define CP0VPEC0_TCS19
+#define CP0VPEC0_SCS18
+#define CP0VPEC0_DSC17
+#define CP0VPEC0_ICS16
+#define CP0VPEC0_MVP1
+#define CP0VPEC0_VPA0
 int32_t CP0_VPEConf1;
-#define CP0VPEC1_NCX   20
-#define CP0VPEC1_NCP2  10
-#define CP0VPEC1_NCP1  0
+#define CP0VPEC1_NCX20
+#define CP0VPEC1_NCP2   10
+#define CP0VPEC1_NCP1   0
 target_ulong CP0_YQMask;
 target_ulong CP0_VPESchedule;
 target_ulong CP0_VPEScheFBack;
 int32_t CP0_VPEOpt;
-#define CP0VPEOpt_IWX7 15
-#define CP0VPEOpt_IWX6 14
-#define CP0VPEOpt_IWX5 13
-#define CP0VPEOpt_IWX4 12
-#define CP0VPEOpt_IWX3 11
-#define CP0VPEOpt_IWX2 10
-#define CP0VPEOpt_IWX1 9
-#define CP0VPEOpt_IWX0 8
-#define CP0VPEOpt_DWX7 7
-#define CP0VPEOpt_DWX6 6
-#define CP0VPEOpt_DWX5 5
-#define CP0VPEOpt_DWX4 4
-#define CP0VPEOpt_DWX3 3
-#define CP0VPEOpt_DWX2 2
-#define CP0VPEOpt_DWX1 1
-#define CP0VPEOpt_DWX0 0
+#define CP0VPEOpt_IWX7  15
+#define CP0VPEOpt_IWX6  14
+#define CP0VPEOpt_IWX5  13
+#define CP0VPEOpt_IWX4  12
+#define CP0VPEOpt_IWX3  11
+#define CP0VPEOpt_IWX2  10
+#define CP0VPEOpt_IWX1  9
+#define CP0VPEOpt_IWX0  8
+#define CP0VPEOpt_DWX7  7
+#define CP0VPEOpt_DWX6  6
+#define CP0VPEOpt_DWX5  5
+#define CP0VPEOpt_DWX4  4
+#define CP0VPEOpt_DWX3  3
+#define CP0VPEOpt_DWX2  2
+#define CP0VPEOpt_DWX1  1
+#define CP0VPEOpt_DWX0  0
 /*
  * CP0 Register 2
  */
@@ -626,33 +626,33 @@ struct CPUMIPSState {
 #define CP0PC_PSN   0 /*  5..0  */
 int32_t CP0_SRSConf0_rw_bitmask;
 int32_t CP0_SRSConf0;
-#define CP0SRSC0_M 31
-#define CP0SRSC0_SRS3  20
-#define CP0SRSC0_SRS2  10
-#define CP0SRSC0_SRS1  0
+#define CP0SRSC0_M 31
+#define CP0SRSC0_SRS3  20
+#define CP0SRSC0_SRS2  10
+#define CP0SRSC0_SRS1  0
 int32_t CP0_SRSConf1_rw_bitmask;
 int32_t CP0_SRSConf1;
-#define CP0SRSC1_M 31
-#define CP0SRSC1_SRS6  20
-#define CP0SRSC1_SRS5  10
-#define CP0SRSC1_SRS4  0
+#define CP0SRSC1_M 31
+#define CP0SRSC1_SRS6  20
+#define CP0SRSC1_SRS5  10
+#define CP0SRSC1_SRS4  0
 int32_t CP0_SRSConf2_rw_bitmask;
 int32_t CP0_SRSConf2;
-#define CP0SRSC2_M 31
-#define CP0SRSC2_SRS9  20
-#define CP0SRSC2_SRS8  10
-#define CP0SRSC2_SRS7  0
+#define CP0SRSC2_M 31
+#define CP0SRSC2_SRS9  20
+#define CP0SRSC2_SRS8  10
+#define CP0SRSC2_SRS7  0
 int32_t CP0_SRSConf3_rw_bitmask;
 int32_t CP0_SRSConf3;
-#define CP0SRSC3_M 31
-#define CP0SRSC3_SRS12 20
-#define CP0SRSC3_SRS11 10
-#define CP0SRSC3_SRS10 0
+#define CP0SRSC3_M 31
+#define CP0SRSC3_SRS12 20
+#define CP0SRSC3_SRS11 10
+#define CP0SRSC3_SRS10 0
 int32_t CP0_SRSConf4_rw_bitmask;
 int32_t CP0_SRSConf4;
-#define CP0SRSC4_SRS15 20
-#define CP0SRSC4_SRS14 1

[Qemu-devel] [PATCH 4/5] target/mips: remove space to fix checkpatch errors

2019-04-03 Thread Jules Irenge

Remove space to fix errors issued by checkpatch.pl tool
"ERROR: space prohibited between function name and open parenthesis"
"ERROR: trailing white space"
 within "target/mips/cpu.h" file.

Signed-off-by: Jules Irenge 
---
 target/mips/cpu.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index c4278b3ffe..238a67c405 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -992,7 +992,7 @@ struct CPUMIPSState {
  * If translation is interrupted between the branch instruction and
  * the delay slot, record what type of branch it is so that we can
  * resume translation properly.  It might be possible to reduce
- * this from three bits to two.  
+ * this from three bits to two.
  */
 #define MIPS_HFLAG_BMASK_BASE  0x803800
 #define MIPS_HFLAG_B  0x00800 /* Unconditional branch   */
@@ -1072,7 +1072,7 @@ static inline MIPSCPU *mips_env_get_cpu(CPUMIPSState *env)
 
 #define ENV_OFFSET offsetof(MIPSCPU, env)
 
-void mips_cpu_list (FILE *f, fprintf_function cpu_fprintf);
+void mips_cpu_list(FILE *f, fprintf_function cpu_fprintf);
 
 #define cpu_signal_handler cpu_mips_signal_handler
 #define cpu_list mips_cpu_list
@@ -1099,14 +1099,14 @@ static inline int hflags_mmu_index(uint32_t hflags)
 }
 }
 
-static inline int cpu_mmu_index (CPUMIPSState *env, bool ifetch)
+static inline int cpu_mmu_index(CPUMIPSState *env, bool ifetch)
 {
 return hflags_mmu_index(env->hflags);
 }
 
 #include "exec/cpu-all.h"
 
-/* 
+/*
  * Memory access type :
  * may be needed for precise access rights control and precise exceptions.
  */
@@ -1192,7 +1192,7 @@ void cpu_mips_soft_irq(CPUMIPSState *env, int irq, int 
level);
 void itc_reconfigure(struct MIPSITUState *tag);
 
 /* helper.c */
-target_ulong exception_resume_pc (CPUMIPSState *env);
+target_ulong exception_resume_pc(CPUMIPSState *env);
 
 static inline void restore_snan_bit_mode(CPUMIPSState *env)
 {
-- 
2.20.1

[Qemu-devel] [PATCH 0/5] target/mips/cpu: errors and warnings coding style cleanup

2019-04-03 Thread Jules Irenge

This v1 series cleans up all warnings and errors of coding style within cpu.h
file

Jules Irenge (5):
  target/mips: add space to fix checkpatch errors
  target/mips: realign comments to fix checkpatch warnings
  target/mips: replace indentation with space to fix checkpatch errors
  target/mips: remove space to fix checkpatch errors
  target/mips: wrap line into multiple lines to to fix checkpatch errors

 target/mips/cpu.h | 211 +-
 1 file changed, 117 insertions(+), 94 deletions(-)

-- 
2.20.1

[Qemu-devel] [PATCH 1/5] target/mips: add space to fix checkpatch errors

2019-04-03 Thread Jules Irenge

Add space to fix errors reported by checkpatch.pl tool
"ERROR: spaces required around that ..."
"ERROR: space required before the open parenthesis"
"ERROR: space required after that ..."

Signed-off-by: Jules Irenge 
---
 target/mips/cpu.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index a10eeb0de3..2429fe80ac 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -22,10 +22,10 @@ typedef struct CPUMIPSTLBContext CPUMIPSTLBContext;
 
 typedef union wr_t wr_t;
 union wr_t {
-int8_t  b[MSA_WRLEN/8];
-int16_t h[MSA_WRLEN/16];
-int32_t w[MSA_WRLEN/32];
-int64_t d[MSA_WRLEN/64];
+int8_t  b[MSA_WRLEN / 8];
+int16_t h[MSA_WRLEN / 16];
+int32_t w[MSA_WRLEN / 32];
+int64_t d[MSA_WRLEN / 64];
 };
 
 typedef union fpr_t fpr_t;
@@ -71,16 +71,16 @@ struct CPUMIPSFPUContext {
 #define FCR31_FS 24
 #define FCR31_ABS2008 19
 #define FCR31_NAN2008 18
-#define SET_FP_COND(num,env) do { ((env).fcr31) |= ((num) ? (1 << ((num) + 
24)) : (1 << 23)); } while(0)
-#define CLEAR_FP_COND(num,env)   do { ((env).fcr31) &= ~((num) ? (1 << ((num) 
+ 24)) : (1 << 23)); } while(0)
+#define SET_FP_COND(num, env)do { ((env).fcr31) |= ((num) ? (1 << ((num) + 
24)) : (1 << 23)); } while (0)
+#define CLEAR_FP_COND(num, env)   do { ((env).fcr31) &= ~((num) ? (1 << ((num) 
+ 24)) : (1 << 23)); } while (0)
 #define GET_FP_COND(env) env).fcr31 >> 24) & 0xfe) | (((env).fcr31 
>> 23) & 0x1))
 #define GET_FP_CAUSE(reg)(((reg) >> 12) & 0x3f)
 #define GET_FP_ENABLE(reg)   (((reg) >>  7) & 0x1f)
 #define GET_FP_FLAGS(reg)(((reg) >>  2) & 0x1f)
-#define SET_FP_CAUSE(reg,v)  do { (reg) = ((reg) & ~(0x3f << 12)) | ((v & 
0x3f) << 12); } while(0)
-#define SET_FP_ENABLE(reg,v) do { (reg) = ((reg) & ~(0x1f <<  7)) | ((v & 
0x1f) << 7); } while(0)
-#define SET_FP_FLAGS(reg,v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | ((v & 
0x1f) << 2); } while(0)
-#define UPDATE_FP_FLAGS(reg,v)   do { (reg) |= ((v & 0x1f) << 2); } while(0)
+#define SET_FP_CAUSE(reg, v)  do { (reg) = ((reg) & ~(0x3f << 12)) | ((v & 
0x3f) << 12); } while (0)
+#define SET_FP_ENABLE(reg, v) do { (reg) = ((reg) & ~(0x1f <<  7)) | ((v & 
0x1f) << 7); } while (0)
+#define SET_FP_FLAGS(reg, v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | ((v & 
0x1f) << 2); } while (0)
+#define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 2); } while (0)
 #define FP_INEXACT1
 #define FP_UNDERFLOW  2
 #define FP_OVERFLOW   4
-- 
2.20.1

[Qemu-devel] [PATCH 2/5] target/mips: realign comments to fix checkpatch warnings

2019-04-03 Thread Jules Irenge

Realign comments to fix warnings issued by checkpatch.pl tool
"WARNING: Block comments use a leading /* on a separate line"
 within "target/mips/cpu.h" file.

Signed-off-by: Jules Irenge 
---
 target/mips/cpu.h | 34 ++
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index 2429fe80ac..bfa595c8a9 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -37,7 +37,8 @@ union fpr_t {
 /* FPU/MSA register mapping is not tested on big-endian hosts. */
 wr_t wr;   /* vector data */
 };
-/* define FP_ENDIAN_IDX to access the same location
+/*
+ * define FP_ENDIAN_IDX to access the same location
  * in the fpr_t union regardless of the host endianness
  */
 #if defined(HOST_WORDS_BIGENDIAN)
@@ -963,9 +964,11 @@ struct CPUMIPSState {
 /* TMASK defines different execution modes */
 #define MIPS_HFLAG_TMASK  0x1F5807FF
 #define MIPS_HFLAG_MODE   0x7 /* execution modes*/
-/* The KSU flags must be the lowest bits in hflags. The flag order
-   must be the same as defined for CP0 Status. This allows to use
-   the bits as the value of mmu_idx. */
+/*
+ * The KSU flags must be the lowest bits in hflags. The flag order
+ * must be the same as defined for CP0 Status. This allows to use
+ * the bits as the value of mmu_idx.
+ */
 #define MIPS_HFLAG_KSU0x3 /* kernel/supervisor/user mode mask   */
 #define MIPS_HFLAG_UM 0x2 /* user mode flag */
 #define MIPS_HFLAG_SM 0x1 /* supervisor mode flag   */
@@ -975,18 +978,22 @@ struct CPUMIPSState {
 #define MIPS_HFLAG_CP00x00010 /* CP0 enabled*/
 #define MIPS_HFLAG_FPU0x00020 /* FPU enabled*/
 #define MIPS_HFLAG_F640x00040 /* 64-bit FPU enabled */
-/* True if the MIPS IV COP1X instructions can be used.  This also
-   controls the non-COP1X instructions RECIP.S, RECIP.D, RSQRT.S
-   and RSQRT.D.  */
+/*
+ * True if the MIPS IV COP1X instructions can be used.  This also
+ * controls the non-COP1X instructions RECIP.S, RECIP.D, RSQRT.S
+ * and RSQRT.D.
+ */
 #define MIPS_HFLAG_COP1X  0x00080 /* COP1X instructions enabled */
 #define MIPS_HFLAG_RE 0x00100 /* Reversed endianness*/
 #define MIPS_HFLAG_AWRAP  0x00200 /* 32-bit compatibility address wrapping */
 #define MIPS_HFLAG_M160x00400 /* MIPS16 mode flag   */
 #define MIPS_HFLAG_M16_SHIFT 10
-/* If translation is interrupted between the branch instruction and
+/*
+ * If translation is interrupted between the branch instruction and
  * the delay slot, record what type of branch it is so that we can
  * resume translation properly.  It might be possible to reduce
- * this from three bits to two.  */
+ * this from three bits to two.  
+ */
 #define MIPS_HFLAG_BMASK_BASE  0x803800
 #define MIPS_HFLAG_B  0x00800 /* Unconditional branch   */
 #define MIPS_HFLAG_BC 0x01000 /* Conditional branch */
@@ -1073,8 +1080,10 @@ void mips_cpu_list (FILE *f, fprintf_function 
cpu_fprintf);
 extern void cpu_wrdsp(uint32_t rs, uint32_t mask_num, CPUMIPSState *env);
 extern uint32_t cpu_rddsp(uint32_t mask_num, CPUMIPSState *env);
 
-/* MMU modes definitions. We carefully match the indices with our
-   hflags layout. */
+/*
+ * MMU modes definitions. We carefully match the indices with our
+ * hflags layout.
+ */
 #define MMU_MODE0_SUFFIX _kernel
 #define MMU_MODE1_SUFFIX _super
 #define MMU_MODE2_SUFFIX _user
@@ -1097,7 +1106,8 @@ static inline int cpu_mmu_index (CPUMIPSState *env, bool 
ifetch)
 
 #include "exec/cpu-all.h"
 
-/* Memory access type :
+/* 
+ * Memory access type :
  * may be needed for precise access rights control and precise exceptions.
  */
 enum {
-- 
2.20.1

[Qemu-devel] [PATCH 5/5] target/mips: wrap line into multiple lines to to fix checkpatch errors

2019-04-03 Thread Jules Irenge

Wrap line into multiple lines  to fix errors issued by checkpatch.pl tool
ERROR: line over 90 characters"
 within "target/mips/cpu.h" file.

Signed-off-by: Jules Irenge 
---
 target/mips/cpu.h | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/target/mips/cpu.h b/target/mips/cpu.h
index 238a67c405..56b0105574 100644
--- a/target/mips/cpu.h
+++ b/target/mips/cpu.h
@@ -72,15 +72,28 @@ struct CPUMIPSFPUContext {
 #define FCR31_FS 24
 #define FCR31_ABS2008 19
 #define FCR31_NAN2008 18
-#define SET_FP_COND(num, env)do { ((env).fcr31) |= ((num) ? (1 << ((num) + 
24)) : (1 << 23)); } while (0)
-#define CLEAR_FP_COND(num, env)   do { ((env).fcr31) &= ~((num) ? (1 << ((num) 
+ 24)) : (1 << 23)); } while (0)
-#define GET_FP_COND(env) env).fcr31 >> 24) & 0xfe) | (((env).fcr31 
>> 23) & 0x1))
+#define SET_FP_COND(num, env)do { ((env).fcr31) |=  \
+((num) ? (1 << ((num) + 24)) : \
+ (1 << 23)); \
+} while (0)
+#define CLEAR_FP_COND(num, env)   do { ((env).fcr31) &= \
+~((num) ? (1 << ((num) + 24)) : \
+ (1 << 23)); \
+ } while (0)
+#define GET_FP_COND(env) env).fcr31 >> 24) & 0xfe) | \
+  (((env).fcr31 >> 23) & 0x1))
 #define GET_FP_CAUSE(reg)(((reg) >> 12) & 0x3f)
 #define GET_FP_ENABLE(reg)   (((reg) >>  7) & 0x1f)
 #define GET_FP_FLAGS(reg)(((reg) >>  2) & 0x1f)
-#define SET_FP_CAUSE(reg, v)  do { (reg) = ((reg) & ~(0x3f << 12)) | ((v & 
0x3f) << 12); } while (0)
-#define SET_FP_ENABLE(reg, v) do { (reg) = ((reg) & ~(0x1f <<  7)) | ((v & 
0x1f) << 7); } while (0)
-#define SET_FP_FLAGS(reg, v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | ((v & 
0x1f) << 2); } while (0)
+#define SET_FP_CAUSE(reg, v)  do { (reg) = ((reg) & ~(0x3f << 12)) | \
+   ((v & 0x3f) << 12); \
+ } while (0)
+#define SET_FP_ENABLE(reg, v) do { (reg) = ((reg) & ~(0x1f <<  7)) | \
+   ((v & 0x1f) << 7);  \
+ } while (0)
+#define SET_FP_FLAGS(reg, v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | \
+   ((v & 0x1f) << 2); \
+ } while (0)
 #define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 2); } while (0)
 #define FP_INEXACT1
 #define FP_UNDERFLOW  2
-- 
2.20.1

Re: [Qemu-devel] [PATCH 5/7] block: Fix BDRV_BLOCK_RAW status to honor alignment

2019-04-03 Thread Kevin Wolf

Am 03.04.2019 um 05:05 hat Eric Blake geschrieben:
> Previous patches mentioned how the blkdebug filter driver demonstrates
> a bug present in our NBD server; the bug is also present with the raw
> format driver when probing occurs. Basically, if we specify a
> particular alignment > 1, but defer the actual block status to the
> underlying file, and the underlying file has a smaller alignment, then
> the use of BDRV_BLOCK_RAW to defer to the underlying file can end up
> with status split at an alignment unacceptable to our layer.  Many
> callers don't care, but NBD does - it is a violation of the NBD
> protocol to send status or read results split on an unaligned boundary
> (we've taught our client to be tolerant of such violations because of
> qemu 3.1; but we still need to fix our server for the sake of other
> stricter clients).
> 
> This patch lays the groundwork - it adjusts bdrv_block_status to
> ensure that any time one layer defers to another via BDRV_BLOCK_RAW,
> the deferral is either truncated down to an aligned boundary, or
> multiple sub-aligned requests are coalesced into a single
> representative answer (using an implicit hole beyond EOF as
> needed). Iotest 241 exposes the effect (when format probing occurred,
> we don't want block status to subdivide the initial sector, and thus
> not any other sector either). Similarly, iotest 221 is a good
> candidate to amend to specifically track the effects; a change to a
> hole at EOF is not visible if an upper layer does not want alignment
> smaller than 512. However, this patch alone is not a complete fix - it
> is still possible to have an active layer with large alignment
> constraints backed by another layer with smaller constraints; so the
> next patch will complete the task.
> 
> Signed-off-by: Eric Blake 
> ---
>  block/io.c | 108 +++--
>  tests/qemu-iotests/221 |  10 
>  tests/qemu-iotests/221.out |   6 +++
>  tests/qemu-iotests/241.out |   3 +-
>  4 files changed, 122 insertions(+), 5 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index dfc153b8d82..936877d3745 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2021,6 +2021,97 @@ int coroutine_fn 
> bdrv_co_block_status_from_backing(BlockDriverState *bs,
>  return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
>  }
> 
> +static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
> + bool want_zero,
> + int64_t offset, int64_t bytes,
> + int64_t *pnum, int64_t *map,
> + BlockDriverState **file);
> +
> +/*
> + * Returns an aligned allocation status of the specified sectors.
> + * Wrapper around bdrv_co_block_status() which requires the initial
> + * offset and count to be aligned, and guarantees the result will also
> + * be aligned (even if it requires piecing together multiple
> + * sub-aligned queries into an appropriate coalesced answer).
> + */

I think this comment should specify the result of the operation when the
aligned region consists of multiple subregions with different status.
Probably something like this:

- BDRV_BLOCK_DATA is set if the flag is set for at least one subregion
- BDRV_BLOCK_ZERO is set if the flag is set for all subregions
- BDRV_BLOCK_OFFSET_VALID is set if the flag is set for all subregions,
  the provided offsets are contiguous and file is the same for all
  subregions.
- BDRV_BLOCK_ALLOCATED is never set here (the caller sets it)
- BDRV_BLOCK_EOF is set if the last subregion sets it; assert that it's
  not set for any other subregion
- BDRV_BLOCK_RAW is never set

- *map is only set if BDRV_BLOCK_OFFSET_VALID is set. It contains
  the offset of the first subregion then.

- *file is also only set if BDRV_BLOCK_OFFSET_VALID is set. It contains
  the *file of the subregions, which must be the same for all of them
  (otherwise, we wouldn't have set BDRV_BLOCK_OFFSET_VALID).

- *pnum: The sum of all subregions

> +static int coroutine_fn bdrv_co_block_status_aligned(BlockDriverState *bs,
> + uint32_t align,
> + bool want_zero,
> + int64_t offset,
> + int64_t bytes,
> + int64_t *pnum,
> + int64_t *map,
> + BlockDriverState **file)
> +{
> +int ret;
> +
> +assert(is_power_of_2(align) && QEMU_IS_ALIGNED(offset | bytes, align));
> +ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, 
> file);
> +if (ret < 0) {
> +return ret;
> +}
> +if (!*pnum) {
> +assert(!bytes || ret & BDRV_BLOCK_EOF);
> +return ret;
>

Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt"

2019-04-03 Thread Laszlo Ersek

On 04/03/19 11:49, Igor Mammedov wrote:
> On Tue, 2 Apr 2019 17:38:26 +0200
> Laszlo Ersek  wrote:
> 
>> On 04/02/19 17:29, Auger Eric wrote:
>>> Hi Laszlo,
>>>
>>> On 4/1/19 3:07 PM, Laszlo Ersek wrote:  
 On 03/29/19 14:56, Auger Eric wrote:  
> Hi Ard,
>
> On 3/29/19 2:14 PM, Ard Biesheuvel wrote:  
>> On Fri, 29 Mar 2019 at 14:12, Auger Eric  wrote:  
>>>
>>> Hi Shameer,
>>>
>>> On 3/29/19 10:59 AM, Shameerali Kolothum Thodi wrote:  

  
> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 29 March 2019 09:32
> To: Shameerali Kolothum Thodi ;
> qemu-devel@nongnu.org; qemu-...@nongnu.org; imamm...@redhat.com;
> peter.mayd...@linaro.org; shannon.zha...@gmail.com;
> sa...@linux.intel.com; sebastien.bo...@intel.com
> Cc: Linuxarm ; xuwei (O) ;
> Laszlo Ersek ; Ard Biesheuvel
> ; Leif Lindholm 
> Subject: Re: [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature 
> "fdt"
>
> Hi Shameer,
>
> [ + Laszlo, Ard, Leif ]
>
> On 3/21/19 11:47 AM, Shameer Kolothum wrote:  
>> This is to disable/enable populating DT nodes in case
>> any conflict with acpi tables. The default is "off".  
> The name of the option sounds misleading to me. Also we don't really
> know the scope of the disablement. At the moment this just aims to
> prevent the hotpluggable dt nodes from being added if we boot in ACPI 
> mode.
>  
>>
>> This will be used in subsequent patch where cold plug
>> device-memory support is added for DT boot.  
> I am concerned about the fact that in dt mode, by default, you won't 
> see
> any PCDIMM nodes.  
>>
>> If DT memory node support is added for cold-plugged device
>> memory, those memory will be visible to Guest kernel via
>> UEFI GetMemoryMap() and gets treated as early boot memory.  
> Don't we have an issue in UEFI then. Normally the SRAT indicates 
> whether
> the slots are hotpluggable or not. Shouldn't the UEFI code look at 
> this
> info.  

 Sorry I missed this part. Yes, that will be a more cleaner solution.

 Also, to be more clear on what happens,

 Guest ACPI boot with "fdt=on" ,

 From kernel log,

 [0.00] Early memory node ranges
 [0.00]   node   0: [mem 0x4000-0xbbf5]
 [0.00]   node   0: [mem 0xbbf6-0xbbff]
 [0.00]   node   0: [mem 0xbc00-0xbc02]
 [0.00]   node   0: [mem 0xbc03-0xbc36]
 [0.00]   node   0: [mem 0xbc37-0xbf64]
 [0.00]   node   0: [mem 0xbf65-0xbf6d]
 [0.00]   node   0: [mem 0xbf6e-0xbf6e]
 [0.00]   node   0: [mem 0xbf6f-0xbf80]
 [0.00]   node   0: [mem 0xbf81-0xbfff]
 [0.00]   node   0: [mem 0xc000-0x] 
  --> This is the hotpluggable memory node from DT.
 [0.00] Zeroed struct page in unavailable ranges: 1040 pages
 [0.00] Initmem setup node 0 [mem 
 0x4000-0x]


 Guest ACPI boot with "fdt=off" ,

 [0.00] Movable zone start for each node
 [0.00] Early memory node ranges
 [0.00]   node   0: [mem 0x4000-0xbbf5]
 [0.00]   node   0: [mem 0xbbf6-0xbbff]
 [0.00]   node   0: [mem 0xbc00-0xbc02]
 [0.00]   node   0: [mem 0xbc03-0xbc36]
 [0.00]   node   0: [mem 0xbc37-0xbf64]
 [0.00]   node   0: [mem 0xbf65-0xbf6d]
 [0.00]   node   0: [mem 0xbf6e-0xbf6e]
 [0.00]   node   0: [mem 0xbf6f-0xbf80]
 [0.00]   node   0: [mem 0xbf81-0xbfff]
 [0.00] Zeroed struct page in unavailable ranges: 1040 pages
 [0.00] Initmem setup node 0 [mem 
 0x4000-0xbfff

 The hotpluggable memory node is absent from early memory nodes here.  
>>>
>>> OK thank you for the example illustrating the concern.  

 As you said, it could be possible to detect this node using SRAT in 
 UEFI.  
>>>
>>> Let's wait for EDK2 experts on thi

Re: [Qemu-devel] [PATCH] sockets: Fix stringop-truncation warning

2019-04-03 Thread Philippe Mathieu-Daudé

On Wed, Apr 3, 2019 at 2:23 PM Daniel P. Berrangé  wrote:
> On Wed, Apr 03, 2019 at 02:16:20PM +0200, Philippe Mathieu-Daudé wrote:
> > Compiling with clang-8 fails with:
> >
> > CC  util/qemu-sockets.o
> >   util/qemu-sockets.c: In function 'unix_connect_saddr':
> >   util/qemu-sockets.c:925:5: error: 'strncpy' specified bound 108 equals 
> > destination size [-Werror=stringop-truncation]
> >strncpy(un.sun_path, saddr->path, sizeof(un.sun_path));
> >^~
> >   util/qemu-sockets.c: In function 'unix_listen_saddr':
> >   util/qemu-sockets.c:880:5: error: 'strncpy' specified bound 108 equals 
> > destination size [-Werror=stringop-truncation]
> >strncpy(un.sun_path, path, sizeof(un.sun_path));
> >^~~
> >
> > Per the unix socket manpage:
> >
> >   UNIX(7)
> >
> >   Pathname sockets
> >   When binding a socket to a pathname, a few rules should be observed for 
> > maximum portability and ease of coding:
> >   *  The pathname in sun_path should be null-terminated.
> >   *  The length of the pathname, including the terminating null byte, 
> > should not exceed the size of sun_path.
> >
> > Reduce the length of the unix socket path by 1 to hold the NUL byte.
>
> Note it just says "should", not "must" here. IOW, there is no requirement
> to NUL terminate and so we should not artifically require that at QEMU
> level either. If mgmt apps want to have NUL termination then they can
> just pass a shorter path to QEMU to start with.
>
> I've proposed the fix for the warning you mention here:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg07759.html

Oh I missed it, thanks for pointing it.

Regards,

Phil.

Re: [Qemu-devel] [PATCH v3 07/10] hw/arm/virt: Introduce opt-in feature "fdt"

2019-04-03 Thread Laszlo Ersek

On 04/03/19 14:10, Shameerali Kolothum Thodi wrote:

> So, the condition for hiding the hotpluggable memory nodes in question
> from the DT is:

>
>   (aarch64 && firmware_loaded && acpi_enabled)

>>> While UEFI has bindings for both 32-bit and 64-bit ARM, ACPI has a
>>> 64-bit-only binding for ARM. (And you can have UEFI without ACPI, but
>>> not the reverse, on ARM.) So if you run the 32-bit build of the
>>> ArmVirtQemu firmware, you get no ACPI at all; all you can rely on with
>>> the OS is the DT.
> 
> Just to confirm, does that mean with 32-bit build of the UEFI, the OS cannot
> boot with ACPI and uses DT only.

Indeed.

> So,
> 
> If ((aarch64 && firmware_loaded && acpi_enabled) {
>Hide_hotpluggable_memory_nodes()
> } else {
>Add_ hotpluggable_memory_nodes()
> }
> 
> should work for all cases?

Yes.

Here's what happens when any one of the subconditions evaluates to false:

- ARM32 has no ACPI bindings, so the guest kernel can only use DT.

- On AARCH64, if you don't "load the firmware" (= don't use UEFI), then
  there won't be an ACPI entry point for the OS to locate (the RSD PTR
  is defined by the ACPI spec in UEFI terms, for AARCH64). So the guest
  kernel can only use DT.

- When on AARCH64 and using UEFI, but asking QEMU not to generate ACPI
  content, the firmware will not install any ACPI tables, so the guest
  kernel can only use DT.

>>> This "bitness distinction" is implemented in the firmware already. If
>>> you hid the memory nodes from the DT under the condition
>>>
>>>   (!aarch64 && firmware_loaded && acpi_enabled)
>>>
>>> then the nodes would not be seen by the OS at all (because
>>> "acpi_enabled" is irrelevant for the 32-bit build of ArmVirtQemu, and
>>> all the OS can ever get is DT).
>>
>> It's getting tricky and I don't like a bit that we are trying to carter
>> 64 bit only UEFI build (or any other build) on QEMU side. Also Peter has
>> a valid about guessing on QEMU side (that's usually a source of problem
>> in the future).
> 
> If the above is correct(with 32-bit variant of UEFI, OS cannot have ACPI 
> boot),
> then do we really have the issue of memory becoming non hot-un-unpluggable?
> May be I am missing something. 

I think Igor and Peter dislike adding complex logic to QEMU that
reflects the behavior of a specific firmware. AIUI their objection isn't
that it wouldn't work, but that it's not the right thing to do, from a
design perspective.

Thanks,
Laszlo

Re: [Qemu-devel] [PATCH 0/5] target/mips/cpu: errors and warnings coding style cleanup

2019-04-03 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20190403125055.26564-1-jbi.oct...@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Message-id: 20190403125055.26564-1-jbi.oct...@gmail.com
Subject: [Qemu-devel] [PATCH 0/5] target/mips/cpu: errors and warnings coding 
style cleanup
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]patchew/20190319144013.26584-2-pbonz...@redhat.com 
-> patchew/20190319144013.26584-2-pbonz...@redhat.com
 t [tag update]patchew/20190401154028.GA10574@erokenlabserver -> 
patchew/20190401154028.GA10574@erokenlabserver
 * [new tag]   patchew/20190403125055.26564-1-jbi.oct...@gmail.com 
-> patchew/20190403125055.26564-1-jbi.oct...@gmail.com
Switched to a new branch 'test'
45a6212b4a target/mips: wrap line into multiple lines to to fix checkpatch 
errors
71f1867094 target/mips: remove space to fix checkpatch errors
68b4463c72 target/mips: replace indentation with space to fix checkpatch errors
b265859837 target/mips: realign comments to fix checkpatch warnings
fbf61f7591 target/mips: add space to fix checkpatch errors

=== OUTPUT BEGIN ===
1/5 Checking commit fbf61f759167 (target/mips: add space to fix checkpatch 
errors)
ERROR: line over 90 characters
#40: FILE: target/mips/cpu.h:74:
+#define SET_FP_COND(num, env)do { ((env).fcr31) |= ((num) ? (1 << ((num) + 
24)) : (1 << 23)); } while (0)

ERROR: line over 90 characters
#41: FILE: target/mips/cpu.h:75:
+#define CLEAR_FP_COND(num, env)   do { ((env).fcr31) &= ~((num) ? (1 << ((num) 
+ 24)) : (1 << 23)); } while (0)

ERROR: line over 90 characters
#50: FILE: target/mips/cpu.h:80:
+#define SET_FP_CAUSE(reg, v)  do { (reg) = ((reg) & ~(0x3f << 12)) | ((v & 
0x3f) << 12); } while (0)

ERROR: line over 90 characters
#51: FILE: target/mips/cpu.h:81:
+#define SET_FP_ENABLE(reg, v) do { (reg) = ((reg) & ~(0x1f <<  7)) | ((v & 
0x1f) << 7); } while (0)

ERROR: line over 90 characters
#52: FILE: target/mips/cpu.h:82:
+#define SET_FP_FLAGS(reg, v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | ((v & 
0x1f) << 2); } while (0)

total: 5 errors, 0 warnings, 36 lines checked

Patch 1/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

2/5 Checking commit b265859837f5 (target/mips: realign comments to fix 
checkpatch warnings)
ERROR: trailing whitespace
#66: FILE: target/mips/cpu.h:995:
+ * this from three bits to two.  $

ERROR: trailing whitespace
#89: FILE: target/mips/cpu.h:1109:
+/* $

total: 2 errors, 0 warnings, 71 lines checked

Patch 2/5 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

3/5 Checking commit 68b4463c727f (target/mips: replace indentation with space 
to fix checkpatch errors)
4/5 Checking commit 71f1867094cf (target/mips: remove space to fix checkpatch 
errors)
5/5 Checking commit 45a6212b4ad9 (target/mips: wrap line into multiple lines to 
to fix checkpatch errors)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20190403125055.26564-1-jbi.oct...@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH 5/7] block: Fix BDRV_BLOCK_RAW status to honor alignment

2019-04-03 Thread Eric Blake

On 4/3/19 8:03 AM, Kevin Wolf wrote:
> Am 03.04.2019 um 05:05 hat Eric Blake geschrieben:
>> Previous patches mentioned how the blkdebug filter driver demonstrates
>> a bug present in our NBD server; the bug is also present with the raw
>> format driver when probing occurs. Basically, if we specify a
>> particular alignment > 1, but defer the actual block status to the
>> underlying file, and the underlying file has a smaller alignment, then
>> the use of BDRV_BLOCK_RAW to defer to the underlying file can end up
>> with status split at an alignment unacceptable to our layer.  Many
>> callers don't care, but NBD does - it is a violation of the NBD
>> protocol to send status or read results split on an unaligned boundary
>> (we've taught our client to be tolerant of such violations because of
>> qemu 3.1; but we still need to fix our server for the sake of other
>> stricter clients).
>>
>> This patch lays the groundwork - it adjusts bdrv_block_status to
>> ensure that any time one layer defers to another via BDRV_BLOCK_RAW,
>> the deferral is either truncated down to an aligned boundary, or
>> multiple sub-aligned requests are coalesced into a single
>> representative answer (using an implicit hole beyond EOF as
>> needed). Iotest 241 exposes the effect (when format probing occurred,
>> we don't want block status to subdivide the initial sector, and thus
>> not any other sector either). Similarly, iotest 221 is a good
>> candidate to amend to specifically track the effects; a change to a
>> hole at EOF is not visible if an upper layer does not want alignment
>> smaller than 512. However, this patch alone is not a complete fix - it
>> is still possible to have an active layer with large alignment
>> constraints backed by another layer with smaller constraints; so the
>> next patch will complete the task.

I should probably update this text to call out that the next patch
introduces some additional mutual recursion (that is, this patch in
isolation may appear to have dead paths, as the caller
bdrv_co_block_status always passes non-NULL 'map' and 'file'; but the
next patch adjusts bdrv_co_block_status_above to call this instead of
directly into bdrv_co_block_status, hence exposing those paths).

>>
>> Signed-off-by: Eric Blake 
>> ---
>>  block/io.c | 108 +++--
>>  tests/qemu-iotests/221 |  10 
>>  tests/qemu-iotests/221.out |   6 +++
>>  tests/qemu-iotests/241.out |   3 +-
>>  4 files changed, 122 insertions(+), 5 deletions(-)
>>

>> +/*
>> + * Returns an aligned allocation status of the specified sectors.
>> + * Wrapper around bdrv_co_block_status() which requires the initial
>> + * offset and count to be aligned, and guarantees the result will also
>> + * be aligned (even if it requires piecing together multiple
>> + * sub-aligned queries into an appropriate coalesced answer).
>> + */
> 
> I think this comment should specify the result of the operation when the
> aligned region consists of multiple subregions with different status.

Good idea.

> Probably something like this:
> 
> - BDRV_BLOCK_DATA is set if the flag is set for at least one subregion
> - BDRV_BLOCK_ZERO is set if the flag is set for all subregions
> - BDRV_BLOCK_OFFSET_VALID is set if the flag is set for all subregions,
>   the provided offsets are contiguous and file is the same for all
>   subregions.

Correct.

> - BDRV_BLOCK_ALLOCATED is never set here (the caller sets it)

Not true, bdrv_co_block_status can set BDRV_BLOCK_ALLOCATED.

> - BDRV_BLOCK_EOF is set if the last subregion sets it; assert that it's
>   not set for any other subregion

With the additional caveat that status beyond BDRV_BLOCK_EOF up to the
alignment boundary is treated as an implicit hole.

> - BDRV_BLOCK_RAW is never set

That should be true.

> 
> - *map is only set if BDRV_BLOCK_OFFSET_VALID is set. It contains
>   the offset of the first subregion then.

Correct.  Since the subregions had to be contiguous, it is the correct
offset for the entire aligned region.

> 
> - *file is also only set if BDRV_BLOCK_OFFSET_VALID is set. It contains
>   the *file of the subregions, which must be the same for all of them
>   (otherwise, we wouldn't have set BDRV_BLOCK_OFFSET_VALID).
> 
> - *pnum: The sum of all subregions

And is guaranteed to be aligned, as well as being non-zero unless
'bytes' was 0 on input or if the entire status request is beyond EOF.

> 
>> +static int coroutine_fn bdrv_co_block_status_aligned(BlockDriverState *bs,
>> + uint32_t align,
>> + bool want_zero,
>> + int64_t offset,
>> + int64_t bytes,
>> + int64_t *pnum,
>> + int64_t *map,
>> +

Re: [Qemu-devel] [PATCH 1/5] target/mips: add space to fix checkpatch errors

2019-04-03 Thread Aleksandar Markovic

> From: Jules Irenge 
> Subject: [PATCH 1/5] target/mips: add space to fix checkpatch errors
> 
> Add space to fix errors reported by checkpatch.pl tool
> "ERROR: spaces required around that ..."
> "ERROR: space required before the open parenthesis"
> "ERROR: space required after that ..."
> 
> Signed-off-by: Jules Irenge 
> ---
>  target/mips/cpu.h | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
> 

Hello, Jules.

I appreciate this and all other patches in the series.

It looks that here you have additional types of errors
in the same code lines that you change in this patch
("line over 90 characters"), and this causes the script
checkpatch.pl to report errors for this very patch,
which is not allowed.

I think your best option is to blend (squash) existing
patches 1 and 5 onto a single patch.

> diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> index a10eeb0de3..2429fe80ac 100644
> --- a/target/mips/cpu.h
> +++ b/target/mips/cpu.h
> @@ -22,10 +22,10 @@ typedef struct CPUMIPSTLBContext CPUMIPSTLBContext;
> 
>  typedef union wr_t wr_t;
>  union wr_t {
> -int8_t  b[MSA_WRLEN/8];
> -int16_t h[MSA_WRLEN/16];
> -int32_t w[MSA_WRLEN/32];
> -int64_t d[MSA_WRLEN/64];
> +int8_t  b[MSA_WRLEN / 8];
> +int16_t h[MSA_WRLEN / 16];
> +int32_t w[MSA_WRLEN / 32];
> +int64_t d[MSA_WRLEN / 64];
>  };
> 
>  typedef union fpr_t fpr_t;
> @@ -71,16 +71,16 @@ struct CPUMIPSFPUContext {
>  #define FCR31_FS 24
>  #define FCR31_ABS2008 19
>  #define FCR31_NAN2008 18
> -#define SET_FP_COND(num,env) do { ((env).fcr31) |= ((num) ? (1 << ((num) 
> + 24)) : (1 << 23)); } while(0)
> -#define CLEAR_FP_COND(num,env)   do { ((env).fcr31) &= ~((num) ? (1 << 
> ((num) + 24)) : (1 << 23)); } while(0)
> +#define SET_FP_COND(num, env)do { ((env).fcr31) |= ((num) ? (1 << ((num) 
> + 24)) : (1 << 23)); } while (0)
> +#define CLEAR_FP_COND(num, env)   do { ((env).fcr31) &= ~((num) ? (1 << 
> ((num) + 24)) : (1 << 23)); } while (0)

There is a misalignment here for the last two lines, after
this patch is applied ("do" is misaligned). That should be
corrected.

Thanks a lot!
Aleksandar

>  #define GET_FP_COND(env) env).fcr31 >> 24) & 0xfe) | 
> (((env).fcr31 >> 23) & 0x1))
>  #define GET_FP_CAUSE(reg)(((reg) >> 12) & 0x3f)
>  #define GET_FP_ENABLE(reg)   (((reg) >>  7) & 0x1f)
>  #define GET_FP_FLAGS(reg)(((reg) >>  2) & 0x1f)
> -#define SET_FP_CAUSE(reg,v)  do { (reg) = ((reg) & ~(0x3f << 12)) | ((v 
> & 0x3f) << 12); } while(0)
> -#define SET_FP_ENABLE(reg,v) do { (reg) = ((reg) & ~(0x1f <<  7)) | ((v 
> & 0x1f) << 7); } while(0)
> -#define SET_FP_FLAGS(reg,v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | ((v 
> & 0x1f) << 2); } while(0)
> -#define UPDATE_FP_FLAGS(reg,v)   do { (reg) |= ((v & 0x1f) << 2); } while(0)
> +#define SET_FP_CAUSE(reg, v)  do { (reg) = ((reg) & ~(0x3f << 12)) | ((v 
> & 0x3f) << 12); } while (0)
> +#define SET_FP_ENABLE(reg, v) do { (reg) = ((reg) & ~(0x1f <<  7)) | ((v 
> & 0x1f) << 7); } while (0)
> +#define SET_FP_FLAGS(reg, v)  do { (reg) = ((reg) & ~(0x1f <<  2)) | ((v 
> & 0x1f) << 2); } while (0)
> +#define UPDATE_FP_FLAGS(reg, v)   do { (reg) |= ((v & 0x1f) << 2); } while 
> (0)
>  #define FP_INEXACT1
>  #define FP_UNDERFLOW  2
>  #define FP_OVERFLOW   4
> --
> 2.20.1

Re: [Qemu-devel] [PATCH 2/5] target/mips: realign comments to fix checkpatch warnings

2019-04-03 Thread Aleksandar Markovic

> From: Jules Irenge 
> Subject: [PATCH 2/5] target/mips: realign comments to fix checkpatch warnings
> 
> Realign comments to fix warnings issued by checkpatch.pl tool
> "WARNING: Block comments use a leading /* on a separate line"
>  within "target/mips/cpu.h" file.
> 
> Signed-off-by: Jules Irenge 
> ---
>  target/mips/cpu.h | 34 ++
>  1 file changed, 22 insertions(+), 12 deletions(-)
> 
> diff --git a/target/mips/cpu.h b/target/mips/cpu.h
> index 2429fe80ac..bfa595c8a9 100644
> --- a/target/mips/cpu.h
> +++ b/target/mips/cpu.h
> @@ -37,7 +37,8 @@ union fpr_t {
>  /* FPU/MSA register mapping is not tested on big-endian hosts. */
>  wr_t wr;   /* vector data */
>  };
> -/* define FP_ENDIAN_IDX to access the same location
> +/*
> + * define FP_ENDIAN_IDX to access the same location
>   * in the fpr_t union regardless of the host endianness
>   */
>  #if defined(HOST_WORDS_BIGENDIAN)
> @@ -963,9 +964,11 @@ struct CPUMIPSState {
>  /* TMASK defines different execution modes */
>  #define MIPS_HFLAG_TMASK  0x1F5807FF
>  #define MIPS_HFLAG_MODE   0x7 /* execution modes*/
> -/* The KSU flags must be the lowest bits in hflags. The flag order
> -   must be the same as defined for CP0 Status. This allows to use
> -   the bits as the value of mmu_idx. */
> +/*
> + * The KSU flags must be the lowest bits in hflags. The flag order
> + * must be the same as defined for CP0 Status. This allows to use
> + * the bits as the value of mmu_idx.
> + */
>  #define MIPS_HFLAG_KSU0x3 /* kernel/supervisor/user mode mask   */
>  #define MIPS_HFLAG_UM 0x2 /* user mode flag */
>  #define MIPS_HFLAG_SM 0x1 /* supervisor mode flag   */
> @@ -975,18 +978,22 @@ struct CPUMIPSState {
>  #define MIPS_HFLAG_CP00x00010 /* CP0 enabled*/
>  #define MIPS_HFLAG_FPU0x00020 /* FPU enabled*/
>  #define MIPS_HFLAG_F640x00040 /* 64-bit FPU enabled */
> -/* True if the MIPS IV COP1X instructions can be used.  This also
> -   controls the non-COP1X instructions RECIP.S, RECIP.D, RSQRT.S
> -   and RSQRT.D.  */
> +/*
> + * True if the MIPS IV COP1X instructions can be used.  This also
> + * controls the non-COP1X instructions RECIP.S, RECIP.D, RSQRT.S
> + * and RSQRT.D.
> + */
>  #define MIPS_HFLAG_COP1X  0x00080 /* COP1X instructions enabled */
>  #define MIPS_HFLAG_RE 0x00100 /* Reversed endianness*/
>  #define MIPS_HFLAG_AWRAP  0x00200 /* 32-bit compatibility address wrapping */
>  #define MIPS_HFLAG_M160x00400 /* MIPS16 mode flag   */
>  #define MIPS_HFLAG_M16_SHIFT 10
> -/* If translation is interrupted between the branch instruction and
> +/*
> + * If translation is interrupted between the branch instruction and
>   * the delay slot, record what type of branch it is so that we can
>   * resume translation properly.  It might be possible to reduce
> - * this from three bits to two.  */
> + * this from three bits to two.  

The last line contains two trailing spaces. Just delete them.

> + */
>  #define MIPS_HFLAG_BMASK_BASE  0x803800
>  #define MIPS_HFLAG_B  0x00800 /* Unconditional branch   */
>  #define MIPS_HFLAG_BC 0x01000 /* Conditional branch */
> @@ -1073,8 +1080,10 @@ void mips_cpu_list (FILE *f, fprintf_function 
> cpu_fprintf);
>  extern void cpu_wrdsp(uint32_t rs, uint32_t mask_num, CPUMIPSState *env);
>  extern uint32_t cpu_rddsp(uint32_t mask_num, CPUMIPSState *env);
> 
> -/* MMU modes definitions. We carefully match the indices with our
> -   hflags layout. */
> +/*
> + * MMU modes definitions. We carefully match the indices with our
> + * hflags layout.
> + */
>  #define MMU_MODE0_SUFFIX _kernel
>  #define MMU_MODE1_SUFFIX _super
>  #define MMU_MODE2_SUFFIX _user
> @@ -1097,7 +1106,8 @@ static inline int cpu_mmu_index (CPUMIPSState *env, 
> bool ifetch)
> 
>  #include "exec/cpu-all.h"
> 
> -/* Memory access type :
> +/* 

The last line contains a trailing space. Just remove it.

> + * Memory access type :
>   * may be needed for precise access rights control and precise exceptions.
>   */
>  enum {
> --
> 2.20.1
> 
> 

Thanks,
Aleksandar

Re: [Qemu-devel] [PATCH 3/5] target/mips: replace indentation with space to fix checkpatch errors

2019-04-03 Thread Aleksandar Markovic

> From: Jules Irenge 
> Subject: [PATCH 3/5] target/mips: replace indentation with space to fix 
> checkpatch errors
>
> Replace indentation with space to fix errors issued by checkpatch.pl tool
> "ERROR: code indent should never use tabs"
> within "target/mips/cpu.h" file.
> 
> Signed-off-by: Jules Irenge 
> ---

This patch is fine.

Reviewed-by: Aleksandar Markovic

Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory

2019-04-03 Thread Xiang Zheng

Hi Laszlo and Markus,

Thanks for your useful suggestions and comments! :)

On 2019/3/27 2:36, Markus Armbruster wrote:
> Laszlo Ersek  writes:
> 
>> On 03/26/19 17:39, Markus Armbruster wrote:
>>> Laszlo Ersek  writes:
>>
 With the dynamic sizing in QEMU (which, IIRC, I had originally
 introduced still in the 1MB times, due to the split between the
 executable and varstore parts), both the 1MB->2MB switch, and the
 2MB->4MB switch in the firmware caused zero pain in QEMU. And right now,
 4MB looks like a "sweet spot", with some elbow room left.
>>>
>>> Explicit configuration would've been exactly as painless.  Even with
>>> pflash sizes restricted to powers of two.
>>
>> I wrote the patch that ended up as commit 637a5acb46b3 -- with your R-b
>> -- in 2013. I'm unsure if machine type properties existed back then, but
>> even if they did, do you think I knew about them? :)
>>
>> You are right, of course; it's just that we can't tell the future.
> 
> True!  All we can do is continue to design as well as we can given the
> information, experience and resources we have, and when the inevitable
> design mistakes become apparent, limit their impact.
> 
> Some of the things we now consider mistakes we just didn't see.  Others
> we saw (e.g. multiple pflash devices, unlike physical hardware), but
> underestimated their impact.
> 

I thought about your comments and wrote the following patch (just for test)
which uses a file mapping to replace the anonymous mapping. UEFI seems to work
fine. So why not use a file mapping to read or write a pflash device?

Except this way, I don't know how to share the pflash memory among VMs or
reduce the consumption for resource. :(

---8>---

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ce2664a..12c78f2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -898,6 +898,7 @@ static void create_one_flash(const char *name, hwaddr 
flashbase,
 qdev_prop_set_uint16(dev, "id2", 0x00);
 qdev_prop_set_uint16(dev, "id3", 0x00);
 qdev_prop_set_string(dev, "name", name);
+qdev_prop_set_bit(dev, "prealloc", false);
 qdev_init_nofail(dev);

 memory_region_add_subregion(sysmem, flashbase,
diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 16dfae1..23a85bc 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -92,6 +92,7 @@ struct PFlashCFI01 {
 void *storage;
 VMChangeStateEntry *vmstate;
 bool old_multiple_chip_handling;
+bool prealloc;
 };

 static int pflash_post_load(void *opaque, int version_id);
@@ -731,11 +732,21 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 device_len = sector_len_per_device * blocks_per_device;

-memory_region_init_rom_device(
-&pfl->mem, OBJECT(dev),
-&pflash_cfi01_ops,
-pfl,
-pfl->name, total_len, &local_err);
+if (pfl->blk && !pfl->prealloc) {
+memory_region_init_rom_device_from_file(
+&pfl->mem, OBJECT(dev),
+&pflash_cfi01_ops,
+pfl,
+pfl->name, total_len,
+blk_is_read_only(pfl->blk) ? RAM_SHARED : RAM_PMEM,
+blk_bs(pfl->blk)->filename, &local_err);
+} else {
+memory_region_init_rom_device(
+&pfl->mem, OBJECT(dev),
+&pflash_cfi01_ops,
+pfl,
+pfl->name, total_len, &local_err);
+}
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -899,6 +910,7 @@ static Property pflash_cfi01_properties[] = {
 DEFINE_PROP_STRING("name", PFlashCFI01, name),
 DEFINE_PROP_BOOL("old-multiple-chip-handling", PFlashCFI01,
  old_multiple_chip_handling, false),
+DEFINE_PROP_BOOL("prealloc", PFlashCFI01, prealloc, true),
 DEFINE_PROP_END_OF_LIST(),
 };

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1625913..1b16d3b 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -804,6 +804,16 @@ void memory_region_init_rom_device_nomigrate(MemoryRegion 
*mr,
  uint64_t size,
  Error **errp);

+void memory_region_init_rom_device_from_file(MemoryRegion *mr,
+ struct Object *owner,
+ const MemoryRegionOps *ops,
+ void *opaque,
+ const char *name,
+ uint64_t size,
+ uint32_t ram_flags,
+ const char *path,
+ Error **errp);
+
 /**
  * memory_region_init_iommu: Initialize a memory region of a custom type
  * that translates addresses
diff --git a/memory.c b/memory.c
index 9fbca52..905956b 100644
--- a/memory.c
+++ b/memory.c
@@ -1719,6 +1719,36 @@ void 
memory_region_init_r

Re: [Qemu-devel] [PATCH v3] s390: diagnose 318 info reset and migration support

2019-04-03 Thread Collin Walling


On 4/3/19 8:30 AM, Cornelia Huck wrote:

On Wed, 3 Apr 2019 13:46:07 +0200
David Hildenbrand  wrote:


On 01.04.19 23:48, Collin Walling wrote:

DIAGNOSE 0x318 (diag318) is a privileged s390x instruction that must
be intercepted by SIE and handled via KVM. Let's introduce some
functions to communicate between QEMU and KVM via ioctls. These
will be used to get/set the diag318 related information (also known
as the "Control Program Code" or "CPC"), as well as check the system
if KVM supports handling this instruction.

Diag318 must also be reset on a load normal and modified clear, so
we use the set function (wrapped in a reset function) to explicitly
set the diag318 info to 0 for these cases.

Lastly, we want to ensure the diag318 info is migrated. The diag318
info migration is handled via a VMStateDescription. This feature is
only supported on QEMU machines 4.0 and later.


This has to become 4.1



Signed-off-by: Collin Walling 
---

This version is posted in tandem with a new kernel patch that changes
how the execution of the diag 0x318 instruction is handled. A link to
this series will be attached as a reply to this series for convenience.

Changelog:

 v3
 - removed CPU model code
 - removed RSCPI and SCLP code
 - reverted max cpus back to 248 (previous patches limited this
 to 247)
 - introduced VMStateDescription handlers for migration
 - disabled migration of diag318 info for machines 3.1 and older
 - a warning is printed if migration is disabled and KVM
   supports handling this instruction

---
  hw/s390x/s390-virtio-ccw.c   |  6 
  linux-headers/asm-s390/kvm.h |  4 +++
  target/s390x/diag.c  | 63 
  target/s390x/internal.h  |  5 ++-
  target/s390x/kvm-stub.c  | 15 +
  target/s390x/kvm.c   | 32 ++
  target/s390x/kvm_s390x.h |  3 ++
  7 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index d11069b860..2a50868496 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -36,6 +36,7 @@
  #include "cpu_models.h"
  #include "hw/nmi.h"
  #include "hw/s390x/tod.h"
+#include "internal.h"
  
  S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)

  {
@@ -302,6 +303,8 @@ static void ccw_init(MachineState *machine)
  
  /* init the TOD clock */

  s390_init_tod();
+
+diag318_register_migration();
  }
  
  static void s390_cpu_plug(HotplugHandler *hotplug_dev,

@@ -352,6 +355,7 @@ static void s390_machine_reset(void)
  }
  subsystem_reset();
  s390_crypto_reset();
+diag318_reset();


Shouldn't this go into subsystem_reset?

Aren't you missing resets during external/reipl resets?

Also, I was wondering if this would be worth creating a fake device like
diag288. The resets can be handled similar to diag288. Resets during
external/reipl reset would come natural.


I'll look into it.



I like the idea of adding a new device. Would also make the migration
code nicer (as you suggested below).



Right. I always forget this step.

  
  static void ccw_machine_3_1_class_options(MachineClass *mc)

diff --git a/linux-headers/asm-s390/kvm.h b/linux-headers/asm-s390/kvm.h
index 0265482f8f..735f5a46e8 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h


Updates of linux headers should always go into a separate patch that
can be replaced by a proper headers update when applying.


@@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
  #define KVM_S390_VM_CRYPTO2
  #define KVM_S390_VM_CPU_MODEL 3
  #define KVM_S390_VM_MIGRATION 4
+#define KVM_S390_VM_MISC   5
  
  /* kvm attributes for mem_ctrl */

  #define KVM_S390_VM_MEM_ENABLE_CMMA   0
@@ -168,6 +169,9 @@ struct kvm_s390_vm_cpu_subfunc {
  #define KVM_S390_VM_MIGRATION_START   1
  #define KVM_S390_VM_MIGRATION_STATUS  2
  
+/* kvm attributes for KVM_S390_VM_MISC */

+#define KVM_S390_VM_MISC_CPC   0
+
  /* for KVM_GET_REGS and KVM_SET_REGS */
  struct kvm_regs {
/* general purpose regs for s390 */

Re: [Qemu-devel] [PATCH 0/5] target/mips/cpu: errors and warnings coding style cleanup

2019-04-03 Thread Aleksandar Markovic

> From: Jules Irenge 
> Subject: [PATCH 0/5] target/mips/cpu: errors and warnings coding style cleanup
>
> This v1 series cleans up all warnings and errors of coding style within cpu.h
> file
>

Hi, Jules!

There are a couple of minor problems that I described in my comments
to other patches. Otherwise I like the series.

May I ask you to send v2 of the series, with some needed modifications?

Regards,
Aleksandar

> Jules Irenge (5):
>   target/mips: add space to fix checkpatch errors
>   target/mips: realign comments to fix checkpatch warnings
>   target/mips: replace indentation with space to fix checkpatch errors
>   target/mips: remove space to fix checkpatch errors
>   target/mips: wrap line into multiple lines to to fix checkpatch errors
> 
>  target/mips/cpu.h | 211 +-
>  1 file changed, 117 insertions(+), 94 deletions(-)

Re: [Qemu-devel] [PATCH v3] s390: diagnose 318 info reset and migration support

2019-04-03 Thread Collin Walling


On 4/3/19 10:16 AM, Collin Walling wrote:

On 4/3/19 8:30 AM, Cornelia Huck wrote:

On Wed, 3 Apr 2019 13:46:07 +0200
David Hildenbrand  wrote:


On 01.04.19 23:48, Collin Walling wrote:

DIAGNOSE 0x318 (diag318) is a privileged s390x instruction that must
be intercepted by SIE and handled via KVM. Let's introduce some
functions to communicate between QEMU and KVM via ioctls. These
will be used to get/set the diag318 related information (also known
as the "Control Program Code" or "CPC"), as well as check the system
if KVM supports handling this instruction.

Diag318 must also be reset on a load normal and modified clear, so
we use the set function (wrapped in a reset function) to explicitly
set the diag318 info to 0 for these cases.

Lastly, we want to ensure the diag318 info is migrated. The diag318
info migration is handled via a VMStateDescription. This feature is
only supported on QEMU machines 4.0 and later.


This has to become 4.1



Signed-off-by: Collin Walling 
---

This version is posted in tandem with a new kernel patch that changes
how the execution of the diag 0x318 instruction is handled. A link to
this series will be attached as a reply to this series for convenience.

Changelog:

 v3
 - removed CPU model code
 - removed RSCPI and SCLP code
 - reverted max cpus back to 248 (previous patches limited this
 to 247)
 - introduced VMStateDescription handlers for migration
 - disabled migration of diag318 info for machines 3.1 and 
older

 - a warning is printed if migration is disabled and KVM
   supports handling this instruction

---
  hw/s390x/s390-virtio-ccw.c   |  6 
  linux-headers/asm-s390/kvm.h |  4 +++
  target/s390x/diag.c  | 63 


  target/s390x/internal.h  |  5 ++-
  target/s390x/kvm-stub.c  | 15 +
  target/s390x/kvm.c   | 32 ++
  target/s390x/kvm_s390x.h |  3 ++
  7 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index d11069b860..2a50868496 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -36,6 +36,7 @@
  #include "cpu_models.h"
  #include "hw/nmi.h"
  #include "hw/s390x/tod.h"
+#include "internal.h"
  S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
  {
@@ -302,6 +303,8 @@ static void ccw_init(MachineState *machine)
  /* init the TOD clock */
  s390_init_tod();
+
+    diag318_register_migration();
  }
  static void s390_cpu_plug(HotplugHandler *hotplug_dev,
@@ -352,6 +355,7 @@ static void s390_machine_reset(void)
  }
  subsystem_reset();
  s390_crypto_reset();
+    diag318_reset();


Shouldn't this go into subsystem_reset?

Aren't you missing resets during external/reipl resets?



Certainly makes sense to do this during reipl as well. The diag318 info 
is to be reset during a clear, power-on, and load normal. I'll look into

it as I investigate this "fake device" route.


Also, I was wondering if this would be worth creating a fake device like
diag288. The resets can be handled similar to diag288. Resets during
external/reipl reset would come natural.


I'll look into it.



I like the idea of adding a new device. Would also make the migration
code nicer (as you suggested below).



Right. I always forget this step.



Whoops -- Meant that response to be under the "linux header" comment.


  static void ccw_machine_3_1_class_options(MachineClass *mc)
diff --git a/linux-headers/asm-s390/kvm.h 
b/linux-headers/asm-s390/kvm.h

index 0265482f8f..735f5a46e8 100644
--- a/linux-headers/asm-s390/kvm.h
+++ b/linux-headers/asm-s390/kvm.h


Updates of linux headers should always go into a separate patch that
can be replaced by a proper headers update when applying.


@@ -74,6 +74,7 @@ struct kvm_s390_io_adapter_req {
  #define KVM_S390_VM_CRYPTO    2
  #define KVM_S390_VM_CPU_MODEL    3
  #define KVM_S390_VM_MIGRATION    4
+#define KVM_S390_VM_MISC    5
  /* kvm attributes for mem_ctrl */
  #define KVM_S390_VM_MEM_ENABLE_CMMA    0
@@ -168,6 +169,9 @@ struct kvm_s390_vm_cpu_subfunc {
  #define KVM_S390_VM_MIGRATION_START    1
  #define KVM_S390_VM_MIGRATION_STATUS    2
+/* kvm attributes for KVM_S390_VM_MISC */
+#define KVM_S390_VM_MISC_CPC    0
+
  /* for KVM_GET_REGS and KVM_SET_REGS */
  struct kvm_regs {
  /* general purpose regs for s390 */

[Qemu-devel] [PATCH for-4.1] commit: Use bdrv_append() in commit_start()

2019-04-03 Thread Alberto Garcia

This function combines bdrv_set_backing_hd() and bdrv_replace_node()
so we can use it to simplify the code a bit in commit_start().

Signed-off-by: Alberto Garcia 
---
 block/commit.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index ba60fef58a..a0beb7d265 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -304,23 +304,14 @@ void commit_start(const char *job_id, BlockDriverState 
*bs,
 commit_top_bs->total_sectors = top->total_sectors;
 bdrv_set_aio_context(commit_top_bs, bdrv_get_aio_context(top));
 
-bdrv_set_backing_hd(commit_top_bs, top, &local_err);
+bdrv_append(commit_top_bs, top, &local_err);
 if (local_err) {
-bdrv_unref(commit_top_bs);
-commit_top_bs = NULL;
-error_propagate(errp, local_err);
-goto fail;
-}
-bdrv_replace_node(top, commit_top_bs, &local_err);
-if (local_err) {
-bdrv_unref(commit_top_bs);
 commit_top_bs = NULL;
 error_propagate(errp, local_err);
 goto fail;
 }
 
 s->commit_top_bs = commit_top_bs;
-bdrv_unref(commit_top_bs);
 
 /* Block all nodes between top and base, because they will
  * disappear from the chain after this operation. */
-- 
2.11.0

[Qemu-devel] [Bug 1818483] Re: qemu user mode does not support binfmt_misc config with flags include "P"

2019-04-03 Thread YunQiang Su

This patch is for linux kernel.

It will set the 3rd bit of AT_FLAGS, if P is set for binfmt_misc.

The major concern is that AT_FLAGS is never used, then, should we use it
here?

** Patch added: "binfmt_preserve_argv0.patch"
   
https://bugs.launchpad.net/qemu/+bug/1818483/+attachment/5252516/+files/binfmt_preserve_argv0.patch

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1818483

Title:
  qemu user mode does not support binfmt_misc config with flags include
  "P"

Status in QEMU:
  New

Bug description:
  Hi Sir:
  During our test in chroot environment with qemu-user-static, we got some test 
cases failed because of program output warning with unexpected full path name.
  For example in test module "Devscripts"
  the test item for broken tarball expected the warning info:
  
  but the output was:
  
  the cause is the config file of binfmt_misc was set not to send argv0, for 
example:
  type command "tar" after chroot:
  ==
  lpeng@lpeng-VirtualBox:~/projects_lpeng/qemu/mips_2/sid$ sudo chroot .
  [sudo] password for lpeng: 
  root@lpeng-VirtualBox:/# tar
  /bin/tar: You must specify one of the '-Acdtrux', '--delete' or 
'--test-label' options
  Try '/bin/tar --help' or '/bin/tar --usage' for more information.
  root@lpeng-VirtualBox:/# 
  ===

  by adding output log in main()@qemu/Linux-user/main.c
  we found the original input command was changed, and qemu do not know that, 
we got the input args:
  argv_0/usr/bin/qemu-mips64el-static---
  argv_1/bin/tar---
  argv_2NULL---

  Next step we modified the flags=P in the corresponding config under folder 
/proc/sys/fs/binfmt_misc, then binfmt_misc sent argv[0] to qemu.
  But chroot could not start bash because in current qemu dose not consider 
about this unexpected one more"argv[0]"

  
  After modified qemu code temporary to handle the new argv list we got the 
input args, and from argv[2] is the original input command
  argv_0/usr/bin/qemu-mips64el-static---
  argv_1/bin/tar---
  argv_2tar---

  We need the original input from command line, so is it possible that let 
binfmt_misc to pass one more additional args or env to qemu as a token of the 
binfmt_misc flag, then qemu can judge how to parse the input args by it?
  looking forward your suggestions.

  Thanks
  luyou

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1818483/+subscriptions

Re: [Qemu-devel] [PULL for-4.0 0/4] s390x gcc 9 warning fixes

2019-04-03 Thread Peter Maydell

On Wed, 3 Apr 2019 at 17:05, Cornelia Huck  wrote:
>
> The following changes since commit 061b51e9195670e9d190cdec46fabcb3c77763fb:
>
>   Update version for v4.0.0-rc2 release (2019-04-02 17:01:20 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/cohuck/qemu tags/s390x-20190403
>
> for you to fetch changes up to 7357b2215978debf2fd17b525ba745d3c69272a3:
>
>   hw/s390x/3270-ccw: avoid taking address of fields in packed struct 
> (2019-04-03 11:19:57 +0200)
>
> 
> Fix taking address of fields in packed structs warnings
> by gcc 9
>
> 

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.0
for any user-visible changes.

-- PMM

Re: [Qemu-devel] VSOCK benchmark and optimizations

2019-04-03 Thread Stefano Garzarella

On Wed, Apr 03, 2019 at 01:34:38PM +0100, Stefan Hajnoczi wrote:
> On Mon, Apr 01, 2019 at 06:32:40PM +0200, Stefano Garzarella wrote:
> > Hi Alex,
> > I'm sending you some benchmarks and information about VSOCK CCing qemu-devel
> > and linux-netdev (maybe this info could be useful for others :))
> > 
> > One of the VSOCK advantages is the simple configuration: you don't need to 
> > set
> > up IP addresses for guest/host, and it can be used with the standard POSIX
> > socket API. [1]
> > 
> > I'm currently working on it, so the "optimized" values are still work in
> > progress and I'll send the patches upstream (Linux) as soon as possible.
> > (I hope in 1 or 2 weeks)
> > 
> > Optimizations:
> > + reducing the number of credit update packets
> >   - RX side sent, on every packet received, an empty packet only to inform 
> > the
> > TX side about the space in the RX buffer.
> > + increase RX buffers size to 64 KB (from 4 KB)
> > + merge packets to fill RX buffers
> > 
> > As benchmark tool I used iperf3 [2] modified with VSOCK support:
> > 
> >  host -> guest [Gbps]  guest -> host [Gbps]
> > pkt_sizebefore opt.  optimizedbefore opt.  optimized
> >   1K0.5 1.6   1.4 1.4
> 
> This is a "large" small package size.  I think 64 bytes is a common
> "small" packet size and is worth benchmarking too.
> 

Okay, I'll add more small packet sizes for the benchmark.

> >   2K1.1 3.1   2.3 2.5
> >   4K2.0 5.6   4.2 4.4
> >   8K3.210.2   7.2 7.5
> >   16K   6.414.2   9.411.3
> >   32K   9.818.9   9.217.8
> >   64K  13.822.9   8.825.0
> >   128K 17.624.5   7.725.7
> >   256K 19.024.8   8.125.6
> >   512K 20.825.1   8.125.4
> 
> Nice improvements!

Thanks :)

I'm cleaning the patches, doing step by step benchmarks and I hope I'll
send the series upstream in these days.

Stefano

Re: [Qemu-devel] [PATCH] configure: Relax check for libseccomp

2019-04-03 Thread Peter Maydell

On Wed, 3 Apr 2019 at 19:51, Helge Deller  wrote:

[cc'ing Eduardo as the seccomp submaintainer]

> On a non-release architecture, the configure program aborts if the
> --enable-seccomp flag was given (with no way to work around it on the
> command line):
>
> ERROR: User requested feature libseccomp
> configure was not able to find it.
> libseccomp is not supported for host cpu parisc64

Surely the workaround is "don't pass --enable-seccomp on
the configure command line" ?

Our general approach with configure arguments is:
 --disable-foo means "don't try to look for or use foo"
 --enable-foo means "use foo, and stop with an error if we can't use
 foo for any reason (eg not found, version too old)"
passing nothing means "look for foo, use it if we can,
 but if we can't then just silently don't use foo"

So I think if the user specifically asks us to use seccomp on a
host architecture where it won't work then configure should fail.

Is the underlying problem here:
 * we use a whitelist of host architectures to enable seccomp for
   and we should not do that (eg blacklist instead, or just allow it
   for any host architecture)?
 * using a whitelist is ok, but we should add some more host archs to it?
 * something else?

What particular host arch are you using?

thanks
-- PMM

Re: [Qemu-devel] [PATCH] configure: Relax check for libseccomp

2019-04-03 Thread Peter Maydell

On Wed, 3 Apr 2019 at 22:16, Peter Maydell  wrote:
>
> On Wed, 3 Apr 2019 at 19:51, Helge Deller  wrote:
>
> [cc'ing Eduardo as the seccomp submaintainer]
>
> > On a non-release architecture, the configure program aborts if the
> > --enable-seccomp flag was given (with no way to work around it on the
> > command line):
> >
> > ERROR: User requested feature libseccomp
> > configure was not able to find it.
> > libseccomp is not supported for host cpu parisc64

> What particular host arch are you using?

Doh, just noticed that the error message has the answer to this!

thanks
- PMM

Re: [Qemu-devel] [PATCH v2] migration: avoid filling ignore-shared ramblock when in incoming migration

2019-04-03 Thread Catherine Ho

Hi Peter Xu

On Wed, 3 Apr 2019 at 10:25, Peter Xu  wrote:

> On Tue, Apr 02, 2019 at 11:30:01AM -0400, Catherine Ho wrote:
> > Commit 18269069c310 ("migration: Introduce ignore-shared capability")
> > addes ignore-shared capability to bypass the shared ramblock (e,g,
> > membackend + numa node). It does good to live migration.
> >
> > This commit expectes that QEMU doesn't write to guest RAM until
> > VM starts, but it does on aarch64 qemu:
> > Backtrace:
> > 1  0x55f4a296dd84 in address_space_write_rom_internal () at
> exec.c:3458
> > 2  0x55f4a296de3a in address_space_write_rom () at exec.c:3479
> > 3  0x55f4a2d519ff in rom_reset () at hw/core/loader.c:1101
> > 4  0x55f4a2d475ec in qemu_devices_reset () at hw/core/reset.c:69
> > 5  0x55f4a2c90a28 in qemu_system_reset () at vl.c:1675
> > 6  0x55f4a2c9851d in main () at vl.c:4552
> >
> > Actually, on arm64 virt marchine, ramblock "dtb" will be filled into ram
> > during rom_reset. In ignore-shared incoming case, this rom filling
> > is not required since all the data has been stored in memory backend
> file.
> >
> > Fixes: commit 18269069c310 ("migration: Introduce ignore-shared
> capability")
> > Signed-off-by: Catherine Ho 
> > Suggested-by: Yury Kotov 
>
> (note: IIUC normally you should have your signed-off to be the last
>  line before the suggested-by :)
>
> About the patch content, I have had a question on whether we should
> need to check ignore-shared at all... That question lies in:
>
> https://patchwork.kernel.org/patch/10859889/#22546487
>
> And if my understanding was correct above, IMHO the patch could be as
> simply be as "if (runstate_check(RUN_STATE_INMIGRATE)) return;" at [1]
> below.
>
>
Thanks， but I thought this method would break the x86 rom_reset logic during
RUN_STATE_INMIGRATE.
Please see the debugging patch and log lines below:
diff --git a/hw/core/loader.c b/hw/core/loader.c
index fe5cb24122..b0c871af26 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -1086,8 +1086,9 @@ int rom_add_option(const char *file, int32_t
bootindex)
 static void rom_reset(void *unused)
 {
 Rom *rom;
-
 QTAILQ_FOREACH(rom, &roms, next) {
+if (runstate_check(RUN_STATE_INMIGRATE))
+   printf("rom name=%s\n",rom->name);
 if (rom->fw_file) {
 continue;
 }

rom name=kvmvapic.bin
rom name=linuxboot_dma.bin
rom name=bios-256k.bin
rom name=etc/acpi/tables
rom name=etc/table-loader
rom name=etc/acpi/rsdp

B.R.
Catherine

> Thanks,
>
> > ---
> >  hw/core/loader.c  | 15 +++
> >  include/exec/cpu-common.h |  1 +
> >  migration/ram.c   |  2 +-
> >  3 files changed, 17 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/core/loader.c b/hw/core/loader.c
> > index fe5cb24122..861a03335b 100644
> > --- a/hw/core/loader.c
> > +++ b/hw/core/loader.c
> > @@ -53,6 +53,7 @@
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "exec/memory.h"
> >  #include "exec/address-spaces.h"
> > +#include "exec/cpu-common.h"
> >  #include "hw/boards.h"
> >  #include "qemu/cutils.h"
> >
> > @@ -1086,6 +1087,9 @@ int rom_add_option(const char *file, int32_t
> bootindex)
> >  static void rom_reset(void *unused)
> >  {
> >  Rom *rom;
> > +MemoryRegion *mr;
> > +hwaddr hw_addr;
> > +hwaddr l;
>
> [1]
>
> >
> >  QTAILQ_FOREACH(rom, &roms, next) {
> >  if (rom->fw_file) {
> > @@ -1094,6 +1098,17 @@ static void rom_reset(void *unused)
> >  if (rom->data == NULL) {
> >  continue;
> >  }
> > +
> > +/* bypass the rom blob in ignore-shared migration case*/
> > +if (runstate_check(RUN_STATE_INMIGRATE)) {
> > +rcu_read_lock();
> > +mr = address_space_translate(rom->as, rom->addr, &hw_addr,
> &l,
> > + true, MEMTXATTRS_UNSPECIFIED);
> > +rcu_read_unlock();
> > +if (mr->ram_block != NULL &&
> ramblock_is_ignored(mr->ram_block))
> > +continue;
> > +}
> > +
> >  if (rom->mr) {
> >  void *host = memory_region_get_ram_ptr(rom->mr);
> >  memcpy(host, rom->data, rom->datasize);
> > diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
> > index cef8b88a2a..c80b7248a6 100644
> > --- a/include/exec/cpu-common.h
> > +++ b/include/exec/cpu-common.h
> > @@ -76,6 +76,7 @@ void *qemu_ram_get_host_addr(RAMBlock *rb);
> >  ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
> >  ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
> >  bool qemu_ram_is_shared(RAMBlock *rb);
> > +bool ramblock_is_ignored(RAMBlock *block);
> >  bool qemu_ram_is_uf_zeroable(RAMBlock *rb);
> >  void qemu_ram_set_uf_zeroable(RAMBlock *rb);
> >  bool qemu_ram_is_migratable(RAMBlock *rb);
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 35bd6213e9..d6de9d335d 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -159,7 +159,7 @@ out:
> >  return ret;
> >  }
> >
> > -static bool ramblock_is_ign

1 2 >

1 - 100 of 162 matches

Mail list logo