date:20201014

Re: [RFC PATCH v6 1/2] hw/misc/sifive_u_otp: Add write function and write-once protection

2020-10-14 Thread Green Wan

On Wed, Oct 14, 2020 at 1:37 PM Bin Meng  wrote:
>
> Hi Green,
>
> On Mon, Sep 28, 2020 at 6:12 PM Green Wan  wrote:
> >
> >  - Add write operation to update fuse data bit when PWE bit is on.
> >  - Add array, fuse_wo, to store the 'written' status for all bits
> >of OTP to block the write operation.
> >
> > Signed-off-by: Green Wan 
> > Reviewed-by: Alistair Francis 
> > ---
> >  hw/misc/sifive_u_otp.c | 30 +-
> >  include/hw/misc/sifive_u_otp.h |  3 +++
> >  2 files changed, 32 insertions(+), 1 deletion(-)
> >
>
> I am not sure how you tested this. I wrote a simple U-Boot command to
> call U-Boot sifive-otp driver to test the write functionality, but it
> failed.
>
> => misc write otp@1007 0 8020 10
  ^
Quick ask, how about 'md 8020'?

I didn't use 'misc write' command. I can check afterward.

> => misc read  otp@1007 0 8040 10
> => md 8040
> 8040:    
> 80400010:    
> 80400020:    
> 80400030:    
> 80400040:    
> 80400050:    
> 80400060:    
> 80400070:    
> 80400080:    
> 80400090:    
> 804000a0:    
> 804000b0:    
> 804000c0:    
> 804000d0:    
> 804000e0:    
> 804000f0:    
> => misc write otp@1007 0 80200010 10
> => misc read  otp@1007 0 80400010 10
> => md 8040
> 8040:    
> 80400010:    
> 80400020:    
> 80400030:    
> 80400040:    
> 80400050:    
> 80400060:    
> 80400070:    
> 80400080:    
> 80400090:    
> 804000a0:    
> 804000b0:    
> 804000c0:    
> 804000d0:    
> 804000e0:    
> 804000f0:    
>
> But it can read the serial number at offset 0x3f0
>
> => misc read  otp@1007 3f0 80400010 10
> => md 8040
> 8040:    
> 80400010: 0001 fffe  
> 80400020:    
> 80400030:    
> 80400040:    
> 80400050:    
> 80400060:    
> 80400070:    
> 80400080:    
> 80400090:    
> 804000a0:    
> 804000b0:    
> 804000c0:    
> 804000d0:    
> 804000e0:    
> 804000f0:    
>
> Regards,
> Bin

Re: [PATCH] vhost-user: add separate memslot counter for vhost-user

2020-10-14 Thread Michael S. Tsirkin

On Tue, Oct 13, 2020 at 08:58:59PM -0400, Raphael Norwitz wrote:
> On Tue, Oct 6, 2020 at 5:48 AM Igor Mammedov  wrote:
> >
> > On Mon, 28 Sep 2020 21:17:31 +0800
> > Jiajun Chen  wrote:
> >
> > > Used_memslots is equal to dev->mem->nregions now, it is true for
> > > vhost kernel, but not for vhost user, which uses the memory regions
> > > that have file descriptor. In fact, not all of the memory regions
> > > have file descriptor.
> > > It is usefully in some scenarios, e.g. used_memslots is 8, and only
> > > 5 memory slots can be used by vhost user, it is failed to hot plug
> > > a new memory RAM because vhost_has_free_slot just returned false,
> > > but we can hot plug it safely in fact.
> >
> > I had an impression that all guest RAM has to be shared with vhost,
> > so combination of anon and fd based RAM couldn't work.
> > Am I wrong?
> 
> I'm not sure about the kernel backend, but I've tested adding anon
> memory to a VM with a vhost-user-scsi device and it works (eventually
> the VM crashed, but I could see the guest recognized the anon RAM).
> The vhost-user code is designed to work with both. I'm not sure I see
> a use case, but if there is one, this would be a valid issue. Maybe
> Jiajun or Jianjay can elaborate.

Hmm does not vhost-user skip all regions that do not have an fd?


mr = vhost_user_get_mr_data(reg->userspace_addr, &offset, &fd);
if (fd > 0) {
if (track_ramblocks) {
assert(*fd_num < VHOST_MEMORY_BASELINE_NREGIONS);
trace_vhost_user_set_mem_table_withfd(*fd_num, mr->name,
  reg->memory_size,
  reg->guest_phys_addr,
  reg->userspace_addr,
  offset);
u->region_rb_offset[i] = offset;
u->region_rb[i] = mr->ram_block;
} else if (*fd_num == VHOST_MEMORY_BASELINE_NREGIONS) {
error_report("Failed preparing vhost-user memory table msg");
return -1;
}
vhost_user_fill_msg_region(®ion_buffer, reg, offset);
msg->payload.memory.regions[*fd_num] = region_buffer;
fds[(*fd_num)++] = fd;
} else if (track_ramblocks) {
u->region_rb_offset[i] = 0;
u->region_rb[i] = NULL;
}



In your test, is it possible that you were lucky and guest did not send
any data from anon memory to the device?



> >
> > >
> > > --
> > > ChangeList:
> > > v3:
> > > -make used_memslots a member of struct vhost_dev instead of a global 
> > > static value
> > it's global resource, so why?
> 
> I suggested it because I thought it made the code a little cleaner.
> I'm not opposed to changing it back, or having it stored at the
> vhost_user level.

Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property

2020-10-14 Thread Richard Purdie

On Wed, 2020-10-14 at 01:36 +, Victor Kamensky (kamensky) wrote:
> Thank you very much for looking at this. I gave a spin to
> your 3 patch series in original setup, and as expected with
> '-cpu 34Kf,tlb-entries=64' option it works great.
> 
> If nobody objects, and your patches could be merged, we
> would greatly appreciate it.

Speaking as one of the Yocto Project maintainers, this is really
helpful for us, thanks!

qemumips is one of our slowest platforms for automated testing so this
performance improvement helps a lot.

Cheers,

Richard

Re: [RFC PATCH v6 1/2] hw/misc/sifive_u_otp: Add write function and write-once protection

2020-10-14 Thread Bin Meng

Hi Green,

On Wed, Oct 14, 2020 at 3:02 PM Green Wan  wrote:
>
> On Wed, Oct 14, 2020 at 1:37 PM Bin Meng  wrote:
> >
> > Hi Green,
> >
> > On Mon, Sep 28, 2020 at 6:12 PM Green Wan  wrote:
> > >
> > >  - Add write operation to update fuse data bit when PWE bit is on.
> > >  - Add array, fuse_wo, to store the 'written' status for all bits
> > >of OTP to block the write operation.
> > >
> > > Signed-off-by: Green Wan 
> > > Reviewed-by: Alistair Francis 
> > > ---
> > >  hw/misc/sifive_u_otp.c | 30 +-
> > >  include/hw/misc/sifive_u_otp.h |  3 +++
> > >  2 files changed, 32 insertions(+), 1 deletion(-)
> > >
> >
> > I am not sure how you tested this. I wrote a simple U-Boot command to
> > call U-Boot sifive-otp driver to test the write functionality, but it
> > failed.
> >
> > => misc write otp@1007 0 8020 10
>   ^
> Quick ask, how about 'md 8020'?
>
> I didn't use 'misc write' command. I can check afterward.

Note 'misc write' is a new U-Boot command I just added for testing
this QEMU functionality. Please use the U-Boot patch below:
http://patchwork.ozlabs.org/project/uboot/patch/1602657292-82815-1-git-send-email-bmeng...@gmail.com/

>
> > => misc read  otp@1007 0 8040 10
> > => md 8040
> > 8040:    
> > 80400010:    
> > 80400020:    
> > 80400030:    
> > 80400040:    
> > 80400050:    
> > 80400060:    
> > 80400070:    
> > 80400080:    
> > 80400090:    
> > 804000a0:    
> > 804000b0:    
> > 804000c0:    
> > 804000d0:    
> > 804000e0:    
> > 804000f0:    
> > => misc write otp@1007 0 80200010 10
> > => misc read  otp@1007 0 80400010 10
> > => md 8040
> > 8040:    
> > 80400010:    
> > 80400020:    
> > 80400030:    
> > 80400040:    
> > 80400050:    
> > 80400060:    
> > 80400070:    
> > 80400080:    
> > 80400090:    
> > 804000a0:    
> > 804000b0:    
> > 804000c0:    
> > 804000d0:    
> > 804000e0:    
> > 804000f0:    
> >
> > But it can read the serial number at offset 0x3f0
> >
> > => misc read  otp@1007 3f0 80400010 10
> > => md 8040
> > 8040:    
> > 80400010: 0001 fffe  
> > 80400020:    
> > 80400030:    
> > 80400040:    
> > 80400050:    
> > 80400060:    
> > 80400070:    
> > 80400080:    
> > 80400090:    
> > 804000a0:    
> > 804000b0:    
> > 804000c0:    
> > 804000d0:    
> > 804000e0:    
> > 804000f0:    

Regards,
Bin

Re: [RFC PATCH 0/3] target/mips: Make the number of TLB entries a CPU property

2020-10-14 Thread Richard Purdie

On Tue, 2020-10-13 at 19:22 -0700, Richard Henderson wrote:
> On 10/13/20 4:11 PM, Richard Henderson wrote:
> > On 10/13/20 6:25 AM, Philippe Mathieu-Daudé wrote:
> > > Yocto developers have expressed interest in running MIPS32
> > > CPU with custom number of TLB:
> > > https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html
> > > 
> > > Help them by making the number of TLB entries a CPU property,
> > > keeping our set of CPU definitions in sync with real hardware.
> > 
> > You mean keeping the 34kf model within qemu in sync, rather than
> > creating a
> > nonsense model that doesn't exist.
> > 
> > Question: is this cpu parameter useful for anything else?
> > 
> > Because the ideal solution for a CI loop is to use one of the mips
> > cpu models
> > that has the hw page table walker (CP0C3_PW).  Having qemu being
> > able to refill
> > the tlb itself is massively faster.
> > 
> > We do not currently implement a mips cpu that has the PW.  When I
> > downloaded
> 
> Bah, "mips32 cpu".
> 
> We do have the P5600 that does has it, though the code is wrapped up
> in TARGET_MIPS64.  I'll also note that the code could be better
> placed [*]
> 
> > (1) anyone know if the PW incompatible with mips32?
> 
> I've since found a copy of the mips32-pra in the wayback machine and
> have answered this as "no" -- PW is documented for mips32.
> 
> > (2) if not, was there any mips32 hw built with PW
> > that we could model?
> 
> But I still don't know about this.
> 
> A further question for the Yocto folks: could you make use of a 64-
> bit kernel in testing a 32-bit userspace?

We run testing of 64 bit kernel with 64 bit userspace and 32 bit kernel
with 32 bit userspace, we've tested that for years. I know some of our
users do use 64 bit kernels with 32 bit userspace and we do limited
testing of that for multilib support.

I think we did try testing an R2 setup but found little performance
change and I think it may have been unreliable so we didn't make the
switch. We did move to 34kf relatively recently as that did perform
marginally better and matched qemu's recommendations.

We've also run into a lot of problems with 32 bit mips in general if we
go over 256MB memory since that seems to trigger highmem and hangs
regularly for us. We're working on infrastructure to save out those
hung VMs to help us debug such issues but don't have that yet. Its not
regular enough and we don't have the expertise to debug it properly as
yet unfortunately.

There is a question of how valid a 32 bit kernel is these days,
particularly given the memory issues we seem to run into there with
larger images.

Cheers,

Richard

[PATCH 03/10] Reduce the time of checkpoint for COLO

2020-10-14 Thread Zhang Chen

From: "Rao, Lei" 

we should set ram_bulk_stage to false after ram_state_init,
otherwise the bitmap will be unused in migration_bitmap_find_dirty.
all pages in ram cache will be flushed to the ram of secondary guest
for each checkpoint.

Signed-off-by: leirao 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
Reviewed-by: Zhang Chen 
---
 migration/ram.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/migration/ram.c b/migration/ram.c
index 433489d633..9cfac3d9ba 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3009,6 +3009,18 @@ static void decompress_data_with_multi_threads(QEMUFile 
*f,
 qemu_mutex_unlock(&decomp_done_lock);
 }
 
+ /*
+  * we must set ram_bulk_stage to false, otherwise in
+  * migation_bitmap_find_dirty the bitmap will be unused and
+  * all the pages in ram cache wil be flushed to the ram of
+  * secondary VM.
+  */
+static void colo_init_ram_state(void)
+{
+ram_state_init(&ram_state);
+ram_state->ram_bulk_stage = false;
+}
+
 /*
  * colo cache: this is for secondary VM, we cache the whole
  * memory of the secondary VM, it is need to hold the global lock
@@ -3052,7 +3064,7 @@ int colo_init_ram_cache(void)
 }
 }
 
-ram_state_init(&ram_state);
+colo_init_ram_state();
 return 0;
 }
 
-- 
2.17.1

[PATCH 01/10] net/filter-rewriter: destroy g_hash_table in colo_rewriter_cleanup

2020-10-14 Thread Zhang Chen

From: Pan Nengyuan 

s->connection_track_table forgot to destroy in colo_rewriter_cleanup. Fix it.

Reported-by: Euler Robot 
Signed-off-by: Pan Nengyuan 
Signed-off-by: Zhang Chen 
Reviewed-by: Zhang Chen 
Reviewed-by: Li Qiang 
---
 net/filter-rewriter.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index dc3c27a489..e063a818b7 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -381,6 +381,8 @@ static void colo_rewriter_cleanup(NetFilterState *nf)
 filter_rewriter_flush(nf);
 g_free(s->incoming_queue);
 }
+
+g_hash_table_destroy(s->connection_track_table);
 }
 
 static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
-- 
2.17.1

[PATCH 00/10] COLO project queued patches 20-Oct

2020-10-14 Thread Zhang Chen

From: Zhang Chen 

Hi Jason, this series include latest COLO related patches.
please check and merge it.

Thanks
Zhang Chen

Li Zhijian (2):
  colo-compare: fix missing compare_seq initialization
  colo-compare: check mark in mutual exclusion

Pan Nengyuan (1):
  net/filter-rewriter: destroy g_hash_table in colo_rewriter_cleanup

Rao, Lei (3):
  Optimize seq_sorter function for colo-compare
  Reduce the time of checkpoint for COLO
  Fix the qemu crash when guest shutdown in COLO mode

Zhang Chen (4):
  net/colo-compare.c: Fix compare_timeout format issue
  net/colo-compare.c: Change the timer clock type
  net/colo-compare.c: Add secondary old packet detection
  net/colo-compare.c: Increase default queued packet scan frequency

 migration/ram.c   | 14 ++-
 net/colo-compare.c| 57 ++-
 net/colo.c|  5 +---
 net/filter-rewriter.c |  2 ++
 softmmu/vl.c  |  1 +
 5 files changed, 46 insertions(+), 33 deletions(-)

-- 
2.17.1

[PATCH 05/10] colo-compare: fix missing compare_seq initialization

2020-10-14 Thread Zhang Chen

From: Li Zhijian 

Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Reviewed-by: Zhang Chen 
---
 net/colo.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index a6c66d829a..ef00609848 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -133,14 +133,11 @@ void reverse_connection_key(ConnectionKey *key)
 
 Connection *connection_new(ConnectionKey *key)
 {
-Connection *conn = g_slice_new(Connection);
+Connection *conn = g_slice_new0(Connection);
 
 conn->ip_proto = key->ip_proto;
 conn->processing = false;
-conn->offset = 0;
 conn->tcp_state = TCPS_CLOSED;
-conn->pack = 0;
-conn->sack = 0;
 g_queue_init(&conn->primary_list);
 g_queue_init(&conn->secondary_list);
 
-- 
2.17.1

[PATCH 04/10] Fix the qemu crash when guest shutdown in COLO mode

2020-10-14 Thread Zhang Chen

From: "Rao, Lei" 

In COLO mode, if the startup parameters of QEMU include "no-shutdown",
QEMU will crash when the guest shutdown. The root cause is when the
guest shutdown, the state of VM will switch COLO to SHUTDOWN. When do
checkpoint again, the state will be changed to COLO. But the state
switch is undefined in runstate_transitions_def, we should add it.
This patch fixes the following:
qemu-system-x86_64: invalid runstate transition: 'shutdown' -> 'colo'
Aborted

Signed-off-by: leirao 
Signed-off-by: Zhang Chen 
Reviewed-by: Zhang Chen 
---
 softmmu/vl.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5a11a62f78..bafe7c5b70 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -632,6 +632,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_SHUTDOWN, RUN_STATE_PAUSED },
 { RUN_STATE_SHUTDOWN, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_SHUTDOWN, RUN_STATE_PRELAUNCH },
+{ RUN_STATE_SHUTDOWN, RUN_STATE_COLO },
 
 { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED },
 { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED },
-- 
2.17.1

[PATCH 02/10] Optimize seq_sorter function for colo-compare

2020-10-14 Thread Zhang Chen

From: "Rao, Lei" 

The seq of tcp has been filled in fill_pkt_tcp_info, it
can be used directly here.

Signed-off-by: leirao 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
Reviewed-by: Zhang Chen 
---
 net/colo-compare.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 3a45d64175..86980cef5e 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -196,11 +196,7 @@ static void colo_compare_inconsistency_notify(CompareState 
*s)
 
 static gint seq_sorter(Packet *a, Packet *b, gpointer data)
 {
-struct tcp_hdr *atcp, *btcp;
-
-atcp = (struct tcp_hdr *)(a->transport_header);
-btcp = (struct tcp_hdr *)(b->transport_header);
-return ntohl(atcp->th_seq) - ntohl(btcp->th_seq);
+return a->tcp_seq - b->tcp_seq;
 }
 
 static void fill_pkt_tcp_info(void *data, uint32_t *max_ack)
-- 
2.17.1

[PATCH 07/10] net/colo-compare.c: Fix compare_timeout format issue

2020-10-14 Thread Zhang Chen

From: Zhang Chen 

This parameter need compare with the return of qemu_clock_get_ms(),
it is uinit64_t. So we need fix this issue here.

Reported-by: Derek Su 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
---
 net/colo-compare.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 24c366eec0..f4814c5f09 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -120,7 +120,7 @@ struct CompareState {
 SendCo out_sendco;
 SendCo notify_sendco;
 bool vnet_hdr;
-uint32_t compare_timeout;
+uint64_t compare_timeout;
 uint32_t expired_scan_cycle;
 
 /*
@@ -1075,9 +1075,9 @@ static void compare_get_timeout(Object *obj, Visitor *v,
 Error **errp)
 {
 CompareState *s = COLO_COMPARE(obj);
-uint32_t value = s->compare_timeout;
+uint64_t value = s->compare_timeout;
 
-visit_type_uint32(v, name, &value, errp);
+visit_type_uint64(v, name, &value, errp);
 }
 
 static void compare_set_timeout(Object *obj, Visitor *v,
@@ -1140,9 +1140,9 @@ static void set_max_queue_size(Object *obj, Visitor *v,
Error **errp)
 {
 Error *local_err = NULL;
-uint32_t value;
+uint64_t value;
 
-visit_type_uint32(v, name, &value, &local_err);
+visit_type_uint64(v, name, &value, &local_err);
 if (local_err) {
 goto out;
 }
@@ -1390,7 +1390,7 @@ static void colo_compare_init(Object *obj)
 object_property_add_str(obj, "notify_dev",
 compare_get_notify_dev, compare_set_notify_dev);
 
-object_property_add(obj, "compare_timeout", "uint32",
+object_property_add(obj, "compare_timeout", "uint64",
 compare_get_timeout,
 compare_set_timeout, NULL, NULL);
 
-- 
2.17.1

[PATCH 09/10] net/colo-compare.c: Add secondary old packet detection

2020-10-14 Thread Zhang Chen

From: Zhang Chen 

Detect queued secondary packet to sync VM state in time.

Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
---
 net/colo-compare.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 61c95fe7e9..c22f2af941 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -635,19 +635,26 @@ void colo_compare_unregister_notifier(Notifier *notify)
 static int colo_old_packet_check_one_conn(Connection *conn,
   CompareState *s)
 {
-GList *result = NULL;
-
-result = g_queue_find_custom(&conn->primary_list,
- &s->compare_timeout,
- (GCompareFunc)colo_old_packet_check_one);
+if (!g_queue_is_empty(&conn->primary_list)) {
+if (g_queue_find_custom(&conn->primary_list,
+&s->compare_timeout,
+(GCompareFunc)colo_old_packet_check_one))
+goto out;
+}
 
-if (result) {
-/* Do checkpoint will flush old packet */
-colo_compare_inconsistency_notify(s);
-return 0;
+if (!g_queue_is_empty(&conn->secondary_list)) {
+if (g_queue_find_custom(&conn->secondary_list,
+&s->compare_timeout,
+(GCompareFunc)colo_old_packet_check_one))
+goto out;
 }
 
 return 1;
+
+out:
+/* Do checkpoint will flush old packet */
+colo_compare_inconsistency_notify(s);
+return 0;
 }
 
 /*
-- 
2.17.1

Re: [PATCH v1 1/2] fuzz: add virtio-blk fuzz target

2020-10-14 Thread Dima Stepanov

On Tue, Oct 13, 2020 at 11:30:52AM -0400, Alexander Bulekov wrote:
> On 201007 1647, Dima Stepanov wrote:
> > The virtio-blk fuzz target sets up and fuzzes the available virtio-blk
> > queues. The implementation is based on two files:
> >   - tests/qtest/fuzz/virtio_scsi_fuzz.c
> >   - tests/qtest/virtio_blk_test.c
> > 
> > Signed-off-by: Dima Stepanov 
> > ---
> >  tests/qtest/fuzz/meson.build   |   1 +
> >  tests/qtest/fuzz/virtio_blk_fuzz.c | 234 
> > +
> >  2 files changed, 235 insertions(+)
> >  create mode 100644 tests/qtest/fuzz/virtio_blk_fuzz.c
> > 
> > diff --git a/tests/qtest/fuzz/meson.build b/tests/qtest/fuzz/meson.build
> > index b31ace7..3b923dc 100644
> > --- a/tests/qtest/fuzz/meson.build
> > +++ b/tests/qtest/fuzz/meson.build
> > @@ -5,6 +5,7 @@ specific_fuzz_ss.add(files('fuzz.c', 'fork_fuzz.c', 
> > 'qos_fuzz.c',
> >  specific_fuzz_ss.add(when: 'CONFIG_I440FX', if_true: 
> > files('i440fx_fuzz.c'))
> >  specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: 
> > files('virtio_net_fuzz.c'))
> >  specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_SCSI', if_true: 
> > files('virtio_scsi_fuzz.c'))
> > +specific_fuzz_ss.add(files('virtio_blk_fuzz.c'))
> 
> Hi Dima,
> For consistency, maybe
> specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: 
> files('virtio_blk_fuzz.c'))
Good point, will update it.

> 
...
> > +
> > +static char *drive_create(void)
> > +{
> > +int fd, ret;
> > +char *t_path = g_strdup("/tmp/qtest.XX");
> > +
> > +/* Create a temporary raw image */
> > +fd = mkstemp(t_path);
> > +g_assert_cmpint(fd, >=, 0);
> > +ret = ftruncate(fd, TEST_IMAGE_SIZE);
> > +g_assert_cmpint(ret, ==, 0);
> > +close(fd);
> > +
> > +g_test_queue_destroy(drive_destroy, t_path);
> > +return t_path;
> > +}
> > +
> 
> I tested this out and it works with multi-process fuzzing under -jobs=4
> -workers=4 (this initialization happens after libfuzzer has already
> forked the processes). This seems like an interesting alternative to
> using fake null-co:// files. 
> I wonder if some state might leak as these disks are filled with fuzzer
> data.
Yes, i've also chosen between the fake null device and temporary file.
Tried this approach, just to see what will happen ). It seems to me that
slightly different paths can be triggered in this case and it is closer
to real usage.
But indeed, mb some state can leak, this is interesting.

> 
> Nit: these disk files remain after the fuzzer exists. It looks
> like the libfuzzer people suggest simply using atexit() to perform
> cleanup: https://reviews.llvm.org/D45762
> The is that the only way I have found to terminate the fuzzer is with
> SIGKILL, where atexit is skipped. QEMU installs some signal handlers in
> os-posix.c:os_setup_signal_handling to notify the main_loop that the
> qemu was killed. Since we replace qemu_main_loop by manually running
> main_loop_wait, we don't check main_loop_should_exit().
Got it! Thanks for sharing this is good to know ).

No other comments mixed in below.

Dima.
> 
> I sent a patch to disable QEMU's signal handlers for the fuzzer.
> Message-Id: <20201013152920.448335-1-alx...@bu.edu>
> 
> With an atexit() call to clean up the temporary images:
> Reviewed-by: Alexander Bulekov 
> 
> > +static void *virtio_blk_test_setup(GString *cmd_line, void *arg)
> > +{
> > +char *tmp_path = drive_create();
> > +
> > +g_string_append_printf(cmd_line,
> > +   " -drive if=none,id=drive0,file=%s,"
> > +   "format=raw,auto-read-only=off ",
> > +   tmp_path);
> > +
> > +return arg;
> > +}
> > +
> > +static void register_virtio_blk_fuzz_targets(void)
> > +{
> > +fuzz_add_qos_target(&(FuzzTarget){
> > +.name = "virtio-blk-fuzz",
> > +.description = "Fuzz the virtio-blk virtual queues, 
> > forking "
> > +"for each fuzz run",
> > +.pre_vm_init = &counter_shm_init,
> > +.pre_fuzz = &virtio_blk_pre_fuzz,
> > +.fuzz = virtio_blk_fork_fuzz,},
> > +"virtio-blk",
> > +&(QOSGraphTestOptions){.before = virtio_blk_test_setup}
> > +);
> > +
> > +fuzz_add_qos_target(&(FuzzTarget){
> > +.name = "virtio-blk-flags-fuzz",
> > +.description = "Fuzz the virtio-blk virtual queues, 
> > forking "
> > +"for each fuzz run (also fuzzes the virtio flags)",
> > +.pre_vm_init = &counter_shm_init,
> > +.pre_fuzz = &virtio_blk_pre_fuzz,
> > +.fuzz = virtio_blk_with_flag_fuzz,},
> > +"virtio-blk",
> > +&(QOSGraphTestOptions){.before = virtio_blk_test_setup}
> > +);
> > +}
> > +
> > +fuzz_target_init(register_virtio_blk_fuzz_targets);
> > -- 
> > 2.7.4
> >

[PATCH 06/10] colo-compare: check mark in mutual exclusion

2020-10-14 Thread Zhang Chen

From: Li Zhijian 

Signed-off-by: Li Zhijian 
Signed-off-by: Zhang Chen 
Reviewed-by: Zhang Chen 
---
 net/colo-compare.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 86980cef5e..24c366eec0 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -476,13 +476,11 @@ sec:
 colo_release_primary_pkt(s, ppkt);
 g_queue_push_head(&conn->secondary_list, spkt);
 goto pri;
-}
-if (mark == COLO_COMPARE_FREE_SECONDARY) {
+} else if (mark == COLO_COMPARE_FREE_SECONDARY) {
 conn->compare_seq = spkt->seq_end;
 packet_destroy(spkt, NULL);
 goto sec;
-}
-if (mark == (COLO_COMPARE_FREE_PRIMARY | COLO_COMPARE_FREE_SECONDARY)) 
{
+} else if (mark == (COLO_COMPARE_FREE_PRIMARY | 
COLO_COMPARE_FREE_SECONDARY)) {
 conn->compare_seq = ppkt->seq_end;
 colo_release_primary_pkt(s, ppkt);
 packet_destroy(spkt, NULL);
-- 
2.17.1

[RFC v1 2/2] accel/tcg: split tcg_start_vcpu_thread

2020-10-14 Thread Claudio Fontana

after the initial split into 3 tcg variants, we proceed to also
split tcg_start_vcpu_thread.

We actually split it in 2 this time, since the icount variant
just uses the round robin function.

Signed-off-by: Claudio Fontana 
---
 accel/tcg/tcg-all.c |  5 
 accel/tcg/tcg-cpus-icount.c |  2 +-
 accel/tcg/tcg-cpus-mttcg.c  | 26 -
 accel/tcg/tcg-cpus-rr.c | 37 +++-
 accel/tcg/tcg-cpus-rr.h |  3 ++
 accel/tcg/tcg-cpus.c| 56 -
 6 files changed, 70 insertions(+), 59 deletions(-)

diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index e42a028043..1ac0b76515 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -105,6 +105,11 @@ static int tcg_init(MachineState *ms)
 tcg_exec_init(s->tb_size * 1024 * 1024);
 mttcg_enabled = s->mttcg_enabled;
 
+/*
+ * Initialize TCG regions
+ */
+tcg_region_init();
+
 if (mttcg_enabled) {
 cpus_register_accel(&tcg_cpus_mttcg);
 } else if (icount_enabled()) {
diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c
index 43505e8f1f..8859e384d7 100644
--- a/accel/tcg/tcg-cpus-icount.c
+++ b/accel/tcg/tcg-cpus-icount.c
@@ -136,7 +136,7 @@ static void icount_handle_interrupt(CPUState *cpu, int mask)
 }
 
 const CpusAccel tcg_cpus_icount = {
-.create_vcpu_thread = tcg_start_vcpu_thread,
+.create_vcpu_thread = rr_start_vcpu_thread,
 .kick_vcpu_thread = qemu_cpu_kick_rr_cpus,
 
 .handle_interrupt = icount_handle_interrupt,
diff --git a/accel/tcg/tcg-cpus-mttcg.c b/accel/tcg/tcg-cpus-mttcg.c
index 2f5317d767..1f34bbb625 100644
--- a/accel/tcg/tcg-cpus-mttcg.c
+++ b/accel/tcg/tcg-cpus-mttcg.c
@@ -110,8 +110,32 @@ static void mttcg_kick_vcpu_thread(CPUState *cpu)
 cpu_exit(cpu);
 }
 
+static void mttcg_start_vcpu_thread(CPUState *cpu)
+{
+char thread_name[VCPU_THREAD_NAME_SIZE];
+
+g_assert(tcg_enabled());
+
+parallel_cpus = (current_machine->smp.max_cpus > 1);
+
+cpu->thread = g_malloc0(sizeof(QemuThread));
+cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+qemu_cond_init(cpu->halt_cond);
+
+/* create a thread per vCPU with TCG (MTTCG) */
+snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+ cpu->cpu_index);
+
+qemu_thread_create(cpu->thread, thread_name, tcg_cpu_thread_fn,
+   cpu, QEMU_THREAD_JOINABLE);
+
+#ifdef _WIN32
+cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+}
+
 const CpusAccel tcg_cpus_mttcg = {
-.create_vcpu_thread = tcg_start_vcpu_thread,
+.create_vcpu_thread = mttcg_start_vcpu_thread,
 .kick_vcpu_thread = mttcg_kick_vcpu_thread,
 
 .handle_interrupt = tcg_handle_interrupt,
diff --git a/accel/tcg/tcg-cpus-rr.c b/accel/tcg/tcg-cpus-rr.c
index b8fd33d9d3..d3a1d8d626 100644
--- a/accel/tcg/tcg-cpus-rr.c
+++ b/accel/tcg/tcg-cpus-rr.c
@@ -262,8 +262,43 @@ void *tcg_rr_cpu_thread_fn(void *arg)
 return NULL;
 }
 
+void rr_start_vcpu_thread(CPUState *cpu)
+{
+char thread_name[VCPU_THREAD_NAME_SIZE];
+static QemuCond *single_tcg_halt_cond;
+static QemuThread *single_tcg_cpu_thread;
+
+g_assert(tcg_enabled());
+parallel_cpus = false;
+
+if (!single_tcg_cpu_thread) {
+cpu->thread = g_malloc0(sizeof(QemuThread));
+cpu->halt_cond = g_malloc0(sizeof(QemuCond));
+qemu_cond_init(cpu->halt_cond);
+
+/* share a single thread for all cpus with TCG */
+snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+qemu_thread_create(cpu->thread, thread_name,
+   tcg_rr_cpu_thread_fn,
+   cpu, QEMU_THREAD_JOINABLE);
+
+single_tcg_halt_cond = cpu->halt_cond;
+single_tcg_cpu_thread = cpu->thread;
+#ifdef _WIN32
+cpu->hThread = qemu_thread_get_handle(cpu->thread);
+#endif
+} else {
+/* we share the thread */
+cpu->thread = single_tcg_cpu_thread;
+cpu->halt_cond = single_tcg_halt_cond;
+cpu->thread_id = first_cpu->thread_id;
+cpu->can_do_io = 1;
+cpu->created = true;
+}
+}
+
 const CpusAccel tcg_cpus_rr = {
-.create_vcpu_thread = tcg_start_vcpu_thread,
+.create_vcpu_thread = rr_start_vcpu_thread,
 .kick_vcpu_thread = qemu_cpu_kick_rr_cpus,
 
 .handle_interrupt = tcg_handle_interrupt,
diff --git a/accel/tcg/tcg-cpus-rr.h b/accel/tcg/tcg-cpus-rr.h
index 155510cfd4..94dc85c1c5 100644
--- a/accel/tcg/tcg-cpus-rr.h
+++ b/accel/tcg/tcg-cpus-rr.h
@@ -17,6 +17,9 @@ extern const CpusAccel tcg_cpus_rr;
 /* Kick all RR vCPUs. */
 void qemu_cpu_kick_rr_cpus(CPUState *unused);
 
+/* start the round robin vcpu thread */
+void rr_start_vcpu_thread(CPUState *cpu);
+
 /* Single-threaded TCG */
 void *tcg_rr_cpu_thread_fn(void *arg);
 
diff --git a/accel/tcg/tcg-cpus.c b/accel/tcg/tcg-cpus.c
index e518cc7ed4..3d2bb7a76a 100644
--- a/accel/tcg/tcg-cpus.c
+++ b/accel/tcg/tcg-cpus.c
@@ -36,62 +36,6 @@
 
 /* common f

[PATCH 08/10] net/colo-compare.c: Change the timer clock type

2020-10-14 Thread Zhang Chen

From: Zhang Chen 

The virtual clock only runs during the emulation. It stops
when the virtual machine is stopped.
The host clock should be used for device models that emulate accurate
real time sources. It will continue to run when the virtual machine
is suspended. COLO need to know the host time here.

Reported-by: Derek Su 
Signed-off-by: Zhang Chen 
Reviewed-by: Li Zhijian 
---
 net/colo-compare.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index f4814c5f09..61c95fe7e9 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -899,7 +899,7 @@ static void check_old_packet_regular(void *opaque)
 
 /* if have old packet we will notify checkpoint */
 colo_old_packet_check(s);
-timer_mod(s->packet_check_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+timer_mod(s->packet_check_timer, qemu_clock_get_ms(QEMU_CLOCK_HOST) +
   s->expired_scan_cycle);
 }
 
@@ -933,10 +933,10 @@ static void colo_compare_timer_init(CompareState *s)
 {
 AioContext *ctx = iothread_get_aio_context(s->iothread);
 
-s->packet_check_timer = aio_timer_new(ctx, QEMU_CLOCK_VIRTUAL,
+s->packet_check_timer = aio_timer_new(ctx, QEMU_CLOCK_HOST,
 SCALE_MS, check_old_packet_regular,
 s);
-timer_mod(s->packet_check_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+timer_mod(s->packet_check_timer, qemu_clock_get_ms(QEMU_CLOCK_HOST) +
   s->expired_scan_cycle);
 }
 
-- 
2.17.1

[RFC v1 0/2] tcg-cpus: split into 3 tcg variants

2020-10-14 Thread Claudio Fontana

The purpose of this series is to split the tcg-cpus into
3 variants:

tcg_cpus_mttcg(multithreaded tcg vcpus)
tcg_cpus_rr   (single threaded round robin vcpus)
tcg_cpus_icount   (same as RR, but using icount)

Alex, I read the comment in tcg_start_vcpu_thread saying:

/*
 * Initialize TCG regions--once. Now is a good time, because:
 * (1) TCG's init context, prologue and target globals have been set up.
 * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
 * -accel flag is processed, so the check doesn't work then).
 */

Is this actually current?

I tried to refactor this (see patch 2), and it seems to work to do
the init of regions in tcg_init, and it seems that mttcg_enabled is known
already at that point..

Ciao,

Claudio

Claudio Fontana (2):
  accel/tcg: split CpusAccel into three TCG variants
  accel/tcg: split tcg_start_vcpu_thread

 accel/tcg/meson.build   |   9 +-
 accel/tcg/tcg-all.c |  13 +-
 accel/tcg/tcg-cpus-icount.c | 145 +++
 accel/tcg/tcg-cpus-icount.h |  20 ++
 accel/tcg/tcg-cpus-mttcg.c  | 142 ++
 accel/tcg/tcg-cpus-mttcg.h  |  25 ++
 accel/tcg/tcg-cpus-rr.c | 305 ++
 accel/tcg/tcg-cpus-rr.h |  26 ++
 accel/tcg/tcg-cpus.c| 500 +---
 accel/tcg/tcg-cpus.h|   9 +-
 softmmu/icount.c|   2 +-
 11 files changed, 697 insertions(+), 499 deletions(-)
 create mode 100644 accel/tcg/tcg-cpus-icount.c
 create mode 100644 accel/tcg/tcg-cpus-icount.h
 create mode 100644 accel/tcg/tcg-cpus-mttcg.c
 create mode 100644 accel/tcg/tcg-cpus-mttcg.h
 create mode 100644 accel/tcg/tcg-cpus-rr.c
 create mode 100644 accel/tcg/tcg-cpus-rr.h

-- 
2.26.2

[RFC v1 1/2] accel/tcg: split CpusAccel into three TCG variants

2020-10-14 Thread Claudio Fontana

split up the CpusAccel tcg_cpus into three TCG variants:

tcg_cpus_rr (single threaded, round robin cpus)
tcg_cpus_icount (same as rr, but with instruction counting enabled)
tcg_cpus_mttcg (multi-threaded cpus)

Signed-off-by: Claudio Fontana 
---
 accel/tcg/meson.build   |   9 +-
 accel/tcg/tcg-all.c |   8 +-
 accel/tcg/tcg-cpus-icount.c | 145 +++
 accel/tcg/tcg-cpus-icount.h |  20 ++
 accel/tcg/tcg-cpus-mttcg.c  | 118 +
 accel/tcg/tcg-cpus-mttcg.h  |  25 ++
 accel/tcg/tcg-cpus-rr.c | 270 
 accel/tcg/tcg-cpus-rr.h |  23 ++
 accel/tcg/tcg-cpus.c| 478 ++--
 accel/tcg/tcg-cpus.h|   9 +-
 softmmu/icount.c|   2 +-
 11 files changed, 647 insertions(+), 460 deletions(-)
 create mode 100644 accel/tcg/tcg-cpus-icount.c
 create mode 100644 accel/tcg/tcg-cpus-icount.h
 create mode 100644 accel/tcg/tcg-cpus-mttcg.c
 create mode 100644 accel/tcg/tcg-cpus-mttcg.h
 create mode 100644 accel/tcg/tcg-cpus-rr.c
 create mode 100644 accel/tcg/tcg-cpus-rr.h

diff --git a/accel/tcg/meson.build b/accel/tcg/meson.build
index 19b9343d5b..f39aab0a0c 100644
--- a/accel/tcg/meson.build
+++ b/accel/tcg/meson.build
@@ -12,4 +12,11 @@ tcg_ss.add(when: 'CONFIG_SOFTMMU', if_false: 
files('user-exec-stub.c'))
 tcg_ss.add(when: 'CONFIG_PLUGIN', if_true: [files('plugin-gen.c'), libdl])
 specific_ss.add_all(when: 'CONFIG_TCG', if_true: tcg_ss)
 
-specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TCG'], if_true: 
files('tcg-all.c', 'cputlb.c', 'tcg-cpus.c'))
+specific_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_TCG'], if_true: files(
+  'tcg-all.c',
+  'cputlb.c',
+  'tcg-cpus.c',
+  'tcg-cpus-mttcg.c',
+  'tcg-cpus-icount.c',
+  'tcg-cpus-rr.c'
+))
diff --git a/accel/tcg/tcg-all.c b/accel/tcg/tcg-all.c
index fa1208158f..e42a028043 100644
--- a/accel/tcg/tcg-all.c
+++ b/accel/tcg/tcg-all.c
@@ -104,8 +104,14 @@ static int tcg_init(MachineState *ms)
 
 tcg_exec_init(s->tb_size * 1024 * 1024);
 mttcg_enabled = s->mttcg_enabled;
-cpus_register_accel(&tcg_cpus);
 
+if (mttcg_enabled) {
+cpus_register_accel(&tcg_cpus_mttcg);
+} else if (icount_enabled()) {
+cpus_register_accel(&tcg_cpus_icount);
+} else {
+cpus_register_accel(&tcg_cpus_rr);
+}
 return 0;
 }
 
diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c
new file mode 100644
index 00..43505e8f1f
--- /dev/null
+++ b/accel/tcg/tcg-cpus-icount.c
@@ -0,0 +1,145 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "sysemu/tcg.h"
+#include "sysemu/replay.h"
+#include "qemu/main-loop.h"
+#include "qemu/guest-random.h"
+#include "exec/exec-all.h"
+#include "hw/boards.h"
+
+#include "tcg-cpus.h"
+
+int64_t tcg_get_icount_limit(void)
+{
+int64_t deadline;
+
+if (replay_mode != REPLAY_MODE_PLAY) {
+/*
+ * Include all the timers, because they may need an attention.
+ * Too long CPU execution may create unnecessary delay in UI.
+ */
+deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL,
+  QEMU_TIMER_ATTR_ALL);
+/* Check realtime timers, because they help with input processing */
+deadline = qemu_soonest_timeout(deadline,
+qemu_clock_deadline_ns_all(QEMU_CLOCK_REALTIME,
+   QEMU_TIMER_ATTR_ALL));
+
+/*
+ * Maintain prior (possibly buggy) behaviour where if no deadline
+ * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
+ * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+ * nanoseconds.
+ */
+if ((deadline < 0) || (deadline > INT32_MAX)) {
+deadline = INT32_

Re: [PATCH 2/2] qga: add ssh-{add,remove}-authorized-keys

2020-10-14 Thread Marc-André Lureau

Hi

On Wed, Oct 14, 2020 at 1:14 AM Philippe Mathieu-Daudé
 wrote:
>
> Hi Marc-André,
>
> On 10/13/20 10:25 PM, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> >
> > Add new commands to add and remove SSH public keys from
> > ~/.ssh/authorized_keys.
> >
> > I took a different approach for testing, including the unit tests right
> > with the code. I wanted to overwrite the function to get the user
> > details, I couldn't easily do that over QMP. Furthermore, I prefer
> > having unit tests very close to the code, and unit files that are domain
> > specific (commands-posix is too crowded already). Fwiw, that
>
> FWIW

ok

>
> > coding/testing style is Rust-style (where tests can or should even be
> > part of the documentation!).
> >
> > Fixes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1885332
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> ...
>
> > diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
> > index cec98c7e06..50e2854b45 100644
> > --- a/qga/qapi-schema.json
> > +++ b/qga/qapi-schema.json
> > @@ -1306,3 +1306,35 @@
> >   ##
> >   { 'command': 'guest-get-devices',
> > 'returns': ['GuestDeviceInfo'] }
> > +
> > +##
> > +# @guest-ssh-add-authorized-keys:
> > +#
> > +# @username: the user account to add the authorized key
> > +# @keys: the public keys to add (in OpenSSH format)
>
> You use plural but the code only seems to add (remove) one key
> at a time.

Uh, what makes you believe that?

>
> 'OpenSSH format' is confusing. From sshd(8):
>
>Each line of the file contains one key (empty lines and lines
>starting with a ‘#’ are ignored as comments).
>
>Public keys consist of the following space-separated fields:
>
>  options, keytype, base64-encoded key, comment.
>
>The options field is optional.
>
>Note that lines in this file can be several hundred bytes long
>(because of the size of the public key encoding) up to a limit
>of 8 kilobytes, which permits RSA keys up to 16 kilobits.
>
>The options (if present) consist of comma-separated option
>specifications.  No spaces are permitted, except within double
>quotes.
>
> @openssh_authorized_key_line is ugly, maybe use @authorized_key
> to make it clearer?

Why? the name of the function already implies we are talking about
authorized keys. The documentation says it's a public key in openssh
format (the ones you expect in ~/.ssh/authorized_keys files)

Yes the format isn't very well defined, so I did simple sanity checks.
After all, people usually append keys with shell >>. I can't find a
common command to do it with some checking.

> > +#
> > +# Append a public key to user $HOME/.ssh/authorized_keys on Unix systems 
> > (not
> > +# implemented for other systems).
>
> Here "a key" singular, good.

bad. it should be plural (everywhere else is plural, afaict)

>
> > +#
> > +# Returns: Nothing on success.
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'guest-ssh-add-authorized-keys',
>
> Here "keys" plural :/
>
> > +  'data': { 'username': 'str', 'keys': ['str'] } }
> > +
> > +##
> > +# @guest-ssh-remove-authorized-keys:
> > +#
> > +# @username: the user account to add the authorized key
> > +# @keys: the public keys to remove (in OpenSSH format)
> > +#
> > +# Remove public keys from the user $HOME/.ssh/authorized_keys on Unix 
> > systems
> > +# (not implemented for other systems).
> > +#
> > +# Returns: Nothing on success.
> > +#
> > +# Since: 5.2
> > +##
> > +{ 'command': 'guest-ssh-remove-authorized-keys',
> > +  'data': { 'username': 'str', 'keys': ['str'] } }
> >
>

Re: [PATCH v1 1/2] fuzz: add virtio-blk fuzz target

2020-10-14 Thread Dima Stepanov

On Wed, Oct 14, 2020 at 10:29:41AM +0300, Dima Stepanov wrote:
> On Tue, Oct 13, 2020 at 11:30:52AM -0400, Alexander Bulekov wrote:
> > On 201007 1647, Dima Stepanov wrote:
> > > The virtio-blk fuzz target sets up and fuzzes the available virtio-blk
> > > queues. The implementation is based on two files:
> > >   - tests/qtest/fuzz/virtio_scsi_fuzz.c
> > >   - tests/qtest/virtio_blk_test.c
> > > 
> > > Signed-off-by: Dima Stepanov 
> > > ---
> > >  tests/qtest/fuzz/meson.build   |   1 +
> > >  tests/qtest/fuzz/virtio_blk_fuzz.c | 234 
> > > +
> > >  2 files changed, 235 insertions(+)
> > >  create mode 100644 tests/qtest/fuzz/virtio_blk_fuzz.c
> > > 
> > > diff --git a/tests/qtest/fuzz/meson.build b/tests/qtest/fuzz/meson.build
> > > index b31ace7..3b923dc 100644
> > > --- a/tests/qtest/fuzz/meson.build
> > > +++ b/tests/qtest/fuzz/meson.build
> > > @@ -5,6 +5,7 @@ specific_fuzz_ss.add(files('fuzz.c', 'fork_fuzz.c', 
> > > 'qos_fuzz.c',
> > >  specific_fuzz_ss.add(when: 'CONFIG_I440FX', if_true: 
> > > files('i440fx_fuzz.c'))
> > >  specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: 
> > > files('virtio_net_fuzz.c'))
> > >  specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_SCSI', if_true: 
> > > files('virtio_scsi_fuzz.c'))
> > > +specific_fuzz_ss.add(files('virtio_blk_fuzz.c'))
> > 
> > Hi Dima,
> > For consistency, maybe
> > specific_fuzz_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: 
> > files('virtio_blk_fuzz.c'))
> Good point, will update it.
> 
> > 
> ...
> > > +
> > > +static char *drive_create(void)
> > > +{
> > > +int fd, ret;
> > > +char *t_path = g_strdup("/tmp/qtest.XX");
> > > +
> > > +/* Create a temporary raw image */
> > > +fd = mkstemp(t_path);
> > > +g_assert_cmpint(fd, >=, 0);
> > > +ret = ftruncate(fd, TEST_IMAGE_SIZE);
> > > +g_assert_cmpint(ret, ==, 0);
> > > +close(fd);
> > > +
> > > +g_test_queue_destroy(drive_destroy, t_path);
> > > +return t_path;
> > > +}
> > > +
> > 
> > I tested this out and it works with multi-process fuzzing under -jobs=4
> > -workers=4 (this initialization happens after libfuzzer has already
> > forked the processes). This seems like an interesting alternative to
> > using fake null-co:// files. 
> > I wonder if some state might leak as these disks are filled with fuzzer
> > data.
> Yes, i've also chosen between the fake null device and temporary file.
> Tried this approach, just to see what will happen ). It seems to me that
> slightly different paths can be triggered in this case and it is closer
> to real usage.
> But indeed, mb some state can leak, this is interesting.
> 
> > 
> > Nit: these disk files remain after the fuzzer exists. It looks
> > like the libfuzzer people suggest simply using atexit() to perform
> > cleanup: https://reviews.llvm.org/D45762
> > The is that the only way I have found to terminate the fuzzer is with
> > SIGKILL, where atexit is skipped. QEMU installs some signal handlers in
> > os-posix.c:os_setup_signal_handling to notify the main_loop that the
> > qemu was killed. Since we replace qemu_main_loop by manually running
> > main_loop_wait, we don't check main_loop_should_exit().
> Got it! Thanks for sharing this is good to know ).
> 
> No other comments mixed in below.
> 
> Dima.
> > 
> > I sent a patch to disable QEMU's signal handlers for the fuzzer.
> > Message-Id: <20201013152920.448335-1-alx...@bu.edu>
Sorry, i couldn't find a patch you've pointed out above. Could you share
some link to it? Also, am i correct that it is a general change for the
QEMU fuzzing, so all the fuzzing targets will automatically reuse it?

> > 
> > With an atexit() call to clean up the temporary images:
> > Reviewed-by: Alexander Bulekov 
> > 
> > > +static void *virtio_blk_test_setup(GString *cmd_line, void *arg)
> > > +{
> > > +char *tmp_path = drive_create();
> > > +
> > > +g_string_append_printf(cmd_line,
> > > +   " -drive if=none,id=drive0,file=%s,"
> > > +   "format=raw,auto-read-only=off ",
> > > +   tmp_path);
> > > +
> > > +return arg;
> > > +}
> > > +
> > > +static void register_virtio_blk_fuzz_targets(void)
> > > +{
> > > +fuzz_add_qos_target(&(FuzzTarget){
> > > +.name = "virtio-blk-fuzz",
> > > +.description = "Fuzz the virtio-blk virtual queues, 
> > > forking "
> > > +"for each fuzz run",
> > > +.pre_vm_init = &counter_shm_init,
> > > +.pre_fuzz = &virtio_blk_pre_fuzz,
> > > +.fuzz = virtio_blk_fork_fuzz,},
> > > +"virtio-blk",
> > > +&(QOSGraphTestOptions){.before = virtio_blk_test_setup}
> > > +);
> > > +
> > > +fuzz_add_qos_target(&(FuzzTarget){
> > > +.name = "virtio-blk-flags-fuzz",
> > > +.description = "Fuzz the virtio-blk virtual queues, 
> > > forking "
> > > +

Re: [RFC v1 1/2] accel/tcg: split CpusAccel into three TCG variants

2020-10-14 Thread Philippe Mathieu-Daudé


On 10/14/20 9:36 AM, Claudio Fontana wrote:

split up the CpusAccel tcg_cpus into three TCG variants:

tcg_cpus_rr (single threaded, round robin cpus)
tcg_cpus_icount (same as rr, but with instruction counting enabled)
tcg_cpus_mttcg (multi-threaded cpus)

Signed-off-by: Claudio Fontana 
---
  accel/tcg/meson.build   |   9 +-
  accel/tcg/tcg-all.c |   8 +-
  accel/tcg/tcg-cpus-icount.c | 145 +++
  accel/tcg/tcg-cpus-icount.h |  20 ++
  accel/tcg/tcg-cpus-mttcg.c  | 118 +
  accel/tcg/tcg-cpus-mttcg.h  |  25 ++
  accel/tcg/tcg-cpus-rr.c | 270 
  accel/tcg/tcg-cpus-rr.h |  23 ++
  accel/tcg/tcg-cpus.c| 478 ++--
  accel/tcg/tcg-cpus.h|   9 +-
  softmmu/icount.c|   2 +-
  11 files changed, 647 insertions(+), 460 deletions(-)
  create mode 100644 accel/tcg/tcg-cpus-icount.c
  create mode 100644 accel/tcg/tcg-cpus-icount.h
  create mode 100644 accel/tcg/tcg-cpus-mttcg.c
  create mode 100644 accel/tcg/tcg-cpus-mttcg.h
  create mode 100644 accel/tcg/tcg-cpus-rr.c
  create mode 100644 accel/tcg/tcg-cpus-rr.h

...


diff --git a/accel/tcg/tcg-cpus-icount.c b/accel/tcg/tcg-cpus-icount.c
new file mode 100644
index 00..43505e8f1f
--- /dev/null
+++ b/accel/tcg/tcg-cpus-icount.c
@@ -0,0 +1,145 @@
+/*
+ * QEMU System Emulator


"QEMU single threaded vCPUs implementation using instruction counting"?


+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */

...


diff --git a/accel/tcg/tcg-cpus-icount.h b/accel/tcg/tcg-cpus-icount.h
new file mode 100644
index 00..e2e25674c2
--- /dev/null
+++ b/accel/tcg/tcg-cpus-icount.h
@@ -0,0 +1,20 @@
+/*
+ * Accelerator CPUS Interface


Ditto.


+ *
+ * Copyright 2020 SUSE LLC
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef TCG_CPUS_ICOUNT_H
+#define TCG_CPUS_ICOUNT_H
+
+extern const CpusAccel tcg_cpus_icount;
+
+int64_t tcg_get_icount_limit(void);
+void handle_icount_deadline(void);
+void prepare_icount_for_run(CPUState *cpu);
+void process_icount_data(CPUState *cpu);
+
+#endif /* TCG_CPUS_ICOUNT_H */
diff --git a/accel/tcg/tcg-cpus-mttcg.c b/accel/tcg/tcg-cpus-mttcg.c
new file mode 100644
index 00..2f5317d767
--- /dev/null
+++ b/accel/tcg/tcg-cpus-mttcg.c
@@ -0,0 +1,118 @@
+/*
+ * QEMU System Emulator


"QEMU multi-threaded vCPUs implementation"?


+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2014 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */

...


diff --git a/accel/tcg/tcg-cpus-mttcg.h b/accel/tcg/tcg-cpus-mttcg.h
new file mode 100644
index 00..c8966b2f8a
--- /dev/null
+++ b/accel/tcg/tcg-cpus-mttcg.h
@@ -0,0 +1,25 @@
+/*
+ * Accelerator CPUS Interface


Ditto.

Re: [RFC v1 0/2] tcg-cpus: split into 3 tcg variants

2020-10-14 Thread Philippe Mathieu-Daudé


On 10/14/20 9:36 AM, Claudio Fontana wrote:

The purpose of this series is to split the tcg-cpus into
3 variants:

tcg_cpus_mttcg(multithreaded tcg vcpus)
tcg_cpus_rr   (single threaded round robin vcpus)
tcg_cpus_icount   (same as RR, but using icount)


Good idea!

[Bug 1899082] Re: ReplayKernel.test_x86_64_pc fails intermittently

2020-10-14 Thread Pavel Dovgalyuk

I traced this bug to hw/char/serial.c/serial_ioport_read

Bug disappears when I add qemu_log("serial_ioport_read %x %x\n",
(int)addr, ret); into the end of this function.

I suppose that there is avocado (or socket) io synchronization problem,
because running the same test without avocado works normally.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1899082

Title:
  ReplayKernel.test_x86_64_pc fails intermittently

Status in QEMU:
  New

Bug description:
  Even though this acceptance test is already skipped on GitLab CI, the
  intermittent failures can be seen on other environments too.

  The record phase works fine, but during the replay phase fail to
  finish booting the kernel (until the expected place):

  16:34:47 DEBUG| [0.034498] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 
1GB 0
  16:34:47 DEBUG| [0.034790] Spectre V2 : Spectre mitigation: LFENCE not 
serializing, switching to generic retpoline
  16:34:47 DEBUG| [0.035093] Spectre V2 : Mitigation: Full generic retpoline
  16:34:47 DEBUG| [0.035347] Spectre V2 : Spectre v2 / SpectreRSB 
mitigation: Filling RSB on context switch
  16:34:47 DEBUG| [0.035667]
  16:36:02 ERROR| 
  16:36:02 ERROR| Reproduced traceback from: 
/home/cleber/src/avocado/avocado/avocado/core/test.py:767
  16:36:02 ERROR| Traceback (most recent call last):
  16:36:02 ERROR|   File 
"/var/lib/users/cleber/build/qemu/tests/acceptance/replay_kernel.py", line 92, 
in test_x86_64_pc
  16:36:02 ERROR| self.run_rr(kernel_path, kernel_command_line, 
console_pattern, shift=5)
  16:36:02 ERROR|   File 
"/var/lib/users/cleber/build/qemu/tests/acceptance/replay_kernel.py", line 73, 
in run_rr
  16:36:02 ERROR| False, shift, args, replay_path)
  16:36:02 ERROR|   File 
"/var/lib/users/cleber/build/qemu/tests/acceptance/replay_kernel.py", line 55, 
in run_vm
  16:36:02 ERROR| self.wait_for_console_pattern(console_pattern, vm)
  16:36:02 ERROR|   File 
"/var/lib/users/cleber/build/qemu/tests/acceptance/boot_linux_console.py", line 
53, in wait_for_console_pattern
  16:36:02 ERROR| vm=vm)
  16:36:02 ERROR|   File 
"/var/lib/users/cleber/build/qemu/tests/acceptance/avocado_qemu/__init__.py", 
line 130, in wait_for_console_pattern
  16:36:02 ERROR| _console_interaction(test, success_message, 
failure_message, None, vm=vm)
  16:36:02 ERROR|   File 
"/var/lib/users/cleber/build/qemu/tests/acceptance/avocado_qemu/__init__.py", 
line 82, in _console_interaction
  16:36:02 ERROR| msg = console.readline().strip()
  16:36:02 ERROR|   File "/usr/lib64/python3.7/socket.py", line 575, in readinto
  16:36:02 ERROR| def readinto(self, b):
  16:36:02 ERROR|   File 
"/home/cleber/src/avocado/avocado/avocado/plugins/runner.py", line 77, in 
sigterm_handler
  16:36:02 ERROR| raise RuntimeError("Test interrupted by SIGTERM")
  16:36:02 ERROR| RuntimeError: Test interrupted by SIGTERM
  16:36:02 ERROR| 

  On my workstation, I can replicate the failure roughly once every 50
  runs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1899082/+subscriptions

[PULL 6/9] vnc-stubs: Allow -vnc none

2020-10-14 Thread Gerd Hoffmann

From: Jason Andryuk 

Currently `-vnc none` is fatal when built with `--disable-vnc`.  Make
vnc_parse accept "none", so QEMU still run without using vnc.

Signed-off-by: Jason Andryuk 
Message-id: 20201009014032.3507-1-jandr...@gmail.com
Signed-off-by: Gerd Hoffmann 
---
 ui/vnc-stubs.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/ui/vnc-stubs.c b/ui/vnc-stubs.c
index 06c4ac6296eb..c6b737dcec67 100644
--- a/ui/vnc-stubs.c
+++ b/ui/vnc-stubs.c
@@ -12,6 +12,9 @@ int vnc_display_pw_expire(const char *id, time_t expires)
 };
 QemuOpts *vnc_parse(const char *str, Error **errp)
 {
+if (strcmp(str, "none") == 0) {
+return NULL;
+}
 error_setg(errp, "VNC support is disabled");
 return NULL;
 }
-- 
2.27.0

[PULL 8/9] input-linux: Reset il->fd handler before closing it

2020-10-14 Thread Gerd Hoffmann

From: Colin Xu 

If object-del input-linux object on-the-fly, instance finalize will
close evdev fd without resetting it. However the main thread is still
trying to lock_acquire/lock_release during ppoll, which leads to a very
high CPU utilization.

Signed-off-by: Colin Xu 
Reviewed-by: Li Qiang 
Message-id: 20200925021808.26471-1-colin...@intel.com
Signed-off-by: Gerd Hoffmann 
---
 ui/input-linux.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ui/input-linux.c b/ui/input-linux.c
index ab351a418701..34cc531190f9 100644
--- a/ui/input-linux.c
+++ b/ui/input-linux.c
@@ -418,6 +418,7 @@ static void input_linux_instance_finalize(Object *obj)
 
 if (il->initialized) {
 QTAILQ_REMOVE(&inputs, il, next);
+qemu_set_fd_handler(il->fd, NULL, NULL, NULL);
 close(il->fd);
 }
 g_free(il->evdev);
-- 
2.27.0

[PULL 5/9] configure: Fixes ncursesw detection under msys2/mingw by convert them to meson

2020-10-14 Thread Gerd Hoffmann

From: Yonggang Luo 

The mingw pkg-config are showing following absolute path and contains : as the 
separator,

-D_XOPEN_SOURCE=600 -D_POSIX_C_SOURCE=199506L 
-IC:/CI-Tools/msys64/mingw64/include/ncursesw:-I/usr/include/ncursesw:
-DNCURSES_WIDECHAR -D_XOPEN_SOURCE=600 -D_POSIX_C_SOURCE=199506L -IC -pipe 
-lncursesw -lgnurx -ltre -lintl -liconv
-DNCURSES_WIDECHAR -D_XOPEN_SOURCE=600 -D_POSIX_C_SOURCE=199506L -IC -lncursesw
-DNCURSES_WIDECHAR -D_XOPEN_SOURCE=600 -D_POSIX_C_SOURCE=199506L -IC -lcursesw
-DNCURSES_WIDECHAR /CI-Tools/msys64/mingw64/include/ncursesw -pipe -lncursesw 
-lgnurx -ltre -lintl -liconv
-DNCURSES_WIDECHAR /CI-Tools/msys64/mingw64/include/ncursesw -lncursesw
-DNCURSES_WIDECHAR /CI-Tools/msys64/mingw64/include/ncursesw -lcursesw
-DNCURSES_WIDECHAR -I/usr/include/ncursesw -pipe -lncursesw -lgnurx -ltre 
-lintl -liconv
-DNCURSES_WIDECHAR -I/usr/include/ncursesw -lncursesw
-DNCURSES_WIDECHAR -I/usr/include/ncursesw -lcursesw

Signed-off-by: Yonggang Luo 
Reviewed-by: Gerd Hoffmann 
Message-id: 20201012234348.1427-6-luoyongg...@gmail.com
Signed-off-by: Gerd Hoffmann 
---
 configure | 118 +++---
 meson_options.txt |   4 ++
 meson.build   |  83 +++-
 ui/meson.build|   2 +-
 4 files changed, 83 insertions(+), 124 deletions(-)

diff --git a/configure b/configure
index 9a87685517ee..f839c2a557c3 100755
--- a/configure
+++ b/configure
@@ -295,7 +295,8 @@ unset target_list_exclude
 
 brlapi=""
 curl=""
-curses=""
+iconv="auto"
+curses="auto"
 docs=""
 fdt="auto"
 netmap="no"
@@ -1173,13 +1174,13 @@ for opt do
   ;;
   --disable-safe-stack) safe_stack="no"
   ;;
-  --disable-curses) curses="no"
+  --disable-curses) curses="disabled"
   ;;
-  --enable-curses) curses="yes"
+  --enable-curses) curses="enabled"
   ;;
-  --disable-iconv) iconv="no"
+  --disable-iconv) iconv="disabled"
   ;;
-  --enable-iconv) iconv="yes"
+  --enable-iconv) iconv="enabled"
   ;;
   --disable-curl) curl="no"
   ;;
@@ -3440,102 +3441,6 @@ EOF
   fi
 fi
 
-##
-# iconv probe
-if test "$iconv" != "no" ; then
-  cat > $TMPC << EOF
-#include 
-int main(void) {
-  iconv_t conv = iconv_open("WCHAR_T", "UCS-2");
-  return conv != (iconv_t) -1;
-}
-EOF
-  iconv_prefix_list="/usr/local:/usr"
-  iconv_lib_list=":-liconv"
-  IFS=:
-  for iconv_prefix in $iconv_prefix_list; do
-IFS=:
-iconv_cflags="-I$iconv_prefix/include"
-iconv_ldflags="-L$iconv_prefix/lib"
-for iconv_link in $iconv_lib_list; do
-  unset IFS
-  iconv_lib="$iconv_ldflags $iconv_link"
-  echo "looking at iconv in '$iconv_cflags' '$iconv_lib'" >> config.log
-  if compile_prog "$iconv_cflags" "$iconv_lib" ; then
-iconv_found=yes
-break
-  fi
-done
-if test "$iconv_found" = yes ; then
-  break
-fi
-  done
-  if test "$iconv_found" = "yes" ; then
-iconv=yes
-  else
-if test "$iconv" = "yes" ; then
-  feature_not_found "iconv" "Install iconv devel"
-fi
-iconv=no
-  fi
-fi
-
-##
-# curses probe
-if test "$iconv" = "no" ; then
-  # curses will need iconv
-  curses=no
-fi
-if test "$curses" != "no" ; then
-  if test "$mingw32" = "yes" ; then
-curses_inc_list="$($pkg_config --cflags ncurses 2>/dev/null):"
-curses_lib_list="$($pkg_config --libs ncurses 2>/dev/null):-lpdcurses"
-  else
-curses_inc_list="$($pkg_config --cflags ncursesw 
2>/dev/null):-I/usr/include/ncursesw:"
-curses_lib_list="$($pkg_config --libs ncursesw 
2>/dev/null):-lncursesw:-lcursesw"
-  fi
-  curses_found=no
-  cat > $TMPC << EOF
-#include 
-#include 
-#include 
-int main(void) {
-  wchar_t wch = L'w';
-  setlocale(LC_ALL, "");
-  resize_term(0, 0);
-  addwstr(L"wide chars\n");
-  addnwstr(&wch, 1);
-  add_wch(WACS_DEGREE);
-  return 0;
-}
-EOF
-  IFS=:
-  for curses_inc in $curses_inc_list; do
-# Make sure we get the wide character prototypes
-curses_inc="-DNCURSES_WIDECHAR $curses_inc"
-IFS=:
-for curses_lib in $curses_lib_list; do
-  unset IFS
-  if compile_prog "$curses_inc" "$curses_lib" ; then
-curses_found=yes
-break
-  fi
-done
-if test "$curses_found" = yes ; then
-  break
-fi
-  done
-  unset IFS
-  if test "$curses_found" = "yes" ; then
-curses=yes
-  else
-if test "$curses" = "yes" ; then
-  feature_not_found "curses" "Install ncurses devel"
-fi
-curses=no
-  fi
-fi
-
 ##
 # curl probe
 if test "$curl" != "no" ; then
@@ -6200,16 +6105,6 @@ if test "$have_x11" = "yes" && test "$need_x11" = "yes"; 
then
   echo "X11_CFLAGS=$x11_cflags" >> $config_host_mak
   echo "X11_LIBS=$x11_libs" >> $config_host_mak
 fi
-if test "$iconv" = "yes" ; then
-  echo "CONFIG_ICONV=y" >> $config_host_mak
-  echo "ICONV_CFLAGS=$iconv_cflags" >> $config_host_mak
-  echo "ICONV_LIBS=$iconv_lib" >> $config_host_mak
-fi
-if test "$curses" = "ye

[PULL 3/9] curses: Fixes curses compiling errors.

2020-10-14 Thread Gerd Hoffmann

From: Yonggang Luo 

This is the compiling error:
../ui/curses.c: In function 'curses_refresh':
../ui/curses.c:256:5: error: 'next_maybe_keycode' may be used uninitialized in 
this function [-Werror=maybe-uninitialized]
  256 | curses2foo(_curses2keycode, _curseskey2keycode, chr, maybe_keycode)
  | ^~
../ui/curses.c:302:32: note: 'next_maybe_keycode' was declared here
  302 | enum maybe_keycode next_maybe_keycode;
  |^~
../ui/curses.c:256:5: error: 'maybe_keycode' may be used uninitialized in this 
function [-Werror=maybe-uninitialized]
  256 | curses2foo(_curses2keycode, _curseskey2keycode, chr, maybe_keycode)
  | ^~
../ui/curses.c:265:24: note: 'maybe_keycode' was declared here
  265 | enum maybe_keycode maybe_keycode;
  |^
cc1.exe: all warnings being treated as errors

gcc version 10.2.0 (Rev1, Built by MSYS2 project)

Signed-off-by: Yonggang Luo 
Reviewed-by: Gerd Hoffmann 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20201012234348.1427-4-luoyongg...@gmail.com
Signed-off-by: Gerd Hoffmann 
---
 ui/curses.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ui/curses.c b/ui/curses.c
index 12bc682cf9b0..e4f9588c3e8a 100644
--- a/ui/curses.c
+++ b/ui/curses.c
@@ -262,7 +262,7 @@ static int curses2foo(const int _curses2foo[], const int 
_curseskey2foo[],
 static void curses_refresh(DisplayChangeListener *dcl)
 {
 int chr, keysym, keycode, keycode_alt;
-enum maybe_keycode maybe_keycode;
+enum maybe_keycode maybe_keycode = CURSES_KEYCODE;
 
 curses_winch_check();
 
@@ -299,7 +299,7 @@ static void curses_refresh(DisplayChangeListener *dcl)
 
 /* alt or esc key */
 if (keycode == 1) {
-enum maybe_keycode next_maybe_keycode;
+enum maybe_keycode next_maybe_keycode = CURSES_KEYCODE;
 int nextchr = console_getch(&next_maybe_keycode);
 
 if (nextchr != -1) {
-- 
2.27.0

[PULL 1/9] qemu-edid: drop cast

2020-10-14 Thread Gerd Hoffmann

Not needed and makes some compilers error out with:

qemu-edid.c:15:1: error: initializer element is not constant

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20201013091615.14166-1-kra...@redhat.com
---
 qemu-edid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-edid.c b/qemu-edid.c
index 1db3372b982c..1cd6a9517238 100644
--- a/qemu-edid.c
+++ b/qemu-edid.c
@@ -9,7 +9,7 @@
 #include "qemu/cutils.h"
 #include "hw/display/edid.h"
 
-static qemu_edid_info info = (qemu_edid_info) {
+static qemu_edid_info info = {
 .prefx = 1024,
 .prefy = 768,
 };
-- 
2.27.0

[PATCH] buildsys: Help git-diff adding .gitattributes config file

2020-10-14 Thread Philippe Mathieu-Daudé

Since commits 0979ed017f0 ("meson: rename .inc.h files to .h.inc")
and 139c1837db7 ("meson: rename included C source files to .c.inc")
'git-diff --function-context' stopped displaying C function context
correctly.
We can help git-diff by providing attributes to the .[ch].inc path
names. See:
https://git-scm.com/docs/gitattributes#_generating_diff_text

Signed-off-by: Philippe Mathieu-Daudé 
---
 .gitattributes | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 .gitattributes

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 000..3d2fe2ecda8
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,2 @@
+*.c.inc diff=c
+*.h.inc diff=c
-- 
2.26.2

[PULL 9/9] ui: Fix default window_id value

2020-10-14 Thread Gerd Hoffmann

From: Samuel Thibault 

./chardev/baum.c expects the default window_id value to be -1, and not 0
which could be confused with a proper window id (when numbered from 0 by
the ui backend).

This fixes getting Braille output with the curses and gtk frontends.

Fixes: f29b3431f62 ("console: move window ID code from baum to sdl")
Signed-off-by: Samuel Thibault 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20200914100637.eeommoflirxrgaeh@function>
Signed-off-by: Gerd Hoffmann 
---
 ui/console.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ui/console.c b/ui/console.c
index 54a74c0b16c9..820e4081709d 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1310,6 +1310,7 @@ static QemuConsole *new_console(DisplayState *ds, 
console_type_t console_type,
 }
 s->ds = ds;
 s->console_type = console_type;
+s->window_id = -1;
 
 if (QTAILQ_EMPTY(&consoles)) {
 s->index = 0;
-- 
2.27.0

[PULL 0/9] Ui 20201014 patches

2020-10-14 Thread Gerd Hoffmann

The following changes since commit 96292515c07e3a99f5a29540ed2f257b1ff75111:

  Merge remote-tracking branch 
'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging 
(2020-10-13 14:06:22 +0100)

are available in the Git repository at:

  git://git.kraxel.org/qemu tags/ui-20201014-pull-request

for you to fetch changes up to 41d004d8af59885da2c21460a73898b1aa09690f:

  ui: Fix default window_id value (2020-10-14 10:20:26 +0200)


ui: fixes for sdl, curses, vnc, input-linux.



Colin Xu (1):
  input-linux: Reset il->fd handler before closing it

Gerd Hoffmann (1):
  qemu-edid: drop cast

Jan Henrik Weinstock (1):
  SDL: enable OpenGL context creation

Jason Andryuk (1):
  vnc-stubs: Allow -vnc none

Samuel Thibault (1):
  ui: Fix default window_id value

Yonggang Luo (4):
  curses: Fixes compiler error that complain don't have langinfo.h on
msys2/mingw
  curses: Fixes curses compiling errors.
  win32: Simplify gmtime_r detection not depends on if _POSIX_C_SOURCE
are defined on msys2/mingw
  configure: Fixes ncursesw detection under msys2/mingw by convert them
to meson

 configure | 155 ++
 meson_options.txt |   4 +
 include/sysemu/os-win32.h |   4 +-
 qemu-edid.c   |   2 +-
 ui/console.c  |   1 +
 ui/curses.c   |  14 ++--
 ui/input-linux.c  |   1 +
 ui/sdl2.c |   5 ++
 ui/vnc-stubs.c|   3 +
 util/oslib-win32.c|   4 +-
 meson.build   |  83 +---
 ui/meson.build|   2 +-
 12 files changed, 105 insertions(+), 173 deletions(-)

-- 
2.27.0

[PULL 2/9] curses: Fixes compiler error that complain don't have langinfo.h on msys2/mingw

2020-10-14 Thread Gerd Hoffmann

From: Yonggang Luo 

msys2/mingw lacks the POSIX-required langinfo.h.

gcc test.c -DNCURSES_WIDECHAR -I/mingw64/include/ncursesw -pipe -lncursesw 
-lgnurx -ltre -lintl -liconv
test.c:4:10: fatal error: langinfo.h: No such file or directory
4 | #include 
  |  ^~~~
compilation terminated.

So we using g_get_codeset instead of nl_langinfo(CODESET)

Signed-off-by: Yonggang Luo 
Reviewed-by: Gerd Hoffmann 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: 20201012234348.1427-3-luoyongg...@gmail.com
Signed-off-by: Gerd Hoffmann 
---
 configure   |  5 +
 ui/curses.c | 10 +-
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 1b6348852171..71a574f718fa 100755
--- a/configure
+++ b/configure
@@ -3530,17 +3530,14 @@ if test "$curses" != "no" ; then
 #include 
 #include 
 #include 
-#include 
 int main(void) {
-  const char *codeset;
   wchar_t wch = L'w';
   setlocale(LC_ALL, "");
   resize_term(0, 0);
   addwstr(L"wide chars\n");
   addnwstr(&wch, 1);
   add_wch(WACS_DEGREE);
-  codeset = nl_langinfo(CODESET);
-  return codeset != 0;
+  return 0;
 }
 EOF
   IFS=:
diff --git a/ui/curses.c b/ui/curses.c
index a59b23a9cf63..12bc682cf9b0 100644
--- a/ui/curses.c
+++ b/ui/curses.c
@@ -30,7 +30,6 @@
 #endif
 #include 
 #include 
-#include 
 #include 
 
 #include "qapi/error.h"
@@ -526,6 +525,7 @@ static void font_setup(void)
 iconv_t nativecharset_to_ucs2;
 iconv_t font_conv;
 int i;
+g_autofree gchar *local_codeset = g_get_codeset();
 
 /*
  * Control characters are normally non-printable, but VGA does have
@@ -566,14 +566,14 @@ static void font_setup(void)
   0x25bc
 };
 
-ucs2_to_nativecharset = iconv_open(nl_langinfo(CODESET), "UCS-2");
+ucs2_to_nativecharset = iconv_open(local_codeset, "UCS-2");
 if (ucs2_to_nativecharset == (iconv_t) -1) {
 fprintf(stderr, "Could not convert font glyphs from UCS-2: '%s'\n",
 strerror(errno));
 exit(1);
 }
 
-nativecharset_to_ucs2 = iconv_open("UCS-2", nl_langinfo(CODESET));
+nativecharset_to_ucs2 = iconv_open("UCS-2", local_codeset);
 if (nativecharset_to_ucs2 == (iconv_t) -1) {
 iconv_close(ucs2_to_nativecharset);
 fprintf(stderr, "Could not convert font glyphs to UCS-2: '%s'\n",
@@ -581,7 +581,7 @@ static void font_setup(void)
 exit(1);
 }
 
-font_conv = iconv_open(nl_langinfo(CODESET), font_charset);
+font_conv = iconv_open(local_codeset, font_charset);
 if (font_conv == (iconv_t) -1) {
 iconv_close(ucs2_to_nativecharset);
 iconv_close(nativecharset_to_ucs2);
@@ -602,7 +602,7 @@ static void font_setup(void)
 /* DEL */
 convert_ucs(0x7F, 0x2302, ucs2_to_nativecharset);
 
-if (strcmp(nl_langinfo(CODESET), "UTF-8")) {
+if (strcmp(local_codeset, "UTF-8")) {
 /* Non-Unicode capable, use termcap equivalents for those available */
 for (i = 0; i <= 0xFF; i++) {
 wchar_t wch[CCHARW_MAX];
-- 
2.27.0

[PULL 4/9] win32: Simplify gmtime_r detection not depends on if _POSIX_C_SOURCE are defined on msys2/mingw

2020-10-14 Thread Gerd Hoffmann

From: Yonggang Luo 

We remove the CONFIG_LOCALTIME_R detection option in configure, and move the 
check
existence of gmtime_r from configure into C header and source directly by using 
macro
`_POSIX_THREAD_SAFE_FUNCTIONS`.
Before this patch, the configure script are always assume the compiler doesn't 
define
_POSIX_C_SOURCE macro at all, but that's not true, because thirdparty library 
such
as ncursesw may define -D_POSIX_C_SOURCE in it's pkg-config file. And that C 
Flags will
added -D_POSIX_C_SOURCE into each QEMU_CFLAGS. And that's causing the following 
compiling error:
n file included from C:/work/xemu/qemu/include/qemu/osdep.h:119,
 from ../softmmu/main.c:25:
C:/work/xemu/qemu/include/sysemu/os-win32.h:53:12: error: redundant 
redeclaration of 'gmtime_r' [-Werror=redundant-decls]
   53 | struct tm *gmtime_r(const time_t *timep, struct tm *result);
  |^~~~
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:94,
 from ../softmmu/main.c:25:
C:/CI-Tools/msys64/mingw64/x86_64-w64-mingw32/include/time.h:284:36: note: 
previous definition of 'gmtime_r' was here
  284 | __forceinline struct tm *__CRTDECL gmtime_r(const time_t *_Time, struct 
tm *_Tm) {
  |^~~~
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:119,
 from ../softmmu/main.c:25:
C:/work/xemu/qemu/include/sysemu/os-win32.h:55:12: error: redundant 
redeclaration of 'localtime_r' [-Werror=redundant-decls]
   55 | struct tm *localtime_r(const time_t *timep, struct tm *result);
  |^~~
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:94,
 from ../softmmu/main.c:25:
C:/CI-Tools/msys64/mingw64/x86_64-w64-mingw32/include/time.h:281:36: note: 
previous definition of 'localtime_r' was here
  281 | __forceinline struct tm *__CRTDECL localtime_r(const time_t *_Time, 
struct tm *_Tm) {
  |^~~
Compiling C object libcommon.fa.p/hw_gpio_zaurus.c.obj
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:119,
 from ../hw/i2c/smbus_slave.c:16:
C:/work/xemu/qemu/include/sysemu/os-win32.h:53:12: error: redundant 
redeclaration of 'gmtime_r' [-Werror=redundant-decls]
   53 | struct tm *gmtime_r(const time_t *timep, struct tm *result);
  |^~~~
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:94,
 from ../hw/i2c/smbus_slave.c:16:
C:/CI-Tools/msys64/mingw64/x86_64-w64-mingw32/include/time.h:284:36: note: 
previous definition of 'gmtime_r' was here
  284 | __forceinline struct tm *__CRTDECL gmtime_r(const time_t *_Time, struct 
tm *_Tm) {
  |^~~~
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:119,
 from ../hw/i2c/smbus_slave.c:16:
C:/work/xemu/qemu/include/sysemu/os-win32.h:55:12: error: redundant 
redeclaration of 'localtime_r' [-Werror=redundant-decls]
   55 | struct tm *localtime_r(const time_t *timep, struct tm *result);
  |^~~
In file included from C:/work/xemu/qemu/include/qemu/osdep.h:94,
 from ../hw/i2c/smbus_slave.c:16:
C:/CI-Tools/msys64/mingw64/x86_64-w64-mingw32/include/time.h:281:36: note: 
previous definition of 'localtime_r' was here
  281 | __forceinline struct tm *__CRTDECL localtime_r(const time_t *_Time, 
struct tm *_Tm) {
  |^~~
Compiling C object libcommon.fa.p/hw_dma_xilinx_axidma.c.obj

After this patch, whenever ncursesw or other thirdparty libraries tried to 
define or not
define  _POSIX_C_SOURCE, the source will building properly. Because now, we 
don't make any
assumption if _POSIX_C_SOURCE are defined. We solely relied on if the macro 
`_POSIX_THREAD_SAFE_FUNCTIONS`
are defined in msys2/mingw header.

The _POSIX_THREAD_SAFE_FUNCTIONS are defined in mingw header like this:

```
#if defined(_POSIX_C_SOURCE) && !defined(_POSIX_THREAD_SAFE_FUNCTIONS)
#define _POSIX_THREAD_SAFE_FUNCTIONS 200112L
#endif

#ifdef _POSIX_THREAD_SAFE_FUNCTIONS
__forceinline struct tm *__CRTDECL localtime_r(const time_t *_Time, struct tm 
*_Tm) {
  return localtime_s(_Tm, _Time) ? NULL : _Tm;
}
__forceinline struct tm *__CRTDECL gmtime_r(const time_t *_Time, struct tm 
*_Tm) {
  return gmtime_s(_Tm, _Time) ? NULL : _Tm;
}
__forceinline char *__CRTDECL ctime_r(const time_t *_Time, char *_Str) {
  return ctime_s(_Str, 0x7fff, _Time) ? NULL : _Str;
}
__forceinline char *__CRTDECL asctime_r(const struct tm *_Tm, char * _Str) {
  return asctime_s(_Str, 0x7fff, _Tm) ? NULL : _Str;
}
#endif
```

Signed-off-by: Yonggang Luo 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20201012234348.1427-5-luoyongg...@gmail.com
Signed-off-by: Gerd Hoffmann 
---
 configure | 34 --
 include/sysemu/os-win32.h |  4 ++--
 util/oslib-win32.c|  4 ++--
 3 files changed, 4 insertions(

[PULL 7/9] SDL: enable OpenGL context creation

2020-10-14 Thread Gerd Hoffmann

From: Jan Henrik Weinstock 

We need to specify SDL_WINDOW_OPENGL if we want to create an OpenGL context on 
it, i.e. when using '-device virtio-gpu-pci,virgl=on'

Signed-off-by: Jan Henrik Weinstock 
Message-id: b2ba98b3-2975-0d4d-1c56-f659923c7...@rwth-aachen.de
Signed-off-by: Gerd Hoffmann 
---
 ui/sdl2.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/ui/sdl2.c b/ui/sdl2.c
index abad7f981e50..189d26e2a951 100644
--- a/ui/sdl2.c
+++ b/ui/sdl2.c
@@ -84,6 +84,11 @@ void sdl2_window_create(struct sdl2_console *scon)
 if (scon->hidden) {
 flags |= SDL_WINDOW_HIDDEN;
 }
+#ifdef CONFIG_OPENGL
+if (scon->opengl) {
+flags |= SDL_WINDOW_OPENGL;
+}
+#endif
 
 scon->real_window = SDL_CreateWindow("", SDL_WINDOWPOS_UNDEFINED,
  SDL_WINDOWPOS_UNDEFINED,
-- 
2.27.0

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-14 Thread Paolo Bonzini

On 13/10/20 22:36, harry harry wrote:
> Hi Paolo and Sean,
> 
> Thanks much for your prompt replies and clear explanations.
> 
> On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini  wrote:
>>
>> No, the logic to find the HPA with a given HVA is the same as the
>> hardware logic to translate HVA -> HPA.  That is it uses the host
>> "regular" page tables, not the nested page tables.
>>
>> In order to translate GPA to HPA, instead, KVM does not use the nested
>> page tables.
> 
> I am curious why KVM does not directly use GPAs as HVAs and leverage
> nested page tables to translate HVAs (i.e., GPAs) to HPAs?

GPAs and HVAs are different things.  In fact I'm not aware of any
hypervisor that uses HVA==GPA.  On 32-bit x86 systems HVAs are 32-bit
(obviously) but GPAs are 36-bit.

In the case of KVM, HVAs are controlled by the rest of Linux; for
example, when you do "mmap" to allocate guest memory you cannot ask the
OS to return the guest memory at the exact HVA that is needed by the
guest.  There could be something else at that HVA (or you don't want
anything at that HVA: GPA 0 is valid, but HVA 0 is the NULL pointer!).
There's also cases where the same memory appears in multiple places in
the guest memory map (aliasing).

Paolo

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-14 Thread Paolo Bonzini

On 14/10/20 00:40, harry harry wrote:
> Q1: Is there any file like ``/proc/pid/pagemap'' to record the
> mappings between GPAs and HVAs in the host OS?

No, there isn't.

> Q2: Seems that there might be extra overhead (e.g., synchronization
> between EPT tables and host regular page tables; maintaining extra
> regular page tables and data structures), which is caused by the extra
> translation between GPAs to HVAs via memslots. Why doesn't KVM
> directly use GPAs as HVAs and leverage extended/nested page tables to
> translate HVAs (i.e., GPAs) to HPAs?

See my other answer.  What you are saying is simply not possible.

Paolo

Re: Why guest physical addresses are not the same as the corresponding host virtual addresses in QEMU/KVM? Thanks!

2020-10-14 Thread Maxim Levitsky

On Tue, 2020-10-13 at 16:36 -0400, harry harry wrote:
> Hi Paolo and Sean,
> 
> Thanks much for your prompt replies and clear explanations.
> 
> On Tue, Oct 13, 2020 at 2:43 AM Paolo Bonzini  wrote:
> > No, the logic to find the HPA with a given HVA is the same as the
> > hardware logic to translate HVA -> HPA.  That is it uses the host
> > "regular" page tables, not the nested page tables.
> > 
> > In order to translate GPA to HPA, instead, KVM does not use the nested
> > page tables.
> 
> I am curious why KVM does not directly use GPAs as HVAs and leverage
> nested page tables to translate HVAs (i.e., GPAs) to HPAs? Is that
> because 1) the hardware logic of ``GPA -> [extended/nested page
> tables] -> HPA[*]'' is different[**] from the hardware logic of ``HVA
> -> [host regular page tables] -> HPA''; 2) if 1) is true, it is
> natural to reuse Linux's original functionality to translate HVAs to
> HPAs through regular page tables.
I would like to emphisise again. The HVA space is not fully free when a guest 
starts,
since it contains qemu's heap, code, data, and whatever qemu needs. However 
guest't
GPA address space must be allocated fully. E.g if qemu heap starts at 0x4,
then guest can't have physical memory at 0x4 following your suggestion, 
which
is wrong. It can be in theory done by blacklisting these areas via ACPI/BIOS 
provided
memory map, but that would be very very difficult to maintain and not worth it.

Best regards,
Maxim Levitsky

> 
> [*]: Here, the translation means the last step for MMU to translate a
> GVA's corresponding GPA to an HPA through the extended/nested page
> tables.
> [**]: To my knowledge, the hardware logic of ``GPA -> [extended/nested
> page tables] -> HPA'' seems to be the same as the hardware logic of
> ``HVA -> [host regular page tables] -> HPA''. I appreciate it if you
> could point out the differences I ignored. Thanks!
> 
> Best,
> Harry
>

[PATCH v2] hw/block/nvme: add dulbe support

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

This adds support for reporting the Deallocated or Unwritten Logical
Block error (DULBE).

Rely on the block status reported by the block layer. Here, we consider
blocks with the BDRV_BLOCK_ZERO status to deallocated. This is because a
pdiscard or write zeroes with unmap won't result in the block having the
BDRV_BLOCK_ALLOCATED status removed in all cases, but the
BDRV_BLOCK_ZERO status will most likely be set.

This works best with the raw driver and 4K LBAs since holes can then be
punched on a block basis. For a qcow2 backing file, the cluster size is
far larger and block will only be marked with BDRV_BLOCK_ZERO if a full
cluster is zeroes or discarded. This is consistent with the spec since
Write Zeroes "should" deallocate the block if the Deallocate attribute
is set and "may" deallocate if the Deallocate attribute is not set.
Here, we always try to unmap (discard) the block and it may or may not
succeed.

While DSM is not implemented yet, the "advisory" nature of it would also
allow this behavior to be consistent with the spec.

Signed-off-by: Klaus Jensen 
---
v2: rely on bdrv_block_status instead of bitmap tracking (Keith).
---
 hw/block/nvme-ns.h   |  4 +++
 include/block/nvme.h |  5 +++
 hw/block/nvme-ns.c   |  4 +++
 hw/block/nvme.c  | 73 
 4 files changed, 86 insertions(+)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 83734f4606e1..44bf6271b744 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -31,6 +31,10 @@ typedef struct NvmeNamespace {
 NvmeIdNs id_ns;
 
 NvmeNamespaceParams params;
+
+struct {
+uint32_t err_rec;
+} features;
 } NvmeNamespace;
 
 static inline uint32_t nvme_nsid(NvmeNamespace *ns)
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 6de2d5aa75a9..2249d77c2129 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -686,6 +686,7 @@ enum NvmeStatusCodes {
 NVME_E2E_REF_ERROR  = 0x0284,
 NVME_CMP_FAILURE= 0x0285,
 NVME_ACCESS_DENIED  = 0x0286,
+NVME_DULB   = 0x0287,
 NVME_MORE   = 0x2000,
 NVME_DNR= 0x4000,
 NVME_NO_COMPLETE= 0x,
@@ -902,6 +903,9 @@ enum NvmeIdCtrlLpa {
 #define NVME_AEC_NS_ATTR(aec)   ((aec >> 8) & 0x1)
 #define NVME_AEC_FW_ACTIVATION(aec) ((aec >> 9) & 0x1)
 
+#define NVME_ERR_REC_TLER(err_rec)  (err_rec & 0x)
+#define NVME_ERR_REC_DULBE(err_rec) (err_rec & 0x1)
+
 enum NvmeFeatureIds {
 NVME_ARBITRATION= 0x1,
 NVME_POWER_MANAGEMENT   = 0x2,
@@ -1022,6 +1026,7 @@ enum NvmeNsIdentifierType {
 
 
 #define NVME_ID_NS_NSFEAT_THIN(nsfeat)  ((nsfeat & 0x1))
+#define NVME_ID_NS_NSFEAT_DULBE(nsfeat) ((nsfeat >> 2) & 0x1)
 #define NVME_ID_NS_FLBAS_EXTENDED(flbas)((flbas >> 4) & 0x1)
 #define NVME_ID_NS_FLBAS_INDEX(flbas)   ((flbas & 0xf))
 #define NVME_ID_NS_MC_SEPARATE(mc)  ((mc >> 1) & 0x1)
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 31c80cdf5b5f..475c6fe44084 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -44,6 +44,9 @@ static void nvme_ns_init(NvmeNamespace *ns)
 /* no thin provisioning */
 id_ns->ncap = id_ns->nsze;
 id_ns->nuse = id_ns->ncap;
+
+/* support DULBE */
+id_ns->nsfeat |= 0x4;
 }
 
 static int nvme_ns_init_blk(NvmeCtrl *n, NvmeNamespace *ns, Error **errp)
@@ -92,6 +95,7 @@ int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error 
**errp)
 }
 
 nvme_ns_init(ns);
+
 if (nvme_register_namespace(n, ns, errp)) {
 return -1;
 }
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9d30ca69dcf1..b4f47448575b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -105,6 +105,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
 
 static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_TEMPERATURE_THRESHOLD]= NVME_FEAT_CAP_CHANGE,
+[NVME_ERROR_RECOVERY]   = NVME_FEAT_CAP_CHANGE | NVME_FEAT_CAP_NS,
 [NVME_VOLATILE_WRITE_CACHE] = NVME_FEAT_CAP_CHANGE,
 [NVME_NUMBER_OF_QUEUES] = NVME_FEAT_CAP_CHANGE,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
@@ -888,6 +889,36 @@ static inline uint16_t nvme_check_bounds(NvmeCtrl *n, 
NvmeNamespace *ns,
 return NVME_SUCCESS;
 }
 
+static uint16_t nvme_check_dulbe(NvmeNamespace *ns, uint64_t slba,
+ uint32_t nlb)
+{
+BlockDriverState *bs = blk_bs(ns->blkconf.blk);
+
+int64_t pnum = 0, bytes = nvme_l2b(ns, nlb);
+int64_t offset = nvme_l2b(ns, slba);
+int ret;
+
+/*
+ * `pnum` holds the number of bytes after offset that shares the same
+ * allocation status as the byte at offset. If `pnum` is different from
+ * `bytes`, we should check the allocation status of the next range and
+ * continue this until all bytes have been checked.
+ */
+do {
+bytes -= pnum;
+
+ret = bdrv_blo

Re: [PATCH] hw/block/nvme: add block utilization tracking

2020-10-14 Thread Klaus Jensen

On Oct 13 14:06, Keith Busch wrote:
> On Tue, Oct 13, 2020 at 09:08:46PM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > This adds support for reporting the Deallocated or Unwritten Logical
> > Block error (DULBE). This requires tracking the allocated/deallocated
> > status of all logical blocks.
> > 
> > Introduce a bitmap that does this. The bitmap is always intialized to
> > all ones (aka, all blocks are allocated) on boot up. Blocks can then be
> > specifically deallocated using Write Zeroes. This ensures that we can
> > always guarantee zeroes to be returned from deallocated blocks.
> > 
> > When the device gains support for persistent state, the intention is to
> > remove the "allocated by default" behavior.
> 
> I think this is a rather odd feature for this device to support. The
> implementation is slow can end up allocating quite a bit of memory.

Totally true - but the bitmap was the best way I could find to do this
on a per LBA basis.

> If we were going to support it here, wouldn't it make more sense to
> tie it to the filesystem's ability to support fallocate hole punch for
> the backing namespace, and check if the range is allocated with
> SEEK_HOLE?  Then you don't even need to track persistent state, and
> we're actually getting the genuine capability.
> 

Yes, this would be optimal, definitely. I think we have to rely on what
we can do with the QEMU block layer, so please see my v2 that uses
bdrv_block_status. I did do something like this initially, but was
unsure if we could live with the fact that block drivers such as qcow2
cannot honor a discard unless an entire cluster is discard/zeroed.

But see my commit message, I think we can work with it and still be in
compliance with the spec.


signature.asc
Description: PGP signature

Re: [PATCH v1 1/2] fuzz: add virtio-blk fuzz target

2020-10-14 Thread Darren Kenny

Hi Dima,

On Wednesday, 2020-10-14 at 10:39:01 +03, Dima Stepanov wrote:
> On Wed, Oct 14, 2020 at 10:29:41AM +0300, Dima Stepanov wrote:
>> On Tue, Oct 13, 2020 at 11:30:52AM -0400, Alexander Bulekov wrote:
>> > On 201007 1647, Dima Stepanov wrote:
...

>> > 
>> > I sent a patch to disable QEMU's signal handlers for the fuzzer.
>> > Message-Id: <20201013152920.448335-1-alx...@bu.edu>
> Sorry, i couldn't find a patch you've pointed out above. Could you share
> some link to it? Also, am i correct that it is a general change for the
> QEMU fuzzing, so all the fuzzing targets will automatically reuse it?
>

The patch Alex is referring to is:

- https://lore.kernel.org/qemu-devel/20201013152920.448335-1-alx...@bu.edu/

Thanks,

Darren.

Re: [PATCH] fuzz: Disable QEMU's signal handlers

2020-10-14 Thread Darren Kenny

On Tuesday, 2020-10-13 at 17:52:46 +01, Daniel P. Berrangé wrote:
> On Tue, Oct 13, 2020 at 05:50:37PM +0100, Darren Kenny wrote:
>> Hi Alex,
>> 
>> This mentions the use of atexit() to perform some cleanup, but I'm not
>> seeing that being added here, should it be?
>
> The reference to atexit is strange, because it says the only way to
> kill the fuzzer is SIGKILL, and that won't let atexit handlers run
> anyway.
>

OK, I understand the context now, it is in reference to Dima's patchset:

- 
https://lore.kernel.org/qemu-devel/cover.1602078083.git.dimas...@yandex-team.ru/

Where Alex suggested using atexit() to clean up the left over files from
the test.

And with regard to SIGKILL, I believe it is that today, before Alex's
patch that is the only way to stop the fuzzer running - which I can
attest to since I've found it hard to stop in the past :)

Resetting these signal handlers to the default behaviour would allow the
process to be terminated and an atexit() used, as Alex mentioned.

Alex, if you could clarify the commit message, then I feel this does
make sent to change in the fuzz testing code, so:

Reviewed-by: Darren Kenny 

Thanks,

Darren.

>> 
>> Thanks,
>> 
>> Darren.
>> 
>> On Tuesday, 2020-10-13 at 11:29:20 -04, Alexander Bulekov wrote:
>> > With the fuzzer, we never call main_loop_should_exit, since we manually
>> > call main_loop_wait. This means that the only way to terminate the
>> > fuzzer is with SIGKILL. Disable the signal handlers, so there are
>> > reasonable ways to terminate the fuzzer and use atexit() to clean-up
>> > after the fuzzer.
>> >
>> > Signed-off-by: Alexander Bulekov 
>> > ---
>> >  tests/qtest/fuzz/fuzz.c | 8 
>> >  1 file changed, 8 insertions(+)
>> >
>> > diff --git a/tests/qtest/fuzz/fuzz.c b/tests/qtest/fuzz/fuzz.c
>> > index d926c490c5..eb0070437f 100644
>> > --- a/tests/qtest/fuzz/fuzz.c
>> > +++ b/tests/qtest/fuzz/fuzz.c
>> > @@ -217,5 +217,13 @@ int LLVMFuzzerInitialize(int *argc, char ***argv, 
>> > char ***envp)
>> >  /* re-enable the rcu atfork, which was previously disabled in 
>> > qemu_init */
>> >  rcu_enable_atfork();
>> >  
>> > +/*
>> > + * Disable QEMU's signal handlers, since we manually control the 
>> > main_loop,
>> > + * and don't check for main_loop_should_exit
>> > + */
>> > +signal(SIGINT, SIG_DFL);
>> > +signal(SIGHUP, SIG_DFL);
>> > +signal(SIGTERM, SIG_DFL);
>> > +
>> >  return 0;
>> >  }
>> > -- 
>> > 2.28.0
>> 
>
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v5 2/2] hw/arm/sbsa-ref: add SBSA watchdog device

2020-10-14 Thread Graeme Gregory

On Tue, Oct 13, 2020 at 11:16:31AM -0400, Shashi Mallela wrote:
> Included the newly implemented SBSA generic watchdog device model into
> SBSA platform
> 
> Signed-off-by: Shashi Mallela 
> ---
>  hw/arm/sbsa-ref.c | 50 +++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/hw/arm/sbsa-ref.c b/hw/arm/sbsa-ref.c
> index 9c3a893bedfd..97ed41607119 100644
> --- a/hw/arm/sbsa-ref.c
> +++ b/hw/arm/sbsa-ref.c
> @@ -30,6 +30,7 @@
>  #include "exec/hwaddr.h"
>  #include "kvm_arm.h"
>  #include "hw/arm/boot.h"
> +#include "hw/arm/fdt.h"
>  #include "hw/block/flash.h"
>  #include "hw/boards.h"
>  #include "hw/ide/internal.h"
> @@ -40,6 +41,7 @@
>  #include "hw/qdev-properties.h"
>  #include "hw/usb.h"
>  #include "hw/char/pl011.h"
> +#include "hw/watchdog/wdt_sbsa_gwdt.h"
>  #include "net/net.h"
>  #include "qom/object.h"
>  
> @@ -64,6 +66,9 @@ enum {
>  SBSA_GIC_DIST,
>  SBSA_GIC_REDIST,
>  SBSA_SECURE_EC,
> +SBSA_GWDT,
> +SBSA_GWDT_REFRESH,
> +SBSA_GWDT_CONTROL,
>  SBSA_SMMU,
>  SBSA_UART,
>  SBSA_RTC,
> @@ -104,6 +109,8 @@ static const MemMapEntry sbsa_ref_memmap[] = {
>  [SBSA_GIC_DIST] =   { 0x4006, 0x0001 },
>  [SBSA_GIC_REDIST] = { 0x4008, 0x0400 },
>  [SBSA_SECURE_EC] =  { 0x5000, 0x1000 },
> +[SBSA_GWDT_REFRESH] =   { 0x5001, 0x1000 },
> +[SBSA_GWDT_CONTROL] =   { 0x50011000, 0x1000 },
>  [SBSA_UART] =   { 0x6000, 0x1000 },
>  [SBSA_RTC] ={ 0x6001, 0x1000 },
>  [SBSA_GPIO] =   { 0x6002, 0x1000 },
> @@ -133,6 +140,8 @@ static const int sbsa_ref_irqmap[] = {
>  [SBSA_SECURE_UART_MM] = 9,
>  [SBSA_AHCI] = 10,
>  [SBSA_EHCI] = 11,
> +[SBSA_SMMU] = 12, /* ... to 15 */
> +[SBSA_GWDT] = 16,
>  };
>  
>  static uint64_t sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
> @@ -141,6 +150,30 @@ static uint64_t 
> sbsa_ref_cpu_mp_affinity(SBSAMachineState *sms, int idx)
>  return arm_cpu_mp_affinity(idx, clustersz);
>  }
>  
> +static void create_wdt_fdt(SBSAMachineState *sms)
> +{
> +char *nodename;
> +const char compat[] = "arm,sbsa-gwdt";
> +
> +hwaddr rbase = sbsa_ref_memmap[SBSA_GWDT_REFRESH].base;
> +hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
> +int irq = sbsa_ref_irqmap[SBSA_GWDT];
> +
> +nodename = g_strdup_printf("/watchdog@%" PRIx64, rbase);
> +qemu_fdt_add_subnode(sms->fdt, nodename);
> +
> +qemu_fdt_setprop(sms->fdt, nodename, "compatible",
> + compat, sizeof(compat));
> +qemu_fdt_setprop_sized_cells(sms->fdt, nodename, "reg",
> + 2, rbase, 2, SBSA_GWDT_RMMIO_SIZE,
> + 2, cbase, 2, SBSA_GWDT_CMMIO_SIZE);
> +qemu_fdt_setprop_cells(sms->fdt, nodename, "interrupts",
> +GIC_FDT_IRQ_TYPE_PPI, irq,
> +GIC_FDT_IRQ_FLAGS_LEVEL_HI);
> +qemu_fdt_setprop_cell(sms->fdt, nodename, "timeout-sec", 30);
> +g_free(nodename);
> +}
> +

Is this actually used anywhere? I ask because SBSA-ref is not a FDT
booting machine and only uses FDT to transfer some dynamic info to
arm-tf/edk2 and is not a full description tree. Your ACPI patch in
edk2 certainly does not use this info.

Graeme


>  /*
>   * Firmware on this machine only uses ACPI table to load OS, these limited
>   * device tree nodes are just to let firmware know the info which varies from
> @@ -219,6 +252,7 @@ static void create_fdt(SBSAMachineState *sms)
>  
>  g_free(nodename);
>  }
> +create_wdt_fdt(sms);
>  }
>  
>  #define SBSA_FLASH_SECTOR_SIZE (256 * KiB)
> @@ -447,6 +481,20 @@ static void create_rtc(const SBSAMachineState *sms)
>  sysbus_create_simple("pl031", base, qdev_get_gpio_in(sms->gic, irq));
>  }
>  
> +static void create_wdt(const SBSAMachineState *sms)
> +{
> +hwaddr rbase = sbsa_ref_memmap[SBSA_GWDT_REFRESH].base;
> +hwaddr cbase = sbsa_ref_memmap[SBSA_GWDT_CONTROL].base;
> +DeviceState *dev = qdev_new(TYPE_WDT_SBSA_GWDT);
> +SysBusDevice *s = SYS_BUS_DEVICE(dev);
> +int irq = sbsa_ref_irqmap[SBSA_GWDT];
> +
> +sysbus_realize_and_unref(s, &error_fatal);
> +sysbus_mmio_map(s, 0, rbase);
> +sysbus_mmio_map(s, 1, cbase);
> +sysbus_connect_irq(s, 0, qdev_get_gpio_in(sms->gic, irq));
> +}
> +
>  static DeviceState *gpio_key_dev;
>  static void sbsa_ref_powerdown_req(Notifier *n, void *opaque)
>  {
> @@ -730,6 +778,8 @@ static void sbsa_ref_init(MachineState *machine)
>  
>  create_rtc(sms);
>  
> +create_wdt(sms);
> +
>  create_gpio(sms);
>  
>  create_ahci(sms);
> -- 
> 2.18.4
> 
>

Re: [PATCH v4 2/3] qga: add implementation of guest-get-disks for Linux

2020-10-14 Thread Tomáš Golembiovský

On Tue, Oct 13, 2020 at 11:26:45AM +0400, Marc-André Lureau wrote:
> Hi
> 
> On Mon, Oct 12, 2020 at 12:36 PM Tomáš Golembiovský 
> wrote:
> 
> > The command lists all disks (real and virtual) as well as disk
> > partitions. For each disk the list of dependent disks is also listed and
> > /dev path is used as a handle so it can be matched with "name" field of
> > other returned disk entries. For disk partitions the "dependents" list
> > is populated with the the parent device for easier tracking of
> > hierarchy.
> >
> > Example output:
> > {
> >   "return": [
> > ...
> > {
> >   "name": "/dev/dm-0",
> >   "partition": false,
> >   "dependents": [
> > "/dev/sda2"
> >   ],
> >   "alias": "luks-7062202e-5b9b-433e-81e8-6628c40da9f7"
> > },
> > {
> >   "name": "/dev/sda2",
> >   "partition": true,
> >   "dependents": [
> > "/dev/sda"
> >   ]
> > },
> > {
> >   "name": "/dev/sda",
> >   "partition": false,
> >   "address": {
> > "serial": "SAMSUNG_MZ7LN512HCHP-000L1_S1ZKNXAG822493",
> > "bus-type": "sata",
> > ...
> > "dev": "/dev/sda",
> > "target": 0
> >   },
> >   "dependents": []
> > },
> > ...
> >   ]
> > }
> >
> > Signed-off-by: Tomáš Golembiovský 
> > ---
> >  qga/commands-posix.c | 296 +--
> >  1 file changed, 285 insertions(+), 11 deletions(-)
> >
> > diff --git a/qga/commands-posix.c b/qga/commands-posix.c
> > index 422144bcff..14683dfbd5 100644
> > --- a/qga/commands-posix.c
> > +++ b/qga/commands-posix.c
> > @@ -1150,13 +1150,27 @@ static void
> > build_guest_fsinfo_for_virtual_device(char const *syspath,
> >  closedir(dir);
> >  }
> >
> > +static bool is_disk_virtual(const char *devpath, Error **errp)
> > +{
> > +g_autofree char *syspath = realpath(devpath, NULL);
> > +
> > +if (!syspath) {
> > +error_setg_errno(errp, errno, "realpath(\"%s\")", devpath);
> > +return false;
> > +}
> > +return strstr(syspath, "/devices/virtual/block/") != NULL;
> > +}
> > +
> >  /* Dispatch to functions for virtual/real device */
> >  static void build_guest_fsinfo_for_device(char const *devpath,
> >GuestFilesystemInfo *fs,
> >Error **errp)
> >  {
> > -char *syspath = realpath(devpath, NULL);
> > +ERRP_GUARD();
> > +g_autofree char *syspath = NULL;
> > +bool is_virtual = false;
> >
> > +syspath = realpath(devpath, NULL);
> >  if (!syspath) {
> >  error_setg_errno(errp, errno, "realpath(\"%s\")", devpath);
> >  return;
> > @@ -1167,16 +1181,281 @@ static void build_guest_fsinfo_for_device(char
> > const *devpath,
> >  }
> >
> >  g_debug("  parse sysfs path '%s'", syspath);
> > -
> > -if (strstr(syspath, "/devices/virtual/block/")) {
> > +is_virtual = is_disk_virtual(syspath, errp);
> > +if (*errp != NULL) {
> > +return;
> > +}
> > +if (is_virtual) {
> >  build_guest_fsinfo_for_virtual_device(syspath, fs, errp);
> >  } else {
> >  build_guest_fsinfo_for_real_device(syspath, fs, errp);
> >  }
> > +}
> > +
> > +#ifdef CONFIG_LIBUDEV
> > +
> > +/*
> > + * Wrapper around build_guest_fsinfo_for_device() for getting just
> > + * the disk address.
> > + */
> > +static GuestDiskAddress *get_disk_address(const char *syspath, Error
> > **errp)
> > +{
> > +g_autoptr(GuestFilesystemInfo) fs = NULL;
> > +
> > +fs = g_new0(GuestFilesystemInfo, 1);
> > +build_guest_fsinfo_for_device(syspath, fs, errp);
> > +if (fs->disk != NULL) {
> > +return g_steal_pointer(&fs->disk->value);
> > +}
> > +return NULL;
> > +}
> > +
> > +static char *get_alias_for_syspath(const char *syspath)
> > +{
> > +struct udev *udev = NULL;
> > +struct udev_device *udevice = NULL;
> > +char *ret = NULL;
> > +
> > +udev = udev_new();
> > +if (udev == NULL) {
> > +g_debug("failed to query udev");
> > +goto out;
> > +}
> > +udevice = udev_device_new_from_syspath(udev, syspath);
> > +if (udevice == NULL) {
> > +g_debug("failed to query udev for path: %s", syspath);
> > +goto out;
> > +} else {
> > +const char *alias = udev_device_get_property_value(
> > +udevice, "DM_NAME");
> > +/*
> > + * NULL means there was an error and empty string means there is
> > no
> > + * alias. In case of no alias we return NULL instead of empty
> > string.
> > + */
> > +if (alias == NULL) {
> > +g_debug("failed to query udev for device alias for: %s",
> > +syspath);
> > +} else if (*alias != 0) {
> > +ret = g_strdup(alias);
> > +}
> > +}
> > +
> > +out:
> > +udev_unref(udev);
> > +udev_device_unref(udevice);
> > +return ret;
> > +}
> > +
> > +static char *get_

Re: [PATCH v2] migration/block-dirty-bitmap: fix uninitialized variable warning

2020-10-14 Thread Max Reitz

On 14.10.20 03:03, Chenqun (kuhn) wrote:
> 
> 
>> -Original Message-
>> From: Max Reitz [mailto:mre...@redhat.com]
>> Sent: Tuesday, October 13, 2020 10:47 PM
>> To: Chenqun (kuhn) ; qemu-devel@nongnu.org;
>> qemu-triv...@nongnu.org
>> Cc: vsement...@virtuozzo.com; stefa...@redhat.com; f...@euphon.net;
>> ebl...@redhat.com; js...@redhat.com; quint...@redhat.com;
>> dgilb...@redhat.com; Zhanghailiang ;
>> ganqixin ; qemu-bl...@nongnu.org; Euler Robot
>> ; Laurent Vivier ; Li Qiang
>> 
>> Subject: Re: [PATCH v2] migration/block-dirty-bitmap: fix uninitialized 
>> variable
>> warning
>>
>> On 13.10.20 14:33, Chen Qun wrote:
>>> A default value is provided for the variable 'bitmap_name' to avoid compiler
>> warning.
>>>
>>> The compiler show warning:
>>> migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’
>>> may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>g_strlcpy(s->bitmap_name, bitmap_name,
>> sizeof(s->bitmap_name));
>>>
>> ^~
>>>
>>> Reported-by: Euler Robot 
>>> Signed-off-by: Chen Qun 
>>> ---
>>> Cc: Vladimir Sementsov-Ogievskiy 
>>> Cc: Laurent Vivier 
>>> Cc: Li Qiang 
>>> ---
>>>  migration/block-dirty-bitmap.c | 9 ++---
>>>  1 file changed, 2 insertions(+), 7 deletions(-)
>>
>> No objections, semantically, but...
>>
>>> diff --git a/migration/block-dirty-bitmap.c
>>> b/migration/block-dirty-bitmap.c index 5bef793ac0..bcb79c04ce 100644
>>> --- a/migration/block-dirty-bitmap.c
>>> +++ b/migration/block-dirty-bitmap.c
>>> @@ -1064,15 +1064,13 @@ static int dirty_bitmap_load_header(QEMUFile
>> *f, DBMLoadState *s,
>>>  assert(nothing || s->cancelled || !!alias_map ==
>>> !!bitmap_alias_map);
>>>
>>>  if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
>>> -const char *bitmap_name;
>>> -
>>>  if (!qemu_get_counted_string(f, s->bitmap_alias)) {
>>>  error_report("Unable to read bitmap alias string");
>>>  return -EINVAL;
>>>  }
>>>
>>> -if (!s->cancelled) {
>>> -if (bitmap_alias_map) {
>>> +const char *bitmap_name = s->bitmap_alias;
>>
>> qemu’s coding style mandates declarations to be placed at the beginning of
>> their block, so the declaration has to stay where it is.  (Putting the 
>> assignment
>> here looks reasonable.)
>>
> Hi Max,
>   Declaration variables here to ensure that the above exceptions(Unable to 
> read bitmap alias string) are avoided.
> If the declaration has to stay where it is, is there a possibility that the 
> assignment fails?

I don’t understand what you mean.  A declaration without initialization
isn’t and doesn’t contain an expression, it isn’t even a statement, so
it has no side effects.[1]

Placing the declaration (without an initialization) at the top of the
block makes no semantic difference.

(As I said, I’d keep the assignment “bitmap_name = s->bitmap_alias”
where you put it.  I think it would technically actually be correct to
put it into the declaration at the start of the block as an initializer,
but that would look weird.)

Max

[1] I suppose exceptions apply for types with constructors, but those
don’t exist in plain C.



signature.asc
Description: OpenPGP digital signature

RE: [PATCH V2] target/riscv: raise exception to HS-mode at get_physical_address

2020-10-14 Thread Jiangyifei


> -Original Message-
> From: Richard Henderson [mailto:richard.hender...@linaro.org]
> Sent: Friday, October 9, 2020 10:34 PM
> To: Jiangyifei ; qemu-devel@nongnu.org;
> qemu-ri...@nongnu.org
> Cc: Zhanghailiang ;
> sag...@eecs.berkeley.edu; kbast...@mail.uni-paderborn.de; Zhangxiaofeng
> (F) ; alistair.fran...@wdc.com; yinyipeng
> ; pal...@dabbelt.com; Wubin (H)
> ; dengkai (A) 
> Subject: Re: [PATCH V2] target/riscv: raise exception to HS-mode at
> get_physical_address
> 
> On 10/9/20 2:57 AM, Yifei Jiang wrote:
> >  #define TRANSLATE_FAIL 1
> >  #define TRANSLATE_SUCCESS 0
> >  #define MMU_USER_IDX 3
> > +#define TRANSLATE_G_STAGE_FAIL 4
> 
> Note that you're interleaving TRANSLATE_* around an unrelated define.
> Perhaps rearrange to
> 
> enum {
> TRANSLATE_SUCCESS = 0,
> TRANSLATE_FAIL,
> TRANSLATE_PMP_FAIL,
> TRANSLATE_G_STAGE_FAIL,
> };
> 

OK

> 
> > +++ b/target/riscv/cpu_helper.c
> > @@ -451,7 +451,10 @@ restart:
> >   mmu_idx,
> false,
> > true);
> >
> >  if (vbase_ret != TRANSLATE_SUCCESS) {
> > -return vbase_ret;
> > +env->guest_phys_fault_addr = (base |
> > +  (addr &
> > +
> (TARGET_PAGE_SIZE - 1))) >> 2;
> > +return TRANSLATE_G_STAGE_FAIL;
> >  }
> 
> I don't think you can make this change to cpu state, as this function is also 
> used
> by gdb.  I think you'll need to add a new (target_ulong *) parameter to
> get_physical_address to return this.
> 
> The usage in riscv_cpu_tlb_fill could pass &env->guest_phys_fault_addr, and
> the usage in riscv_cpu_get_phys_page_debug could pass the address of a local
> variable (which it then ignores).
> 

OK

> Also, isn't the offset more naturally written idx * ptesize, as seen just a 
> few
> lines below?

OK

> 
> > +if (ret != TRANSLATE_FAIL && ret != TRANSLATE_G_STAGE_FAIL) {
> 
> Should this not be ret == TRANSLATE_SUCCESS?
> This looks buggy with TRANSLATE_PMP_FAIL...

On TRANSLATE_PMP_FAIL, it should not execute G-stage translation.
So I think it is ok for 'ret == TRANSLATE_SUCCESS'

I will send V3.

Yifei

> 
> 
> r~

Re: [RFC v1 0/2] tcg-cpus: split into 3 tcg variants

2020-10-14 Thread Alex Bennée



Claudio Fontana  writes:

> The purpose of this series is to split the tcg-cpus into
> 3 variants:
>
> tcg_cpus_mttcg(multithreaded tcg vcpus)
> tcg_cpus_rr   (single threaded round robin vcpus)
> tcg_cpus_icount   (same as RR, but using icount)

I've no objection to the cosmetic clean-up but I assume the 3 modes will
still be available in TCG enabled binaries.

>
> Alex, I read the comment in tcg_start_vcpu_thread saying:
>
> /*
>  * Initialize TCG regions--once. Now is a good time, because:
>  * (1) TCG's init context, prologue and target globals have been set up.
>  * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
>  * -accel flag is processed, so the check doesn't work then).
>  */
>
> Is this actually current?

Hmm probably not. Now everything is tied to the order of class
initialisation and realisation. AIUI all properties set by the command
line should be complete by the time an object realizes and parent
classes should be processed before their children.

>
> I tried to refactor this (see patch 2), and it seems to work to do
> the init of regions in tcg_init, and it seems that mttcg_enabled is known
> already at that point..
>
> Ciao,
>
> Claudio
>
> Claudio Fontana (2):
>   accel/tcg: split CpusAccel into three TCG variants
>   accel/tcg: split tcg_start_vcpu_thread
>
>  accel/tcg/meson.build   |   9 +-
>  accel/tcg/tcg-all.c |  13 +-
>  accel/tcg/tcg-cpus-icount.c | 145 +++
>  accel/tcg/tcg-cpus-icount.h |  20 ++
>  accel/tcg/tcg-cpus-mttcg.c  | 142 ++
>  accel/tcg/tcg-cpus-mttcg.h  |  25 ++
>  accel/tcg/tcg-cpus-rr.c | 305 ++
>  accel/tcg/tcg-cpus-rr.h |  26 ++
>  accel/tcg/tcg-cpus.c| 500 +---
>  accel/tcg/tcg-cpus.h|   9 +-
>  softmmu/icount.c|   2 +-
>  11 files changed, 697 insertions(+), 499 deletions(-)
>  create mode 100644 accel/tcg/tcg-cpus-icount.c
>  create mode 100644 accel/tcg/tcg-cpus-icount.h
>  create mode 100644 accel/tcg/tcg-cpus-mttcg.c
>  create mode 100644 accel/tcg/tcg-cpus-mttcg.h
>  create mode 100644 accel/tcg/tcg-cpus-rr.c
>  create mode 100644 accel/tcg/tcg-cpus-rr.h


-- 
Alex Bennée

[PATCH V3] target/riscv: raise exception to HS-mode at get_physical_address

2020-10-14 Thread Yifei Jiang

VS-stage translation at get_physical_address needs to translate pte
address by G-stage translation. But the G-stage translation error
can not be distinguished from VS-stage translation error in
riscv_cpu_tlb_fill. On migration, destination needs to rebuild pte,
and this G-stage translation error must be handled by HS-mode. So
introduce TRANSLATE_STAGE2_FAIL so that riscv_cpu_tlb_fill could
distinguish and raise it to HS-mode.

Signed-off-by: Yifei Jiang 
Signed-off-by: Yipeng Yin 
---
 target/riscv/cpu.h| 10 +++---
 target/riscv/cpu_helper.c | 35 ++-
 2 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index de275782e6..de4705bb57 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -82,9 +82,13 @@ enum {
 
 #define VEXT_VERSION_0_07_1 0x0701
 
-#define TRANSLATE_PMP_FAIL 2
-#define TRANSLATE_FAIL 1
-#define TRANSLATE_SUCCESS 0
+enum {
+TRANSLATE_SUCCESS,
+TRANSLATE_FAIL,
+TRANSLATE_PMP_FAIL,
+TRANSLATE_G_STAGE_FAIL
+};
+
 #define MMU_USER_IDX 3
 
 #define MAX_RISCV_PMPS (16)
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 904899054d..ae66722d32 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -316,6 +316,8 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
  * @physical: This will be set to the calculated physical address
  * @prot: The returned protection attributes
  * @addr: The virtual address to be translated
+ * @fault_pte_addr: If not NULL, this will be set to fault pte address
+ *  when a error occurs on pte address translation.
  * @access_type: The type of MMU access
  * @mmu_idx: Indicates current privilege level
  * @first_stage: Are we in first stage translation?
@@ -324,6 +326,7 @@ void riscv_cpu_set_mode(CPURISCVState *env, target_ulong 
newpriv)
  */
 static int get_physical_address(CPURISCVState *env, hwaddr *physical,
 int *prot, target_ulong addr,
+target_ulong *fault_pte_addr,
 int access_type, int mmu_idx,
 bool first_stage, bool two_stage)
 {
@@ -447,11 +450,14 @@ restart:
 
 /* Do the second stage translation on the base PTE address. */
 int vbase_ret = get_physical_address(env, &vbase, &vbase_prot,
- base, MMU_DATA_LOAD,
+ base, NULL, MMU_DATA_LOAD,
  mmu_idx, false, true);
 
 if (vbase_ret != TRANSLATE_SUCCESS) {
-return vbase_ret;
+if (fault_pte_addr) {
+*fault_pte_addr = (base + idx * ptesize) >> 2;
+}
+return TRANSLATE_G_STAGE_FAIL;
 }
 
 pte_addr = vbase + idx * ptesize;
@@ -632,13 +638,13 @@ hwaddr riscv_cpu_get_phys_page_debug(CPUState *cs, vaddr 
addr)
 int prot;
 int mmu_idx = cpu_mmu_index(&cpu->env, false);
 
-if (get_physical_address(env, &phys_addr, &prot, addr, 0, mmu_idx,
+if (get_physical_address(env, &phys_addr, &prot, addr, NULL, 0, mmu_idx,
  true, riscv_cpu_virt_enabled(env))) {
 return -1;
 }
 
 if (riscv_cpu_virt_enabled(env)) {
-if (get_physical_address(env, &phys_addr, &prot, phys_addr,
+if (get_physical_address(env, &phys_addr, &prot, phys_addr, NULL,
  0, mmu_idx, false, true)) {
 return -1;
 }
@@ -727,19 +733,30 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 if (riscv_cpu_virt_enabled(env) ||
 (riscv_cpu_two_stage_lookup(env) && access_type != MMU_INST_FETCH)) {
 /* Two stage lookup */
-ret = get_physical_address(env, &pa, &prot, address, access_type,
+ret = get_physical_address(env, &pa, &prot, address,
+   &env->guest_phys_fault_addr, access_type,
mmu_idx, true, true);
 
+/*
+ * A G-stage exception may be triggered during two state lookup.
+ * And the env->guest_phys_fault_addr has already been set in
+ * get_physical_address().
+ */
+if (ret == TRANSLATE_G_STAGE_FAIL) {
+first_stage_error = false;
+access_type = MMU_DATA_LOAD;
+}
+
 qemu_log_mask(CPU_LOG_MMU,
   "%s 1st-stage address=%" VADDR_PRIx " ret %d physical "
   TARGET_FMT_plx " prot %d\n",
   __func__, address, ret, pa, prot);
 
-if (ret != TRANSLATE_FAIL) {
+if (ret == TRANSLATE_SUCCESS) {
 /* Second stage lookup */
 im_address = pa;
 
-ret = get_physical_address(env, &pa, &prot2, im_address,
+ret = get_physical_address(env

Re: [RFC PATCH 3/3] target/mips: Make the number of TLB entries a CPU property

2020-10-14 Thread Jiaxun Yang





在 2020/10/13 21:25, Philippe Mathieu-Daudé 写道:

Allow changing the number of TLB entries for
testing/tunning purpose.

Example to force a 34Kf cpu with 64 TLB:

   $ qemu-system-mipsel -cpu 34Kf,tlb-entries=64 ...

This is helpful for developers of the Yocto Project [*]:

   Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit
   MIPS CI loop. It was observed that in this case CI test execution
   time was almost twice longer than 64bit MIPS variant that runs
   under MIPS64R2-generic model. It was investigated and concluded
   that the difference in number of TLBs 16 in 34Kf case vs 64 in
   MIPS64R2-generic is responsible for most of CI real time execution
   difference. Because with 16 TLBs linux user-land trashes TLB more
   and it needs to execute more instructions in TLB refill handler
   calls, as result it runs much longer.

[*] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html

Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
Reported-by: Victor Kamensky 
Signed-off-by: Philippe Mathieu-Daudé 
---

Hi Philippe,

I think name can this property vtlb-entries?

MIPS R2 had introduced dual-TLB feature and the entries specified here
is the number of VTLB, while FTLB is another set of entries with fixed 
pagesize.


Although MIPS TCG haven't implemented dual-TLB but it can prevent future
confusion.

Thanks.

- Jiaxun

Re: [RFC v1 0/2] tcg-cpus: split into 3 tcg variants

2020-10-14 Thread Claudio Fontana

On 10/14/20 12:14 PM, Alex Bennée wrote:
> 
> Claudio Fontana  writes:
> 
>> The purpose of this series is to split the tcg-cpus into
>> 3 variants:
>>
>> tcg_cpus_mttcg(multithreaded tcg vcpus)
>> tcg_cpus_rr   (single threaded round robin vcpus)
>> tcg_cpus_icount   (same as RR, but using icount)
> 
> I've no objection to the cosmetic clean-up but I assume the 3 modes will
> still be available in TCG enabled binaries.

Hi Alex,

yes, all three will be available in the code when enabling TCG.

> 
>>
>> Alex, I read the comment in tcg_start_vcpu_thread saying:
>>
>> /*
>>  * Initialize TCG regions--once. Now is a good time, because:
>>  * (1) TCG's init context, prologue and target globals have been set up.
>>  * (2) qemu_tcg_mttcg_enabled() works now (TCG init code runs before the
>>  * -accel flag is processed, so the check doesn't work then).
>>  */
>>
>> Is this actually current?
> 
> Hmm probably not. Now everything is tied to the order of class
> initialisation and realisation. AIUI all properties set by the command
> line should be complete by the time an object realizes and parent
> classes should be processed before their children.

This is great news, as it allows more refactoring as showed on patch 2,

thanks a lot!

Ciao,

Claudio

> 
>>
>> I tried to refactor this (see patch 2), and it seems to work to do
>> the init of regions in tcg_init, and it seems that mttcg_enabled is known
>> already at that point..
>>
>> Ciao,
>>
>> Claudio
>>
>> Claudio Fontana (2):
>>   accel/tcg: split CpusAccel into three TCG variants
>>   accel/tcg: split tcg_start_vcpu_thread
>>
>>  accel/tcg/meson.build   |   9 +-
>>  accel/tcg/tcg-all.c |  13 +-
>>  accel/tcg/tcg-cpus-icount.c | 145 +++
>>  accel/tcg/tcg-cpus-icount.h |  20 ++
>>  accel/tcg/tcg-cpus-mttcg.c  | 142 ++
>>  accel/tcg/tcg-cpus-mttcg.h  |  25 ++
>>  accel/tcg/tcg-cpus-rr.c | 305 ++
>>  accel/tcg/tcg-cpus-rr.h |  26 ++
>>  accel/tcg/tcg-cpus.c| 500 +---
>>  accel/tcg/tcg-cpus.h|   9 +-
>>  softmmu/icount.c|   2 +-
>>  11 files changed, 697 insertions(+), 499 deletions(-)
>>  create mode 100644 accel/tcg/tcg-cpus-icount.c
>>  create mode 100644 accel/tcg/tcg-cpus-icount.h
>>  create mode 100644 accel/tcg/tcg-cpus-mttcg.c
>>  create mode 100644 accel/tcg/tcg-cpus-mttcg.h
>>  create mode 100644 accel/tcg/tcg-cpus-rr.c
>>  create mode 100644 accel/tcg/tcg-cpus-rr.h
> 
>

RE: [PATCH V2 1/5] target/riscv: Add basic vmstate description of CPU

2020-10-14 Thread Jiangyifei


> -Original Message-
> From: Richard Henderson [mailto:richard.hender...@linaro.org]
> Sent: Saturday, October 10, 2020 9:23 PM
> To: Jiangyifei ; qemu-devel@nongnu.org;
> qemu-ri...@nongnu.org
> Cc: pal...@dabbelt.com; alistair.fran...@wdc.com;
> sag...@eecs.berkeley.edu; kbast...@mail.uni-paderborn.de; Zhangxiaofeng
> (F) ; Wubin (H) ;
> Zhanghailiang ; dengkai (A)
> ; yinyipeng 
> Subject: Re: [PATCH V2 1/5] target/riscv: Add basic vmstate description of CPU
> 
> On 10/10/20 3:06 AM, Yifei Jiang wrote:
> > +++ b/target/riscv/cpu.h
> > @@ -311,6 +311,10 @@ extern const char * const riscv_fpr_regnames[];
> > extern const char * const riscv_excp_names[];  extern const char *
> > const riscv_intr_names[];
> >
> > +#ifndef CONFIG_USER_ONLY
> > +extern const VMStateDescription vmstate_riscv_cpu; #endif
> > +
> 
> This is not part of the public interface to RISCVCPU, so it should go in
> internals.h.
> 
> Not that there aren't other things in cpu.h that don't belong.  No target has
> been perfect in differentiating what's interface and what's implementation.
> 

Yes, I think it should go in internals.h, although most architectures declare 
it in cpu.h.
I would move it to internals.h in the next series.

> > +
> > +#ifdef TARGET_RISCV32
> > +VMSTATE_UINTTL(env.mstatush, RISCVCPU), #endif
> 
> Would this be a good time to expand mstatus to uint64_t instead of
> target_ulong so that it can be saved as one unit and reduce some ifdefs in the
> code base?
> 
> Similarly with some of the other status registers that are two halved for
> riscv32.

I agree with you that it should be rearranged.
But I hope this series will focus on achieving migration.
Can I send another patch to rearrange it later?

Yifei
> 
> 
> r~

[PATCH] usb/hcd-ehci: Fix error handling on missing device for iTD

2020-10-14 Thread Anthony PERARD via

The EHCI Host Controller emulation attempt to locate the device
associated with a periodic isochronous transfer description (iTD) and
when this fail the host controller is reset.

But according the EHCI spec 1.0 section 5.15.2.4 Host System
Error, the host controller is supposed to reset itself only when it
failed to communicate with the Host (Operating System), like when
there's an error on the PCI bus. If a transaction fails, there's
nothing in the spec that say to reset the host controller.

This patch rework the error path so that the host controller can keep
working when the OS setup a bogus transaction, it also revert to the
behavior of the EHCI emulation to before commits:
e94682f1fe ("ehci: check device is not NULL before calling usb_ep_get()")
7011baece2 ("usb: remove unnecessary NULL device check from usb_ep_get()")

The issue has been found while trying to passthrough a USB device to a
Windows Server 2012 Xen guest via "usb-ehci", which prevent the USB
device from working in Windows. ("usb-ehci" alone works, windows only
setup this weird periodic iTD to device 127 endpoint 15 when the USB
device is passthrough.)

Signed-off-by: Anthony PERARD 
---

CCing the author of e94682f1fe which changed the behavior of EHCI
emulation.
Cc: Liam Merwick 
---
 hw/usb/hcd-ehci.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 0b5534ac3a32..b46df501ff63 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1447,24 +1447,25 @@ static int ehci_process_itd(EHCIState *ehci,
 dev = ehci_find_device(ehci, devaddr);
 if (dev == NULL) {
 ehci_trace_guest_bug(ehci, "no device found");
-qemu_sglist_destroy(&ehci->isgl);
-return -1;
-}
-pid = dir ? USB_TOKEN_IN : USB_TOKEN_OUT;
-ep = usb_ep_get(dev, pid, endp);
-if (ep && ep->type == USB_ENDPOINT_XFER_ISOC) {
-usb_packet_setup(&ehci->ipacket, pid, ep, 0, addr, false,
- (itd->transact[i] & ITD_XACT_IOC) != 0);
-if (usb_packet_map(&ehci->ipacket, &ehci->isgl)) {
-qemu_sglist_destroy(&ehci->isgl);
-return -1;
-}
-usb_handle_packet(dev, &ehci->ipacket);
-usb_packet_unmap(&ehci->ipacket, &ehci->isgl);
-} else {
-DPRINTF("ISOCH: attempt to addess non-iso endpoint\n");
-ehci->ipacket.status = USB_RET_NAK;
+ehci->ipacket.status = USB_RET_NODEV;
 ehci->ipacket.actual_length = 0;
+} else {
+pid = dir ? USB_TOKEN_IN : USB_TOKEN_OUT;
+ep = usb_ep_get(dev, pid, endp);
+if (ep && ep->type == USB_ENDPOINT_XFER_ISOC) {
+usb_packet_setup(&ehci->ipacket, pid, ep, 0, addr, false,
+ (itd->transact[i] & ITD_XACT_IOC) != 0);
+if (usb_packet_map(&ehci->ipacket, &ehci->isgl)) {
+qemu_sglist_destroy(&ehci->isgl);
+return -1;
+}
+usb_handle_packet(dev, &ehci->ipacket);
+usb_packet_unmap(&ehci->ipacket, &ehci->isgl);
+} else {
+DPRINTF("ISOCH: attempt to addess non-iso endpoint\n");
+ehci->ipacket.status = USB_RET_NAK;
+ehci->ipacket.actual_length = 0;
+}
 }
 qemu_sglist_destroy(&ehci->isgl);
 
-- 
Anthony PERARD

Re: [PATCH 0/2] qemu-ga: add ssh-{add,remove}-authorized-keys

2020-10-14 Thread Michal Privoznik


On 10/13/20 10:25 PM, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

Hi,

Add two new commands to help modify ~/.ssh/authorized_keys.


Apart from Philippe's comments, this path is configurable in 
sshd.config. But I doubt we should care as ssh-copy-id doesn't care.




Although it's possible already to modify the authorized_keys files via
file-{read,write} or exec, the commands are often denied by default, and the
logic is left to the client. Let's add specific commands for this job.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1885332

Marc-André Lureau (2):
   glib-compat: add g_unix_get_passwd_entry_qemu()
   qga: add ssh-{add,remove}-authorized-keys

  include/glib-compat.h|  24 +++
  qga/commands-posix-ssh.c | 394 +++
  qga/commands-win32.c |  10 +
  qga/meson.build  |  17 +-
  qga/qapi-schema.json |  32 
  5 files changed, 476 insertions(+), 1 deletion(-)
  create mode 100644 qga/commands-posix-ssh.c



Reviewed-by: Michal Privoznik 

Michal

Re: [PATCH v11 02/13] copy-on-read: add filter append/drop functions

2020-10-14 Thread Max Reitz

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Provide API for the COR-filter insertion/removal.
> Also, drop the filter child permissions for an inactive state when the
> filter node is being removed.
> 
> Signed-off-by: Andrey Shinkevich 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/copy-on-read.c | 88 
> 
>  block/copy-on-read.h | 35 +
>  2 files changed, 123 insertions(+)
>  create mode 100644 block/copy-on-read.h
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index cb03e0f..bcccf0f 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c

[...]

> @@ -159,4 +188,63 @@ static void bdrv_copy_on_read_init(void)
>  bdrv_register(&bdrv_copy_on_read);
>  }
>  
> +
> +BlockDriverState *bdrv_cor_filter_append(BlockDriverState *bs,
> + QDict *node_options,
> + int flags, Error **errp)

I had hoped you could make this a generic block layer function. :(

(Because it really is rather generic)

*shrug*

Reviewed-by: Max Reitz 

> +{
> +BlockDriverState *cor_filter_bs;
> +Error *local_err = NULL;
> +
> +cor_filter_bs = bdrv_open(NULL, NULL, node_options, flags, errp);
> +if (cor_filter_bs == NULL) {
> +error_prepend(errp, "Could not create COR-filter node: ");
> +return NULL;
> +}
> +
> +if (!qdict_get_try_str(node_options, "node-name")) {
> +cor_filter_bs->implicit = true;
> +}
> +
> +bdrv_drained_begin(bs);
> +bdrv_replace_node(bs, cor_filter_bs, &local_err);
> +bdrv_drained_end(bs);
> +
> +if (local_err) {
> +bdrv_unref(cor_filter_bs);
> +error_propagate(errp, local_err);
> +return NULL;
> +}
> +
> +return cor_filter_bs;
> +}



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 0/2] qemu-ga: add ssh-{add,remove}-authorized-keys

2020-10-14 Thread Daniel P . Berrangé

On Wed, Oct 14, 2020 at 12:25:00AM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> Hi,
> 
> Add two new commands to help modify ~/.ssh/authorized_keys.
> 
> Although it's possible already to modify the authorized_keys files via
> file-{read,write} or exec, the commands are often denied by default, and the
> logic is left to the client. Let's add specific commands for this job.

More importantly the mgmt app has no idea what file location the keys
need to be saved in. Knowing the user isn't sufficient as you cannot
assume that $HOME is /home/$USERNAME - it could be in an arbitrarily
different location. So having dedicated commands for this which can
use getpwent in the guest to find $HOME is mcuh saner.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC PATCH 3/3] target/mips: Make the number of TLB entries a CPU property

2020-10-14 Thread Philippe Mathieu-Daudé


On 10/14/20 12:20 PM, Jiaxun Yang wrote:

在 2020/10/13 21:25, Philippe Mathieu-Daudé 写道:

Allow changing the number of TLB entries for
testing/tunning purpose.

Example to force a 34Kf cpu with 64 TLB:

   $ qemu-system-mipsel -cpu 34Kf,tlb-entries=64 ...

This is helpful for developers of the Yocto Project [*]:

   Yocto Project uses qemu-system-mips 34Kf cpu model, to run 32bit
   MIPS CI loop. It was observed that in this case CI test execution
   time was almost twice longer than 64bit MIPS variant that runs
   under MIPS64R2-generic model. It was investigated and concluded
   that the difference in number of TLBs 16 in 34Kf case vs 64 in
   MIPS64R2-generic is responsible for most of CI real time execution
   difference. Because with 16 TLBs linux user-land trashes TLB more
   and it needs to execute more instructions in TLB refill handler
   calls, as result it runs much longer.

[*] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg03428.html

Buglink: https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
Reported-by: Victor Kamensky 
Signed-off-by: Philippe Mathieu-Daudé 
---

Hi Philippe,

I think name can this property vtlb-entries?

MIPS R2 had introduced dual-TLB feature and the entries specified here
is the number of VTLB, while FTLB is another set of entries with fixed 
pagesize.


Although MIPS TCG haven't implemented dual-TLB but it can prevent future
confusion.


Sure, good idea.

I'll follow Richard suggestion first, having a look at the P5600.



Thanks.

- Jiaxun

Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Max Reitz

On 12.10.20 19:43, Andrey Shinkevich wrote:
> We are going to use the COR-filter for a block-stream job.
> To limit COR operations by the base node in the backing chain during
> stream job, pass the name of overlay base node to the copy-on-read
> driver as base node itself may change due to possible concurrent jobs.
> The rest of the functionality will be implemented in the patch that
> follows.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 14 ++
>  1 file changed, 14 insertions(+)

Is there a reason why you didn’t add this option to QAPI (as part of a
yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.

> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index bcccf0f..c578b1b 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -24,19 +24,24 @@
>  #include "block/block_int.h"
>  #include "qemu/module.h"
>  #include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
>  #include "qapi/qmp/qdict.h"
>  #include "block/copy-on-read.h"
>  
>  
>  typedef struct BDRVStateCOR {
>  bool active;
> +BlockDriverState *base_overlay;
>  } BDRVStateCOR;
>  
>  
>  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>  Error **errp)
>  {
> +BlockDriverState *base_overlay = NULL;
>  BDRVStateCOR *state = bs->opaque;
> +/* We need the base overlay node rather than the base itself */
> +const char *base_overlay_node = qdict_get_try_str(options, "base");

Shouldn’t it be called base-overlay or above-base then?

>  
>  bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
> @@ -52,7 +57,16 @@ static int cor_open(BlockDriverState *bs, QDict *options, 
> int flags,
>  ((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
>  bs->file->bs->supported_zero_flags);
>  
> +if (base_overlay_node) {
> +qdict_del(options, "base");
> +base_overlay = bdrv_lookup_bs(NULL, base_overlay_node, errp);

I think this is a use-after-free.  The storage @base_overlay_node points
to belongs to a QString, which is referenced only by @options; so
deleting that element of @options should free that string.

Max

> +if (!base_overlay) {
> +error_setg(errp, QERR_BASE_NOT_FOUND, base_overlay_node);
> +return -EINVAL;
> +}
> +}
>  state->active = true;
> +state->base_overlay = base_overlay;
>  
>  /*
>   * We don't need to call bdrv_child_refresh_perms() now as the 
> permissions
> 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH v4 1/7] nbd: Utilize QAPI_CLONE for type conversion

2020-10-14 Thread Vladimir Sementsov-Ogievskiy


10.10.2020 00:55, Eric Blake wrote:

Rather than open-coding the translation from the deprecated
NbdServerAddOptions type to the preferred BlockExportOptionsNbd, it's
better to utilize QAPI_CLONE_MEMBERS.  This solves a couple of issues:
first, if we do any more refactoring of the base type (which an
upcoming patch plans to do), we don't have to revisit the open-coding.
Second, our assignment to arg->name is fishy: the generated QAPI code
currently does not visit it if arg->has_name is false, but if it DID
visit it, we would have introduced a double-free situation when arg is
finally freed.

Signed-off-by: Eric Blake


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: [PATCH 2/2] qga: add ssh-{add,remove}-authorized-keys

2020-10-14 Thread Philippe Mathieu-Daudé


On 10/14/20 9:37 AM, Marc-André Lureau wrote:

On Wed, Oct 14, 2020 at 1:14 AM Philippe Mathieu-Daudé
 wrote:

On 10/13/20 10:25 PM, marcandre.lur...@redhat.com wrote:

diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index cec98c7e06..50e2854b45 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1306,3 +1306,35 @@
   ##
   { 'command': 'guest-get-devices',
 'returns': ['GuestDeviceInfo'] }
+
+##
+# @guest-ssh-add-authorized-keys:
+#
+# @username: the user account to add the authorized key
+# @keys: the public keys to add (in OpenSSH format)


You use plural but the code only seems to add (remove) one key
at a time.


Uh, what makes you believe that?


The code in your patch:

+static bool
+check_openssh_pub_keys(strList *keys, size_t *nkeys, Error **errp)
+{
+size_t n = 0;
+strList *k;
+
+ERRP_GUARD();
+
+for (k = keys; k != NULL; k = k->next) {
+if (!check_openssh_pub_key(k->value, errp)) {
+return false;
+}
+n++;
+}
+
+if (nkeys) {
+*nkeys = n;
+}
+return true;
+}





'OpenSSH format' is confusing. From sshd(8):

Each line of the file contains one key (empty lines and lines
starting with a ‘#’ are ignored as comments).

Public keys consist of the following space-separated fields:

  options, keytype, base64-encoded key, comment.

The options field is optional.

Note that lines in this file can be several hundred bytes long
(because of the size of the public key encoding) up to a limit
of 8 kilobytes, which permits RSA keys up to 16 kilobits.

The options (if present) consist of comma-separated option
specifications.  No spaces are permitted, except within double
quotes.

@openssh_authorized_key_line is ugly, maybe use @authorized_key
to make it clearer?


Why? the name of the function already implies we are talking about
authorized keys. The documentation says it's a public key in openssh
format (the ones you expect in ~/.ssh/authorized_keys files)


OK then.



Yes the format isn't very well defined, so I did simple sanity checks.
After all, people usually append keys with shell >>. I can't find a
common command to do it with some checking.

RE: [PATCH v2] migration/block-dirty-bitmap: fix uninitialized variable warning

2020-10-14 Thread Chenqun (kuhn)

> -Original Message-
> From: Qemu-devel
> [mailto:qemu-devel-bounces+kuhn.chenqun=huawei@nongnu.org] On
> Behalf Of Max Reitz
> Sent: Wednesday, October 14, 2020 5:36 PM
> To: Chenqun (kuhn) ; qemu-devel@nongnu.org;
> qemu-triv...@nongnu.org
> Cc: f...@euphon.net; ganqixin ;
> vsement...@virtuozzo.com; Zhanghailiang
> ; qemu-bl...@nongnu.org;
> quint...@redhat.com; Li Qiang ; dgilb...@redhat.com;
> Laurent Vivier ; stefa...@redhat.com; Euler Robot
> ; js...@redhat.com
> Subject: Re: [PATCH v2] migration/block-dirty-bitmap: fix uninitialized 
> variable
> warning
> 
> On 14.10.20 03:03, Chenqun (kuhn) wrote:
> >
> >
> >> -Original Message-
> >> From: Max Reitz [mailto:mre...@redhat.com]
> >> Sent: Tuesday, October 13, 2020 10:47 PM
> >> To: Chenqun (kuhn) ;
> qemu-devel@nongnu.org;
> >> qemu-triv...@nongnu.org
> >> Cc: vsement...@virtuozzo.com; stefa...@redhat.com; f...@euphon.net;
> >> ebl...@redhat.com; js...@redhat.com; quint...@redhat.com;
> >> dgilb...@redhat.com; Zhanghailiang ;
> >> ganqixin ; qemu-bl...@nongnu.org; Euler Robot
> >> ; Laurent Vivier ; Li
> >> Qiang 
> >> Subject: Re: [PATCH v2] migration/block-dirty-bitmap: fix
> >> uninitialized variable warning
> >>
> >> On 13.10.20 14:33, Chen Qun wrote:
> >>> A default value is provided for the variable 'bitmap_name' to avoid
> >>> compiler
> >> warning.
> >>>
> >>> The compiler show warning:
> >>> migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’
> >>> may be used uninitialized in this function [-Wmaybe-uninitialized]
> >>>g_strlcpy(s->bitmap_name, bitmap_name,
> >> sizeof(s->bitmap_name));
> >>>
> >>
> ^~
> >>>
> >>> Reported-by: Euler Robot 
> >>> Signed-off-by: Chen Qun 
> >>> ---
> >>> Cc: Vladimir Sementsov-Ogievskiy 
> >>> Cc: Laurent Vivier 
> >>> Cc: Li Qiang 
> >>> ---
> >>>  migration/block-dirty-bitmap.c | 9 ++---
> >>>  1 file changed, 2 insertions(+), 7 deletions(-)
> >>
> >> No objections, semantically, but...
> >>
> >>> diff --git a/migration/block-dirty-bitmap.c
> >>> b/migration/block-dirty-bitmap.c index 5bef793ac0..bcb79c04ce 100644
> >>> --- a/migration/block-dirty-bitmap.c
> >>> +++ b/migration/block-dirty-bitmap.c
> >>> @@ -1064,15 +1064,13 @@ static int
> dirty_bitmap_load_header(QEMUFile
> >> *f, DBMLoadState *s,
> >>>  assert(nothing || s->cancelled || !!alias_map ==
> >>> !!bitmap_alias_map);
> >>>
> >>>  if (s->flags & DIRTY_BITMAP_MIG_FLAG_BITMAP_NAME) {
> >>> -const char *bitmap_name;
> >>> -
> >>>  if (!qemu_get_counted_string(f, s->bitmap_alias)) {
> >>>  error_report("Unable to read bitmap alias string");
> >>>  return -EINVAL;
> >>>  }
> >>>
> >>> -if (!s->cancelled) {
> >>> -if (bitmap_alias_map) {
> >>> +const char *bitmap_name = s->bitmap_alias;
> >>
> >> qemu’s coding style mandates declarations to be placed at the
> >> beginning of their block, so the declaration has to stay where it is.
> >> (Putting the assignment here looks reasonable.)
> >>
> > Hi Max,
> >   Declaration variables here to ensure that the above exceptions(Unable to
> read bitmap alias string) are avoided.
> > If the declaration has to stay where it is, is there a possibility that the
> assignment fails?
> 
> I don’t understand what you mean.  

I think my description is not accurate. Forgive me for being a non-native 
English speaker.
The variable 'bitmap_name' assignment maybe failed at the beginning of the 
block, because reading the 's->bitmap_alias' maybe failed.

>A declaration without initialization isn’t
> and doesn’t contain an expression, it isn’t even a statement, so it has no 
> side
> effects.[1]
> 
> Placing the declaration (without an initialization) at the top of the block 
> makes
> no semantic difference.
>
I see what you mean. Separate variable declarations from variable assignments.
You're right!  I will update it later.

Thanks,
Chen Qun
> (As I said, I’d keep the assignment “bitmap_name = s->bitmap_alias”
> where you put it.  I think it would technically actually be correct to put it 
> into
> the declaration at the start of the block as an initializer, but that would 
> look
> weird.)
> 
> Max
> 
> [1] I suppose exceptions apply for types with constructors, but those don’t
> exist in plain C.

Re: [RFC v1 0/2] tcg-cpus: split into 3 tcg variants

2020-10-14 Thread Philippe Mathieu-Daudé


On 10/14/20 12:14 PM, Alex Bennée wrote:


Claudio Fontana  writes:


The purpose of this series is to split the tcg-cpus into
3 variants:

tcg_cpus_mttcg(multithreaded tcg vcpus)
tcg_cpus_rr   (single threaded round robin vcpus)
tcg_cpus_icount   (same as RR, but using icount)


I've no objection to the cosmetic clean-up but I assume the 3 modes will
still be available in TCG enabled binaries.


Yes, I think so too, no point in disabling some.
However it seems to me it is now easier to review
for a newcomer.

Code easily understandable/reviewable is easier to
maintain and less bug prone :)

Phil.

[PATCH v3 2/9] hw/block/nvme: add uuid namespace parameter

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Add the 'uuid' nvme-ns device parameter such that users who requires a
persistent namespace UUID can explicitly specify it.

If not specified, the property will autogenerate an UUID for each QEMU
invocation.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h |  1 +
 hw/block/nvme-ns.c |  1 +
 hw/block/nvme.c| 12 
 3 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 44bf6271b744..8951fc5e86b8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+QemuUUID uuid;
 } NvmeNamespaceParams;
 
 typedef struct NvmeNamespace {
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 475c6fe44084..dff3e308e31b 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -130,6 +130,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 static Property nvme_ns_props[] = {
 DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
 DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
+DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 8d0d96f42e61..aaea0436fd05 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1632,6 +1632,7 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, 
NvmeRequest *req)
 NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
 uint32_t nsid = le32_to_cpu(c->nsid);
 uint8_t list[NVME_IDENTIFY_DATA_SIZE];
+NvmeNamespace *ns;
 
 struct data {
 struct {
@@ -1648,21 +1649,16 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
*n, NvmeRequest *req)
 return NVME_INVALID_NSID | NVME_DNR;
 }
 
-if (unlikely(!nvme_ns(n, nsid))) {
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
 memset(list, 0x0, sizeof(list));
 
-/*
- * Because the NGUID and EUI64 fields are 0 in the Identify Namespace data
- * structure, a Namespace UUID (nidt = 0x3) must be reported in the
- * Namespace Identification Descriptor. Add a very basic Namespace UUID
- * here.
- */
 ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
 ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN;
-stl_be_p(&ns_descrs->uuid.v, nsid);
+memcpy(ns_descrs->uuid.v, ns->params.uuid.data, 16);
 
 return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
 DMA_DIRECTION_FROM_DEVICE, req);
-- 
2.28.0

[PATCH v3 1/9] hw/block/nvme: add commands supported and effects log page

2020-10-14 Thread Klaus Jensen

From: Gollu Appalanaidu 

This is to support for the Commands Supported and Effects log page. See
NVM Express Spec 1.3d, sec. 5.14.1.5 ("Commands Supported and Effects")

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 25 +--
 hw/block/nvme.c  | 74 +++-
 2 files changed, 96 insertions(+), 3 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 2249d77c2129..f71376072762 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -737,6 +737,24 @@ typedef struct QEMU_PACKED NvmeSmartLog {
 uint8_t reserved2[320];
 } NvmeSmartLog;
 
+typedef struct QEMU_PACKED NvmeEffectsLog {
+uint32_t acs[256];
+uint32_t iocs[256];
+uint8_t  rsvd2048[2048];
+} NvmeEffectsLog;
+
+enum {
+NVME_EFFECTS_CSUPP  = 1 <<  0,
+NVME_EFFECTS_LBCC   = 1 <<  1,
+NVME_EFFECTS_NCC= 1 <<  2,
+NVME_EFFECTS_NIC= 1 <<  3,
+NVME_EFFECTS_CCC= 1 <<  4,
+NVME_EFFECTS_CSE_SINGLE = 1 << 16,
+NVME_EFFECTS_CSE_MULTI  = 1 << 17,
+NVME_EFFECTS_CSE_MASK   = 3 << 16,
+NVME_EFFECTS_UUID_SEL   = 1 << 19,
+};
+
 enum NvmeSmartWarn {
 NVME_SMART_SPARE  = 1 << 0,
 NVME_SMART_TEMPERATURE= 1 << 1,
@@ -749,6 +767,7 @@ enum NvmeLogIdentifier {
 NVME_LOG_ERROR_INFO = 0x01,
 NVME_LOG_SMART_INFO = 0x02,
 NVME_LOG_FW_SLOT_INFO   = 0x03,
+NVME_LOG_EFFECTS= 0x05,
 };
 
 typedef struct QEMU_PACKED NvmePSD {
@@ -860,8 +879,9 @@ enum NvmeIdCtrlFrmw {
 };
 
 enum NvmeIdCtrlLpa {
-NVME_LPA_NS_SMART = 1 << 0,
-NVME_LPA_EXTENDED = 1 << 2,
+NVME_LPA_NS_SMART = 1 << 0,
+NVME_LPA_EFFECTS_LOG  = 1 << 1,
+NVME_LPA_EXTENDED = 1 << 2,
 };
 
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
@@ -1068,5 +1088,6 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNs) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsDescr) != 4);
+QEMU_BUILD_BUG_ON(sizeof(NvmeEffectsLog) != 4096);
 }
 #endif
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index b4f47448575b..8d0d96f42e61 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -112,6 +112,46 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
 };
 
+#define NVME_EFFECTS_ADMIN_INITIALIZER \
+[NVME_ADM_CMD_DELETE_SQ]= NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_CREATE_SQ]= NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_GET_LOG_PAGE] = NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_DELETE_CQ]= NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_CREATE_CQ]= NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_IDENTIFY] = NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_ABORT]= NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_SET_FEATURES] = NVME_EFFECTS_CSUPP | \
+  NVME_EFFECTS_CCC |   \
+  NVME_EFFECTS_NIC |   \
+  NVME_EFFECTS_NCC,\
+[NVME_ADM_CMD_GET_FEATURES] = NVME_EFFECTS_CSUPP,  \
+[NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_EFFECTS_CSUPP
+
+#define NVME_EFFECTS_NVM_INITIALIZER   \
+[NVME_CMD_FLUSH]= NVME_EFFECTS_CSUPP | \
+  NVME_EFFECTS_LBCC,   \
+[NVME_CMD_WRITE]= NVME_EFFECTS_CSUPP | \
+  NVME_EFFECTS_LBCC,   \
+[NVME_CMD_READ] = NVME_EFFECTS_CSUPP,  \
+[NVME_CMD_WRITE_ZEROES] = NVME_EFFECTS_CSUPP | \
+  NVME_EFFECTS_LBCC
+
+static const NvmeEffectsLog nvme_effects_admin_only = {
+.acs = {
+NVME_EFFECTS_ADMIN_INITIALIZER,
+},
+};
+
+static const NvmeEffectsLog nvme_effects = {
+.acs = {
+NVME_EFFECTS_ADMIN_INITIALIZER,
+},
+
+.iocs = {
+NVME_EFFECTS_NVM_INITIALIZER,
+},
+};
+
 static void nvme_process_sq(void *opaque);
 
 static uint16_t nvme_cid(NvmeRequest *req)
@@ -1331,6 +1371,36 @@ static uint16_t nvme_error_info(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 DMA_DIRECTION_FROM_DEVICE, req);
 }
 
+static uint16_t nvme_effects_log(NvmeCtrl *n, uint32_t buf_len, uint64_t off,
+ NvmeRequest *req)
+{
+const NvmeEffectsLog *effects;
+
+uint32_t trans_len;
+
+if (off >= sizeof(NvmeEffectsLog)) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+switch (NVME_CC_CSS(n->bar.cc)) {
+case NVME_CC_CSS_ADMIN_ONLY:
+effects = &nvme_effects_admin_only;
+break;
+
+case NVME_CC_CSS_NVM:
+effects = &nvme_effects;
+break;
+
+default:
+return NVME_INTERNAL_DEV_ERROR | NVME_DNR;
+}
+
+trans_len = MIN(sizeof(NvmeEffectsLog) - off, buf_len);
+
+return nvme_dma(n, (uint8_t *)effects + off, trans_len,
+DMA_DIRECTION_FROM_DEVICE,

[PATCH v3 4/9] hw/block/nvme: add basic read/write for zoned namespaces

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

This adds basic read and write for zoned namespaces.

A zoned namespace is created by setting the iocs namespace parameter to
0x2 and specifying the zns.zcap parameter (zone capacity) in number of
logical blocks per zone. If a zone size (zns.zsze) is not specified, the
namespace device will set the zone size to be the next power of two and
fit in as many zones as possible on the underlying namespace blockdev.
This behavior is not required by the specification, but ensures that the
device can be initialized by the Linux kernel nvme driver, which
requires a power of two zone size.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt   |   8 ++
 hw/block/nvme-ns.h|  80 +++
 hw/block/nvme.h   |  11 ++
 include/block/nvme.h  |  60 +++-
 hw/block/nvme-ns.c| 116 
 hw/block/nvme.c   | 308 +-
 hw/block/trace-events |   7 +
 7 files changed, 585 insertions(+), 5 deletions(-)

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
index 619bd9ce4378..80cb34406255 100644
--- a/docs/specs/nvme.txt
+++ b/docs/specs/nvme.txt
@@ -6,6 +6,14 @@ The nvme device (-device nvme) emulates an NVM Express 
Controller.
   `iocs`; The "I/O Command Set" associated with the namespace. E.g. 0x0 for the
  NVM Command Set (the default), or 0x2 for the Zoned Namespace Command Set.
 
+  `zns.zcap`; If `iocs` is 0x2, this specifies the zone capacity. It is
+ specified in units of logical blocks.
+
+  `zns.zsze`; If `iocs` is 0x2, this specifies the zone size. It is specified
+ in units of the logical blocks. If not specified, the value depends on
+ zns.zcap; if the zone capacity is a power of two, the zone size will be
+ set to that, otherwise it will default to the next power of two.
+
 
 Reference Specifications
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 5eb135a0b73f..dd311012213e 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -23,8 +23,19 @@ typedef struct NvmeNamespaceParams {
 uint32_t nsid;
 uint8_t  iocs;
 QemuUUID uuid;
+
+struct {
+uint64_t zcap;
+uint64_t zsze;
+} zns;
 } NvmeNamespaceParams;
 
+typedef struct NvmeZone {
+NvmeZoneDescriptor *zd;
+
+uint64_t wp_staging;
+} NvmeZone;
+
 typedef struct NvmeNamespace {
 DeviceState  parent_obj;
 BlockConfblkconf;
@@ -38,8 +49,20 @@ typedef struct NvmeNamespace {
 struct {
 uint32_t err_rec;
 } features;
+
+struct {
+int num_zones;
+
+NvmeZone   *zones;
+NvmeZoneDescriptor *zd;
+} zns;
 } NvmeNamespace;
 
+static inline bool nvme_ns_zoned(NvmeNamespace *ns)
+{
+return ns->iocs == NVME_IOCS_ZONED;
+}
+
 static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 {
 if (ns) {
@@ -54,17 +77,34 @@ static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns)
 return ns->id_ns[NVME_IOCS_NVM];
 }
 
+static inline NvmeIdNsZns *nvme_ns_id_zoned(NvmeNamespace *ns)
+{
+return ns->id_ns[NVME_IOCS_ZONED];
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
 NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
 return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 }
 
+static inline NvmeLBAFE *nvme_ns_lbafe(NvmeNamespace *ns)
+{
+NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
+NvmeIdNsZns *id_ns_zns = nvme_ns_id_zoned(ns);
+return &id_ns_zns->lbafe[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
+}
+
 static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
 {
 return nvme_ns_lbaf(ns)->ds;
 }
 
+static inline uint64_t nvme_ns_zsze(NvmeNamespace *ns)
+{
+return nvme_ns_lbafe(ns)->zsze;
+}
+
 /* calculate the number of LBAs that the namespace can accomodate */
 static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
 {
@@ -77,10 +117,50 @@ static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t 
lba)
 return lba << nvme_ns_lbads(ns);
 }
 
+static inline NvmeZone *nvme_ns_zone(NvmeNamespace *ns, uint64_t lba)
+{
+int idx = lba / nvme_ns_zsze(ns);
+if (unlikely(idx >= ns->zns.num_zones)) {
+return NULL;
+}
+
+return &ns->zns.zones[idx];
+}
+
+static inline NvmeZoneState nvme_zs(NvmeZone *zone)
+{
+return (zone->zd->zs >> 4) & 0xf;
+}
+
+static inline void nvme_zs_set(NvmeZone *zone, NvmeZoneState zs)
+{
+zone->zd->zs = zs << 4;
+}
+
+static inline uint64_t nvme_zslba(NvmeZone *zone)
+{
+return le64_to_cpu(zone->zd->zslba);
+}
+
+static inline uint64_t nvme_zcap(NvmeZone *zone)
+{
+return le64_to_cpu(zone->zd->zcap);
+}
+
+static inline uint64_t nvme_wp(NvmeZone *zone)
+{
+return le64_to_cpu(zone->zd->wp);
+}
+
 typedef struct NvmeCtrl NvmeCtrl;
 
+const char *nvme_zs_str(NvmeZone *zone);
+const char *nvme_zs_to_str(NvmeZoneState zs);
+
 int nvme_ns_setup(NvmeCtrl *n, NvmeNamespace *ns, Error **errp);
 void nvme_ns_drain(NvmeNamespace *ns);
 void nvme_ns_flush(NvmeNamespace *ns);
 
+void nvme_ns_zns_init_zone_state(NvmeNamespace *ns);
+
 #e

[PATCH v3 9/9] hw/block/nvme: allow open to close zone transitions by controller

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Allow the controller to release open resources by transitioning
implicitly and explicitly opened zones to closed. This is done using a
naive "least recently opened" strategy.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h|  5 
 hw/block/nvme-ns.c|  5 
 hw/block/nvme.c   | 57 ---
 hw/block/trace-events |  1 +
 4 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 3d0269eef6f0..5d8523c047d8 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -38,6 +38,8 @@ typedef struct NvmeZone {
 uint8_t*zde;
 
 uint64_t wp_staging;
+
+QTAILQ_ENTRY(NvmeZone) lru_entry;
 } NvmeZone;
 
 typedef struct NvmeNamespace {
@@ -64,6 +66,9 @@ typedef struct NvmeNamespace {
 struct {
 uint32_t open;
 uint32_t active;
+
+QTAILQ_HEAD(, NvmeZone) lru_open;
+QTAILQ_HEAD(, NvmeZone) lru_active;
 } resources;
 } zns;
 } NvmeNamespace;
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index a01cc5eeb445..cb8b44a78450 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -135,6 +135,9 @@ void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 ns->zns.resources.open = ns->params.zns.mor != 0x ?
 ns->params.zns.mor + 1 : ns->zns.num_zones;
 
+QTAILQ_INIT(&ns->zns.resources.lru_open);
+QTAILQ_INIT(&ns->zns.resources.lru_active);
+
 for (int i = 0; i < ns->zns.num_zones; i++) {
 NvmeZone *zone = &ns->zns.zones[i];
 zone->zd = &ns->zns.zd[i];
@@ -158,6 +161,8 @@ void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 
 if (ns->zns.resources.active) {
 ns->zns.resources.active--;
+QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone,
+   lru_entry);
 break;
 }
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index cc637b3a68e9..1fab9d69261c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1105,11 +1105,47 @@ static inline void nvme_zone_reset_wp(NvmeZone *zone)
 zone->wp_staging = nvme_zslba(zone);
 }
 
+static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
+NvmeZoneState to);
+
+static uint16_t nvme_zrm_release_open(NvmeNamespace *ns)
+{
+NvmeZone *candidate;
+NvmeZoneState zs;
+uint16_t status;
+
+trace_pci_nvme_zrm_release_open(ns->params.nsid);
+
+QTAILQ_FOREACH(candidate, &ns->zns.resources.lru_open, lru_entry) {
+zs = nvme_zs(candidate);
+
+/* skip explicitly opened zones */
+if (zs == NVME_ZS_ZSEO) {
+continue;
+}
+
+/* skip zones that have in-flight writes */
+if (candidate->wp_staging != nvme_wp(candidate)) {
+continue;
+}
+
+status = nvme_zrm_transition(ns, candidate, NVME_ZS_ZSC);
+if (status) {
+return status;
+}
+
+return NVME_SUCCESS;
+}
+
+return NVME_TOO_MANY_OPEN_ZONES;
+}
+
 static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 NvmeZoneState to)
 {
 NvmeZoneState from = nvme_zs(zone);
 NvmeZoneDescriptor *zd = zone->zd;
+uint16_t status;
 
 trace_pci_nvme_zrm_transition(ns->params.nsid, nvme_zslba(zone), from, to);
 
@@ -1131,6 +1167,7 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 }
 
 ns->zns.resources.active--;
+QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry);
 
 break;
 
@@ -1141,11 +1178,15 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 }
 
 if (!ns->zns.resources.open) {
-return NVME_TOO_MANY_OPEN_ZONES;
+status = nvme_zrm_release_open(ns);
+if (status) {
+return status;
+}
 }
 
 ns->zns.resources.active--;
 ns->zns.resources.open--;
+QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_open, zone, lru_entry);
 
 break;
 
@@ -1172,11 +1213,15 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSF:
 case NVME_ZS_ZSRO:
 ns->zns.resources.active++;
+ns->zns.resources.open++;
+QTAILQ_REMOVE(&ns->zns.resources.lru_open, zone, lru_entry);
 
-/* fallthrough */
+break;
 
 case NVME_ZS_ZSC:
 ns->zns.resources.open++;
+QTAILQ_REMOVE(&ns->zns.resources.lru_open, zone, lru_entry);
+QTAILQ_INSERT_TAIL(&ns->zns.resources.lru_active, zone, lru_entry);
 
 break;
 
@@ -1201,16 +1246,22 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSF:
 case NVME_ZS_ZSRO:

[PATCH v3 3/9] hw/block/nvme: support namespace types

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Implement support for TP 4056 ("Namespace Types"). This adds the 'iocs'
(I/O Command Set) device parameter to the nvme-ns device.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt   |   3 +
 hw/block/nvme-ns.h|  11 ++-
 hw/block/nvme.h   |   3 +
 include/block/nvme.h  |  52 +++---
 block/nvme.c  |   4 +-
 hw/block/nvme-ns.c|  21 +++-
 hw/block/nvme.c   | 225 +++---
 hw/block/trace-events |   6 +-
 8 files changed, 267 insertions(+), 58 deletions(-)

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
index 56d393884e7a..619bd9ce4378 100644
--- a/docs/specs/nvme.txt
+++ b/docs/specs/nvme.txt
@@ -3,6 +3,9 @@ NVM Express Controller
 
 The nvme device (-device nvme) emulates an NVM Express Controller.
 
+  `iocs`; The "I/O Command Set" associated with the namespace. E.g. 0x0 for the
+ NVM Command Set (the default), or 0x2 for the Zoned Namespace Command Set.
+
 
 Reference Specifications
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index 8951fc5e86b8..5eb135a0b73f 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -21,6 +21,7 @@
 
 typedef struct NvmeNamespaceParams {
 uint32_t nsid;
+uint8_t  iocs;
 QemuUUID uuid;
 } NvmeNamespaceParams;
 
@@ -29,7 +30,8 @@ typedef struct NvmeNamespace {
 BlockConfblkconf;
 int32_t  bootindex;
 int64_t  size;
-NvmeIdNs id_ns;
+uint8_t  iocs;
+void *id_ns[NVME_IOCS_MAX];
 
 NvmeNamespaceParams params;
 
@@ -47,9 +49,14 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return -1;
 }
 
+static inline NvmeIdNsNvm *nvme_ns_id_nvm(NvmeNamespace *ns)
+{
+return ns->id_ns[NVME_IOCS_NVM];
+}
+
 static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
 {
-NvmeIdNs *id_ns = &ns->id_ns;
+NvmeIdNsNvm *id_ns = nvme_ns_id_nvm(ns);
 return &id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 }
 
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e080a2318a50..5c1de0ef16e7 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -112,6 +112,7 @@ typedef struct NvmeFeatureVal {
 };
 uint32_tasync_config;
 uint32_tvwc;
+uint32_tiocsci;
 } NvmeFeatureVal;
 
 typedef struct NvmeCtrl {
@@ -139,6 +140,7 @@ typedef struct NvmeCtrl {
 uint64_ttimestamp_set_qemu_clock_ms;/* QEMU clock time */
 uint64_tstarttime_ms;
 uint16_ttemperature;
+uint64_tiocscs[512];
 
 HostMemoryBackend *pmrdev;
 
@@ -154,6 +156,7 @@ typedef struct NvmeCtrl {
 NvmeSQueue  admin_sq;
 NvmeCQueue  admin_cq;
 NvmeIdCtrl  id_ctrl;
+void*id_ctrl_iocss[NVME_IOCS_MAX];
 NvmeFeatureVal  features;
 } NvmeCtrl;
 
diff --git a/include/block/nvme.h b/include/block/nvme.h
index f71376072762..443f5c7e8376 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -84,6 +84,7 @@ enum NvmeCapMask {
 
 enum NvmeCapCss {
 NVME_CAP_CSS_NVM= 1 << 0,
+NVME_CAP_CSS_CSI= 1 << 6,
 NVME_CAP_CSS_ADMIN_ONLY = 1 << 7,
 };
 
@@ -117,6 +118,7 @@ enum NvmeCcMask {
 
 enum NvmeCcCss {
 NVME_CC_CSS_NVM= 0x0,
+NVME_CC_CSS_ALL= 0x6,
 NVME_CC_CSS_ADMIN_ONLY = 0x7,
 };
 
@@ -388,6 +390,11 @@ enum NvmePmrmscMask {
 #define NVME_PMRMSC_SET_CBA(pmrmsc, val)   \
 (pmrmsc |= (uint64_t)(val & PMRMSC_CBA_MASK) << PMRMSC_CBA_SHIFT)
 
+enum NvmeCommandSet {
+NVME_IOCS_NVM = 0x0,
+NVME_IOCS_MAX = 0x1,
+};
+
 enum NvmeSglDescriptorType {
 NVME_SGL_DESCR_TYPE_DATA_BLOCK  = 0x0,
 NVME_SGL_DESCR_TYPE_BIT_BUCKET  = 0x1,
@@ -534,8 +541,13 @@ typedef struct QEMU_PACKED NvmeIdentify {
 uint64_trsvd2[2];
 uint64_tprp1;
 uint64_tprp2;
-uint32_tcns;
-uint32_trsvd11[5];
+uint8_t cns;
+uint8_t rsvd3;
+uint16_tcntid;
+uint16_tnvmsetid;
+uint8_t rsvd4;
+uint8_t csi;
+uint32_trsvd11[4];
 } NvmeIdentify;
 
 typedef struct QEMU_PACKED NvmeRwCmd {
@@ -627,8 +639,15 @@ typedef struct QEMU_PACKED NvmeAerResult {
 } NvmeAerResult;
 
 typedef struct QEMU_PACKED NvmeCqe {
-uint32_tresult;
-uint32_trsvd;
+union {
+struct {
+uint32_tdw0;
+uint32_tdw1;
+};
+
+uint64_t qw0;
+};
+
 uint16_tsq_head;
 uint16_tsq_id;
 uint16_tcid;
@@ -676,6 +695,10 @@ enum NvmeStatusCodes {
 NVME_FEAT_NOT_CHANGEABLE= 0x010e,
 NVME_FEAT_NOT_NS_SPEC   = 0x010f,
 NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
+NVME_IOCS_NOT_SUPPORTED = 0x0129,
+NVME_IOCS_NOT_ENABLED   = 0x012a,
+NVME_IOCS_COMB_REJECTED = 0x012b,
+NVME_INVALID_IOCS   = 0x012c,
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
@@ -785,10 +808,14 @@ typedef struct QEMU_PACKED NvmePSD {
 #define NVME_IDENTIFY

[PATCH v3 7/9] hw/block/nvme: add the zone append command

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Add the Zone Append command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h   |  6 ++
 include/block/nvme.h  |  7 +++
 hw/block/nvme.c   | 46 +++
 hw/block/trace-events |  1 +
 4 files changed, 60 insertions(+)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 658f15bb5fd1..de128e9bb174 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -16,6 +16,10 @@ typedef struct NvmeParams {
 uint32_t aer_max_queued;
 uint8_t  mdts;
 bool use_intel_id;
+
+struct {
+uint8_t zasl;
+} zns;
 } NvmeParams;
 
 typedef struct NvmeAsyncEvent {
@@ -42,6 +46,7 @@ static inline bool nvme_req_is_write(NvmeRequest *req)
 switch (req->cmd.opcode) {
 case NVME_CMD_WRITE:
 case NVME_CMD_WRITE_ZEROES:
+case NVME_CMD_ZONE_APPEND:
 return true;
 default:
 return false;
@@ -74,6 +79,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
 case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_ZONE_MGMT_SEND";
 case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_ZONE_MGMT_RECV";
+case NVME_CMD_ZONE_APPEND:  return "NVME_ZONED_CMD_ZONE_APPEND";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 4181d0068edb..aecefefd43b6 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -482,6 +482,7 @@ enum NvmeIoCommands {
 NVME_CMD_DSM= 0x09,
 NVME_CMD_ZONE_MGMT_SEND = 0x79,
 NVME_CMD_ZONE_MGMT_RECV = 0x7a,
+NVME_CMD_ZONE_APPEND= 0x7d,
 };
 
 typedef struct QEMU_PACKED NvmeDeleteQ {
@@ -1016,6 +1017,11 @@ enum NvmeIdCtrlLpa {
 NVME_LPA_EXTENDED = 1 << 2,
 };
 
+typedef struct QEMU_PACKED NvmeIdCtrlZns {
+uint8_t zasl;
+uint8_t rsvd1[4095];
+} NvmeIdCtrlZns;
+
 #define NVME_CTRL_SQES_MIN(sqes) ((sqes) & 0xf)
 #define NVME_CTRL_SQES_MAX(sqes) (((sqes) >> 4) & 0xf)
 #define NVME_CTRL_CQES_MIN(cqes) ((cqes) & 0xf)
@@ -1240,6 +1246,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSmartLog) != 512);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrl) != 4096);
+QEMU_BUILD_BUG_ON(sizeof(NvmeIdCtrlZns) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsNvm) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdNsZns) != 4096);
 QEMU_BUILD_BUG_ON(sizeof(NvmeSglDescriptor) != 16);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e02c89b1496b..947554c48b35 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -165,6 +165,8 @@ static const NvmeEffectsLog nvme_effects[NVME_IOCS_MAX] = {
 [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP,
 [NVME_CMD_ZONE_MGMT_SEND] = NVME_EFFECTS_CSUPP |
 NVME_EFFECTS_LBCC,
+[NVME_CMD_ZONE_APPEND]= NVME_EFFECTS_CSUPP |
+NVME_EFFECTS_LBCC,
 },
 },
 };
@@ -1040,6 +1042,21 @@ static inline uint16_t nvme_check_mdts(NvmeCtrl *n, 
size_t len)
 return NVME_SUCCESS;
 }
 
+static inline uint16_t nvme_check_zasl(NvmeCtrl *n, size_t len)
+{
+uint8_t zasl = n->params.zns.zasl;
+
+if (!zasl) {
+return nvme_check_mdts(n, len);
+}
+
+if (len > n->page_size << zasl) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+return NVME_SUCCESS;
+}
+
 static inline uint16_t nvme_check_bounds(NvmeCtrl *n, NvmeNamespace *ns,
  uint64_t slba, uint32_t nlb)
 {
@@ -1848,6 +1865,24 @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeRequest *req)
 zone = nvme_ns_zone(ns, slba);
 assert(zone);
 
+if (req->cmd.opcode == NVME_CMD_ZONE_APPEND) {
+uint64_t wp = zone->wp_staging;
+
+if (slba != nvme_zslba(zone)) {
+trace_pci_nvme_err_invalid_zslba(nvme_cid(req), slba);
+return NVME_INVALID_FIELD | NVME_DNR;
+}
+
+status = nvme_check_zasl(n, data_size);
+if (status) {
+trace_pci_nvme_err_zasl(nvme_cid(req), data_size);
+goto invalid;
+}
+
+slba = wp;
+rw->slba = req->cqe.qw0 = cpu_to_le64(wp);
+}
+
 status = nvme_check_zone(n, slba, nlb, req, zone);
 if (status) {
 goto invalid;
@@ -1942,6 +1977,7 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 return nvme_write_zeroes(n, req);
 case NVME_CMD_WRITE:
 case NVME_CMD_READ:
+case NVME_CMD_ZONE_APPEND:
 return nvme_rw(n, req);
 case NVME_CMD_ZONE_MGMT_SEND:
 return nvme_zone_mgmt_send(n, req);
@@ -3635,6 +3671,11 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 return;
 }
 
+if (params->zns.zasl && params->zns.zasl > params->mdts) {
+

[PATCH v3 0/9] hw/block/nvme: zoned namespace command set

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Updated version of my proposal.

Based-on: <20201014084324.333774-1-...@irrelevant.dk>

Changes for v3
~~

  * Rebased on nvme-next with "[PATCH v2] hw/block/nvme: add dulbe
support" applied.

  * "hw/block/nvme: add support for dulbe and block utilization tracking"
- Dropped from this series. This series instead builds on the
  support for DULBE that I added in "[PATCH v2] hw/block/nvme: add
  dulbe support", previously posted.

  * "hw/block/nvme: add the zone management send command"
- Use asynchronous discards.

  * "hw/block/nvme: add basic read/write for zoned namespaces"
  * "hw/block/nvme: add the zone management receive command"
  * "hw/block/nvme: add the zone management send command"
  * "hw/block/nvme: add the zone append command"
  * "hw/block/nvme: track and enforce zone resources"
  * "hw/block/nvme: allow open to close zone transitions by controller"
- In compliance with the concensus I dropped zone persistence
  support from all patches.

Changes for v2
~~

  * "hw/block/nvme: add support for dulbe and block utilization tracking"
- Factor out pstate init/load into separate functions.

- Fixed a stupid off-by-1 bug that would trigger when resetting the
  last zone.

- I added a more formalized pstate file format that includes a
  header. This is pretty similar to what is done in Dmitry's series,
  but with fewer fields. The device parameters for nvme-ns are still
  the "authoritative" ones, so if any parameters that influence LBA
  size, number of zones, etc. do not match, an error indicating the
  discrepancy will be produced. IIRC, Dmitry's version does the
  same.

  It is set up such that newer versions can load pstate files from
  older versions. The file format header is not unlike a standard
  nvme datastructure with reserved areas. This means that when
  adding new command sets that require persistent state, it is not
  needed to bump the version number, unless the header has to change
  dramatically.  This is demonstrated when the zoned namespace
  command set support is added in "hw/block/nvme: add basic
  read/write for zoned namespaces".

  * "hw/block/nvme: add basic read/write for zoned namespaces"
- The device will now transition all opened zones to Closed on
  "normal shutdown". You can force the "transition to Full" behavior
  by killing QEMU from the monitor.

  * "hw/block/nvme: add the zone append command"
- Slightly reordered the logic so a LBA Out of Range error is
  returned before Invalid Field in Command for normal read/write
  commands.

  * "hw/block/nvme: support zone active excursions"
- Dropped. Optional and non-critical.

  * "hw/block/nvme: support reset/finish recommended limits"
- Dropped. Optional and non-critical.

Gollu Appalanaidu (1):
  hw/block/nvme: add commands supported and effects log page

Klaus Jensen (8):
  hw/block/nvme: add uuid namespace parameter
  hw/block/nvme: support namespace types
  hw/block/nvme: add basic read/write for zoned namespaces
  hw/block/nvme: add the zone management receive command
  hw/block/nvme: add the zone management send command
  hw/block/nvme: add the zone append command
  hw/block/nvme: track and enforce zone resources
  hw/block/nvme: allow open to close zone transitions by controller

 docs/specs/nvme.txt   |   17 +
 hw/block/nvme-ns.h|  112 +++-
 hw/block/nvme.h   |   23 +
 include/block/nvme.h  |  216 ++-
 block/nvme.c  |4 +-
 hw/block/nvme-ns.c|  172 +-
 hw/block/nvme.c   | 1284 +++--
 hw/block/trace-events |   27 +-
 8 files changed, 1791 insertions(+), 64 deletions(-)

-- 
2.28.0

[PATCH v3 6/9] hw/block/nvme: add the zone management send command

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Add the Zone Management Send command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.h   |   2 +
 include/block/nvme.h  |  28 +++
 hw/block/nvme.c   | 384 +-
 hw/block/trace-events |  11 ++
 4 files changed, 420 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 523eef0bcad8..658f15bb5fd1 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -28,6 +28,7 @@ typedef struct NvmeRequest {
 struct NvmeNamespace*ns;
 BlockAIOCB  *aiocb;
 uint16_tstatus;
+void*opaque;
 NvmeCqe cqe;
 NvmeCmd cmd;
 BlockAcctCookie acct;
@@ -71,6 +72,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE";
 case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
+case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_ZONE_MGMT_SEND";
 case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_ZONE_MGMT_RECV";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index fdbb3a0d5490..4181d0068edb 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -480,6 +480,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_ZONE_MGMT_SEND = 0x79,
 NVME_CMD_ZONE_MGMT_RECV = 0x7a,
 };
 
@@ -593,6 +594,32 @@ enum {
 NVME_RW_PRINFO_PRCHK_REF= 1 << 10,
 };
 
+typedef struct QEMU_PACKED NvmeZoneManagementSendCmd {
+uint8_t opcode;
+uint8_t flags;
+uint16_tcid;
+uint32_tnsid;
+uint32_trsvd8[4];
+NvmeCmdDptr dptr;
+uint64_tslba;
+uint32_trsvd48;
+uint8_t zsa;
+uint8_t zsflags;
+uint16_trsvd54;
+uint32_trsvd56[2];
+} NvmeZoneManagementSendCmd;
+
+#define NVME_CMD_ZONE_MGMT_SEND_SELECT_ALL(zsflags) ((zsflags) & 0x1)
+
+typedef enum NvmeZoneManagementSendAction {
+NVME_CMD_ZONE_MGMT_SEND_CLOSE   = 0x1,
+NVME_CMD_ZONE_MGMT_SEND_FINISH  = 0x2,
+NVME_CMD_ZONE_MGMT_SEND_OPEN= 0x3,
+NVME_CMD_ZONE_MGMT_SEND_RESET   = 0x4,
+NVME_CMD_ZONE_MGMT_SEND_OFFLINE = 0x5,
+NVME_CMD_ZONE_MGMT_SEND_SET_ZDE = 0x10,
+} NvmeZoneManagementSendAction;
+
 typedef struct QEMU_PACKED NvmeZoneManagementRecvCmd {
 uint8_t opcode;
 uint8_t flags;
@@ -1206,6 +1233,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementSendCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementRecvCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index a92cd02d8955..e02c89b1496b 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -163,6 +163,8 @@ static const NvmeEffectsLog nvme_effects[NVME_IOCS_MAX] = {
 .iocs = {
 NVME_EFFECTS_NVM_INITIALIZER,
 [NVME_CMD_ZONE_MGMT_RECV] = NVME_EFFECTS_CSUPP,
+[NVME_CMD_ZONE_MGMT_SEND] = NVME_EFFECTS_CSUPP |
+NVME_EFFECTS_LBCC,
 },
 },
 };
@@ -1080,6 +1082,12 @@ static uint16_t nvme_check_dulbe(NvmeNamespace *ns, 
uint64_t slba,
 return NVME_SUCCESS;
 }
 
+static inline void nvme_zone_reset_wp(NvmeZone *zone)
+{
+zone->zd->wp = zone->zd->zslba;
+zone->wp_staging = nvme_zslba(zone);
+}
+
 static uint16_t nvme_zrm_transition(NvmeNamespace *ns, NvmeZone *zone,
 NvmeZoneState to)
 {
@@ -1100,6 +1108,10 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSEO:
 switch (to) {
 case NVME_ZS_ZSE:
+nvme_zone_reset_wp(zone);
+
+/* fallthrough */
+
 case NVME_ZS_ZSO:
 NVME_ZA_CLEAR_ALL(zd->za);
 
@@ -1120,6 +1132,10 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSC:
 switch (to) {
 case NVME_ZS_ZSE:
+nvme_zone_reset_wp(zone);
+
+/* fallthrough */
+
 case NVME_ZS_ZSO:
 NVME_ZA_CLEAR_ALL(zd->za);
 
@@ -1152,8 +1168,12 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 case NVME_ZS_ZSF:
 switch (to) {
 case NVME_ZS_ZSE:
+nvme_zone_reset_wp(zone);
+
+/* fallthrough */
+
 case NVME_ZS_ZSO:
-NVME_ZA_CLEAR_ALL(zd->za);
+NVME_ZA_CLEAR_ALL(zone->zd->za);
 
 /* fallthrough */
 
@@ -1254,6 +1274,354 @@ stati

Re: [PATCH v6 02/11] hw/block/nvme: Generate namespace UUIDs

2020-10-14 Thread Klaus Jensen

On Oct 14 06:42, Dmitry Fomichev wrote:
> In NVMe 1.4, a namespace must report an ID descriptor of UUID type
> if it doesn't support EUI64 or NGUID. Add a new namespace property,
> "uuid", that provides the user the option to either specify the UUID
> explicitly or have a UUID generated automatically every time a
> namespace is initialized.
> 
> Suggested-by: Klaus Jansen 
> Signed-off-by: Dmitry Fomichev 

Reviewed-by: Klaus Jensen 

> ---
>  hw/block/nvme-ns.c | 1 +
>  hw/block/nvme-ns.h | 1 +
>  hw/block/nvme.c| 9 +
>  3 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
> index b69cdaf27e..de735eb9f3 100644
> --- a/hw/block/nvme-ns.c
> +++ b/hw/block/nvme-ns.c
> @@ -129,6 +129,7 @@ static void nvme_ns_realize(DeviceState *dev, Error 
> **errp)
>  static Property nvme_ns_props[] = {
>  DEFINE_BLOCK_PROPERTIES(NvmeNamespace, blkconf),
>  DEFINE_PROP_UINT32("nsid", NvmeNamespace, params.nsid, 0),
> +DEFINE_PROP_UUID("uuid", NvmeNamespace, params.uuid),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> index ea8c2f785d..a38071884a 100644
> --- a/hw/block/nvme-ns.h
> +++ b/hw/block/nvme-ns.h
> @@ -21,6 +21,7 @@
>  
>  typedef struct NvmeNamespaceParams {
>  uint32_t nsid;
> +QemuUUID uuid;
>  } NvmeNamespaceParams;
>  
>  typedef struct NvmeNamespace {
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ee0eff98da..89d91926d9 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1571,6 +1571,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
> NvmeRequest *req)
>  
>  static uint16_t nvme_identify_ns_descr_list(NvmeCtrl *n, NvmeRequest *req)
>  {
> +NvmeNamespace *ns;
>  NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
>  uint32_t nsid = le32_to_cpu(c->nsid);
>  uint8_t list[NVME_IDENTIFY_DATA_SIZE];
> @@ -1590,7 +1591,8 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
> *n, NvmeRequest *req)
>  return NVME_INVALID_NSID | NVME_DNR;
>  }
>  
> -if (unlikely(!nvme_ns(n, nsid))) {
> +ns = nvme_ns(n, nsid);
> +if (unlikely(!ns)) {
>  return NVME_INVALID_FIELD | NVME_DNR;
>  }
>  
> @@ -1599,12 +1601,11 @@ static uint16_t nvme_identify_ns_descr_list(NvmeCtrl 
> *n, NvmeRequest *req)
>  /*
>   * Because the NGUID and EUI64 fields are 0 in the Identify Namespace 
> data
>   * structure, a Namespace UUID (nidt = 0x3) must be reported in the
> - * Namespace Identification Descriptor. Add a very basic Namespace UUID
> - * here.
> + * Namespace Identification Descriptor. Add the namespace UUID here.
>   */
>  ns_descrs->uuid.hdr.nidt = NVME_NIDT_UUID;
>  ns_descrs->uuid.hdr.nidl = NVME_NIDT_UUID_LEN;
> -stl_be_p(&ns_descrs->uuid.v, nsid);
> +memcpy(&ns_descrs->uuid.v, ns->params.uuid.data, NVME_NIDT_UUID_LEN);
>  
>  return nvme_dma(n, list, NVME_IDENTIFY_DATA_SIZE,
>  DMA_DIRECTION_FROM_DEVICE, req);
> -- 
> 2.21.0
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.


signature.asc
Description: PGP signature

[PATCH v3 5/9] hw/block/nvme: add the zone management receive command

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Add the Zone Management Receive command.

Signed-off-by: Klaus Jensen 
---
 hw/block/nvme-ns.h|   8 +++
 hw/block/nvme.h   |   1 +
 include/block/nvme.h  |  46 +
 hw/block/nvme-ns.c|  12 +++-
 hw/block/nvme.c   | 146 ++
 hw/block/trace-events |   1 +
 6 files changed, 213 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index dd311012213e..a273e7f7ed4b 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -27,11 +27,13 @@ typedef struct NvmeNamespaceParams {
 struct {
 uint64_t zcap;
 uint64_t zsze;
+uint8_t  zdes;
 } zns;
 } NvmeNamespaceParams;
 
 typedef struct NvmeZone {
 NvmeZoneDescriptor *zd;
+uint8_t*zde;
 
 uint64_t wp_staging;
 } NvmeZone;
@@ -55,6 +57,7 @@ typedef struct NvmeNamespace {
 
 NvmeZone   *zones;
 NvmeZoneDescriptor *zd;
+uint8_t*zde;
 } zns;
 } NvmeNamespace;
 
@@ -105,6 +108,11 @@ static inline uint64_t nvme_ns_zsze(NvmeNamespace *ns)
 return nvme_ns_lbafe(ns)->zsze;
 }
 
+static inline size_t nvme_ns_zdes_bytes(NvmeNamespace *ns)
+{
+return ns->params.zns.zdes << 6;
+}
+
 /* calculate the number of LBAs that the namespace can accomodate */
 static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
 {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index f66ed9ab7eff..523eef0bcad8 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -71,6 +71,7 @@ static inline const char *nvme_io_opc_str(uint8_t opc)
 case NVME_CMD_WRITE:return "NVME_NVM_CMD_WRITE";
 case NVME_CMD_READ: return "NVME_NVM_CMD_READ";
 case NVME_CMD_WRITE_ZEROES: return "NVME_NVM_CMD_WRITE_ZEROES";
+case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_ZONE_MGMT_RECV";
 default:return "NVME_NVM_CMD_UNKNOWN";
 }
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index cf566032a9fa..fdbb3a0d5490 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -480,6 +480,7 @@ enum NvmeIoCommands {
 NVME_CMD_COMPARE= 0x05,
 NVME_CMD_WRITE_ZEROES   = 0x08,
 NVME_CMD_DSM= 0x09,
+NVME_CMD_ZONE_MGMT_RECV = 0x7a,
 };
 
 typedef struct QEMU_PACKED NvmeDeleteQ {
@@ -592,6 +593,44 @@ enum {
 NVME_RW_PRINFO_PRCHK_REF= 1 << 10,
 };
 
+typedef struct QEMU_PACKED NvmeZoneManagementRecvCmd {
+uint8_t opcode;
+uint8_t flags;
+uint16_tcid;
+uint32_tnsid;
+uint8_t rsvd8[16];
+NvmeCmdDptr dptr;
+uint64_tslba;
+uint32_tnumdw;
+uint8_t zra;
+uint8_t zrasp;
+uint8_t zrasf;
+uint8_t rsvd55[9];
+} NvmeZoneManagementRecvCmd;
+
+typedef enum NvmeZoneManagementRecvAction {
+NVME_CMD_ZONE_MGMT_RECV_REPORT_ZONES  = 0x0,
+NVME_CMD_ZONE_MGMT_RECV_EXTENDED_REPORT_ZONES = 0x1,
+} NvmeZoneManagementRecvAction;
+
+typedef enum NvmeZoneManagementRecvActionSpecificField {
+NVME_CMD_ZONE_MGMT_RECV_LIST_ALL  = 0x0,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSE  = 0x1,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSIO = 0x2,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSEO = 0x3,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSC  = 0x4,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSF  = 0x5,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSRO = 0x6,
+NVME_CMD_ZONE_MGMT_RECV_LIST_ZSO  = 0x7,
+} NvmeZoneManagementRecvActionSpecificField;
+
+#define NVME_CMD_ZONE_MGMT_RECEIVE_PARTIAL 0x1
+
+typedef struct QEMU_PACKED NvmeZoneReportHeader {
+uint64_t num_zones;
+uint8_t  rsvd[56];
+} NvmeZoneReportHeader;
+
 typedef struct QEMU_PACKED NvmeDsmCmd {
 uint8_t opcode;
 uint8_t flags;
@@ -810,6 +849,12 @@ typedef struct QEMU_PACKED NvmeZoneDescriptor {
 uint8_t  rsvd32[32];
 } NvmeZoneDescriptor;
 
+#define NVME_ZA_ZDEV (1 << 7)
+
+#define NVME_ZA_SET(za, attrs)   ((za) |= (attrs))
+#define NVME_ZA_CLEAR(za, attrs) ((za) &= ~(attrs))
+#define NVME_ZA_CLEAR_ALL(za)((za) = 0x0)
+
 enum NvmeSmartWarn {
 NVME_SMART_SPARE  = 1 << 0,
 NVME_SMART_TEMPERATURE= 1 << 1,
@@ -1161,6 +1206,7 @@ static inline void _nvme_check_size(void)
 QEMU_BUILD_BUG_ON(sizeof(NvmeIdentify) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRwCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeDsmCmd) != 64);
+QEMU_BUILD_BUG_ON(sizeof(NvmeZoneManagementRecvCmd) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeRangeType) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeErrorLog) != 64);
 QEMU_BUILD_BUG_ON(sizeof(NvmeFwSlotInfoLog) != 512);
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 1a5e99e50578..0922b67dd6d8 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -59,6 +59,9 @@ static void nvme_ns_zns_init_zones(NvmeNamespace *ns)
 
 zone = &ns->zns.zones[i];
 zone->zd = &ns->zns.zd[i];
+if (ns->params.zns.zdes) {
+zone->zde = &ns->zns.zde[i];
+}

[PATCH v3] migration/block-dirty-bitmap: fix uninitialized variable warning

2020-10-14 Thread Chen Qun

A default value is provided for the variable 'bitmap_name' to avoid compiler 
warning.

The compiler show warning:
migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’
may be used uninitialized in this function [-Wmaybe-uninitialized]
   g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name));
   ^~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
v2->v3: Placing the declaration at the beginning of the block(Base on Max's 
suggestion).

Cc: Max Reitz 
Cc: Vladimir Sementsov-Ogievskiy 
Cc: Laurent Vivier 
Cc: Li Qiang 
---
 migration/block-dirty-bitmap.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 5bef793ac0..114987961a 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1071,18 +1071,15 @@ static int dirty_bitmap_load_header(QEMUFile *f, 
DBMLoadState *s,
 return -EINVAL;
 }
 
-if (!s->cancelled) {
-if (bitmap_alias_map) {
-bitmap_name = g_hash_table_lookup(bitmap_alias_map,
-  s->bitmap_alias);
-if (!bitmap_name) {
-error_report("Error: Unknown bitmap alias '%s' on node "
- "'%s' (alias '%s')", s->bitmap_alias,
- s->bs->node_name, s->node_alias);
-cancel_incoming_locked(s);
-}
-} else {
-bitmap_name = s->bitmap_alias;
+bitmap_name = s->bitmap_alias;
+if (!s->cancelled && bitmap_alias_map) {
+bitmap_name = g_hash_table_lookup(bitmap_alias_map,
+  s->bitmap_alias);
+if (!bitmap_name) {
+error_report("Error: Unknown bitmap alias '%s' on node "
+ "'%s' (alias '%s')", s->bitmap_alias,
+ s->bs->node_name, s->node_alias);
+cancel_incoming_locked(s);
 }
 }
 
-- 
2.23.0

[PATCH v3 8/9] hw/block/nvme: track and enforce zone resources

2020-10-14 Thread Klaus Jensen

From: Klaus Jensen 

Track number of open/active resources.

Signed-off-by: Klaus Jensen 
---
 docs/specs/nvme.txt  |  6 
 hw/block/nvme-ns.h   |  7 +
 include/block/nvme.h |  2 ++
 hw/block/nvme-ns.c   | 25 +++--
 hw/block/nvme.c  | 66 +++-
 5 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/docs/specs/nvme.txt b/docs/specs/nvme.txt
index 80cb34406255..03bb4d9516b4 100644
--- a/docs/specs/nvme.txt
+++ b/docs/specs/nvme.txt
@@ -14,6 +14,12 @@ The nvme device (-device nvme) emulates an NVM Express 
Controller.
  zns.zcap; if the zone capacity is a power of two, the zone size will be
  set to that, otherwise it will default to the next power of two.
 
+  `zns.mar`; Specifies the number of active resources available. This is a 0s
+ based value.
+
+  `zns.mor`; Specifies the number of open resources available. This is a 0s
+ based value.
+
 
 Reference Specifications
 
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
index a273e7f7ed4b..3d0269eef6f0 100644
--- a/hw/block/nvme-ns.h
+++ b/hw/block/nvme-ns.h
@@ -28,6 +28,8 @@ typedef struct NvmeNamespaceParams {
 uint64_t zcap;
 uint64_t zsze;
 uint8_t  zdes;
+uint32_t mar;
+uint32_t mor;
 } zns;
 } NvmeNamespaceParams;
 
@@ -58,6 +60,11 @@ typedef struct NvmeNamespace {
 NvmeZone   *zones;
 NvmeZoneDescriptor *zd;
 uint8_t*zde;
+
+struct {
+uint32_t open;
+uint32_t active;
+} resources;
 } zns;
 } NvmeNamespace;
 
diff --git a/include/block/nvme.h b/include/block/nvme.h
index aecefefd43b6..4ca5f3fb15eb 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -775,6 +775,8 @@ enum NvmeStatusCodes {
 NVME_ZONE_IS_READ_ONLY  = 0x01ba,
 NVME_ZONE_IS_OFFLINE= 0x01bb,
 NVME_ZONE_INVALID_WRITE = 0x01bc,
+NVME_TOO_MANY_ACTIVE_ZONES  = 0x01bd,
+NVME_TOO_MANY_OPEN_ZONES= 0x01be,
 NVME_INVALID_ZONE_STATE_TRANSITION = 0x01bf,
 NVME_WRITE_FAULT= 0x0280,
 NVME_UNRECOVERED_READ   = 0x0281,
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 0922b67dd6d8..a01cc5eeb445 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -92,8 +92,8 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 ns->zns.zde = g_malloc0_n(ns->zns.num_zones, nvme_ns_zdes_bytes(ns));
 }
 
-id_ns_zns->mar = 0x;
-id_ns_zns->mor = 0x;
+id_ns_zns->mar = cpu_to_le32(ns->params.zns.mar);
+id_ns_zns->mor = cpu_to_le32(ns->params.zns.mor);
 }
 
 static void nvme_ns_init(NvmeNamespace *ns)
@@ -130,6 +130,11 @@ static void nvme_ns_init(NvmeNamespace *ns)
 
 void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 {
+ns->zns.resources.active = ns->params.zns.mar != 0x ?
+ns->params.zns.mar + 1 : ns->zns.num_zones;
+ns->zns.resources.open = ns->params.zns.mor != 0x ?
+ns->params.zns.mor + 1 : ns->zns.num_zones;
+
 for (int i = 0; i < ns->zns.num_zones; i++) {
 NvmeZone *zone = &ns->zns.zones[i];
 zone->zd = &ns->zns.zd[i];
@@ -148,9 +153,15 @@ void nvme_ns_zns_init_zone_state(NvmeNamespace *ns)
 if (nvme_wp(zone) == nvme_zslba(zone) &&
 !(zone->zd->za & NVME_ZA_ZDEV)) {
 nvme_zs_set(zone, NVME_ZS_ZSE);
+break;
 }
 
-break;
+if (ns->zns.resources.active) {
+ns->zns.resources.active--;
+break;
+}
+
+/* fallthrough */
 
 case NVME_ZS_ZSIO:
 case NVME_ZS_ZSEO:
@@ -207,6 +218,12 @@ static int nvme_ns_check_constraints(NvmeNamespace *ns, 
Error **errp)
 return -1;
 }
 
+if (ns->params.zns.mor > ns->params.zns.mar) {
+error_setg(errp, "maximum open resources (zns.mor) must be less "
+   "than or equal to maximum active resources (zns.mar)");
+return -1;
+}
+
 break;
 
 default:
@@ -272,6 +289,8 @@ static Property nvme_ns_props[] = {
 DEFINE_PROP_UINT64("zns.zcap", NvmeNamespace, params.zns.zcap, 0),
 DEFINE_PROP_UINT64("zns.zsze", NvmeNamespace, params.zns.zsze, 0),
 DEFINE_PROP_UINT8("zns.zdes", NvmeNamespace, params.zns.zdes, 0),
+DEFINE_PROP_UINT32("zns.mar", NvmeNamespace, params.zns.mar, 0x),
+DEFINE_PROP_UINT32("zns.mor", NvmeNamespace, params.zns.mor, 0x),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 947554c48b35..cc637b3a68e9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1119,6 +1119,40 @@ static uint16_t nvme_zrm_transition(NvmeNamespace *ns, 
NvmeZone *zone,
 
 switch (from) {
 case NVME_ZS_ZSE:
+switch (to) {
+case NVME_ZS_ZSF:
+case NVME_ZS_ZSRO:
+case NVME_ZS_ZSO:
+break;
+
+c

[PATCH 2/9] util/vfio-helpers: Trace PCI I/O config accesses

2020-10-14 Thread Philippe Mathieu-Daudé

We sometime get kernel panic with some devices on Aarch64
hosts. Alex Williamson suggests it might be broken PCIe
root complex. Add trace event to record the latest I/O
access before crashing. In case, assert our accesses are
aligned.

Signed-off-by: Philippe Mathieu-Daudé 
---
Cc: Alex Williamson 
---
 util/vfio-helpers.c | 8 
 util/trace-events   | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 14a549510fe..1d4efafcaa4 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -227,6 +227,10 @@ static int qemu_vfio_pci_read_config(QEMUVFIOState *s, 
void *buf,
 {
 int ret;
 
+trace_qemu_vfio_pci_read_config(buf, ofs, size,
+s->config_region_info.offset,
+s->config_region_info.size);
+assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
 do {
 ret = pread(s->device, buf, size, s->config_region_info.offset + ofs);
 } while (ret == -1 && errno == EINTR);
@@ -237,6 +241,10 @@ static int qemu_vfio_pci_write_config(QEMUVFIOState *s, 
void *buf, int size, int
 {
 int ret;
 
+trace_qemu_vfio_pci_write_config(buf, ofs, size,
+ s->config_region_info.offset,
+ s->config_region_info.size);
+assert(QEMU_IS_ALIGNED(s->config_region_info.offset + ofs, size));
 do {
 ret = pwrite(s->device, buf, size, s->config_region_info.offset + ofs);
 } while (ret == -1 && errno == EINTR);
diff --git a/util/trace-events b/util/trace-events
index 24c31803b01..c048f85f828 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -85,3 +85,5 @@ qemu_vfio_new_mapping(void *s, void *host, size_t size, int 
index, uint64_t iova
 qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p 
host %p size 0x%zx iova 0x%"PRIx64
 qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t 
*iova) "s %p host %p size 0x%zx temporary %d iova %p"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
+qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
+qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
-- 
2.26.2

Re: [PATCH v4 2/7] nbd: Add new qemu:allocation-depth metadata context

2020-10-14 Thread Vladimir Sementsov-Ogievskiy


10.10.2020 00:55, Eric Blake wrote:

'qemu-img map' provides a way to determine which extents of an image
come from the top layer vs. inherited from a backing chain.  This is
useful information worth exposing over NBD.  There is a proposal to
add a QMP command block-dirty-bitmap-populate which can create a dirty
bitmap that reflects allocation information, at which point the
qemu:dirty-bitmap:NAME metadata context can expose that information
via the creation of a temporary bitmap, but we can shorten the effort
by adding a new qemu:allocation-depth metadata context that does the
same thing without an intermediate bitmap (this patch does not
eliminate the need for that proposal, as it will have other uses as
well).

For this patch, I just encoded a tri-state value (unallocated, from
this layer, from any of the backing layers); an obvious extension
would be to provide the actual depth in bits 31-4 while keeping bits
1-0 as a tri-state (leaving bits 3-2 unused, for ease of reading depth
from a hex number).  But adding this extension would require
bdrv_is_allocated_above to return a depth number.

While documenting things, remember that although the NBD protocol has
NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to
'metadata context', which is a more apt description of what is
actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting
metadata by passing one or more context names.  So I also touched up
some existing wording to prefer the term 'metadata context' where it
makes sense.

Note that this patch does not actually enable any way to request a
server to enable this context; that will come in the next patch.

Signed-off-by: Eric Blake 


Reviewed-by: Vladimir Sementsov-Ogievskiy 


---
  docs/interop/nbd.txt | 27 ++---


[..]


+In the allocation depth context, bits 0 and 1 form a tri-state value:
+
+bits 0-1: 00: NBD_STATE_DEPTH_UNALLOC, the extent is unallocated
+  01: NBD_STATE_DEPTH_LOCAL, the extent is allocated in the
+  top level of the image


Hmm. I always thought that "image" == file, so backing chain is a chain of 
images,
not a several levels of one image. If it is so, than it should be "the top level 
image".
And "levels of the image" may designate internal qcow2 snapshots unrelated 
here..


--
Best regards,
Vladimir

[PATCH 1/9] util/vfio-helpers: Improve reporting unsupported IOMMU type

2020-10-14 Thread Philippe Mathieu-Daudé

Change the confuse "VFIO IOMMU check failed" error message by
the explicit "VFIO IOMMU Type1 is not supported" once.

Example on POWER:

 $ qemu-system-ppc64 -drive 
if=none,id=nvme0,file=nvme://0001:01:00.0/1,format=raw
 qemu-system-ppc64: -drive 
if=none,id=nvme0,file=nvme://0001:01:00.0/1,format=raw: VFIO IOMMU Type1 is not 
supported

Suggested-by: Alex Williamson 
Reviewed-by: Fam Zheng 
Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c469beb0616..14a549510fe 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -300,7 +300,7 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char 
*device,
 }
 
 if (!ioctl(s->container, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU)) {
-error_setg_errno(errp, errno, "VFIO IOMMU check failed");
+error_setg_errno(errp, errno, "VFIO IOMMU Type1 is not supported");
 ret = -EINVAL;
 goto fail_container;
 }
-- 
2.26.2

[PATCH 3/9] util/vfio-helpers: Trace PCI BAR region info

2020-10-14 Thread Philippe Mathieu-Daudé

For debug purpose, trace BAR regions info.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 8 
 util/trace-events   | 1 +
 2 files changed, 9 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 1d4efafcaa4..cd6287c3a98 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -136,6 +136,7 @@ static inline void assert_bar_index_valid(QEMUVFIOState *s, 
int index)
 
 static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int index, Error **errp)
 {
+g_autofree char *barname = NULL;
 assert_bar_index_valid(s, index);
 s->bar_region_info[index] = (struct vfio_region_info) {
 .index = VFIO_PCI_BAR0_REGION_INDEX + index,
@@ -145,6 +146,10 @@ static int qemu_vfio_pci_init_bar(QEMUVFIOState *s, int 
index, Error **errp)
 error_setg_errno(errp, errno, "Failed to get BAR region info");
 return -errno;
 }
+barname = g_strdup_printf("bar[%d]", index);
+trace_qemu_vfio_region_info(barname, s->bar_region_info[index].offset,
+s->bar_region_info[index].size,
+s->bar_region_info[index].cap_offset);
 
 return 0;
 }
@@ -416,6 +421,9 @@ static int qemu_vfio_init_pci(QEMUVFIOState *s, const char 
*device,
 ret = -errno;
 goto fail;
 }
+trace_qemu_vfio_region_info("config", s->config_region_info.offset,
+s->config_region_info.size,
+s->config_region_info.cap_offset);
 
 for (i = 0; i < ARRAY_SIZE(s->bar_region_info); i++) {
 ret = qemu_vfio_pci_init_bar(s, i, errp);
diff --git a/util/trace-events b/util/trace-events
index c048f85f828..4d40c74a21f 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -87,3 +87,4 @@ qemu_vfio_dma_map(void *s, void *host, size_t size, bool 
temporary, uint64_t *io
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
+qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, 
uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs 
%"PRId32
-- 
2.26.2

[PATCH 0/9] util/vfio-helpers: Improve debugging experience

2020-10-14 Thread Philippe Mathieu-Daudé

A bunch of boring patches that have been proven helpful
while debugging.

Philippe Mathieu-Daudé (9):
  util/vfio-helpers: Improve reporting unsupported IOMMU type
  util/vfio-helpers: Trace PCI I/O config accesses
  util/vfio-helpers: Trace PCI BAR region info
  util/vfio-helpers: Trace where BARs are mapped
  util/vfio-helpers: Improve DMA trace events
  util/vfio-helpers: Convert vfio_dump_mapping to trace events
  util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error
  util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error
  util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report()

 include/qemu/vfio-helpers.h |  2 +-
 block/nvme.c| 14 
 util/vfio-helpers.c | 66 +
 util/trace-events   | 10 --
 4 files changed, 54 insertions(+), 38 deletions(-)

-- 
2.26.2

[PATCH 6/9] util/vfio-helpers: Convert vfio_dump_mapping to trace events

2020-10-14 Thread Philippe Mathieu-Daudé

The QEMU_VFIO_DEBUG definition is only modifiable at build-time.
Trace events can be enabled at run-time. As we prefer the latter,
convert qemu_vfio_dump_mappings() to use trace events instead
of fprintf().

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 19 ---
 util/trace-events   |  1 +
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c24a510df82..73f7bfa7540 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -521,23 +521,12 @@ QEMUVFIOState *qemu_vfio_open_pci(const char *device, 
Error **errp)
 return s;
 }
 
-static void qemu_vfio_dump_mapping(IOVAMapping *m)
-{
-if (QEMU_VFIO_DEBUG) {
-printf("  vfio mapping %p %" PRIx64 " to %" PRIx64 "\n", m->host,
-   (uint64_t)m->size, (uint64_t)m->iova);
-}
-}
-
 static void qemu_vfio_dump_mappings(QEMUVFIOState *s)
 {
-int i;
-
-if (QEMU_VFIO_DEBUG) {
-printf("vfio mappings\n");
-for (i = 0; i < s->nr_mappings; ++i) {
-qemu_vfio_dump_mapping(&s->mappings[i]);
-}
+for (int i = 0; i < s->nr_mappings; ++i) {
+trace_qemu_vfio_dump_mapping(s->mappings[i].host,
+ s->mappings[i].iova,
+ s->mappings[i].size);
 }
 }
 
diff --git a/util/trace-events b/util/trace-events
index 8598066acdb..7faad2a718c 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -80,6 +80,7 @@ qemu_mutex_unlock(void *mutex, const char *file, const int 
line) "released mutex
 qemu_vfio_dma_reset_temporary(void *s) "s %p"
 qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
 qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
+qemu_vfio_dump_mapping(void *host, uint64_t iova, size_t size) "vfio mapping 
%p to iova 0x%08" PRIx64 " size 0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t 
iova) "s %p host %p size 0x%zx index %d iova 0x%"PRIx64
 qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p 
host %p <-> iova 0x%"PRIx64 " size 0x%zx"
-- 
2.26.2

Re: [PATCH v2 1/4] MAINTAINERS: Put myself forward for MIPS target

2020-10-14 Thread Thomas Huth

On 13/10/2020 12.16, Philippe Mathieu-Daudé wrote:
> To avoid the MIPS target being orphan, volunteer to keep an eye
> on it and put together pull requests.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  MAINTAINERS | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 426f52c1f63..d59e5c05c10 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -221,10 +221,11 @@ F: hw/microblaze/
>  F: disas/microblaze.c
>  
>  MIPS TCG CPUs
> +M: Philippe Mathieu-Daudé 
>  R: Aurelien Jarno 
>  R: Jiaxun Yang 
>  R: Aleksandar Rikalo 
> -S: Orphaned
> +S: Odd Fixes
>  F: target/mips/
>  F: default-configs/*mips*
>  F: disas/*mips*
> @@ -2815,11 +2816,12 @@ F: tcg/i386/
>  F: disas/i386.c
>  
>  MIPS TCG target
> +M: Philippe Mathieu-Daudé 
>  R: Aurelien Jarno 
>  R: Huacai Chen 
>  R: Jiaxun Yang 
>  R: Aleksandar Rikalo 
> -S: Orphaned
> +S: Odd Fixes
>  F: tcg/mips/

It would be good to get some feedback from the people with the "R:" entries
here, too ... but FWIW:

Reviewed-by: Thomas Huth

[PATCH 5/9] util/vfio-helpers: Improve DMA trace events

2020-10-14 Thread Philippe Mathieu-Daudé

For debugging purpose, trace where DMA regions are mapped.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 3 ++-
 util/trace-events   | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 278c54902e7..c24a510df82 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -627,7 +627,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 .vaddr = (uintptr_t)host,
 .size = size,
 };
-trace_qemu_vfio_do_mapping(s, host, size, iova);
+trace_qemu_vfio_do_mapping(s, host, iova, size);
 
 if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, &dma_map)) {
 error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
@@ -783,6 +783,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 }
 }
 }
+trace_qemu_vfio_dma_mapped(s, host, iova0, size);
 if (iova) {
 *iova = iova0;
 }
diff --git a/util/trace-events b/util/trace-events
index 50652761a58..8598066acdb 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -82,8 +82,9 @@ qemu_vfio_ram_block_added(void *s, void *p, size_t size) "s 
%p host %p size 0x%z
 qemu_vfio_ram_block_removed(void *s, void *p, size_t size) "s %p host %p size 
0x%zx"
 qemu_vfio_find_mapping(void *s, void *p) "s %p host %p"
 qemu_vfio_new_mapping(void *s, void *host, size_t size, int index, uint64_t 
iova) "s %p host %p size 0x%zx index %d iova 0x%"PRIx64
-qemu_vfio_do_mapping(void *s, void *host, size_t size, uint64_t iova) "s %p 
host %p size 0x%zx iova 0x%"PRIx64
-qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t 
*iova) "s %p host %p size 0x%zx temporary %d iova %p"
+qemu_vfio_do_mapping(void *s, void *host, uint64_t iova, size_t size) "s %p 
host %p <-> iova 0x%"PRIx64 " size 0x%zx"
+qemu_vfio_dma_map(void *s, void *host, size_t size, bool temporary, uint64_t 
*iova) "s %p host %p size 0x%zx temporary %d &iova %p"
+qemu_vfio_dma_mapped(void *s, void *host, uint64_t iova, size_t size) "s %p 
host %p <-> iova 0x%"PRIx64" size 0x%zx"
 qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
-- 
2.26.2

[PATCH 8/9] util/vfio-helpers: Let qemu_vfio_do_mapping() propagate Error

2020-10-14 Thread Philippe Mathieu-Daudé

Pass qemu_vfio_do_mapping() an Error* argument so it can propagate
any error to callers. Replace error_report() which only report
to the monitor by the more generic error_setg_errno().

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index c03fe0b7156..2c4598d7faa 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -609,7 +609,7 @@ static IOVAMapping *qemu_vfio_add_mapping(QEMUVFIOState *s,
 
 /* Do the DMA mapping with VFIO. */
 static int qemu_vfio_do_mapping(QEMUVFIOState *s, void *host, size_t size,
-uint64_t iova)
+uint64_t iova, Error **errp)
 {
 struct vfio_iommu_type1_dma_map dma_map = {
 .argsz = sizeof(dma_map),
@@ -621,7 +621,7 @@ static int qemu_vfio_do_mapping(QEMUVFIOState *s, void 
*host, size_t size,
 trace_qemu_vfio_do_mapping(s, host, iova, size);
 
 if (ioctl(s->container, VFIO_IOMMU_MAP_DMA, &dma_map)) {
-error_report("VFIO_MAP_DMA failed: %s", strerror(errno));
+error_setg_errno(errp, errno, "VFIO_MAP_DMA failed");
 return -errno;
 }
 return 0;
@@ -757,7 +757,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 goto out;
 }
 assert(qemu_vfio_verify_mappings(s));
-ret = qemu_vfio_do_mapping(s, host, size, iova0);
+ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
 if (ret) {
 qemu_vfio_undo_mapping(s, mapping, NULL);
 goto out;
@@ -768,7 +768,7 @@ int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t 
size,
 ret = -ENOMEM;
 goto out;
 }
-ret = qemu_vfio_do_mapping(s, host, size, iova0);
+ret = qemu_vfio_do_mapping(s, host, size, iova0, errp);
 if (ret) {
 goto out;
 }
-- 
2.26.2

[PATCH 7/9] util/vfio-helpers: Let qemu_vfio_dma_map() propagate Error

2020-10-14 Thread Philippe Mathieu-Daudé

Currently qemu_vfio_dma_map() displays errors on stderr.
When using management interface, this information is simply
lost. Pass qemu_vfio_dma_map() an Error* argument so it can
propagate the error to callers.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/qemu/vfio-helpers.h |  2 +-
 block/nvme.c| 14 +++---
 util/vfio-helpers.c | 12 +++-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 4491c8e1a6e..bde9495b254 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -18,7 +18,7 @@ typedef struct QEMUVFIOState QEMUVFIOState;
 QEMUVFIOState *qemu_vfio_open_pci(const char *device, Error **errp);
 void qemu_vfio_close(QEMUVFIOState *s);
 int qemu_vfio_dma_map(QEMUVFIOState *s, void *host, size_t size,
-  bool temporary, uint64_t *iova_list);
+  bool temporary, uint64_t *iova_list, Error **errp);
 int qemu_vfio_dma_reset_temporary(QEMUVFIOState *s);
 void qemu_vfio_dma_unmap(QEMUVFIOState *s, void *host);
 void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
diff --git a/block/nvme.c b/block/nvme.c
index b48f6f25881..5f662c55bbe 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -167,9 +167,9 @@ static void nvme_init_queue(BDRVNVMeState *s, NVMeQueue *q,
 return;
 }
 memset(q->queue, 0, bytes);
-r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova);
+r = qemu_vfio_dma_map(s->vfio, q->queue, bytes, false, &q->iova, errp);
 if (r) {
-error_setg(errp, "Cannot map queue");
+error_prepend(errp, "Cannot map queue: ");
 }
 }
 
@@ -223,7 +223,7 @@ static NVMeQueuePair *nvme_create_queue_pair(BDRVNVMeState 
*s,
 q->completion_bh = aio_bh_new(aio_context, nvme_process_completion_bh, q);
 r = qemu_vfio_dma_map(s->vfio, q->prp_list_pages,
   s->page_size * NVME_NUM_REQS,
-  false, &prp_list_iova);
+  false, &prp_list_iova, errp);
 if (r) {
 goto fail;
 }
@@ -514,9 +514,9 @@ static void nvme_identify(BlockDriverState *bs, int 
namespace, Error **errp)
 error_setg(errp, "Cannot allocate buffer for identify response");
 goto out;
 }
-r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova);
+r = qemu_vfio_dma_map(s->vfio, id, sizeof(*id), true, &iova, errp);
 if (r) {
-error_setg(errp, "Cannot map buffer for DMA");
+error_prepend(errp, "Cannot map buffer for DMA: ");
 goto out;
 }
 
@@ -989,7 +989,7 @@ try_map:
 r = qemu_vfio_dma_map(s->vfio,
   qiov->iov[i].iov_base,
   qiov->iov[i].iov_len,
-  true, &iova);
+  true, &iova, NULL);
 if (r == -ENOMEM && retry) {
 retry = false;
 trace_nvme_dma_flush_queue_wait(s);
@@ -1436,7 +1436,7 @@ static void nvme_register_buf(BlockDriverState *bs, void 
*host, size_t size)
 int ret;
 BDRVNVMeState *s = bs->opaque;
 
-ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL);
+ret = qemu_vfio_dma_map(s->vfio, host, size, false, NULL, NULL);
 if (ret) {
 /* FIXME: we may run out of IOVA addresses after repeated
  * bdrv_register_buf/bdrv_unregister_buf, because nvme_vfio_dma_unmap
diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 73f7bfa7540..c03fe0b7156 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -462,7 +462,7 @@ static void qemu_vfio_ram_block_added(RAMBlockNotifier *n,
 {
 QEMUVFIOState *s = container_of(n, QEMUVFIOState, ram_notifier);
 trace_qemu_vfio_ram_block_added(s, host, size);
-qemu_vfio_dma_map(s, host, size, false, NULL);
+qemu_vfio_dma_map(s, host, size, false, NULL, NULL);
 }
 
 static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
@@ -477,6 +477,7 @@ static void qemu_vfio_ram_block_removed(RAMBlockNotifier *n,
 
 static int qemu_vfio_init_ramblock(RAMBlock *rb, void *opaque)
 {
+Error *local_err = NULL;
 void *host_addr = qemu_ram_get_host_addr(rb);
 ram_addr_t length = qemu_ram_get_used_length(rb);
 int ret;
@@ -485,10 +486,11 @@ static int qemu_vfio_init_ramblock(RAMBlock *rb, void 
*opaque)
 if (!host_addr) {
 return 0;
 }
-ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL);
+ret = qemu_vfio_dma_map(s, host_addr, length, false, NULL, &local_err);
 if (ret) {
-fprintf(stderr, "qemu_vfio_init_ramblock: failed %p %" PRId64 "\n",
-host_addr, (uint64_t)length);
+error_reportf_err(local_err,
+  "qemu_vfio_init_ramblock: failed %p %" PRId64 ":",
+  host_addr, (uint64_t)length);
 }
 return 0;
 }
@@ -724,7 +726,7 @@ qemu_vfio_find_temp_iova(QEMUVFIOState *s, size_t size, 
uint64_t *iova)
  * mapping status

[PATCH 4/9] util/vfio-helpers: Trace where BARs are mapped

2020-10-14 Thread Philippe Mathieu-Daudé

For debugging purpose, trace where a BAR is mapped.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 2 ++
 util/trace-events   | 1 +
 2 files changed, 3 insertions(+)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index cd6287c3a98..278c54902e7 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -166,6 +166,8 @@ void *qemu_vfio_pci_map_bar(QEMUVFIOState *s, int index,
 p = mmap(NULL, MIN(size, s->bar_region_info[index].size - offset),
  prot, MAP_SHARED,
  s->device, s->bar_region_info[index].offset + offset);
+trace_qemu_vfio_pci_map_bar(index, s->bar_region_info[index].offset ,
+size, offset, p);
 if (p == MAP_FAILED) {
 error_setg_errno(errp, errno, "Failed to map BAR region");
 p = NULL;
diff --git a/util/trace-events b/util/trace-events
index 4d40c74a21f..50652761a58 100644
--- a/util/trace-events
+++ b/util/trace-events
@@ -88,3 +88,4 @@ qemu_vfio_dma_unmap(void *s, void *host) "s %p host %p"
 qemu_vfio_pci_read_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "read cfg ptr %p ofs 0x%x size %d (region ofs 0x%"PRIx64" 
size %"PRId64")"
 qemu_vfio_pci_write_config(void *buf, int ofs, int size, uint64_t region_ofs, 
uint64_t region_size) "write cfg ptr %p ofs 0x%x size %d (region ofs 
0x%"PRIx64" size %"PRId64")"
 qemu_vfio_region_info(const char *desc, uint64_t offset, uint64_t size, 
uint32_t cap_offset) "region '%s' ofs 0x%"PRIx64" size %"PRId64" cap_ofs 
%"PRId32
+qemu_vfio_pci_map_bar(int index, uint64_t region_ofs, uint64_t region_size, 
int ofs, void *host) "map region bar#%d ofs 0x%"PRIx64" size %"PRId64" ofs %d 
host %p"
-- 
2.26.2

[PATCH 9/9] util/vfio-helpers: Let qemu_vfio_verify_mappings() use error_report()

2020-10-14 Thread Philippe Mathieu-Daudé

Instead of displaying the error on stderr, use error_report()
which also report to the monitor.

Signed-off-by: Philippe Mathieu-Daudé 
---
 util/vfio-helpers.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c
index 2c4598d7faa..488ddfca2a9 100644
--- a/util/vfio-helpers.c
+++ b/util/vfio-helpers.c
@@ -661,13 +661,13 @@ static bool qemu_vfio_verify_mappings(QEMUVFIOState *s)
 if (QEMU_VFIO_DEBUG) {
 for (i = 0; i < s->nr_mappings - 1; ++i) {
 if (!(s->mappings[i].host < s->mappings[i + 1].host)) {
-fprintf(stderr, "item %d not sorted!\n", i);
+error_report("item %d not sorted!", i);
 qemu_vfio_dump_mappings(s);
 return false;
 }
 if (!(s->mappings[i].host + s->mappings[i].size <=
   s->mappings[i + 1].host)) {
-fprintf(stderr, "item %d overlap with next!\n", i);
+error_report("item %d overlap with next!", i);
 qemu_vfio_dump_mappings(s);
 return false;
 }
-- 
2.26.2

Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-14 Thread Max Reitz

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn 
> cor_co_preadv_part(BlockDriverState *bs,
> size_t qiov_offset,
> int flags)
>  {
> -return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -   flags | BDRV_REQ_COPY_ON_READ);
> +int64_t n = 0;
> +int64_t size = offset + bytes;

Just when I hit send I noticed that “end” would be a more fitting name
for this variable.

> +int local_flags;
> +int ret;
> +BDRVStateCOR *state = bs->opaque;
> +
> +if (!state->base_overlay) {
> +return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, 
> qiov_offset,
> +   flags | BDRV_REQ_COPY_ON_READ);
> +}
> +
> +while (offset < size) {

(because I got a bit confused looking at this)

(Though dropping @size (or @end) and just checking when @bytes becomes 0
should work, too.)

> +local_flags = flags;
> +
> +/* In case of failure, try to copy-on-read anyway */
> +ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +if (!ret) {
> +ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),
> +  state->base_overlay, true, offset,
> +  n, &n);
> +if (ret) {
> +local_flags |= BDRV_REQ_COPY_ON_READ;
> +}
> +}

Furthermore, I just noticed – can the is_allocated functions not return
0 in @n, when @offset is a the EOF?  Is that something to look out for?
 (I’m not sure.)

Max

> +
> +ret = bdrv_co_preadv_part(bs->file, offset, n, qiov, qiov_offset,
> +  local_flags);
> +if (ret < 0) {
> +return ret;
> +}
> +
> +offset += n;
> +qiov_offset += n;
> +bytes -= n;
> +}
> +
> +return 0;
>  }
>  
>  
> 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH v11 05/13] copy-on-read: limit COR operations to base in COR driver

2020-10-14 Thread Max Reitz

On 12.10.20 19:43, Andrey Shinkevich wrote:
> Limit COR operations by the base node in the backing chain when the
> overlay base node name is given. It will be useful for a block stream
> job when the COR-filter is applied. The overlay base node is passed as
> the base itself may change due to concurrent commit jobs on the same
> backing chain.
> 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/copy-on-read.c | 39 +--
>  1 file changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
> index c578b1b..dfbd6ad 100644
> --- a/block/copy-on-read.c
> +++ b/block/copy-on-read.c
> @@ -122,8 +122,43 @@ static int coroutine_fn 
> cor_co_preadv_part(BlockDriverState *bs,
> size_t qiov_offset,
> int flags)
>  {
> -return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, qiov_offset,
> -   flags | BDRV_REQ_COPY_ON_READ);
> +int64_t n = 0;
> +int64_t size = offset + bytes;
> +int local_flags;
> +int ret;
> +BDRVStateCOR *state = bs->opaque;
> +
> +if (!state->base_overlay) {
> +return bdrv_co_preadv_part(bs->file, offset, bytes, qiov, 
> qiov_offset,
> +   flags | BDRV_REQ_COPY_ON_READ);
> +}
> +
> +while (offset < size) {
> +local_flags = flags;
> +
> +/* In case of failure, try to copy-on-read anyway */
> +ret = bdrv_is_allocated(bs->file->bs, offset, bytes, &n);
> +if (!ret) {

In case of failure, a negative value is going to be returned, we won’t
go into this conditional block, and local_flags isn’t going to contain
BDRV_REQ_COPY_ON_READ.

So the idea of CORing in case of failure sounds sound to me, but it
doesn’t look like that’s done.

> +ret = bdrv_is_allocated_above(bdrv_cow_bs(bs->file->bs),

I think this should either be bdrv_backing_chain_next() or we must rule
out the possibility of bs->file->bs being a filter somewhere.  I think
I’d prefer the former.

> +  state->base_overlay, true, offset,
> +  n, &n);
> +if (ret) {

“ret == 1 || ret < 0” would be more explicit (and in line with the “!ret
|| ret < 0” probably needed above), but correct either way.

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v11 04/13] copy-on-read: pass overlay base node name to COR driver

2020-10-14 Thread Max Reitz

On 14.10.20 13:09, Max Reitz wrote:
> On 12.10.20 19:43, Andrey Shinkevich wrote:
>> We are going to use the COR-filter for a block-stream job.
>> To limit COR operations by the base node in the backing chain during
>> stream job, pass the name of overlay base node to the copy-on-read
>> driver as base node itself may change due to possible concurrent jobs.
>> The rest of the functionality will be implemented in the patch that
>> follows.
>>
>> Signed-off-by: Andrey Shinkevich 
>> ---
>>  block/copy-on-read.c | 14 ++
>>  1 file changed, 14 insertions(+)
> 
> Is there a reason why you didn’t add this option to QAPI (as part of a
> yet-to-be-created BlockdevOptionsCor)?  Because I’d really like it there.
> 
>> diff --git a/block/copy-on-read.c b/block/copy-on-read.c
>> index bcccf0f..c578b1b 100644
>> --- a/block/copy-on-read.c
>> +++ b/block/copy-on-read.c
>> @@ -24,19 +24,24 @@
>>  #include "block/block_int.h"
>>  #include "qemu/module.h"
>>  #include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>>  #include "qapi/qmp/qdict.h"
>>  #include "block/copy-on-read.h"
>>  
>>  
>>  typedef struct BDRVStateCOR {
>>  bool active;
>> +BlockDriverState *base_overlay;
>>  } BDRVStateCOR;
>>  
>>  
>>  static int cor_open(BlockDriverState *bs, QDict *options, int flags,
>>  Error **errp)
>>  {
>> +BlockDriverState *base_overlay = NULL;
>>  BDRVStateCOR *state = bs->opaque;
>> +/* We need the base overlay node rather than the base itself */
>> +const char *base_overlay_node = qdict_get_try_str(options, "base");
> 
> Shouldn’t it be called base-overlay or above-base then?

While reviewing patch 5, I realized this sould indeed be base-overlay
(i.e. the next allocation-bearing layer above the base, not just a
filter on top of base), but that should be noted somewhere, of course.
Best would be the QAPI documentation for this option. O:)

Max



signature.asc
Description: OpenPGP digital signature

Re: [PATCH] hw/usb/hcd-dwc2: fix divide-by-zero in dwc2_handle_packet()

2020-10-14 Thread Gerd Hoffmann

  Hi,

> I sent you a patch to fix up several assert()s, including that one, about a
> month ago. Did you miss it?
> https://lore.kernel.org/qemu-devel/20200920021449.830-1-pauld...@gmail.com

Seems I missed that, or deleted by accident.
Added to qemu queue now.

thanks,
  Gerd

Re: [PATCH v6 05/11] hw/block/nvme: Support Zoned Namespace Command Set

2020-10-14 Thread Niklas Cassel

On Wed, Oct 14, 2020 at 06:42:06AM +0900, Dmitry Fomichev wrote:
> The emulation code has been changed to advertise NVM Command Set when
> "zoned" device property is not set (default) and Zoned Namespace
> Command Set otherwise.
> 
> Define values and structures that are needed to support Zoned
> Namespace Command Set (NVMe TP 4053) in PCI NVMe controller emulator.
> Define trace events where needed in newly introduced code.
> 
> In order to improve scalability, all open, closed and full zones
> are organized in separate linked lists. Consequently, almost all
> zone operations don't require scanning of the entire zone array
> (which potentially can be quite large) - it is only necessary to
> enumerate one or more zone lists.
> 
> Handlers for three new NVMe commands introduced in Zoned Namespace
> Command Set specification are added, namely for Zone Management
> Receive, Zone Management Send and Zone Append.
> 
> Device initialization code has been extended to create a proper
> configuration for zoned operation using device properties.
> 
> Read/Write command handler is modified to only allow writes at the
> write pointer if the namespace is zoned. For Zone Append command,
> writes implicitly happen at the write pointer and the starting write
> pointer value is returned as the result of the command. Write Zeroes
> handler is modified to add zoned checks that are identical to those
> done as a part of Write flow.
> 
> Subsequent commits in this series add ZDE support and checks for
> active and open zone limits.
> 
> Signed-off-by: Niklas Cassel 
> Signed-off-by: Hans Holmberg 
> Signed-off-by: Ajay Joshi 
> Signed-off-by: Chaitanya Kulkarni 
> Signed-off-by: Matias Bjorling 
> Signed-off-by: Aravind Ramesh 
> Signed-off-by: Shin'ichiro Kawasaki 
> Signed-off-by: Adam Manzanares 
> Signed-off-by: Dmitry Fomichev 

(snip)

> @@ -2260,6 +3155,11 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
>  ns->iocs = nvme_cse_iocs_nvm;
>  }
>  break;
> +case NVME_CSI_ZONED:
> +if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
> +ns->iocs = nvme_cse_iocs_zoned;
> +}
> +break;
>  }
>  }
>  }

Who knows how this whole command set mess is supposed to work,
since e.g. the Key Value Command Set assigns opcodes for new commands
(Delete, Exist) with a opcode values (0x10,0x14) smaller than the
current highest opcode value (0x15) in the NVM Command Set,
while those opcodes (0x10,0x14) are reserved in the NVM Command Set.

At least for Zoned Command Set, they defined the new commands
(Zone Mgmt Send, Zone Mgmt Recv) to opcode values (0x79,0x7a)
that are higher than the current highest opcode value in the
NVM Command Set.

So since we know that the Zoned Command Set is a strict superset of
the NVM Command Set, I guess it might be nice to do something like:

case NVME_CSI_ZONED:
if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
ns->iocs = nvme_cse_iocs_zoned;
} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
ns->iocs = nvme_cse_iocs_nvm;
}
break;


Since I assume that the spec people intended reads/writes
to a ZNS namespace to still be possible when CC_CSS == NVM,
but who knows?


Kind regards,
Niklas

Re: [PATCH] usb/hcd-ehci: Fix error handling on missing device for iTD

2020-10-14 Thread Gerd Hoffmann

On Wed, Oct 14, 2020 at 11:41:06AM +0100, Anthony PERARD wrote:
> The EHCI Host Controller emulation attempt to locate the device
> associated with a periodic isochronous transfer description (iTD) and
> when this fail the host controller is reset.
> 
> But according the EHCI spec 1.0 section 5.15.2.4 Host System
> Error, the host controller is supposed to reset itself only when it
> failed to communicate with the Host (Operating System), like when
> there's an error on the PCI bus. If a transaction fails, there's
> nothing in the spec that say to reset the host controller.
> 
> This patch rework the error path so that the host controller can keep
> working when the OS setup a bogus transaction, it also revert to the
> behavior of the EHCI emulation to before commits:
> e94682f1fe ("ehci: check device is not NULL before calling usb_ep_get()")
> 7011baece2 ("usb: remove unnecessary NULL device check from usb_ep_get()")
> 
> The issue has been found while trying to passthrough a USB device to a
> Windows Server 2012 Xen guest via "usb-ehci", which prevent the USB
> device from working in Windows. ("usb-ehci" alone works, windows only
> setup this weird periodic iTD to device 127 endpoint 15 when the USB
> device is passthrough.)
> 
> Signed-off-by: Anthony PERARD 

Added to usb queue.

thanks,
  Gerd

[PATCH 1/2] i386/cpu: Add the Intel PT capabilities checking before extend the CPUID level

2020-10-14 Thread Luwei Kang

The current implementation will extend the CPUID level to 0x14 if
Intel PT is enabled in the guest(in x86_cpu_expand_features()) and
the Intel PT will be disabled if it can't pass the capabilities
checking later(in x86_cpu_filter_features()). In this case, the
level of CPUID will be still 0x14 and the CPUID values from leaf
0xe to 0x14 are all zero.

This patch moves the capabilities checking before setting the
level of the CPUID.

Signed-off-by: Luwei Kang 
---
 target/i386/cpu.c | 63 ---
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9eafbe3690..24644abfd4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6401,12 +6401,40 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
 
 /* Intel Processor Trace requires CPUID[0x14] */
 if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT)) {
-if (cpu->intel_pt_auto_level) {
-x86_cpu_adjust_level(cpu, &cpu->env.cpuid_min_level, 0x14);
-} else if (cpu->env.cpuid_min_level < 0x14) {
+uint32_t eax_0, ebx_0, ecx_0, eax_1, ebx_1;
+
+eax_0 = kvm_arch_get_supported_cpuid(kvm_state, 0x14, 0, R_EAX);
+ebx_0 = kvm_arch_get_supported_cpuid(kvm_state, 0x14, 0, R_EBX);
+ecx_0 = kvm_arch_get_supported_cpuid(kvm_state, 0x14, 0, R_ECX);
+eax_1 = kvm_arch_get_supported_cpuid(kvm_state, 0x14, 1, R_EAX);
+ebx_1 = kvm_arch_get_supported_cpuid(kvm_state, 0x14, 1, R_EBX);
+
+if (eax_0 &&
+   ((ebx_0 & INTEL_PT_MINIMAL_EBX) == INTEL_PT_MINIMAL_EBX) &&
+   ((ecx_0 & INTEL_PT_MINIMAL_ECX) == INTEL_PT_MINIMAL_ECX) &&
+   ((eax_1 & INTEL_PT_MTC_BITMAP) == INTEL_PT_MTC_BITMAP) &&
+   ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) >=
+   INTEL_PT_ADDR_RANGES_NUM) &&
+   ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ==
+(INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) &&
+   !(ecx_0 & INTEL_PT_IP_LIP)) {
+if (cpu->intel_pt_auto_level) {
+x86_cpu_adjust_level(cpu, &cpu->env.cpuid_min_level, 0x14);
+} else if (cpu->env.cpuid_min_level < 0x14) {
+mark_unavailable_features(cpu, FEAT_7_0_EBX,
+CPUID_7_0_EBX_INTEL_PT,
+"Intel PT need CPUID leaf 0x14, please set by \"-cpu 
...,+intel-pt,min-level=0x14\"");
+}
+} else {
+   /*
+* Processor Trace capabilities aren't configurable, so if the
+* host can't emulate the capabilities we report on
+* cpu_x86_cpuid(), intel-pt can't be enabled on the current
+* host.
+*/
 mark_unavailable_features(cpu, FEAT_7_0_EBX,
 CPUID_7_0_EBX_INTEL_PT,
-"Intel PT need CPUID leaf 0x14, please set by \"-cpu 
...,+intel-pt,min-level=0x14\"");
+"host Intel PT features doesn't satisfy the guest 
request.");
 }
 }
 
@@ -6466,33 +6494,6 @@ static void x86_cpu_filter_features(X86CPU *cpu, bool 
verbose)
 uint64_t unavailable_features = requested_features & ~host_feat;
 mark_unavailable_features(cpu, w, unavailable_features, prefix);
 }
-
-if ((env->features[FEAT_7_0_EBX] & CPUID_7_0_EBX_INTEL_PT) &&
-kvm_enabled()) {
-KVMState *s = CPU(cpu)->kvm_state;
-uint32_t eax_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EAX);
-uint32_t ebx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_EBX);
-uint32_t ecx_0 = kvm_arch_get_supported_cpuid(s, 0x14, 0, R_ECX);
-uint32_t eax_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EAX);
-uint32_t ebx_1 = kvm_arch_get_supported_cpuid(s, 0x14, 1, R_EBX);
-
-if (!eax_0 ||
-   ((ebx_0 & INTEL_PT_MINIMAL_EBX) != INTEL_PT_MINIMAL_EBX) ||
-   ((ecx_0 & INTEL_PT_MINIMAL_ECX) != INTEL_PT_MINIMAL_ECX) ||
-   ((eax_1 & INTEL_PT_MTC_BITMAP) != INTEL_PT_MTC_BITMAP) ||
-   ((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) <
-   INTEL_PT_ADDR_RANGES_NUM) ||
-   ((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) !=
-(INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ||
-   (ecx_0 & INTEL_PT_IP_LIP)) {
-/*
- * Processor Trace capabilities aren't configurable, so if the
- * host can't emulate the capabilities we report on
- * cpu_x86_cpuid(), intel-pt can't be enabled on the current host.
- */
-mark_unavailable_features(cpu, FEAT_7_0_EBX, 
CPUID_7_0_EBX_INTEL_PT, prefix);
-}
-}
 }
 
 static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
-- 
2.18.4

[PATCH 2/2] i386/cpu: Make the Intel PT LIP feature configurable

2020-10-14 Thread Luwei Kang

The current implementation will disable the guest Intel PT
feature if the Intel PT LIP feature is supported on the host,
but the LIP feature is comming soon(e.g. SnowRidge and later).

This patch will make the guest LIP feature configurable and
Intel PT feature can be enabled in guest when the guest LIP
status same with the host.

Signed-off-by: Luwei Kang 
---
 target/i386/cpu.c | 29 +++--
 target/i386/cpu.h |  4 
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 24644abfd4..aeabdd5bd4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -672,6 +672,7 @@ static void x86_cpu_vendor_words2str(char *dst, uint32_t 
vendor1,
 #define TCG_XSAVE_FEATURES (CPUID_XSAVE_XSAVEOPT | CPUID_XSAVE_XGETBV1)
   /* missing:
   CPUID_XSAVE_XSAVEC, CPUID_XSAVE_XSAVES */
+#define TCG_14_0_ECX_FEATURES 0
 
 typedef enum FeatureWordType {
CPUID_FEATURE_WORD,
@@ -1301,6 +1302,26 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] 
= {
 }
 },
 
+[FEAT_14_0_ECX] = {
+.type = CPUID_FEATURE_WORD,
+.feat_names = {
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, NULL,
+NULL, NULL, NULL, "intel-pt-lip",
+},
+.cpuid = {
+.eax = 0x14,
+.needs_ecx = true, .ecx = 0,
+.reg = R_ECX,
+},
+.tcg_features = TCG_14_0_ECX_FEATURES,
+},
+
 };
 
 typedef struct FeatureMask {
@@ -5743,6 +5764,9 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 *eax = INTEL_PT_MAX_SUBLEAF;
 *ebx = INTEL_PT_MINIMAL_EBX;
 *ecx = INTEL_PT_MINIMAL_ECX;
+if (env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP) {
+*ecx |= CPUID_14_0_ECX_LIP;
+}
 } else if (count == 1) {
 *eax = INTEL_PT_MTC_BITMAP | INTEL_PT_ADDR_RANGES_NUM;
 *ebx = INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP;
@@ -6416,8 +6440,9 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
((eax_1 & INTEL_PT_ADDR_RANGES_NUM_MASK) >=
INTEL_PT_ADDR_RANGES_NUM) &&
((ebx_1 & (INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) ==
-(INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) &&
-   !(ecx_0 & INTEL_PT_IP_LIP)) {
+(INTEL_PT_PSB_BITMAP | INTEL_PT_CYCLE_BITMAP)) &&
+   ((ecx_0 & CPUID_14_0_ECX_LIP) ==
+(env->features[FEAT_14_0_ECX] & CPUID_14_0_ECX_LIP))) {
 if (cpu->intel_pt_auto_level) {
 x86_cpu_adjust_level(cpu, &cpu->env.cpuid_min_level, 0x14);
 } else if (cpu->env.cpuid_min_level < 0x14) {
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 51c1d5f60a..1fcd93e39a 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -541,6 +541,7 @@ typedef enum FeatureWord {
 FEAT_VMX_EPT_VPID_CAPS,
 FEAT_VMX_BASIC,
 FEAT_VMX_VMFUNC,
+FEAT_14_0_ECX,
 FEATURE_WORDS,
 } FeatureWord;
 
@@ -797,6 +798,9 @@ typedef uint64_t FeatureWordArray[FEATURE_WORDS];
 /* AVX512 BFloat16 Instruction */
 #define CPUID_7_1_EAX_AVX512_BF16   (1U << 5)
 
+/* Packets which contain IP payload have LIP values */
+#define CPUID_14_0_ECX_LIP  (1U << 31)
+
 /* CLZERO instruction */
 #define CPUID_8000_0008_EBX_CLZERO  (1U << 0)
 /* Always save/restore FP error pointers */
-- 
2.18.4

Re: [PATCH v4 3/3] replay: do not build if TCG is not available

2020-10-14 Thread Pavel Dovgalyuk


On 14.10.2020 09:51, Claudio Fontana wrote:

привет Pavel!

On 10/14/20 7:42 AM, Pavel Dovgalyuk wrote:

On 13.10.2020 22:21, Claudio Fontana wrote:

this fixes non-TCG builds broken recently by replay reverse debugging.

stub the needed functions in stub/, including errors for hmp and qmp.
This includes duplicating some code in replay/, and puts the logic
for non-replay related events in the replay/ module (+ the stubs),
so this should be revisited in the future.

Surprisingly, only _one_ qtest was affected by this, ide-test.c, which
resulted in a buzz as the bh events were never delivered, and the bh
never executed.

Many other subsystems _should_ have been affected.

This fixes the immediate issue, however a better way to group replay
functionality to TCG-only code could be developed in the long term.

Signed-off-by: Claudio Fontana 
---
   block/meson.build  |  3 +-
   migration/savevm.c | 11 +++--
   net/meson.build|  3 +-
   replay/meson.build |  2 +-
   replay/replay-input.c  |  4 +-
   stubs/meson.build  |  1 -
   stubs/replay-user.c|  9 
   stubs/replay.c | 98 ++
   tests/ptimer-test-stubs.c  |  5 --
   tests/qtest/qmp-cmd-test.c |  3 ++
   ui/input.c | 12 -
   11 files changed, 125 insertions(+), 26 deletions(-)
   delete mode 100644 stubs/replay-user.c

diff --git a/block/meson.build b/block/meson.build
index 78e8b25232..01fe6f84d2 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -7,7 +7,6 @@ block_ss.add(files(
 'backup-top.c',
 'blkdebug.c',
 'blklogwrites.c',
-  'blkreplay.c',
 'blkverify.c',
 'block-backend.c',
 'block-copy.c',
@@ -42,6 +41,8 @@ block_ss.add(files(
 'write-threshold.c',
   ), zstd, zlib)
   
+block_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))

+
   block_ss.add(when: 'CONFIG_QCOW1', if_true: files('qcow.c'))
   block_ss.add(when: 'CONFIG_VDI', if_true: files('vdi.c'))
   block_ss.add(when: 'CONFIG_CLOOP', if_true: files('cloop.c'))
diff --git a/migration/savevm.c b/migration/savevm.c
index d2e141f7b1..d9181ca520 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -63,6 +63,7 @@
   #include "migration/colo.h"
   #include "qemu/bitmap.h"
   #include "net/announce.h"
+#include "sysemu/tcg.h"
   
   const unsigned int postcopy_ram_discard_version = 0;
   
@@ -2674,10 +2675,12 @@ int save_snapshot(const char *name, Error **errp)

   return ret;
   }
   
-if (!replay_can_snapshot()) {

-error_setg(errp, "Record/replay does not allow making snapshot "
-   "right now. Try once more later.");
-return ret;
+if (tcg_enabled()) {
+if (!replay_can_snapshot()) {
+error_setg(errp, "Record/replay does not allow making snapshot "
+   "right now. Try once more later.");
+return ret;
+}
   }
   
   if (!bdrv_all_can_snapshot(&bs)) {

diff --git a/net/meson.build b/net/meson.build
index 1c7e3a3cb9..1076b0a7ab 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -7,7 +7,6 @@ softmmu_ss.add(files(
 'eth.c',
 'filter-buffer.c',
 'filter-mirror.c',
-  'filter-replay.c',
 'filter-rewriter.c',
 'filter.c',
 'hub.c',
@@ -17,6 +16,8 @@ softmmu_ss.add(files(
 'util.c',
   ))
   
+softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('filter-replay.c'))

+
   softmmu_ss.add(when: 'CONFIG_L2TPV3', if_true: files('l2tpv3.c'))
   softmmu_ss.add(when: slirp, if_true: files('slirp.c'))
   softmmu_ss.add(when: ['CONFIG_VDE', vde], if_true: files('vde.c'))
diff --git a/replay/meson.build b/replay/meson.build
index f91163fb1e..cb3207740a 100644
--- a/replay/meson.build
+++ b/replay/meson.build
@@ -1,4 +1,4 @@
-softmmu_ss.add(files(
+softmmu_ss.add(when: 'CONFIG_TCG', if_true: files(
 'replay.c',
 'replay-internal.c',
 'replay-events.c',
diff --git a/replay/replay-input.c b/replay/replay-input.c
index 1147e3d34e..5d1fd92e79 100644
--- a/replay/replay-input.c
+++ b/replay/replay-input.c
@@ -124,7 +124,7 @@ void replay_input_event(QemuConsole *src, InputEvent *evt)
   } else if (replay_mode == REPLAY_MODE_RECORD) {
   replay_add_input_event(QAPI_CLONE(InputEvent, evt));
   } else {
-qemu_input_event_send_impl(src, evt);
+g_assert_not_reached();
   }
   }
   
@@ -135,6 +135,6 @@ void replay_input_sync_event(void)

   } else if (replay_mode == REPLAY_MODE_RECORD) {
   replay_add_input_sync_event();
   } else {
-qemu_input_event_sync_impl();
+g_assert_not_reached();
   }
   }



incidentally Pavel, is this ok when replay is enabled, can you give it a spin?
I used the if (replay_events_enabled()) in the ui/input.c code to only call 
these when events are enabled.


I think this is ok to use this function for checking that replay is on.






diff --git a/stubs/meson.build b/stubs/meson.build
index 67f2a8c069..b

[PATCH v5 2/7] ui/spice-app: don't use qemu_chr_open_spice_port directly

2020-10-14 Thread Gerd Hoffmann

Save the parent object's open function pointer in the (new)
VCChardevClass struct instead before overwriting it, so we
can look it up when needed.

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Philippe Mathieu-Daudé 
---
 ui/spice-app.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/ui/spice-app.c b/ui/spice-app.c
index 93e105c6ee82..7e0550c79fdd 100644
--- a/ui/spice-app.c
+++ b/ui/spice-app.c
@@ -44,11 +44,15 @@ static char *sock_path;
 struct VCChardev {
 SpiceChardev parent;
 };
-typedef struct VCChardev VCChardev;
+
+struct VCChardevClass {
+ChardevClass parent;
+void (*parent_open)(Chardev *chr, ChardevBackend *backend,
+bool *be_opened, Error **errp);
+};
 
 #define TYPE_CHARDEV_VC "chardev-vc"
-DECLARE_INSTANCE_CHECKER(VCChardev, VC_CHARDEV,
- TYPE_CHARDEV_VC)
+OBJECT_DECLARE_TYPE(VCChardev, VCChardevClass, CHARDEV_VC)
 
 static ChardevBackend *
 chr_spice_backend_new(void)
@@ -66,6 +70,7 @@ static void vc_chr_open(Chardev *chr,
 bool *be_opened,
 Error **errp)
 {
+VCChardevClass *vc = CHARDEV_VC_GET_CLASS(chr);
 ChardevBackend *be;
 const char *fqdn = NULL;
 
@@ -80,7 +85,7 @@ static void vc_chr_open(Chardev *chr,
 be = chr_spice_backend_new();
 be->u.spiceport.data->fqdn = fqdn ?
 g_strdup(fqdn) : g_strdup_printf("org.qemu.console.%s", chr->label);
-qemu_chr_open_spice_port(chr, be, be_opened, errp);
+vc->parent_open(chr, be, be_opened, errp);
 qapi_free_ChardevBackend(be);
 }
 
@@ -91,8 +96,11 @@ static void vc_chr_set_echo(Chardev *chr, bool echo)
 
 static void char_vc_class_init(ObjectClass *oc, void *data)
 {
+VCChardevClass *vc = CHARDEV_VC_CLASS(oc);
 ChardevClass *cc = CHARDEV_CLASS(oc);
 
+vc->parent_open = cc->open;
+
 cc->parse = qemu_chr_parse_vc;
 cc->open = vc_chr_open;
 cc->chr_set_echo = vc_chr_set_echo;
@@ -103,6 +111,7 @@ static const TypeInfo char_vc_type_info = {
 .parent = TYPE_CHARDEV_SPICEPORT,
 .instance_size = sizeof(VCChardev),
 .class_init = char_vc_class_init,
+.class_size = sizeof(VCChardevClass),
 };
 
 static void spice_app_atexit(void)
-- 
2.27.0

[PATCH v5 3/7] chardev/spice: make qemu_chr_open_spice_port static

2020-10-14 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/chardev/spice.h | 3 ---
 chardev/spice.c | 8 
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/chardev/spice.h b/include/chardev/spice.h
index 99f26aedde54..1115502cdfbd 100644
--- a/include/chardev/spice.h
+++ b/include/chardev/spice.h
@@ -24,7 +24,4 @@ typedef struct SpiceChardev SpiceChardev;
 DECLARE_INSTANCE_CHECKER(SpiceChardev, SPICE_CHARDEV,
  TYPE_CHARDEV_SPICE)
 
-void qemu_chr_open_spice_port(Chardev *chr, ChardevBackend *backend,
-  bool *be_opened, Error **errp);
-
 #endif
diff --git a/chardev/spice.c b/chardev/spice.c
index bf7ea1e2940d..051c23a84f4c 100644
--- a/chardev/spice.c
+++ b/chardev/spice.c
@@ -296,10 +296,10 @@ static void qemu_chr_open_spice_vmc(Chardev *chr,
 chr_open(chr, type);
 }
 
-void qemu_chr_open_spice_port(Chardev *chr,
-  ChardevBackend *backend,
-  bool *be_opened,
-  Error **errp)
+static void qemu_chr_open_spice_port(Chardev *chr,
+ ChardevBackend *backend,
+ bool *be_opened,
+ Error **errp)
 {
 ChardevSpicePort *spiceport = backend->u.spiceport.data;
 const char *name = spiceport->fqdn;
-- 
2.27.0

[PATCH v5 1/7] qemu-edid: drop cast

2020-10-14 Thread Gerd Hoffmann

Not needed and makes some compilers error out with:

qemu-edid.c:15:1: error: initializer element is not constant

Signed-off-by: Gerd Hoffmann 
Reviewed-by: Daniel P. Berrangé 
Message-id: 20201013091615.14166-1-kra...@redhat.com
---
 qemu-edid.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-edid.c b/qemu-edid.c
index 1db3372b982c..1cd6a9517238 100644
--- a/qemu-edid.c
+++ b/qemu-edid.c
@@ -9,7 +9,7 @@
 #include "qemu/cutils.h"
 #include "hw/display/edid.h"
 
-static qemu_edid_info info = (qemu_edid_info) {
+static qemu_edid_info info = {
 .prefx = 1024,
 .prefy = 768,
 };
-- 
2.27.0

[PATCH v5 7/7] chardev/spice: build spice chardevs as module

2020-10-14 Thread Gerd Hoffmann

Signed-off-by: Gerd Hoffmann 
---
 util/module.c   | 2 ++
 chardev/meson.build | 7 ++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/util/module.c b/util/module.c
index a44ec38d9362..4a5735dfdc76 100644
--- a/util/module.c
+++ b/util/module.c
@@ -264,6 +264,8 @@ static struct {
 { "virtio-gpu-device", "hw-", "display-virtio-gpu"},
 { "vhost-user-gpu","hw-", "display-virtio-gpu"},
 { "chardev-braille",   "chardev-", "baum" },
+{ "chardev-spicevmc",  "chardev-", "spice"},
+{ "chardev-spiceport", "chardev-", "spice"},
 };
 
 static bool module_loaded_qom_all;
diff --git a/chardev/meson.build b/chardev/meson.build
index dd2699a11b08..859d8b04d480 100644
--- a/chardev/meson.build
+++ b/chardev/meson.build
@@ -26,7 +26,6 @@ chardev_ss.add(when: 'CONFIG_WIN32', if_true: files(
 chardev_ss = chardev_ss.apply(config_host, strict: false)
 
 softmmu_ss.add(files('chardev-sysemu.c', 'msmouse.c', 'wctablet.c', 
'testdev.c'))
-softmmu_ss.add(when: ['CONFIG_SPICE', spice], if_true: files('spice.c'))
 
 chardev_modules = {}
 
@@ -36,4 +35,10 @@ if config_host.has_key('CONFIG_BRLAPI')
   chardev_modules += { 'baum': module_ss }
 endif
 
+if config_host.has_key('CONFIG_SPICE')
+  module_ss = ss.source_set()
+  module_ss.add(when: [spice], if_true: files('spice.c'))
+  chardev_modules += { 'spice': module_ss }
+endif
+
 modules += { 'chardev': chardev_modules }
-- 
2.27.0

[PATCH v5 0/7] build spice chardevs as module

2020-10-14 Thread Gerd Hoffmann

v3:
 - fix spice-app
 - init objects before spice
 - meson cleanup
v4:
 - rebase to latest master, adapt to qom changes.
 - add patches to fix modular builds.
 - pick up review tags.
v5:
 - rebase to latest master, fixup conflicts.

Gerd Hoffmann (7):
  qemu-edid: drop cast
  ui/spice-app: don't use qemu_chr_open_spice_port directly
  chardev/spice: make qemu_chr_open_spice_port static
  chardev/spice: simplify chardev setup
  meson: add spice_headers dependency.
  meson: add spice dependency to core spice source files.
  chardev/spice: build spice chardevs as module

 include/chardev/spice.h |  4 
 include/ui/qemu-spice.h |  1 -
 chardev/spice.c | 37 ++---
 qemu-edid.c |  2 +-
 softmmu/vl.c|  9 +
 ui/spice-app.c  | 34 ++
 ui/spice-core.c |  2 --
 util/module.c   |  2 ++
 audio/meson.build   |  2 +-
 chardev/meson.build |  7 ++-
 meson.build |  2 ++
 monitor/meson.build |  2 +-
 ui/meson.build  |  2 +-
 13 files changed, 51 insertions(+), 55 deletions(-)

-- 
2.27.0

[PATCH v5 4/7] chardev/spice: simplify chardev setup

2020-10-14 Thread Gerd Hoffmann

Initialize spice before chardevs.  That allows to register the spice
chardevs directly in the init function and removes the need to maintain
a linked list of chardevs just for registration.

Signed-off-by: Gerd Hoffmann 
---
 include/chardev/spice.h |  1 -
 include/ui/qemu-spice.h |  1 -
 chardev/spice.c | 29 ++---
 softmmu/vl.c|  9 +
 ui/spice-app.c  | 17 +
 ui/spice-core.c |  2 --
 6 files changed, 20 insertions(+), 39 deletions(-)

diff --git a/include/chardev/spice.h b/include/chardev/spice.h
index 1115502cdfbd..58e5b727e9fe 100644
--- a/include/chardev/spice.h
+++ b/include/chardev/spice.h
@@ -13,7 +13,6 @@ struct SpiceChardev {
 bool  blocked;
 const uint8_t *datapos;
 int   datalen;
-QLIST_ENTRY(SpiceChardev) next;
 };
 typedef struct SpiceChardev SpiceChardev;
 
diff --git a/include/ui/qemu-spice.h b/include/ui/qemu-spice.h
index 12474d88f40e..0e8ec3f0d776 100644
--- a/include/ui/qemu-spice.h
+++ b/include/ui/qemu-spice.h
@@ -45,7 +45,6 @@ int qemu_spice_migrate_info(const char *hostname, int port, 
int tls_port,
 #else
 #define SPICE_NEEDS_SET_MM_TIME 0
 #endif
-void qemu_spice_register_ports(void);
 
 #else  /* CONFIG_SPICE */
 
diff --git a/chardev/spice.c b/chardev/spice.c
index 051c23a84f4c..7d1fb1771894 100644
--- a/chardev/spice.c
+++ b/chardev/spice.c
@@ -14,9 +14,6 @@ typedef struct SpiceCharSource {
 SpiceChardev   *scd;
 } SpiceCharSource;
 
-static QLIST_HEAD(, SpiceChardev) spice_chars =
-QLIST_HEAD_INITIALIZER(spice_chars);
-
 static int vmc_write(SpiceCharDeviceInstance *sin, const uint8_t *buf, int len)
 {
 SpiceChardev *scd = container_of(sin, SpiceChardev, sin);
@@ -216,8 +213,6 @@ static void char_spice_finalize(Object *obj)
 
 vmc_unregister_interface(s);
 
-QLIST_SAFE_REMOVE(s, next);
-
 g_free((char *)s->sin.subtype);
 g_free((char *)s->sin.portname);
 }
@@ -256,8 +251,6 @@ static void chr_open(Chardev *chr, const char *subtype)
 
 s->active = false;
 s->sin.subtype = g_strdup(subtype);
-
-QLIST_INSERT_HEAD(&spice_chars, s, next);
 }
 
 static void qemu_chr_open_spice_vmc(Chardev *chr,
@@ -310,28 +303,18 @@ static void qemu_chr_open_spice_port(Chardev *chr,
 return;
 }
 
+if (!using_spice) {
+error_setg(errp, "spice not enabled");
+return;
+}
+
 chr_open(chr, "port");
 
 *be_opened = false;
 s = SPICE_CHARDEV(chr);
 s->sin.portname = g_strdup(name);
 
-if (using_spice) {
-/* spice server already created */
-vmc_register_interface(s);
-}
-}
-
-void qemu_spice_register_ports(void)
-{
-SpiceChardev *s;
-
-QLIST_FOREACH(s, &spice_chars, next) {
-if (s->sin.portname == NULL) {
-continue;
-}
-vmc_register_interface(s);
-}
+vmc_register_interface(s);
 }
 
 static void qemu_chr_parse_spice_vmc(QemuOpts *opts, ChardevBackend *backend,
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 254ee5e525d1..cb476aa70bcc 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -4148,6 +4148,11 @@ void qemu_init(int argc, char **argv, char **envp)
   user_creatable_add_opts_foreach,
   object_create_initial, &error_fatal);
 
+/* spice needs the timers to be initialized by this point */
+/* spice must initialize before audio as it changes the default auiodev */
+/* spice must initialize before chardevs (for spicevmc and spiceport) */
+qemu_spice_init();
+
 qemu_opts_foreach(qemu_find_opts("chardev"),
   chardev_init_func, NULL, &error_fatal);
 
@@ -4156,10 +4161,6 @@ void qemu_init(int argc, char **argv, char **envp)
   fsdev_init_func, NULL, &error_fatal);
 #endif
 
-/* spice needs the timers to be initialized by this point */
-/* spice must initialize before audio as it changes the default auiodev */
-qemu_spice_init();
-
 /*
  * Note: we need to create audio and block backends before
  * machine_set_property(), so machine properties can refer to
diff --git a/ui/spice-app.c b/ui/spice-app.c
index 7e0550c79fdd..026124ef56a0 100644
--- a/ui/spice-app.c
+++ b/ui/spice-app.c
@@ -129,7 +129,6 @@ static void spice_app_atexit(void)
 static void spice_app_display_early_init(DisplayOptions *opts)
 {
 QemuOpts *qopts;
-ChardevBackend *be = chr_spice_backend_new();
 GError *err = NULL;
 
 if (opts->has_full_screen) {
@@ -174,6 +173,15 @@ static void spice_app_display_early_init(DisplayOptions 
*opts)
 qemu_opt_set(qopts, "gl", opts->has_gl ? "on" : "off", &error_abort);
 display_opengl = opts->has_gl;
 #endif
+}
+
+static void spice_app_display_init(DisplayState *ds, DisplayOptions *opts)
+{
+ChardevBackend *be = chr_spice_backend_new();
+QemuOpts *qopts;
+GError *err = NULL;
+gchar *uri;
+
 be->u.spiceport.data->fqdn = g_strdup("org.qemu.monitor.q

1 2 3 4 >

1 - 100 of 358 matches

Mail list logo