Re: [PATCH RFC 0/3] mirror: rework soft-cancelling READY mirror

2021-07-28 Thread Max Reitz

On 27.07.21 18:47, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

That's an alternative to (part of) Max's
"[PATCH for-6.1? v2 0/7] mirror: Handle errors after READY cancel"
and shows' my idea of handling soft-cancelling READY mirror case
directly in qmp_block_job_cancel. And cleanup all other job cancelling
functions.

That's untested draft, don't take it to heart :)


Well, I would have preferred it if you’d rebased this on top of that 
series, precisely because it’s an alternative to only part of it. And if 
it’s just an untested draft, that would have been even better, because 
it would’ve given a better idea on what the cleanup looks like.


There are also things like this series making cancel internally always a 
force-cancel, where I’m not sure whether we want that in the replication 
driver or not[1].  With my series, we add an explicit parameter, so 
we’re forced to think about it, and then in this series on top we can 
just drop the parameter for all force-cancel invocations again, and for 
all non-force-cancel invocations we would have to think a bit more.


Specifically as for this series, I don’t like job_complete_ex() very 
much, I think the parameter should be part of job_complete() itself.  If 
we think that’s too specific of a mirror parameter to include in normal 
job_complete(), well, then there shouldn’t be a job_complete_ex() 
either, and do_graph_change should be a property of the mirror job 
(perhaps as pivot_on_completion) that’s cleared by 
qmp_block_job_cancel() before invoking job_complete().


Max

[1] Although looking at it again now, it probably wants force-cancel.




[PATCH] format code

2021-07-28 Thread ruoguang
From: qwe624758472 <624758...@qq.com>

---
 block/dmg-bz2.c   | 3 +--
 block/dmg-lzfse.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/block/dmg-bz2.c b/block/dmg-bz2.c
index 37f7ee04be..2022eafc81 100644
--- a/block/dmg-bz2.c
+++ b/block/dmg-bz2.c
@@ -52,8 +52,7 @@ static int dmg_uncompress_bz2_do(char *next_in, unsigned int 
avail_in,
 return 0;
 }
 
-__attribute__((constructor))
-static void dmg_bz2_init(void)
+static void __attribute__((constructor)) dmg_bz2_init(void)
 {
 assert(!dmg_uncompress_bz2);
 dmg_uncompress_bz2 = dmg_uncompress_bz2_do;
diff --git a/block/dmg-lzfse.c b/block/dmg-lzfse.c
index 6798cf4fbf..9438539352 100644
--- a/block/dmg-lzfse.c
+++ b/block/dmg-lzfse.c
@@ -40,8 +40,7 @@ static int dmg_uncompress_lzfse_do(char *next_in, unsigned 
int avail_in,
 return -1;
 }
 
-__attribute__((constructor))
-static void dmg_lzfse_init(void)
+static void __attribute__((constructor)) dmg_lzfse_init(void)
 {
 assert(!dmg_uncompress_lzfse);
 dmg_uncompress_lzfse = dmg_uncompress_lzfse_do;
-- 
2.20.1




Re: [PATCH] target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()

2021-07-28 Thread Paolo Bonzini

On 28/07/21 01:52, Hamza Mahfooz wrote:

As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
variants"), RCU_READ_LOCK_GUARD() should be used instead of
rcu_read_{un}lock().

Signed-off-by: Hamza Mahfooz 
---


Reviewed-by: Paolo Bonzini 


  target/arm/kvm.c | 17 -
  1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index d8381ba224..5d55de1a49 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -998,7 +998,6 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry 
*route,
  hwaddr xlat, len, doorbell_gpa;
  MemoryRegionSection mrs;
  MemoryRegion *mr;
-int ret = 1;
  
  if (as == &address_space_memory) {

  return 0;
@@ -1006,15 +1005,19 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
  
  /* MSI doorbell address is translated by an IOMMU */
  
-rcu_read_lock();

+RCU_READ_LOCK_GUARD();
+
  mr = address_space_translate(as, address, &xlat, &len, true,
   MEMTXATTRS_UNSPECIFIED);
+
  if (!mr) {
-goto unlock;
+return 1;
  }
+
  mrs = memory_region_find(mr, xlat, 1);
+
  if (!mrs.mr) {
-goto unlock;
+return 1;
  }
  
  doorbell_gpa = mrs.offset_within_address_space;

@@ -1025,11 +1028,7 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
  
  trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
  
-ret = 0;

-
-unlock:
-rcu_read_unlock();
-return ret;
+return 0;
  }
  
  int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,







Re: [PATCH 3/3] docs: Move user-facing barrier docs into system manual

2021-07-28 Thread Paolo Bonzini

On 27/07/21 22:41, Peter Maydell wrote:

+where VM-1 is the name the display configured int the Barrier server


in the Barrier server


+on the host providing the mouse and the keyboard events.
+
+by default  is ``localhost``,
+ is ``24800``,  and  are set to ``0``,
+ and  to ``1920`` and ``1080``.
+
+If Barrier server is stopped QEMU needs to be reconnected manually,


If the Barrier server


+by removing and re-adding the input-barrier object, for instance
+with the help of the HMP monitor::
+
+(qemu) object_del barrier0



Paolo




Re: [PATCH 0/3] docs: Convert barrier.txt to rST

2021-07-28 Thread Paolo Bonzini

On 27/07/21 22:41, Peter Maydell wrote:

This patchset converts docs/barrier.txt to rST, putting
it in the appropriate places:
  * the protocol info lives in interop/
  * the "how to use this" info lives in system/
  * TODO remarks live in the .c file :-)

thanks
-- PMM

Peter Maydell (3):
   docs: Move the protocol part of barrier.txt into interop
   ui/input-barrier: Move TODOs from barrier.txt to a comment
   docs: Move user-facing barrier docs into system manual

  docs/barrier.txt | 370 --
  docs/interop/barrier.rst | 426 +++
  docs/interop/index.rst   |   1 +
  docs/system/barrier.rst  |  44 
  docs/system/index.rst|   1 +
  ui/input-barrier.c   |   5 +
  6 files changed, 477 insertions(+), 370 deletions(-)
  delete mode 100644 docs/barrier.txt
  create mode 100644 docs/interop/barrier.rst
  create mode 100644 docs/system/barrier.rst



Apart from the two typos in patch 3,

Reviewed-by: Paolo Bonzini 

Paolo




[PATCH] gitlab-ci.d/buildtest: Mark the aarch64 and ppc64-s390x CFI jobs as manual

2021-07-28 Thread Thomas Huth
These two jobs are currently failing very often - the linker seems to
get killed due to out-of-memory problems. Since apparently nobody has
currently an idea how to fix that nicely, let's mark the jobs as manual
for the time being until someone comes up with a proper fix.

Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/buildtest.yml | 12 
 1 file changed, 12 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 63f1903f07..3537c6f1a1 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -416,6 +416,12 @@ build-cfi-aarch64:
 expire_in: 2 days
 paths:
   - build
+  rules:
+# FIXME: This job is often failing, likely due to out-of-memory problems in
+# the constraint containers of the shared runners. Thus this is marked as
+# manual until the situation has been solved.
+- when: manual
+  allow_failure: true
 
 check-cfi-aarch64:
   extends: .native_test_job_template
@@ -452,6 +458,12 @@ build-cfi-ppc64-s390x:
 expire_in: 2 days
 paths:
   - build
+  rules:
+# FIXME: This job is often failing, likely due to out-of-memory problems in
+# the constraint containers of the shared runners. Thus this is marked as
+# manual until the situation has been solved.
+- when: manual
+  allow_failure: true
 
 check-cfi-ppc64-s390x:
   extends: .native_test_job_template
-- 
2.27.0




Re: [PATCH 01/20] Hexagon HVX (target/hexagon) README

2021-07-28 Thread Rob Landley
On 7/26/21 8:59 AM, Taylor Simpson wrote:
>> Anyway, I still hope somebody else has already done most of this in a git
>> tree somewhere. :)
>
> We're working on system mode support for Hexagon, and we plan to upstream it 
> when it is ready.

Yay! Thanks.

While you're at it, why is llvm's cmake config unable to do:

  $ cccnext/cross_bin/hexagon-unknown-linux-musl-cc \
-Xpreprocessor -P -E - <<< __SIZEOF_POINTER__
  4

I'm trying to genericize that llvm build script to do all the targets musl and
llvm agree on supporting, which means not passing in -DCMAKE_SIZEOF_VOID_P=4
because the compiler ALREADY KNOWS THIS... but cmake/config-ix.cmake line 196 is
REALLY going to barf if we didn't explicitly specify it on the command line? Are
the llvm developers not _aware_ of the "cc -E -dM - < /dev/null" trick? Even if
they aren't, why couldn't they just sizeof(void *) in a header file?

*shrug* I can do the above trick in the wrapper script and then provide
-DCMAKE_SIZEOF_VOID_P=$BLAH on the command line, it just seems DEEPLY pointless
to go to all the trouble of having a ./configure that has to be manually told
stuff the compiler already knows.

Confused,

Rob



Re: [PATCH] qga-win: Add support of Windows Server 2022 in get-osinfo command

2021-07-28 Thread Konstantin Kostiuk
ping

On Wed, Jul 14, 2021 at 10:25 AM Konstantin Kostiuk 
wrote:

>
> ping
>
>
> On Sun, Jul 11, 2021 at 8:18 PM Konstantin Kostiuk 
> wrote:
>
>> ping
>>
>> On Sun, Jul 4, 2021 at 8:51 AM Konstantin Kostiuk 
>> wrote:
>>
>>> ping
>>>
>>> On Mon, Jun 21, 2021 at 3:50 PM Kostiantyn Kostiuk <
>>> konstan...@daynix.com> wrote:
>>>
 Signed-off-by: Kostiantyn Kostiuk 
 ---
  qga/commands-win32.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

 diff --git a/qga/commands-win32.c b/qga/commands-win32.c
 index 300b87c859..93b08fd4b5 100644
 --- a/qga/commands-win32.c
 +++ b/qga/commands-win32.c
 @@ -2209,9 +2209,10 @@ typedef struct _ga_win_10_0_server_t {
  char const *version_id;
  } ga_win_10_0_server_t;

 -static ga_win_10_0_server_t const WIN_10_0_SERVER_VERSION_MATRIX[3] = {
 +static ga_win_10_0_server_t const WIN_10_0_SERVER_VERSION_MATRIX[4] = {
  {14393, "Microsoft Windows Server 2016","2016"},
  {17763, "Microsoft Windows Server 2019","2019"},
 +{20344, "Microsoft Windows Server 2022","2022"},
  {0, 0}
  };

 --
 2.25.1

>>>


Re: [PATCH 1/3] qga-win: Increase VSS freeze timeout to 60 secs instead of 10

2021-07-28 Thread Konstantin Kostiuk
ping

On Wed, Jul 14, 2021 at 9:17 AM Konstantin Kostiuk 
wrote:

> Hi Michael,
>
> qga-vss.dll has many dependencies. Please check that all DLLs are present
> in the same directory.
>
> Also, I have a similar problem in my setup. In my case, libgcc_s_seh-1.dll
> was renamed in MinGW which I used.
>
> Best wishes,
> Kostiantyn Kostiuk
>
>
> On Tue, Jul 13, 2021 at 9:40 PM Michael Roth  wrote:
>
>> Quoting Konstantin Kostiuk (2021-04-22 02:43:25)
>> > ping
>>
>> I've been trying to get these queued but I'm hitting an issue where qga
>> reports:
>>
>>   failed to load qga-vss.dll: The specified module could not be found.
>>
>> via LoadLibraryA(QGA_VSS_DLL) returning error code 126. What's weird is
>> it seems to find qga-vss.dll in the install directory, and you can see it
>> access it in WinDBG and various trace tools, but somehow it reports not
>> found. Are you seeing this issue?
>>
>> I'll debug more this week and try to get these in for rc1, but if you
>> happen
>> to have more of a clue than me then any insights would be much
>> appreciated.
>>
>> >
>> > On Mon, Apr 5, 2021 at 4:14 PM Basil Salman  wrote:
>> >
>> > Currently Requester freeze times out after 10 seconds, while
>> > the default timeout for Writer Freeze is 60 seconds. according to
>> > VSS Documentation [1].
>> > [1]: https://docs.microsoft.com/en-us/windows/win32/vss/
>> > overview-of-processing-a-backup-under-vss
>> >
>> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1909073
>> >
>> > Signed-off-by: Basil Salman 
>> > Signed-off-by: Basil Salman 
>> > ---
>> >  qga/vss-win32/requester.cpp | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/qga/vss-win32/requester.cpp
>> b/qga/vss-win32/requester.cpp
>> > index 5378c55d23..940a2c8f55 100644
>> > --- a/qga/vss-win32/requester.cpp
>> > +++ b/qga/vss-win32/requester.cpp
>> > @@ -18,7 +18,7 @@
>> >  #include 
>> >
>> >  /* Max wait time for frozen event (VSS can only hold writes for 10
>> > seconds) */
>> > -#define VSS_TIMEOUT_FREEZE_MSEC 1
>> > +#define VSS_TIMEOUT_FREEZE_MSEC 6
>> >
>> >  /* Call QueryStatus every 10 ms while waiting for frozen event */
>> >  #define VSS_TIMEOUT_EVENT_MSEC 10
>> > --
>> > 2.17.2
>> >
>> >
>>
>


Re: [PATCH v2] qga-win: Free GMatchInfo properly

2021-07-28 Thread Konstantin Kostiuk
ping

On Wed, Jul 14, 2021 at 10:26 AM Konstantin Kostiuk 
wrote:

> CC Michael Roth
>
> On Thu, Jun 10, 2021 at 7:14 PM Daniel P. Berrangé 
> wrote:
>
>> On Thu, Jun 10, 2021 at 07:08:36PM +0300, Konstantin Kostiuk wrote:
>> > On Thu, Jun 10, 2021 at 7:02 PM Daniel P. Berrangé > >
>> > wrote:
>> >
>> > > On Thu, Jun 10, 2021 at 06:58:11PM +0300, Kostiantyn Kostiuk wrote:
>> > > > The g_regex_match function creates match_info even if it
>> > > > returns FALSE. So we should always call g_match_info_free.
>> > > > A better solution is using g_autoptr for match_info variable.
>> > > >
>> > > > Signed-off-by: Kostiantyn Kostiuk 
>> > > > ---
>> > > >  qga/commands-win32.c | 3 +--
>> > > >  1 file changed, 1 insertion(+), 2 deletions(-)
>> > > >
>> > > > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
>> > > > index 300b87c859..785a5cc6b2 100644
>> > > > --- a/qga/commands-win32.c
>> > > > +++ b/qga/commands-win32.c
>> > > > @@ -2494,7 +2494,7 @@ GuestDeviceInfoList
>> *qmp_guest_get_devices(Error
>> > > **errp)
>> > > >  continue;
>> > > >  }
>> > > >  for (j = 0; hw_ids[j] != NULL; j++) {
>> > > > -GMatchInfo *match_info;
>> > > > +g_autoptr(GMatchInfo) match_info;
>> > >
>> > > This should be initialized to NULL otherwise...
>> > >
>> > > >  GuestDeviceIdPCI *id;
>> > > >  if (!g_regex_match(device_pci_re, hw_ids[j], 0,
>> > > &match_info)) {
>> > > >  continue;
>> > >
>> > > this continue will trigger freeing of unintialized memory
>> > >
>> >
>> > But we always call match_info, so match_info is always initialized.
>> > The g_regex_match function creates match_info even if it returns FALSE.
>>
>> Opps, yes, you are right.
>>
>> Reviewed-by: Daniel P. Berrangé 
>>
>>
>> Regards,
>> Daniel
>> --
>> |: https://berrange.com  -o-
>> https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org -o-
>> https://fstop138.berrange.com :|
>> |: https://entangle-photo.org-o-
>> https://www.instagram.com/dberrange :|
>>
>>


Re: [PATCH] gitlab-ci.d/buildtest: Mark the aarch64 and ppc64-s390x CFI jobs as manual

2021-07-28 Thread Daniel P . Berrangé
On Wed, Jul 28, 2021 at 09:51:41AM +0200, Thomas Huth wrote:
> These two jobs are currently failing very often - the linker seems to
> get killed due to out-of-memory problems. Since apparently nobody has
> currently an idea how to fix that nicely, let's mark the jobs as manual
> for the time being until someone comes up with a proper fix.

Copying the original author of the CFI work...

> 
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/buildtest.yml | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 63f1903f07..3537c6f1a1 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -416,6 +416,12 @@ build-cfi-aarch64:
>  expire_in: 2 days
>  paths:
>- build
> +  rules:
> +# FIXME: This job is often failing, likely due to out-of-memory problems 
> in
> +# the constraint containers of the shared runners. Thus this is marked as

s/constraint/constrained/

> +# manual until the situation has been solved.
> +- when: manual
> +  allow_failure: true
>  
>  check-cfi-aarch64:
>extends: .native_test_job_template
> @@ -452,6 +458,12 @@ build-cfi-ppc64-s390x:
>  expire_in: 2 days
>  paths:
>- build
> +  rules:
> +# FIXME: This job is often failing, likely due to out-of-memory problems 
> in
> +# the constraint containers of the shared runners. Thus this is marked as

s/constraint/constrained/

> +# manual until the situation has been solved.
> +- when: manual
> +  allow_failure: true
>  
>  check-cfi-ppc64-s390x:
>extends: .native_test_job_template

Reviewed-by: Daniel P. Berrangé 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH] target/i386: Added consistency checks for event injection

2021-07-28 Thread Paolo Bonzini

On 25/07/21 11:08, Lara Lazier wrote:

VMRUN exits with SVM_EXIT_ERR if either:
  * The event injected has a reserved type.
  * When the event injected is of type 3 (exception), and the vector that
  has been specified does not correspond to an exception.

This does not fix the entire exc_inj test in kvm-unit-tests.

Signed-off-by: Lara Lazier 
---
  target/i386/tcg/sysemu/svm_helper.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index a61aa23017..70d5c2e35d 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -395,6 +395,9 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
  cpu_loop_exit(cs);
  break;
  case SVM_EVTINJ_TYPE_EXEPT:
+if (vector == EXCP02_NMI || vector >= 31)  {
+cpu_vmexit(env, SVM_EXIT_ERR, 0, GETPC());
+}
  cs->exception_index = vector;
  env->error_code = event_inj_err;
  env->exception_is_int = 0;
@@ -410,6 +413,9 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
  qemu_log_mask(CPU_LOG_TB_IN_ASM, "SOFT");
  cpu_loop_exit(cs);
  break;
+default:
+cpu_vmexit(env, SVM_EXIT_ERR, 0, GETPC());
+break;
  }
  qemu_log_mask(CPU_LOG_TB_IN_ASM, " %#x %#x\n", cs->exception_index,
env->error_code);



Queued, thanks.

Paolo




Re: [PATCH] target/i386: Added V_INTR_PRIO check to virtual interrupts

2021-07-28 Thread Paolo Bonzini

On 21/07/21 17:26, Lara Lazier wrote:

+static inline bool ctl_has_irq(uint32_t int_ctl)
+{
+uint32_t int_prio;
+uint32_t tpr;
+
+int_prio = (int_ctl & V_INTR_PRIO_MASK) >> V_INTR_MASKING_SHIFT;


Oops, I missed that this should be V_INTR_PRIO_SHIFT.  Can you send the 
correction please?


Thanks,

Paolo


+tpr = int_ctl & V_TPR_MASK;
+return (int_ctl & V_IRQ_MASK) && (int_prio >= tpr);
+}
+





Re: [PATCH RFC 04/19] vfio-user: Define type vfio_user_pci_dev_info

2021-07-28 Thread Stefan Hajnoczi
On Sun, Jul 18, 2021 at 11:27:43PM -0700, Elena Ufimtseva wrote:
> From: John G Johnson 
> 
> New class for vfio-user with its class and instance
> constructors and destructors.
> 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
>  hw/vfio/pci.c | 49 +
>  1 file changed, 49 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index bea95efc33..554b562769 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -42,6 +42,7 @@
>  #include "qapi/error.h"
>  #include "migration/blocker.h"
>  #include "migration/qemu-file.h"
> +#include "hw/vfio/user.h"
>  
>  #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
>  
> @@ -3326,3 +3327,51 @@ static void register_vfio_pci_dev_type(void)
>  }
>  
>  type_init(register_vfio_pci_dev_type)
> +
> +static void vfio_user_pci_realize(PCIDevice *pdev, Error **errp)
> +{
> +ERRP_GUARD();
> +VFIOUserPCIDevice *udev = VFIO_USER_PCI(pdev);
> +
> +if (!udev->sock_name) {
> +error_setg(errp, "No socket specified");
> +error_append_hint(errp, "Use -device vfio-user-pci,socket=\n");
> +return;
> +}
> +}
> +
> +static void vfio_user_instance_finalize(Object *obj)
> +{
> +}
> +
> +static Property vfio_user_pci_dev_properties[] = {
> +DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name),

Please use SocketAddress so that alternative socket connection details
can be supported without inventing custom syntax for vfio-user-pci. For
example, file descriptor passing should be possible.

I think this requires a bit of command-line parsing work, so don't worry
about it for now, but please add a TODO comment. When the -device
vfio-user-pci syntax is finalized (i.e. when the code is merged and the
device name doesn't start with the experimental x- prefix), then it
needs to be solved.

> +DEFINE_PROP_BOOL("secure-dma", VFIOUserPCIDevice, secure, false),

I'm not sure what "secure-dma" means and the "secure" variable name is
even more inscrutable. Does this mean don't share memory so that each
DMA access is checked individually?

> +DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void vfio_user_pci_dev_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
> +
> +device_class_set_props(dc, vfio_user_pci_dev_properties);
> +dc->desc = "VFIO over socket PCI device assignment";
> +pdc->realize = vfio_user_pci_realize;
> +}
> +
> +static const TypeInfo vfio_user_pci_dev_info = {
> +.name = TYPE_VFIO_USER_PCI,
> +.parent = TYPE_VFIO_PCI_BASE,
> +.instance_size = sizeof(VFIOUserPCIDevice),
> +.class_init = vfio_user_pci_dev_class_init,
> +.instance_init = vfio_instance_init,
> +.instance_finalize = vfio_user_instance_finalize,
> +};
> +
> +static void register_vfio_user_dev_type(void)
> +{
> +type_register_static(&vfio_user_pci_dev_info);
> +}
> +
> +type_init(register_vfio_user_dev_type)
> -- 
> 2.25.1
> 


signature.asc
Description: PGP signature


[PATCH v2] target/i386: Added V_INTR_PRIO check to virtual interrupts

2021-07-28 Thread Lara Lazier
v1->v2: Fixed Mask

The APM2 states that The processor takes a virtual INTR interrupt
if V_IRQ and V_INTR_PRIO indicate that there is a virtual interrupt pending
whose priority is greater than the value in V_TPR.

Signed-off-by: Lara Lazier 
---
 target/i386/tcg/sysemu/svm_helper.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index 72ea7c9a08..a3138e9f86 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -65,6 +65,16 @@ static inline void svm_load_seg_cache(CPUX86State *env, 
hwaddr addr,
sc->base, sc->limit, sc->flags);
 }
 
+static inline bool ctl_has_irq(uint32_t int_ctl)
+{
+uint32_t int_prio;
+uint32_t tpr;
+
+int_prio = (int_ctl & V_INTR_PRIO_MASK) >> V_INTR_PRIO_SHIFT;
+tpr = int_ctl & V_TPR_MASK;
+return (int_ctl & V_IRQ_MASK) && (int_prio >= tpr);
+}
+
 static inline bool is_efer_invalid_state (CPUX86State *env)
 {
 if (!(env->efer & MSR_EFER_SVME)) {
@@ -365,7 +375,6 @@ void helper_vmrun(CPUX86State *env, int aflag, int 
next_eip_addend)
 
 if (ctl_has_irq(int_ctl)) {
 CPUState *cs = env_cpu(env);
-
 cs->interrupt_request |= CPU_INTERRUPT_VIRQ;
 }
 
-- 
2.25.1




[PATCH] block/io_uring: resubmit when result is -EAGAIN

2021-07-28 Thread Fabian Ebner
Quoting from [0]:

 Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
 completion path. Normally we expect this to happen inline as part
 of submission, but apparently SCSI has a weird corner case where it
 can happen as part of normal completions.

Host kernels without patch [0] can panic when this happens [1], and
resubmitting makes the panic more likely. On the other hand, for
kernels with patch [0], resubmitting ensures that a block job is not
aborted just because of such spurious errors.

[0]: 
https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u

[1]:
  #9 [b732000c8b70] asm_exc_page_fault at a4800ade
 #10 [b732000c8bf8] io_prep_async_work at a3d89c16
 #11 [b732000c8c50] io_rw_reissue at a3d8b2e1
 #12 [b732000c8c78] io_complete_rw at a3d8baa8
 #13 [b732000c8c98] blkdev_bio_end_io at a3d62a80
 #14 [b732000c8cc8] bio_endio at a3f4e800
 #15 [b732000c8ce8] dec_pending at a432f854
 #16 [b732000c8d30] clone_endio at a433170c
 #17 [b732000c8d70] bio_endio at a3f4e800
 #18 [b732000c8d90] blk_update_request at a3f53a37
 #19 [b732000c8dd0] scsi_end_request at a4233a5c
 #20 [b732000c8e08] scsi_io_completion at a423432c
 #21 [b732000c8e58] scsi_finish_command at a422c527
 #22 [b732000c8e88] scsi_softirq_done at a42341e4

Signed-off-by: Fabian Ebner 
---

I'm new to this code and io_uring, so I don't know what all the
implications are, but retrying upon EAGAIN does not sound like
a bad thing to my inexperienced ears.

Some more context, leading up to this patch:

We had some users reporting issues after we switched to using io_uring
by default. Namely, kernel panics [2] for some, and failing block jobs
[3] (with a custom backup mechanism we implemented on top of QEMU's
block layer) for others.

I had luck and managed to reprouce the issue, and it was a failed
block job about half of the time and a kernel panic the other half.
When using a host kernel with [0], it's a failed block job all the
time, and this patch attempts to fix that, by resubmitting instead
of bubbling up the error.

[2]: 
https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/post-404382
[3]: 
https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 00a3ee9fb8..77d162cb24 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -165,7 +165,7 @@ static void luring_process_completions(LuringState *s)
 total_bytes = ret + luringcb->total_read;
 
 if (ret < 0) {
-if (ret == -EINTR) {
+if (ret == -EINTR || ret == -EAGAIN) {
 luring_resubmit(s, luringcb);
 continue;
 }
-- 
2.30.2





[PULL 0/1] Libslirp update

2021-07-28 Thread marcandre . lureau
From: Marc-André Lureau 

The following changes since commit a2376507f615495b1d16685449ce0ea78c2caf9d:

  Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into 
staging (2021-07-24 11:04:57 +0100)

are available in the Git repository at:

  g...@github.com:elmarco/qemu.git tags/libslirp-pull-request

for you to fetch changes up to 565301284e83fd5f12024a4e7959895380491668:

  Update libslirp to v4.6.1 (2021-07-28 13:11:11 +0400)


Update libslirp

Hi,

Let's update libslirp to 4.6.1, with its various fixes.



Marc-André Lureau (1):
  Update libslirp to v4.6.1

 slirp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.32.0.264.g75ae10bc75





[PULL 1/1] Update libslirp to v4.6.1

2021-07-28 Thread marcandre . lureau
From: Marc-André Lureau 

Switch from stable-4.2 branch to upstream v4.6.1 release + fixes.

## [Unreleased]

### Fixed

 - Haiku fixes. !98 !99
 - Fix a minor DHCP regression introduced in 4.6.0. !97

## [4.6.1] - 2021-06-18

### Fixed

 - Fix DHCP regression introduced in 4.6.0. !95

## [4.6.0] - 2021-06-14

### Added

 - mbuf: Add debugging helpers for allocation. !90

### Changed

 -  Revert "Set macOS deployment target to macOS 10.4". !93

### Fixed

 - mtod()-related buffer overflows (CVE-2021-3592 #44, CVE-2021-3593 #45,
   CVE-2021-3594 #47, CVE-2021-3595 #46).
 - poll_fd: add missing fd registration for UDP and ICMP
 - ncsi: make ncsi_calculate_checksum work with unaligned data. !89
 - Various typos and doc fixes. !88

## [4.5.0] - 2021-05-18

### Added

 - IPv6 forwarding. !62 !75 !77
 - slirp_neighbor_info() to dump the ARP/NDP tables. !71

### Changed

 - Lazy guest address resolution for IPv6. !81
 - Improve signal handling when spawning a child. !61
 - Set macOS deployment target to macOS 10.4. !72
 - slirp_add_hostfwd: Ensure all error paths set errno. !80
 - More API documentation.

### Fixed

 - Assertion failure on unspecified IPv6 address. !86
 - Disable polling for PRI on MacOS, fixing some closing streams issues. !73
 - Various memory leak fixes on fastq/batchq. !68
 - Memory leak on IPv6 fast-send. !67
 - Slow socket response on Windows. !64
 - Misc build and code cleanups. !60 !63 !76 !79 !84

## [4.4.0] - 2020-12-02

### Added

 - udp, udp6, icmp: handle TTL value. !48
 - Enable forwarding ICMP errors. !49
 - Add DNS resolving for iOS. !54

### Changed

 - Improve meson subproject() support. !53
 - Removed Makefile-based build system. !56

### Fixed

 - socket: consume empty packets. !55
 - check pkt_len before reading protocol header (CVE-2020-29129). !57
 - ip_stripoptions use memmove (fixes undefined behaviour). !47
 - various Coverity-related changes/fixes.

## [4.3.1] - 2020-07-08

### Changed

 - A silent truncation could occur in `slirp_fmt()`, which will now print a
   critical message. See also #22.

### Fixed

 - CVE-2020-10756 - Drop bogus IPv6 messages that could lead to data leakage.
   See !44 and !42.
 - Fix win32 builds by using the SLIRP_PACKED definition.
 - Various coverity scan errors fixed. !41
 - Fix new GCC warnings. !43

## [4.3.0] - 2020-04-22

### Added

 - `SLIRP_VERSION_STRING` macro, with the git sha suffix when building from git
 - `SlirpConfig.disable_dns`, to disable DNS redirection #16

### Changed

 - `slirp_version_string()` now has the git sha suffix when building form git
 - Limit DNS redirection to port 53 #16

### Fixed

 - Fix build regression with mingw & NetBSD
 - Fix use-afte-free in `ip_reass()` (CVE-2020-1983)

Signed-off-by: Marc-André Lureau 
Reviewed-by: Doug Evans 
---
 slirp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/slirp b/slirp
index 8f43a99191..a88d9ace23 16
--- a/slirp
+++ b/slirp
@@ -1 +1 @@
-Subproject commit 8f43a99191afb47ca3f3c6972f6306209f367ece
+Subproject commit a88d9ace234a24ce1c17189642ef9104799425e0
-- 
2.32.0.264.g75ae10bc75




[Bug 1908626] Re: Atomic test-and-set instruction does not work on qemu-user

2021-07-28 Thread taos
** Changed in: qemu
   Status: Expired => New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1908626

Title:
  Atomic test-and-set instruction does not work on qemu-user

Status in QEMU:
  New

Bug description:
  I try to compile and run PostgreSQL/Greenplum database inside docker 
container/qemu-aarch64-static:
  ```
   host: CentOS7 x86_64
   container: centos:centos7.9.2009 --platform linux/arm64/v8
   qemu-user-static: https://github.com/multiarch/qemu-user-static/releases/
  ```

  However, GP/PG's spinlock always gets stuck and reports PANIC errors. It 
seems its spinlock
  has something wrong.
  ```
  https://github.com/greenplum-db/gpdb/blob/master/src/include/storage/s_lock.h
  
https://github.com/greenplum-db/gpdb/blob/master/src/backend/storage/lmgr/s_lock.c
  ```

  So I extract its spinlock implementation into one test C source file (see 
attachment file),
  and get reprodcued:

  ```
  $ gcc spinlock_qemu.c
  $ ./a.out 
  C -- slock inited, lock value is: 0
  parent 139642, child 139645
  P -- slock lock before, lock value is: 0
  P -- slock locked, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock lock before, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock locked, lock value is: 1
  C -- slock unlock after, lock value is: 0
  C -- slock lock before, lock value is: 1
  P -- slock locked, lock value is: 1
  P -- slock unlock after, lock value is: 0
  P -- slock lock before, lock value is: 1
  C -- slock locked, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock locked, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock locked, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  P -- slock lock before, lock value is: 1
  spin timeout, lock value is 1 (pid 139642)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139642)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139642)
  ...
  ...
  ...
  ```

  NOTE: this code always works on PHYSICAL ARM64 server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1908626/+subscriptions




Re: [PATCH v2 3/6] util/oslib-posix: Don't create too many threads with small memory or little pages

2021-07-28 Thread Pankaj Gupta
> Let's limit the number of threads to something sane, especially that
> - We don't have more threads than the number of pages we have
> - We don't have threads that initialize small (< 64 MiB) memory
>
> Signed-off-by: David Hildenbrand 
> ---
>  util/oslib-posix.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 2ae6c3aaab..a1d309d495 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -40,6 +40,7 @@
>  #include 
>  #include "qemu/cutils.h"
>  #include "qemu/compiler.h"
> +#include "qemu/units.h"
>
>  #ifdef CONFIG_LINUX
>  #include 
> @@ -529,7 +530,8 @@ static void *do_madv_populate_write_pages(void *arg)
>  return NULL;
>  }
>
> -static inline int get_memset_num_threads(int smp_cpus)
> +static inline int get_memset_num_threads(size_t hpagesize, size_t numpages,
> + int smp_cpus)
>  {
>  long host_procs = sysconf(_SC_NPROCESSORS_ONLN);
>  int ret = 1;
> @@ -537,6 +539,12 @@ static inline int get_memset_num_threads(int smp_cpus)
>  if (host_procs > 0) {
>  ret = MIN(MIN(host_procs, MAX_MEM_PREALLOC_THREAD_COUNT), smp_cpus);
>  }
> +
> +/* Especially with gigantic pages, don't create more threads than pages. 
> */
> +ret = MIN(ret, numpages);
> +/* Don't start threads to prealloc comparatively little memory. */
> +ret = MIN(ret, MAX(1, hpagesize * numpages / (64 * MiB)));
> +
>  /* In case sysconf() fails, we fall back to single threaded */
>  return ret;
>  }
> @@ -546,7 +554,7 @@ static bool touch_all_pages(char *area, size_t hpagesize, 
> size_t numpages,
>  {
>  static gsize initialized = 0;
>  MemsetContext context = {
> -.num_threads = get_memset_num_threads(smp_cpus),
> +.num_threads = get_memset_num_threads(hpagesize, numpages, smp_cpus),
>  };
>  size_t numpages_per_thread, leftover;
>  void *(*touch_fn)(void *);

Reviewed-by: Pankaj Gupta 



Re: [PATCH v2 4/6] util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITE

2021-07-28 Thread Pankaj Gupta
> Let's simplify the case when we only want a single thread and don't have
> to mess with signal handlers.
>
> Signed-off-by: David Hildenbrand 
> ---
>  util/oslib-posix.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index a1d309d495..1483e985c6 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -568,6 +568,14 @@ static bool touch_all_pages(char *area, size_t 
> hpagesize, size_t numpages,
>  }
>
>  if (use_madv_populate_write) {
> +/* Avoid creating a single thread for MADV_POPULATE_WRITE */
> +if (context.num_threads == 1) {
> +if (qemu_madvise(area, hpagesize * numpages,
> + QEMU_MADV_POPULATE_WRITE)) {
> +return true;
> +}
> +return false;
> +}
>  touch_fn = do_madv_populate_write_pages;
>  } else {
>  touch_fn = do_touch_pages;

Reviewed-by: Pankaj Gupta 



Re: [PATCH PING] tests/tcg/linux-test: Fix random hangs in test_socket

2021-07-28 Thread Ilya Leoshkevich
On Tue, 2021-06-01 at 16:20 +0200, Ilya Leoshkevich wrote:
> test_socket hangs randomly in connect(), especially when run without
> qemu. Apparently the reason is that linux started treating backlog
> value of 0 literally instead of rounding it up since v4.4 (commit
> ef547f2ac16b).
> 
> So set it to 1 instead.
> 
> Signed-off-by: Ilya Leoshkevich 
> ---
>  tests/tcg/multiarch/linux-test.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/tcg/multiarch/linux-test.c
> b/tests/tcg/multiarch/linux-test.c
> index eba07273f7..3b256f6c02 100644
> --- a/tests/tcg/multiarch/linux-test.c
> +++ b/tests/tcg/multiarch/linux-test.c
> @@ -263,7 +263,7 @@ static int server_socket(void)
>  sockaddr.sin_port = htons(0); /* choose random ephemeral port)
> */
>  sockaddr.sin_addr.s_addr = 0;
>  chk_error(bind(fd, (struct sockaddr *)&sockaddr,
> sizeof(sockaddr)));
> -    chk_error(listen(fd, 0));
> +    chk_error(listen(fd, 1));
>  return fd;
>  
>  }

Hello,

I would like to ping this patch.

Best regards,
Ilya




[PATCH v2] target/i386: Added VGIF feature

2021-07-28 Thread Lara Lazier
VGIF allows STGI and CLGI to execute in guest mode and control virtual
interrupts in guest mode.
When the VGIF feature is enabled then:
 * executing STGI in the guest sets bit 9 of the VMCB offset 60h.
 * executing CLGI in the guest clears bit 9 of the VMCB offset 60h.

Signed-off-by: Lara Lazier 
---
 target/i386/cpu.c   |  3 ++-
 target/i386/svm.h   |  6 ++
 target/i386/tcg/sysemu/svm_helper.c | 27 +--
 3 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index edb97ebbbe..71d26cf1bd 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -631,7 +631,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
 #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \
   CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A)
 #define TCG_EXT4_FEATURES 0
-#define TCG_SVM_FEATURES CPUID_SVM_NPT
+#define TCG_SVM_FEATURES (CPUID_SVM_NPT | CPUID_SVM_VGIF | \
+  CPUID_SVM_SVME_ADDR_CHK)
 #define TCG_KVM_FEATURES 0
 #define TCG_7_0_EBX_FEATURES (CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_SMAP | \
   CPUID_7_0_EBX_BMI1 | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ADX | \
diff --git a/target/i386/svm.h b/target/i386/svm.h
index e54670ef12..dab2f90925 100644
--- a/target/i386/svm.h
+++ b/target/i386/svm.h
@@ -9,6 +9,12 @@
 #define V_IRQ_SHIFT 8
 #define V_IRQ_MASK (1 << V_IRQ_SHIFT)
 
+#define V_GIF_ENABLED_SHIFT 25
+#define V_GIF_ENABLED_MASK (1 << V_GIF_ENABLED_SHIFT)
+
+#define V_GIF_SHIFT 9
+#define V_GIF_MASK (1 << V_GIF_SHIFT)
+
 #define V_INTR_PRIO_SHIFT 16
 #define V_INTR_PRIO_MASK (0x0f << V_INTR_PRIO_SHIFT)
 
diff --git a/target/i386/tcg/sysemu/svm_helper.c 
b/target/i386/tcg/sysemu/svm_helper.c
index 7b80f7bc85..72ea7c9a08 100644
--- a/target/i386/tcg/sysemu/svm_helper.c
+++ b/target/i386/tcg/sysemu/svm_helper.c
@@ -120,6 +120,11 @@ static inline void svm_vmload_canonicalization(CPUX86State 
*env)
 env->segs[R_GS].base = (long) (env->segs[R_GS].base & mask);
 }
 
+static inline bool virtual_gif_enabled(CPUX86State *env, uint32_t int_ctl)
+{
+return (int_ctl & V_GIF_ENABLED_MASK) && (env->features[FEAT_SVM] & 
CPUID_SVM_VGIF);
+}
+
 void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
 {
 CPUState *cs = env_cpu(env);
@@ -517,13 +522,31 @@ void helper_vmsave(CPUX86State *env, int aflag)
 void helper_stgi(CPUX86State *env)
 {
 cpu_svm_check_intercept_param(env, SVM_EXIT_STGI, 0, GETPC());
-env->hflags2 |= HF2_GIF_MASK;
+
+CPUState *cs = env_cpu(env);
+uint32_t int_ctl = x86_ldl_phys(cs,
+   env->vm_vmcb + offsetof(struct vmcb, control.int_ctl));
+if (virtual_gif_enabled(env, int_ctl) && likely(env->hflags & 
HF_GUEST_MASK)) {
+x86_stl_phys(cs, env->vm_vmcb + offsetof(struct vmcb, control.int_ctl),
+int_ctl | V_GIF_MASK);
+} else {
+env->hflags2 |= HF2_GIF_MASK;
+}
 }
 
 void helper_clgi(CPUX86State *env)
 {
 cpu_svm_check_intercept_param(env, SVM_EXIT_CLGI, 0, GETPC());
-env->hflags2 &= ~HF2_GIF_MASK;
+
+CPUState *cs = env_cpu(env);
+uint32_t int_ctl = x86_ldl_phys(cs,
+   env->vm_vmcb + offsetof(struct vmcb, control.int_ctl));
+if (virtual_gif_enabled(env, int_ctl) && likely(env->hflags & 
HF_GUEST_MASK)) {
+x86_stl_phys(cs, env->vm_vmcb + offsetof(struct vmcb, control.int_ctl),
+int_ctl & ~V_GIF_MASK);
+} else {
+env->hflags2 &= ~HF2_GIF_MASK;
+}
 }
 
 bool cpu_svm_has_intercept(CPUX86State *env, uint32_t type)
-- 
2.25.1




Re: [PATCH 2/3] ui/input-barrier: Move TODOs from barrier.txt to a comment

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/27/21 10:41 PM, Peter Maydell wrote:
> docs/barrier.txt has a couple of TODO notes about things to be
> implemented in this device; move them into a comment in the
> source code.
> 
> Signed-off-by: Peter Maydell 
> ---
>  docs/barrier.txt   | 4 
>  ui/input-barrier.c | 5 +
>  2 files changed, 5 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] qga-win: Add support of Windows Server 2022 in get-osinfo command

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 9:53 AM, Konstantin Kostiuk wrote:
> ping

Seems merged:

commit c141814d4f495bd9efdabb1229ce0a5b5a239bf3
Author: Kostiantyn Kostiuk 
Date:   Mon Jun 21 15:50:17 2021 +0300

qga-win: Add support of Windows Server 2022 in get-osinfo command

Signed-off-by: Kostiantyn Kostiuk 
Signed-off-by: Michael Roth 

> On Wed, Jul 14, 2021 at 10:25 AM Konstantin Kostiuk
> mailto:konstan...@daynix.com>> wrote:
> 
> 
> ping
> 
> 
> On Sun, Jul 11, 2021 at 8:18 PM Konstantin Kostiuk
> mailto:konstan...@daynix.com>> wrote:
> 
> ping
> 
> On Sun, Jul 4, 2021 at 8:51 AM Konstantin Kostiuk
> mailto:konstan...@daynix.com>> wrote:
> 
> ping
> 
> On Mon, Jun 21, 2021 at 3:50 PM Kostiantyn Kostiuk
> mailto:konstan...@daynix.com>> wrote:
> 
> Signed-off-by: Kostiantyn Kostiuk  >
> ---
>  qga/commands-win32.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> index 300b87c859..93b08fd4b5 100644
> --- a/qga/commands-win32.c
> +++ b/qga/commands-win32.c
> @@ -2209,9 +2209,10 @@ typedef struct
> _ga_win_10_0_server_t {
>      char const *version_id;
>  } ga_win_10_0_server_t;
> 
> -static ga_win_10_0_server_t const
> WIN_10_0_SERVER_VERSION_MATRIX[3] = {
> +static ga_win_10_0_server_t const
> WIN_10_0_SERVER_VERSION_MATRIX[4] = {
>      {14393, "Microsoft Windows Server 2016",    "2016"},
>      {17763, "Microsoft Windows Server 2019",    "2019"},
> +    {20344, "Microsoft Windows Server 2022",    "2022"},
>      {0, 0}
>  };
> 
> --
> 2.25.1
> 




Re: [PATCH-for-6.1 v2] qga-win: Free GMatchInfo properly

2021-07-28 Thread Philippe Mathieu-Daudé
Still candidate for 6.1.

On 7/28/21 9:54 AM, Konstantin Kostiuk wrote:
> ping
> 
> On Wed, Jul 14, 2021 at 10:26 AM Konstantin Kostiuk
> mailto:konstan...@daynix.com>> wrote:
> 
> CC Michael Roth
> 
> On Thu, Jun 10, 2021 at 7:14 PM Daniel P. Berrangé
> mailto:berra...@redhat.com>> wrote:
> 
> On Thu, Jun 10, 2021 at 07:08:36PM +0300, Konstantin Kostiuk wrote:
> > On Thu, Jun 10, 2021 at 7:02 PM Daniel P. Berrangé
> mailto:berra...@redhat.com>>
> > wrote:
> >
> > > On Thu, Jun 10, 2021 at 06:58:11PM +0300, Kostiantyn Kostiuk
> wrote:
> > > > The g_regex_match function creates match_info even if it
> > > > returns FALSE. So we should always call g_match_info_free.
> > > > A better solution is using g_autoptr for match_info variable.
> > > >
> > > > Signed-off-by: Kostiantyn Kostiuk  >
> > > > ---
> > > >  qga/commands-win32.c | 3 +--
> > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > >
> > > > diff --git a/qga/commands-win32.c b/qga/commands-win32.c
> > > > index 300b87c859..785a5cc6b2 100644
> > > > --- a/qga/commands-win32.c
> > > > +++ b/qga/commands-win32.c
> > > > @@ -2494,7 +2494,7 @@ GuestDeviceInfoList
> *qmp_guest_get_devices(Error
> > > **errp)
> > > >              continue;
> > > >          }
> > > >          for (j = 0; hw_ids[j] != NULL; j++) {
> > > > -            GMatchInfo *match_info;
> > > > +            g_autoptr(GMatchInfo) match_info;
> > >
> > > This should be initialized to NULL otherwise...
> > >
> > > >              GuestDeviceIdPCI *id;
> > > >              if (!g_regex_match(device_pci_re, hw_ids[j], 0,
> > > &match_info)) {
> > > >                  continue;
> > >
> > > this continue will trigger freeing of unintialized memory
> > >
> >
> > But we always call match_info, so match_info is always
> initialized.
> > The g_regex_match function creates match_info even if it
> returns FALSE.
> 
> Opps, yes, you are right.
> 
> Reviewed-by: Daniel P. Berrangé  >
> 
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com       -o-   
> https://www.flickr.com/photos/dberrange
>  :|
> |: https://libvirt.org          -o-       
>     https://fstop138.berrange.com  :|
> |: https://entangle-photo.org    
> -o-    https://www.instagram.com/dberrange
>  :|
> 




Re: [PATCH] gitlab-ci.d/buildtest: Mark the aarch64 and ppc64-s390x CFI jobs as manual

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 9:51 AM, Thomas Huth wrote:
> These two jobs are currently failing very often - the linker seems to
> get killed due to out-of-memory problems. Since apparently nobody has
> currently an idea how to fix that nicely, let's mark the jobs as manual
> for the time being until someone comes up with a proper fix.
> 
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/buildtest.yml | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 63f1903f07..3537c6f1a1 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -416,6 +416,12 @@ build-cfi-aarch64:
>  expire_in: 2 days
>  paths:
>- build
> +  rules:
> +# FIXME: This job is often failing, likely due to out-of-memory problems 
> in
> +# the constraint containers of the shared runners. Thus this is marked as
> +# manual until the situation has been solved.> +- when: manual
> +  allow_failure: true

Eventually:

# Except if user explicitly sets the QEMU_CI_CFI_TESTING variable
- if: '$QEMU_CI_CFI_TESTING'
  when: always

But I agree, better to fix for everybody.

Reviewed-by: Philippe Mathieu-Daudé 

>  check-cfi-aarch64:
>extends: .native_test_job_template
> @@ -452,6 +458,12 @@ build-cfi-ppc64-s390x:
>  expire_in: 2 days
>  paths:
>- build
> +  rules:
> +# FIXME: This job is often failing, likely due to out-of-memory problems 
> in
> +# the constraint containers of the shared runners. Thus this is marked as
> +# manual until the situation has been solved.
> +- when: manual
> +  allow_failure: true
>  
>  check-cfi-ppc64-s390x:
>extends: .native_test_job_template
> 




Re: [PATCH] block/io_uring: resubmit when result is -EAGAIN

2021-07-28 Thread Philippe Mathieu-Daudé
Cc'ing the maintainers for you. See
https://wiki.qemu.org/Contribute/SubmitAPatch#CC_the_relevant_maintainer

$ ./scripts/get_maintainer.pl -f block/io_uring.c
Aarushi Mehta  (maintainer:Linux io_uring)
Julia Suvorova  (maintainer:Linux io_uring)
Stefan Hajnoczi  (maintainer:Linux io_uring)
Kevin Wolf  (supporter:Block layer core)
Max Reitz  (supporter:Block layer core)
qemu-bl...@nongnu.org (open list:Linux io_uring)
qemu-devel@nongnu.org (open list:All patches CC here)

Also Cc'ing Stefano for commit b4e44c9944e ("io_uring: retry
io_uring_submit() if it fails with errno=EINTR").
(Stefano, you might want to add yourself a R: tag in MAINTAINERS).

On 7/28/21 12:35 PM, Fabian Ebner wrote:
> Quoting from [0]:
> 
>  Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>  completion path. Normally we expect this to happen inline as part
>  of submission, but apparently SCSI has a weird corner case where it
>  can happen as part of normal completions.
> 
> Host kernels without patch [0] can panic when this happens [1], and
> resubmitting makes the panic more likely. On the other hand, for
> kernels with patch [0], resubmitting ensures that a block job is not
> aborted just because of such spurious errors.
> 
> [0]: 
> https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u
> 
> [1]:
>   #9 [b732000c8b70] asm_exc_page_fault at a4800ade
>  #10 [b732000c8bf8] io_prep_async_work at a3d89c16
>  #11 [b732000c8c50] io_rw_reissue at a3d8b2e1
>  #12 [b732000c8c78] io_complete_rw at a3d8baa8
>  #13 [b732000c8c98] blkdev_bio_end_io at a3d62a80
>  #14 [b732000c8cc8] bio_endio at a3f4e800
>  #15 [b732000c8ce8] dec_pending at a432f854
>  #16 [b732000c8d30] clone_endio at a433170c
>  #17 [b732000c8d70] bio_endio at a3f4e800
>  #18 [b732000c8d90] blk_update_request at a3f53a37
>  #19 [b732000c8dd0] scsi_end_request at a4233a5c
>  #20 [b732000c8e08] scsi_io_completion at a423432c
>  #21 [b732000c8e58] scsi_finish_command at a422c527
>  #22 [b732000c8e88] scsi_softirq_done at a42341e4
> 
> Signed-off-by: Fabian Ebner 
> ---
> 
> I'm new to this code and io_uring, so I don't know what all the
> implications are, but retrying upon EAGAIN does not sound like
> a bad thing to my inexperienced ears.
> 
> Some more context, leading up to this patch:
> 
> We had some users reporting issues after we switched to using io_uring
> by default. Namely, kernel panics [2] for some, and failing block jobs
> [3] (with a custom backup mechanism we implemented on top of QEMU's
> block layer) for others.
> 
> I had luck and managed to reprouce the issue, and it was a failed
> block job about half of the time and a kernel panic the other half.
> When using a host kernel with [0], it's a failed block job all the
> time, and this patch attempts to fix that, by resubmitting instead
> of bubbling up the error.
> 
> [2]: 
> https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/post-404382
> [3]: 
> https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/
> 
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 00a3ee9fb8..77d162cb24 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -165,7 +165,7 @@ static void luring_process_completions(LuringState *s)
>  total_bytes = ret + luringcb->total_read;
>  
>  if (ret < 0) {
> -if (ret == -EINTR) {
> +if (ret == -EINTR || ret == -EAGAIN) {
>  luring_resubmit(s, luringcb);
>  continue;
>  }
> 




Re: [PATCH v2 7/9] cpus: Introduce qemu_mutex_unlock_iothread_prepare()

2021-07-28 Thread David Hildenbrand

On 27.07.21 18:08, Peter Xu wrote:

On Tue, Jul 27, 2021 at 02:59:26PM +0200, David Hildenbrand wrote:

On 23.07.21 21:34, Peter Xu wrote:

The prepare function before unlocking BQL.  There're only three places that can
release the BQL: unlock(), cond_wait() or cond_timedwait().

Signed-off-by: Peter Xu 
---
   softmmu/cpus.c | 7 +++
   1 file changed, 7 insertions(+)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 9131f77f87..6085f8edbe 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -66,6 +66,10 @@
   static QemuMutex qemu_global_mutex;
+static void qemu_mutex_unlock_iothread_prepare(void)
+{
+}
+
   bool cpu_is_stopped(CPUState *cpu)
   {
   return cpu->stopped || !runstate_is_running();
@@ -523,16 +527,19 @@ void qemu_mutex_unlock_iothread(void)
   {
   g_assert(qemu_mutex_iothread_locked());
   iothread_locked = false;
+qemu_mutex_unlock_iothread_prepare();
   qemu_mutex_unlock(&qemu_global_mutex);
   }
   void qemu_cond_wait_iothread(QemuCond *cond)
   {
+qemu_mutex_unlock_iothread_prepare();
   qemu_cond_wait(cond, &qemu_global_mutex);
   }
   void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
   {
+qemu_mutex_unlock_iothread_prepare();
   qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
   }



I'd squash this patch into the next one.

I don't quite like the function name, but don't really have a better
suggestion  maybe qemu_mutex_might_unlock_iothread(), similar to
might_sleep() or might_fault() in the kernel. (although here it's pretty
clear and not a "might"; could be useful in other context where we might
conditionally unlock the BQL at some point in the future, though)


Yes, IMHO "might" describes a capability of doing something, here it's not
(this one should only be called right before releasing bql, not within any
context of having some capability).  The other option I thought was "pre" but
it will be just a short version of "prepare".

Let me know if you have a better suggestion on naming. :) Otherwise I'll keep
the naming, squash this patch into the next and keep your r-b for that.


Fine with me :)


--
Thanks,

David / dhildenb




Re: [PATCH v2 4/9] memory: Don't do topology update in memory finalize()

2021-07-28 Thread David Hildenbrand

On 27.07.21 18:02, Peter Xu wrote:

On Tue, Jul 27, 2021 at 03:21:31PM +0200, David Hildenbrand wrote:

On 23.07.21 21:34, Peter Xu wrote:

Topology update could be wrongly triggered in memory region finalize() if
there's bug somewhere else.  It'll be a very confusing stack when it
happens (e.g., sending KVM ioctl within the RCU thread, and we'll observe it
only until it fails!).

Instead of that, we use the push()/pop() helper to avoid memory transaction
commit, at the same time we use assertions to make sure there's no pending
updates or it's a nested transaction, so it could fail even earlier and in a
more explicit way.

Suggested-by: Paolo Bonzini 
Signed-off-by: Peter Xu 
---
   softmmu/memory.c | 23 +--
   1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 1a3e9ff8ad..dfce4a2bda 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -170,6 +170,12 @@ struct MemoryRegionIoeventfd {
   EventNotifier *e;
   };
+/* Returns whether there's any pending memory updates */
+static bool memory_region_has_pending_update(void)
+{
+return memory_region_update_pending || ioeventfd_update_pending;
+}
+
   static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
  MemoryRegionIoeventfd *b)
   {
@@ -1756,12 +1762,25 @@ static void memory_region_finalize(Object *obj)
* and cause an infinite loop.
*/
   mr->enabled = false;
-memory_region_transaction_begin();
+
+/*
+ * Use push()/pop() instead of begin()/commit() to make sure below block
+ * won't trigger any topology update (which should never happen, but it's
+ * still a safety belt).
+ */


Hmm, I wonder if we can just keep the begin/end semantics and just do an
assertion before doing the commit? Does anything speak against that?


That sounds working too for the case of run_on_cpu and similar, but I think
this patch should be able to cover more.  For example, it's possible depth==0
when enter memory_region_finalize(), but some removal of subregions could
further cause memory layout changes.  IMHO we should also bail out early for
those cases too.  Thanks,



Do we really have to switch to push/pop to catch these cases early? I'd 
assume we'd just have to formulate the right assertions :)


--
Thanks,

David / dhildenb




Re: [PATCH] docs: Move bootindex.txt into system section and rstify

2021-07-28 Thread Markus Armbruster
Peter Maydell  writes:

> Move bootindex.txt into the system section of the manual and turn it
> into rST format.  To make the document make more sense in the context
> of the system manual, expand the title and introductory paragraphs to
> give more context.
>
> Signed-off-by: Peter Maydell 

Reviewed-by: Markus Armbruster 




[PATCH v12] qapi: introduce 'query-x86-cpuid' QMP command.

2021-07-28 Thread Valeriy Vdovin
Introducing new QMP command 'query-x86-cpuid'. This command can be used to
get virtualized cpu model info generated by QEMU during VM initialization in
the form of cpuid representation.

Diving into more details about virtual CPU generation: QEMU first parses '-cpu'
command line option. From there it takes the name of the model as the basis for
feature set of the new virtual CPU. After that it uses trailing '-cpu' options,
that state if additional cpu features should be present on the virtual CPU or
excluded from it (tokens '+'/'-' or '=on'/'=off').
After that QEMU checks if the host's cpu can actually support the derived
feature set and applies host limitations to it.
After this initialization procedure, virtual CPU has it's model and
vendor names, and a working feature set and is ready for identification
instructions such as CPUID.

To learn exactly how virtual CPU is presented to the guest machine via CPUID
instruction, new QMP command can be used. By calling 'query-x86-cpuid'
command, one can get a full listing of all CPUID leaves with subleaves which are
supported by the initialized virtual CPU.

Other than debug, the command is useful in cases when we would like to
utilize QEMU's virtual CPU initialization routines and put the retrieved
values into kernel CPUID overriding mechanics for more precise control
over how various processes perceive its underlying hardware with
container processes as a good example.

The command is specific to x86. It is currenly only implemented for KVM 
acceleator.

Output format:
The output is a plain list of leaf/subleaf argument combinations, that
return 4 words in registers EAX, EBX, ECX, EDX.

Use example:
qmp_request: {
  "execute": "query-x86-cpuid"
}

qmp_response: {
  "return": [
{
  "eax": 1073741825,
  "edx": 77,
  "in-eax": 1073741824,
  "ecx": 1447775574,
  "ebx": 1263359563
},
{
  "eax": 16777339,
  "edx": 0,
  "in-eax": 1073741825,
  "ecx": 0,
  "ebx": 0
},
{
  "eax": 13,
  "edx": 1231384169,
  "in-eax": 0,
  "ecx": 1818588270,
  "ebx": 1970169159
},
{
  "eax": 198354,
  "edx": 126614527,
  "in-eax": 1,
  "ecx": 2176328193,
  "ebx": 2048
},

{
  "eax": 12328,
  "edx": 0,
  "in-eax": 2147483656,
  "ecx": 0,
  "ebx": 0
}
  ]
}

Signed-off-by: Valeriy Vdovin 
---
This patch is next version (v12) of a patch with a changed subject
Previous subject: [PATCH v11] qapi: introduce 'query-kvm-cpuid' QMP command
https://patchew.org/QEMU/20210617122723.8670-1-valeriy.vdo...@virtuozzo.com/
Supersedes: 20210617122723.8670-1-valeriy.vdo...@virtuozzo.com

v2: - Removed leaf/subleaf iterators.
- Modified cpu_x86_cpuid to return false in cases when count is
  greater than supported subleaves.
v3: - Fixed structure name coding style.
- Added more comments
- Ensured buildability for non-x86 targets.
v4: - Fixed cpu_x86_cpuid return value logic and handling of 0xA leaf.
- Fixed comments.
- Removed target check in qmp_query_cpu_model_cpuid.
v5: - Added error handling code in qmp_query_cpu_model_cpuid
v6: - Fixed error handling code. Added method to query_error_class
v7: - Changed implementation in favor of cached cpuid_data for
  KVM_SET_CPUID2
v8: - Renamed qmp method to query-kvm-cpuid and some fields in response.
- Modified documentation to qmp method
- Removed helper struct declaration
v9: - Renamed 'in_eax' / 'in_ecx' fields to 'in-eax' / 'in-ecx'
- Pasted more complete response to commit message.
v10:
- Subject changed
- Fixes in commit message
- Small fixes in QMP command docs
v11:
- Added explanation about CONFIG_KVM to the commit message.
v12:
- Changed title from query-kvm-cpuid to query-x86-cpuid
- Removed CONFIG_KVM ifdefs
- Added detailed error messages for some stub/unimplemented cases.

 qapi/machine-target.json   | 44 
 softmmu/cpus.c |  2 +-
 target/i386/kvm/kvm-stub.c | 10 
 target/i386/kvm/kvm.c  | 51 ++
 tests/qtest/qmp-cmd-test.c |  1 +
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index e7811654b7..db906c9240 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -329,3 +329,47 @@
 ##
 { 'command': 'query-cpu-definitions', 'returns': ['CpuDefinitionInfo'],
   'if': 'defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_I386) || 
defined(TARGET_S390X) || defined(TARGET_MIPS)' }
+
+##
+# @CpuidEntry:
+#
+# A single entry of a CPUID response.
+#
+# One entry holds full set of information (leaf) returned to the guest
+# in response to it calling a CPUID instruction with eax, ecx used as
+# the agruments to that instruction. ecx is an optional argument as
+# not all of the leaves support it.
+#
+# @in-eax: CPUID argument in eax
+# @in-ecx: CPUID argument in ecx
+# @eax

Re: [PATCH] block/io_uring: resubmit when result is -EAGAIN

2021-07-28 Thread Stefano Garzarella

On Wed, Jul 28, 2021 at 02:09:40PM +0200, Philippe Mathieu-Daudé wrote:

Cc'ing the maintainers for you. See
https://wiki.qemu.org/Contribute/SubmitAPatch#CC_the_relevant_maintainer

$ ./scripts/get_maintainer.pl -f block/io_uring.c
Aarushi Mehta  (maintainer:Linux io_uring)
Julia Suvorova  (maintainer:Linux io_uring)
Stefan Hajnoczi  (maintainer:Linux io_uring)
Kevin Wolf  (supporter:Block layer core)
Max Reitz  (supporter:Block layer core)
qemu-bl...@nongnu.org (open list:Linux io_uring)
qemu-devel@nongnu.org (open list:All patches CC here)

Also Cc'ing Stefano for commit b4e44c9944e ("io_uring: retry
io_uring_submit() if it fails with errno=EINTR").


Thanks Phil!


(Stefano, you might want to add yourself a R: tag in MAINTAINERS).


Yep, I'll send a patch for that.



On 7/28/21 12:35 PM, Fabian Ebner wrote:

Quoting from [0]:

 Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
 completion path. Normally we expect this to happen inline as part
 of submission, but apparently SCSI has a weird corner case where it
 can happen as part of normal completions.

Host kernels without patch [0] can panic when this happens [1], and
resubmitting makes the panic more likely. On the other hand, for
kernels with patch [0], resubmitting ensures that a block job is not
aborted just because of such spurious errors.

[0]: 
https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u

[1]:
  #9 [b732000c8b70] asm_exc_page_fault at a4800ade
 #10 [b732000c8bf8] io_prep_async_work at a3d89c16
 #11 [b732000c8c50] io_rw_reissue at a3d8b2e1
 #12 [b732000c8c78] io_complete_rw at a3d8baa8
 #13 [b732000c8c98] blkdev_bio_end_io at a3d62a80
 #14 [b732000c8cc8] bio_endio at a3f4e800
 #15 [b732000c8ce8] dec_pending at a432f854
 #16 [b732000c8d30] clone_endio at a433170c
 #17 [b732000c8d70] bio_endio at a3f4e800
 #18 [b732000c8d90] blk_update_request at a3f53a37
 #19 [b732000c8dd0] scsi_end_request at a4233a5c
 #20 [b732000c8e08] scsi_io_completion at a423432c
 #21 [b732000c8e58] scsi_finish_command at a422c527
 #22 [b732000c8e88] scsi_softirq_done at a42341e4

Signed-off-by: Fabian Ebner 
---

I'm new to this code and io_uring, so I don't know what all the
implications are, but retrying upon EAGAIN does not sound like
a bad thing to my inexperienced ears.


Yeah, that doesn't sound bad.

For kernels that don't have the patch applied, I don't think there's 
much we can do about it, so this change seems okay to me:


Reviewed-by: Stefano Garzarella 




Re: [PATCH v3] hw/acpi: add an assertion check for non-null return from acpi_get_i386_pci_host

2021-07-28 Thread Michael S. Tsirkin
On Mon, Jul 26, 2021 at 10:27:43PM +0530, Ani Sinha wrote:
> All existing code using acpi_get_i386_pci_host() checks for a non-null
> return value from this function call. Instead of returning early when the 
> value
> returned is NULL, assert instead. Since there are only two possible host buses
> for i386 - q35 and i440fx, a null value return from the function does not make
> sense in most cases and is likely an error situation.

add "on i386"?

> Fixes: c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")

This that seems inappropriate, this is not a bugfix.

> 
> Signed-off-by: Ani Sinha 


Frankly I don't see this as a useful cleanup.
assert is generally a last resort thing.

> ---
>  hw/acpi/pcihp.c  |  8 
>  hw/i386/acpi-build.c | 15 ++-
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> changelog:
> v1: initial patch
> v2: removed comment addition - that can be sent as a separate patch.
> v3: added assertion for null host values for all cases except one.
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index f4d706e47d..054ee8cbc5 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -116,6 +116,12 @@ static void acpi_set_pci_info(void)
>  bsel_is_set = true;
>  
>  if (!host) {
> +/*
> + * This function can be eventually called from
> + * qemu_devices_reset() -> acpi_pcihp_reset() even
> + * for architectures other than i386. Hence, we need

why call out i386 here? well because currently host
is only non-null for q35 and i440fx which are both i386.
all the above is not a given and we won't remember to update
the comment if we change it. Generally graceful failure
is the default or should be.



> + * to ignore null values for host here.
> + */
>  return;
>  }
>  
> @@ -136,6 +142,8 @@ static void acpi_pcihp_disable_root_bus(void)
>  return;
>  }
>  
> +assert(host);
> +
>  bus = PCI_HOST_BRIDGE(host)->bus;
>  if (bus) {
>  /* setting the hotplug handler to NULL makes the bus 
> non-hotpluggable */
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 17836149fe..83fb1d55c0 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -321,9 +321,7 @@ static void acpi_get_pci_holes(Range *hole, Range *hole64)
>  
>  pci_host = acpi_get_i386_pci_host();
>  
> -if (!pci_host) {
> -return;
> -}
> +assert(pci_host);
>  
>  range_set_bounds1(hole,
>object_property_get_uint(pci_host,
> @@ -1769,9 +1767,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
>  
>  pci_host = acpi_get_i386_pci_host();
>  
> -if (pci_host) {
> -bus = PCI_HOST_BRIDGE(pci_host)->bus;
> -}
> +assert(pci_host);
> +
> +bus = PCI_HOST_BRIDGE(pci_host)->bus;
>  
>  if (bus) {
>  Aml *scope = aml_scope("PCI0");
> @@ -2389,9 +2387,8 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>  QObject *o;
>  
>  pci_host = acpi_get_i386_pci_host();
> -if (!pci_host) {
> -return false;
> -}
> +
> +assert(pci_host);
>  
>  o = object_property_get_qobject(pci_host, PCIE_HOST_MCFG_BASE, NULL);
>  if (!o) {
> -- 
> 2.25.1




Re: aarch64 efi boot failures with qemu 6.0+

2021-07-28 Thread Michael S. Tsirkin
On Tue, Jul 27, 2021 at 12:36:03PM +0200, Igor Mammedov wrote:
> On Tue, 27 Jul 2021 05:01:23 -0400
> "Michael S. Tsirkin"  wrote:
> 
> > On Mon, Jul 26, 2021 at 10:12:38PM -0700, Guenter Roeck wrote:
> > > On 7/26/21 9:45 PM, Michael S. Tsirkin wrote:  
> > > > On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:  
> > > > > (cc Bjorn)
> > > > > 
> > > > > On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé 
> > > > >  wrote:  
> > > > > > 
> > > > > > On 7/26/21 12:56 AM, Guenter Roeck wrote:  
> > > > > > > On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:  
> > > > > > > > On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:  
> > > > > > > > > Hi all,
> > > > > > > > > 
> > > > > > > > > starting with qemu v6.0, some of my aarch64 efi boot tests no 
> > > > > > > > > longer
> > > > > > > > > work. Analysis shows that PCI devices with IO ports do not 
> > > > > > > > > instantiate
> > > > > > > > > in qemu v6.0 (or v6.1-rc0) when booting through efi. The 
> > > > > > > > > problem affects
> > > > > > > > > (at least) ne2k_pci, tulip, dc390, and am53c974. The problem 
> > > > > > > > > only
> > > > > > > > > affects
> > > > > > > > > aarch64, not x86/x86_64.
> > > > > > > > > 
> > > > > > > > > I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: 
> > > > > > > > > Inform os to
> > > > > > > > > keep firmware resource map"). Since this commit, PCI device 
> > > > > > > > > BAR
> > > > > > > > > allocation has changed. Taking tulip as example, the kernel 
> > > > > > > > > reports
> > > > > > > > > the following PCI bar assignments when running qemu v5.2.
> > > > > > > > > 
> > > > > > > > > [3.921801] pci :00:01.0: [1011:0019] type 00 class 
> > > > > > > > > 0x02
> > > > > > > > > [3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > > > > > > > > [3.922505] pci :00:01.0: reg 0x14: [mem 
> > > > > > > > > 0x1000-0x107f]  
> > > > > 
> > > > > IIUC, these lines are read back from the BARs
> > > > >   
> > > > > > > > > [3.927111] pci :00:01.0: BAR 0: assigned [io  
> > > > > > > > > 0x1000-0x107f]
> > > > > > > > > [3.927455] pci :00:01.0: BAR 1: assigned [mem
> > > > > > > > > 0x1000-0x107f]
> > > > > > > > >   
> > > > > 
> > > > > ... and this is the assignment created by the kernel.
> > > > >   
> > > > > > > > > With qemu v6.0, the assignment is reported as follows.
> > > > > > > > > 
> > > > > > > > > [3.922887] pci :00:01.0: [1011:0019] type 00 class 
> > > > > > > > > 0x02
> > > > > > > > > [3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
> > > > > > > > > [3.923451] pci :00:01.0: reg 0x14: [mem 
> > > > > > > > > 0x1000-0x107f]
> > > > > > > > >   
> > > > > 
> > > > > The problem here is that Linux, for legacy reasons, does not support
> > > > > I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
> > > > > rejected.
> > > > > 
> > > > > This might make sense on x86, where legacy I/O ports may exist, but on
> > > > > other architectures, this makes no sense.  
> > > > 
> > > > 
> > > > Fixing Linux makes sense but OTOH EFI probably shouldn't create mappings
> > > > that trip up existing guests, right?
> > > >   
> > > 
> > > I think it is difficult to draw a line. Sure, maybe EFI should not create
> > > such mappings, but then maybe qemu should not suddenly start to enforce
> > > those mappings for existing guests either.  
> > 
> > I would say both. But about QEMU actually I think you have a point here.
> > Re-reading the spec:
> > 
> > 0: No (The operating system shall not ignore the PCI configuration that 
> > firmware has done
> > at boot time. However, the operating system is free to configure the 
> > devices in this hierarchy
> > that have not been configured by the firmware. There may be a reduced level 
> > of hot plug
> > capability support in this hierarchy due to resource constraints. This 
> > situation is the same as
> > the legacy situation where this _DSM is not provided.)
> > 1: Yes (The operating system may ignore the PCI configuration that the 
> > firmware has done
> > at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> > 
> > 
> > I think I misread the spec previously, and understood it to mean that
> > 1 means must ignore. In fact 1 gives the most flexibility.
> > So why are we suddenly telling the guest it must not override
> > firmware?
> > 
> > The commit log says
> > The diffences could result in resource assignment failure.
> > 
> > which is kind of vague ...
> > 
> > Jiahui Cen, Igor, what do you think about it?
> > I'm inclined to revert 0cf8882fd06ba0aeb1e90fa6f23fce85504d7e14
> > at least for now.
> Looking at patch history, it seems consensus was that it's better to
> enforce firmware allocations.
> 
> Also letting OS do as it pleases might break PCI devices that
> don't tolerate reallocation. ex: firmware initializes PCI device
> IO/BARs and then fetches ACPI tables, which get patched with
> a

[PATCH] MAINTAINERS: add Stefano Garzarella as io_uring reviewer

2021-07-28 Thread Stefano Garzarella
I've been working with io_uring for a while so I'd like to help
with reviews.

Signed-off-by: Stefano Garzarella 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 42ac45c3e5..1776d0950b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3257,6 +3257,7 @@ Linux io_uring
 M: Aarushi Mehta 
 M: Julia Suvorova 
 M: Stefan Hajnoczi 
+R: Stefano Garzarella 
 L: qemu-bl...@nongnu.org
 S: Maintained
 F: block/io_uring.c
-- 
2.31.1




Re: [PATCH 3/7] virtio: Add shared memory capability

2021-07-28 Thread Dr. David Alan Gilbert
* Antonio Caggiano (antonio.caggi...@collabora.com) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'.
> They allow defining shared memory regions with sizes and offsets
> of 2^32 and more.
> Multiple instances of the capability are allowed and distinguished
> by a device-specific 'id'.
> Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu.
> 
> Signed-off-by: Dr. David Alan Gilbert 
> Signed-off-by: Antonio Caggiano 
> (cherry picked from commit a5d628a3a3c5e60b98b15197c36a77056115)

Could you please separate this off, so that you have my original
virtio-pci patch, adn then the gpu parts in a separate patch please.

Dave

> ---
>  hw/display/virtio-gpu-pci.c |  2 +-
>  hw/display/virtio-vga.c |  2 +-
>  hw/virtio/virtio-pci.c  | 19 +++
>  hw/virtio/virtio-pci.h  |  5 +
>  4 files changed, 26 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
> index 9808663d05..a79bd751b2 100644
> --- a/hw/display/virtio-gpu-pci.c
> +++ b/hw/display/virtio-gpu-pci.c
> @@ -43,7 +43,7 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>   PCI_BASE_ADDRESS_MEM_PREFETCH |
>   PCI_BASE_ADDRESS_MEM_TYPE_64,
>   &g->hostmem);
> -virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 0);
> +virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 
> VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
>  }
>  
>  qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus), errp);
> diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
> index 61993dd3f2..ca841a0799 100644
> --- a/hw/display/virtio-vga.c
> +++ b/hw/display/virtio-vga.c
> @@ -147,7 +147,7 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
> *vpci_dev, Error **errp)
>   PCI_BASE_ADDRESS_MEM_PREFETCH |
>   PCI_BASE_ADDRESS_MEM_TYPE_64,
>   &g->hostmem);
> -virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 0);
> +virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 
> VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
>  }
>  
>  if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 433060ac02..37a50b4658 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1149,6 +1149,25 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy 
> *proxy,
>  return offset;
>  }
>  
> +int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
> +   uint8_t bar, uint64_t offset, uint64_t length,
> +   uint8_t id)
> +{
> +struct virtio_pci_cap64 cap = {
> +.cap.cap_len = sizeof cap,
> +.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
> +};
> +uint32_t mask32 = ~0;
> +
> +cap.cap.bar = bar;
> +cap.cap.length = cpu_to_le32(length & mask32);
> +cap.length_hi = cpu_to_le32((length >> 32) & mask32);
> +cap.cap.offset = cpu_to_le32(offset & mask32);
> +cap.offset_hi = cpu_to_le32((offset >> 32) & mask32);
> +cap.cap.id = id;
> +return virtio_pci_add_mem_cap(proxy, &cap.cap);
> +}
> +
>  static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
> unsigned size)
>  {
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 2446dcd9ae..25c4b7a32d 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -252,4 +252,9 @@ void virtio_pci_types_register(const 
> VirtioPCIDeviceTypeInfo *t);
>   */
>  unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
>  
> +/* Add shared memory capability */
> +int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
> +   uint8_t bar, uint64_t offset, uint64_t length,
> +   uint8_t id);
> +
>  #endif
> -- 
> 2.30.2
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH-for-6.2 v12] qapi: introduce 'query-x86-cpuid' QMP command.

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 2:54 PM, Valeriy Vdovin wrote:
> Introducing new QMP command 'query-x86-cpuid'. This command can be used to
> get virtualized cpu model info generated by QEMU during VM initialization in
> the form of cpuid representation.
> 
> Diving into more details about virtual CPU generation: QEMU first parses 
> '-cpu'
> command line option. From there it takes the name of the model as the basis 
> for
> feature set of the new virtual CPU. After that it uses trailing '-cpu' 
> options,
> that state if additional cpu features should be present on the virtual CPU or
> excluded from it (tokens '+'/'-' or '=on'/'=off').
> After that QEMU checks if the host's cpu can actually support the derived
> feature set and applies host limitations to it.
> After this initialization procedure, virtual CPU has it's model and
> vendor names, and a working feature set and is ready for identification
> instructions such as CPUID.
> 
> To learn exactly how virtual CPU is presented to the guest machine via CPUID
> instruction, new QMP command can be used. By calling 'query-x86-cpuid'
> command, one can get a full listing of all CPUID leaves with subleaves which 
> are
> supported by the initialized virtual CPU.
> 
> Other than debug, the command is useful in cases when we would like to
> utilize QEMU's virtual CPU initialization routines and put the retrieved
> values into kernel CPUID overriding mechanics for more precise control
> over how various processes perceive its underlying hardware with
> container processes as a good example.
> 
> The command is specific to x86. It is currenly only implemented for KVM 
> acceleator.
> 
> Output format:
> The output is a plain list of leaf/subleaf argument combinations, that
> return 4 words in registers EAX, EBX, ECX, EDX.
> 
...

> +##
> +# @query-x86-cpuid:
> +#
> +# Returns raw data from the emulated CPUID table for the first VCPU.
> +# The emulated CPUID table defines the response to the CPUID
> +# instruction when executed by the guest operating system.
> +#
> +# Returns: a list of CpuidEntry
> +#
> +# Since: 6.1
> +##

Unfortunately too late for 6.1, so you'll have to update that to
"6.2" (also in @CpuidEntry).




Re: aarch64 efi boot failures with qemu 6.0+

2021-07-28 Thread Ard Biesheuvel
On Wed, 28 Jul 2021 at 15:11, Michael S. Tsirkin  wrote:
>
> On Tue, Jul 27, 2021 at 12:36:03PM +0200, Igor Mammedov wrote:
> > On Tue, 27 Jul 2021 05:01:23 -0400
> > "Michael S. Tsirkin"  wrote:
> >
> > > On Mon, Jul 26, 2021 at 10:12:38PM -0700, Guenter Roeck wrote:
> > > > On 7/26/21 9:45 PM, Michael S. Tsirkin wrote:
> > > > > On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:
> > > > > > (cc Bjorn)
> > > > > >
> > > > > > On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On 7/26/21 12:56 AM, Guenter Roeck wrote:
> > > > > > > > On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:
> > > > > > > > > On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:
> > > > > > > > > > Hi all,
> > > > > > > > > >
> > > > > > > > > > starting with qemu v6.0, some of my aarch64 efi boot tests 
> > > > > > > > > > no longer
> > > > > > > > > > work. Analysis shows that PCI devices with IO ports do not 
> > > > > > > > > > instantiate
> > > > > > > > > > in qemu v6.0 (or v6.1-rc0) when booting through efi. The 
> > > > > > > > > > problem affects
> > > > > > > > > > (at least) ne2k_pci, tulip, dc390, and am53c974. The 
> > > > > > > > > > problem only
> > > > > > > > > > affects
> > > > > > > > > > aarch64, not x86/x86_64.
> > > > > > > > > >
> > > > > > > > > > I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: 
> > > > > > > > > > Inform os to
> > > > > > > > > > keep firmware resource map"). Since this commit, PCI device 
> > > > > > > > > > BAR
> > > > > > > > > > allocation has changed. Taking tulip as example, the kernel 
> > > > > > > > > > reports
> > > > > > > > > > the following PCI bar assignments when running qemu v5.2.
> > > > > > > > > >
> > > > > > > > > > [3.921801] pci :00:01.0: [1011:0019] type 00 class 
> > > > > > > > > > 0x02
> > > > > > > > > > [3.922207] pci :00:01.0: reg 0x10: [io  
> > > > > > > > > > 0x-0x007f]
> > > > > > > > > > [3.922505] pci :00:01.0: reg 0x14: [mem 
> > > > > > > > > > 0x1000-0x107f]
> > > > > >
> > > > > > IIUC, these lines are read back from the BARs
> > > > > >
> > > > > > > > > > [3.927111] pci :00:01.0: BAR 0: assigned [io  
> > > > > > > > > > 0x1000-0x107f]
> > > > > > > > > > [3.927455] pci :00:01.0: BAR 1: assigned [mem
> > > > > > > > > > 0x1000-0x107f]
> > > > > > > > > >
> > > > > >
> > > > > > ... and this is the assignment created by the kernel.
> > > > > >
> > > > > > > > > > With qemu v6.0, the assignment is reported as follows.
> > > > > > > > > >
> > > > > > > > > > [3.922887] pci :00:01.0: [1011:0019] type 00 class 
> > > > > > > > > > 0x02
> > > > > > > > > > [3.923278] pci :00:01.0: reg 0x10: [io  
> > > > > > > > > > 0x-0x007f]
> > > > > > > > > > [3.923451] pci :00:01.0: reg 0x14: [mem 
> > > > > > > > > > 0x1000-0x107f]
> > > > > > > > > >
> > > > > >
> > > > > > The problem here is that Linux, for legacy reasons, does not support
> > > > > > I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
> > > > > > rejected.
> > > > > >
> > > > > > This might make sense on x86, where legacy I/O ports may exist, but 
> > > > > > on
> > > > > > other architectures, this makes no sense.
> > > > >
> > > > >
> > > > > Fixing Linux makes sense but OTOH EFI probably shouldn't create 
> > > > > mappings
> > > > > that trip up existing guests, right?
> > > > >
> > > >
> > > > I think it is difficult to draw a line. Sure, maybe EFI should not 
> > > > create
> > > > such mappings, but then maybe qemu should not suddenly start to enforce
> > > > those mappings for existing guests either.
> > >
> > > I would say both. But about QEMU actually I think you have a point here.
> > > Re-reading the spec:
> > >
> > > 0: No (The operating system shall not ignore the PCI configuration that 
> > > firmware has done
> > > at boot time. However, the operating system is free to configure the 
> > > devices in this hierarchy
> > > that have not been configured by the firmware. There may be a reduced 
> > > level of hot plug
> > > capability support in this hierarchy due to resource constraints. This 
> > > situation is the same as
> > > the legacy situation where this _DSM is not provided.)
> > > 1: Yes (The operating system may ignore the PCI configuration that the 
> > > firmware has done
> > > at boot time, and reconfigure/rebalance the resources in the hierarchy.)
> > >
> > >
> > > I think I misread the spec previously, and understood it to mean that
> > > 1 means must ignore. In fact 1 gives the most flexibility.
> > > So why are we suddenly telling the guest it must not override
> > > firmware?
> > >
> > > The commit log says
> > > The diffences could result in resource assignment failure.
> > >
> > > which is kind of vague ...
> > >
> > > Jiahui Cen, Igor, what do you think about it?
> > > I'm inclined to revert 0cf8882fd06ba0aeb1e90fa6f23fce85504d7e14
> > > at least for now.
> > Looki

Re: [PATCH] MAINTAINERS: add Stefano Garzarella as io_uring reviewer

2021-07-28 Thread Stefan Hajnoczi
On Wed, Jul 28, 2021 at 03:15:15PM +0200, Stefano Garzarella wrote:
> I've been working with io_uring for a while so I'd like to help
> with reviews.
> 
> Signed-off-by: Stefano Garzarella 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


[PATCH v2 1/8] virtio-gpu: CONTEXT_INIT feature

2021-07-28 Thread Antonio Caggiano
Create virgl renderer context with flags using context_id when valid.

Signed-off-by: Antonio Caggiano 
---
 hw/display/virtio-gpu-base.c|  2 ++
 hw/display/virtio-gpu-virgl.c   | 10 --
 include/hw/virtio/virtio-gpu-bswap.h|  2 +-
 include/standard-headers/linux/virtio_gpu.h |  9 +++--
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index c8da4806e0..619185a9d2 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -212,6 +212,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
 
+features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
+
 return features;
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 18d054922f..5a184cf445 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -97,8 +97,14 @@ static void virgl_cmd_context_create(VirtIOGPU *g,
 trace_virtio_gpu_cmd_ctx_create(cc.hdr.ctx_id,
 cc.debug_name);
 
-virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen,
-  cc.debug_name);
+if (cc.context_init) {
+virgl_renderer_context_create_with_flags(cc.hdr.ctx_id,
+ cc.context_init,
+ cc.nlen,
+ cc.debug_name);
+} else {
+virgl_renderer_context_create(cc.hdr.ctx_id, cc.nlen, cc.debug_name);
+}
 }
 
 static void virgl_cmd_context_destroy(VirtIOGPU *g,
diff --git a/include/hw/virtio/virtio-gpu-bswap.h 
b/include/hw/virtio/virtio-gpu-bswap.h
index e2bee8f595..6267cb57e5 100644
--- a/include/hw/virtio/virtio-gpu-bswap.h
+++ b/include/hw/virtio/virtio-gpu-bswap.h
@@ -24,7 +24,7 @@ virtio_gpu_ctrl_hdr_bswap(struct virtio_gpu_ctrl_hdr *hdr)
 le32_to_cpus(&hdr->flags);
 le64_to_cpus(&hdr->fence_id);
 le32_to_cpus(&hdr->ctx_id);
-le32_to_cpus(&hdr->padding);
+le32_to_cpus(&hdr->info);
 }
 
 static inline void
diff --git a/include/standard-headers/linux/virtio_gpu.h 
b/include/standard-headers/linux/virtio_gpu.h
index 1357e4774e..c9f9c24d6a 100644
--- a/include/standard-headers/linux/virtio_gpu.h
+++ b/include/standard-headers/linux/virtio_gpu.h
@@ -59,6 +59,11 @@
  * VIRTIO_GPU_CMD_RESOURCE_CREATE_BLOB
  */
 #define VIRTIO_GPU_F_RESOURCE_BLOB   3
+/*
+ * VIRTIO_GPU_CMD_CREATE_CONTEXT with
+ * context_init
+ */
+#define VIRTIO_GPU_F_CONTEXT_INIT4
 
 enum virtio_gpu_ctrl_type {
VIRTIO_GPU_UNDEFINED = 0,
@@ -129,7 +134,7 @@ struct virtio_gpu_ctrl_hdr {
uint32_t flags;
uint64_t fence_id;
uint32_t ctx_id;
-   uint32_t padding;
+   uint32_t info;
 };
 
 /* data passed in the cursor vq */
@@ -272,7 +277,7 @@ struct virtio_gpu_resource_create_3d {
 struct virtio_gpu_ctx_create {
struct virtio_gpu_ctrl_hdr hdr;
uint32_t nlen;
-   uint32_t padding;
+   uint32_t context_init;
char debug_name[64];
 };
 
-- 
2.30.2




[PATCH v2 2/8] virtio-gpu: hostmem [wip]

2021-07-28 Thread Antonio Caggiano
From: Gerd Hoffmann 

---
 hw/display/virtio-gpu-base.c|  4 +++
 hw/display/virtio-gpu-pci.c | 14 +
 hw/display/virtio-gpu.c |  1 +
 hw/display/virtio-vga.c | 32 +++--
 include/hw/virtio/virtio-gpu.h  |  5 
 include/standard-headers/linux/virtio_gpu.h |  5 
 6 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 619185a9d2..31b430664f 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -214,6 +214,10 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 
 features |= (1 << VIRTIO_GPU_F_CONTEXT_INIT);
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_HOSTMEM);
+}
+
 return features;
 }
 
diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index e36eee0c40..9808663d05 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -33,6 +33,20 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 DeviceState *vdev = DEVICE(g);
 int i;
 
+if (virtio_gpu_hostmem_enabled(g->conf)) {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 0);
+}
+
+qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus), errp);
 virtio_pci_force_virtio_1(vpci_dev);
 if (!qdev_realize(vdev, BUS(&vpci_dev->bus), errp)) {
 return;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 990e71fd40..9686f17d79 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1392,6 +1392,7 @@ static Property virtio_gpu_properties[] = {
  256 * MiB),
 DEFINE_PROP_BIT("blob", VirtIOGPU, parent_obj.conf.flags,
 VIRTIO_GPU_FLAG_BLOB_ENABLED, false),
+DEFINE_PROP_SIZE("hostmem", VirtIOGPU, parent_obj.conf.hostmem, 0),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index 9e57f61e9e..61993dd3f2 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -125,16 +125,30 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
 pci_register_bar(&vpci_dev->pci_dev, 0,
  PCI_BASE_ADDRESS_MEM_PREFETCH, &vga->vram);
 
-/*
- * Configure virtio bar and regions
- *
- * We use bar #2 for the mmio regions, to be compatible with stdvga.
- * virtio regions are moved to the end of bar #2, to make room for
- * the stdvga mmio registers at the start of bar #2.
- */
-vpci_dev->modern_mem_bar_idx = 2;
-vpci_dev->msix_bar_idx = 4;
 vpci_dev->modern_io_bar_idx = 5;
+  
+if (!virtio_gpu_hostmem_enabled(g->conf)) {
+/*
+* Configure virtio bar and regions
+*
+* We use bar #2 for the mmio regions, to be compatible with stdvga.
+* virtio regions are moved to the end of bar #2, to make room for
+* the stdvga mmio registers at the start of bar #2.
+*/
+vpci_dev->modern_mem_bar_idx = 2;
+vpci_dev->msix_bar_idx = 4;
+} else {
+vpci_dev->msix_bar_idx = 1;
+vpci_dev->modern_mem_bar_idx = 2;
+memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem",
+   g->conf.hostmem);
+pci_register_bar(&vpci_dev->pci_dev, 4,
+ PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_PREFETCH |
+ PCI_BASE_ADDRESS_MEM_TYPE_64,
+ &g->hostmem);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 0);
+}
 
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
 /*
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 24c6628944..835ebcb1a0 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -102,12 +102,15 @@ enum virtio_gpu_base_conf_flags {
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_DMABUF_ENABLED))
 #define virtio_gpu_blob_enabled(_cfg) \
 (_cfg.flags & (1 << VIRTIO_GPU_FLAG_BLOB_ENABLED))
+#define virtio_gpu_hostmem_enabled(_cfg) \
+(_cfg.hostmem > 0)
 
 struct virtio_gpu_base_conf {
 uint32_t max_outputs;
 uint32_t flags;
 uint32_t xres;
 uint32_t yres;
+uint64_t hostmem;
 };
 
 struct virtio_gpu_ctrl_command {
@@ -131,6 +134,8 @@ struct VirtIOGPUBase {
 int renderer_blocked;
 int ena

[PATCH v2 6/8] virtio-gpu: Support Venus capset

2021-07-28 Thread Antonio Caggiano
Signed-off-by: Antonio Caggiano 
---
 hw/display/virtio-gpu-virgl.c   | 21 +
 include/standard-headers/linux/virtio_gpu.h |  2 ++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 38e5ca6c72..beb4b7d106 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -372,6 +372,11 @@ static void virgl_cmd_get_capset_info(VirtIOGPU *g,
 virgl_renderer_get_cap_set(resp.capset_id,
&resp.capset_max_version,
&resp.capset_max_size);
+} else if (info.capset_index == 2) {
+resp.capset_id = VIRTIO_GPU_CAPSET_VENUS;
+virgl_renderer_get_cap_set(resp.capset_id,
+   &resp.capset_max_version,
+   &resp.capset_max_size);
 } else {
 resp.capset_max_version = 0;
 resp.capset_max_size = 0;
@@ -634,10 +639,18 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 
 int virtio_gpu_virgl_get_num_capsets(VirtIOGPU *g)
 {
-uint32_t capset2_max_ver, capset2_max_size;
+uint32_t capset2_max_ver, capset2_max_size, num_capsets;
+num_capsets = 1;
+
 virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VIRGL2,
-  &capset2_max_ver,
-  &capset2_max_size);
+   &capset2_max_ver,
+   &capset2_max_size);
+num_capsets += capset2_max_ver ? 1 : 0;
+
+virgl_renderer_get_cap_set(VIRTIO_GPU_CAPSET_VENUS,
+   &capset2_max_ver,
+   &capset2_max_size);
+num_capsets += capset2_max_size ? 1 : 0;
 
-return capset2_max_ver ? 2 : 1;
+return num_capsets;
 }
diff --git a/include/standard-headers/linux/virtio_gpu.h 
b/include/standard-headers/linux/virtio_gpu.h
index 85898d41a7..28e863cd08 100644
--- a/include/standard-headers/linux/virtio_gpu.h
+++ b/include/standard-headers/linux/virtio_gpu.h
@@ -307,6 +307,8 @@ struct virtio_gpu_cmd_submit {
 
 #define VIRTIO_GPU_CAPSET_VIRGL 1
 #define VIRTIO_GPU_CAPSET_VIRGL2 2
+/* 3 is reserved for gfxstream */
+#define VIRTIO_GPU_CAPSET_VENUS 4
 
 /* VIRTIO_GPU_CMD_GET_CAPSET_INFO */
 struct virtio_gpu_get_capset_info {
-- 
2.30.2




[PATCH v2 0/8] virtio-gpu: Support Venus Vulkan driver

2021-07-28 Thread Antonio Caggiano
This series of patches enables support for the Venus VirtIO-GPU Vulkan
driver by adding some features required by the driver:

- CONTEXT_INIT
- HOSTMEM
- RESOURCE_UUID
- BLOB_RESOURCES

In addition to these features, Venus capset support was required
together with the implementation for Virgl blob resource commands.

With this in place, QEMU can take advantage of GPU acceleration to
render a simple program such as vkcube [0] in headless mode [1]. Further
work is needed to enable rendering with other kinds of surfaces.

Tested with Chia-I Wu kernel venus-5 branch [2], Mesa v21.1 [3], and my
WIP virglrenderer res-mapping branch [4].

Relevant QEMU command line parameters:

-m 4G \
-object memory-backend-memfd,id=mem1,size=4G \
-machine memory-backend=mem1 \
-display gtk,gl=on,show-cursor=on \
-vga none \
-device virtio-vga-gl,blob=true,hostmem=1G \

v2: Split shared memory capability commit.

[0] https://github.com/krh/vkcube
[1] https://share.collabora.com/index.php/s/RM5igzZMH2o749W/preview
[2] https://gitlab.freedesktop.org/olv/drm-misc-next/-/tree/venus-5
[3] https://gitlab.freedesktop.org/mesa/mesa/-/tree/21.1
[4] https://gitlab.freedesktop.org/Fahien/virglrenderer/-/tree/res-mapping

Antonio Caggiano (6):
  virtio-gpu: CONTEXT_INIT feature
  virtio-gpu: Shared memory capability
  virtio-gpu: Resource UUID
  virtio-gpu: Support Venus capset
  virtio-gpu: Initialize Venus
  virtio-gpu: Handle resource blob commands

Dr. David Alan Gilbert (1):
  virtio: Add shared memory capability

Gerd Hoffmann (1):
  virtio-gpu: hostmem [wip]

 hw/display/trace-events |   1 +
 hw/display/virtio-gpu-base.c|   8 +
 hw/display/virtio-gpu-pci.c |  14 ++
 hw/display/virtio-gpu-virgl.c   | 184 +++-
 hw/display/virtio-gpu.c |  36 +++-
 hw/display/virtio-vga.c |  32 +++-
 hw/virtio/virtio-pci.c  |  19 ++
 hw/virtio/virtio-pci.h  |   4 +
 include/hw/virtio/virtio-gpu-bswap.h|  20 ++-
 include/hw/virtio/virtio-gpu.h  |  10 ++
 include/standard-headers/linux/virtio_gpu.h |  16 +-
 meson.build |   1 +
 12 files changed, 318 insertions(+), 27 deletions(-)

-- 
2.30.2




[PATCH v2 7/8] virtio-gpu: Initialize Venus

2021-07-28 Thread Antonio Caggiano
Enable VirGL unstable APIs and request Venus when initializing VirGL.

Signed-off-by: Antonio Caggiano 
---
 hw/display/virtio-gpu-virgl.c | 2 +-
 meson.build   | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index beb4b7d106..ea903172dd 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -621,7 +621,7 @@ int virtio_gpu_virgl_init(VirtIOGPU *g)
 {
 int ret;
 
-ret = virgl_renderer_init(g, 0, &virtio_gpu_3d_cbs);
+ret = virgl_renderer_init(g, VIRGL_RENDERER_VENUS, &virtio_gpu_3d_cbs);
 if (ret != 0) {
 return ret;
 }
diff --git a/meson.build b/meson.build
index f2e148eaf9..31b65050b7 100644
--- a/meson.build
+++ b/meson.build
@@ -483,6 +483,7 @@ if not get_option('virglrenderer').auto() or have_system
  method: 'pkg-config',
  required: get_option('virglrenderer'),
  kwargs: static_kwargs)
+  add_project_arguments('-DVIRGL_RENDERER_UNSTABLE_APIS', language : 'c')
 endif
 curl = not_found
 if not get_option('curl').auto() or have_block
-- 
2.30.2




[PATCH v2 3/8] virtio: Add shared memory capability

2021-07-28 Thread Antonio Caggiano
From: "Dr. David Alan Gilbert" 

Define a new capability type 'VIRTIO_PCI_CAP_SHARED_MEMORY_CFG'
and the data structure 'virtio_pci_shm_cap' to go with it.
They allow defining shared memory regions with sizes and offsets
of 2^32 and more.
Multiple instances of the capability are allowed and distinguished
by a device-specific 'id'.

Signed-off-by: Dr. David Alan Gilbert 
(cherry picked from commit a5d628a3a3c5e60b98b15197c36a77056115)
---
 hw/virtio/virtio-pci.c  | 19 +++
 hw/virtio/virtio-pci.h  |  4 
 include/standard-headers/linux/virtio_pci.h |  7 +++
 3 files changed, 30 insertions(+)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 433060ac02..3589386412 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1149,6 +1149,25 @@ static int virtio_pci_add_mem_cap(VirtIOPCIProxy *proxy,
 return offset;
 }
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id)
+{
+struct virtio_pci_shm_cap cap = {
+.cap.cap_len = sizeof cap,
+.cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
+};
+uint32_t mask32 = ~0;
+
+cap.cap.bar = bar;
+cap.cap.length = cpu_to_le32(length & mask32);
+cap.length_hi = cpu_to_le32((length >> 32) & mask32);
+cap.cap.offset = cpu_to_le32(offset & mask32);
+cap.offset_hi = cpu_to_le32((offset >> 32) & mask32);
+cap.id = id;
+return virtio_pci_add_mem_cap(proxy, &cap.cap);
+}
+
 static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
unsigned size)
 {
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 2446dcd9ae..5e5c4a4c6d 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -252,4 +252,8 @@ void virtio_pci_types_register(const 
VirtioPCIDeviceTypeInfo *t);
  */
 unsigned virtio_pci_optimal_num_queues(unsigned fixed_queues);
 
+int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
+   uint8_t bar, uint64_t offset, uint64_t length,
+   uint8_t id);
+
 #endif
diff --git a/include/standard-headers/linux/virtio_pci.h 
b/include/standard-headers/linux/virtio_pci.h
index db7a8e2fcb..85d1420d29 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -172,6 +172,13 @@ struct virtio_pci_cfg_cap {
uint8_t pci_cfg_data[4]; /* Data for BAR access. */
 };
 
+struct virtio_pci_shm_cap {
+   struct virtio_pci_cap cap;
+   uint32_t offset_hi; /* Most sig 32 bits of offset */
+   uint32_t length_hi; /* Most sig 32 bits of length */
+   uint8_t  id;/* To distinguish shm chunks */
+};
+
 /* Macro versions of offsets for the Old Timers! */
 #define VIRTIO_PCI_CAP_VNDR0
 #define VIRTIO_PCI_CAP_NEXT1
-- 
2.30.2




[PATCH v2 4/8] virtio-gpu: Shared memory capability

2021-07-28 Thread Antonio Caggiano
Use VIRTIO_GPU_SHM_ID_HOST_VISIBLE as id for virtio-gpu. Also, remove
struct virtio_pci_shm_cap as virtio_pci_cap64 can be used instead.

Signed-off-by: Antonio Caggiano 
---
 hw/display/virtio-gpu-pci.c | 2 +-
 hw/display/virtio-vga.c | 2 +-
 hw/virtio/virtio-pci.c  | 4 ++--
 include/standard-headers/linux/virtio_pci.h | 7 ---
 4 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/hw/display/virtio-gpu-pci.c b/hw/display/virtio-gpu-pci.c
index 9808663d05..a79bd751b2 100644
--- a/hw/display/virtio-gpu-pci.c
+++ b/hw/display/virtio-gpu-pci.c
@@ -43,7 +43,7 @@ static void virtio_gpu_pci_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
  PCI_BASE_ADDRESS_MEM_PREFETCH |
  PCI_BASE_ADDRESS_MEM_TYPE_64,
  &g->hostmem);
-virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 0);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 
VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
 }
 
 qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus), errp);
diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
index 61993dd3f2..ca841a0799 100644
--- a/hw/display/virtio-vga.c
+++ b/hw/display/virtio-vga.c
@@ -147,7 +147,7 @@ static void virtio_vga_base_realize(VirtIOPCIProxy 
*vpci_dev, Error **errp)
  PCI_BASE_ADDRESS_MEM_PREFETCH |
  PCI_BASE_ADDRESS_MEM_TYPE_64,
  &g->hostmem);
-virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 0);
+virtio_pci_add_shm_cap(vpci_dev, 4, 0, g->conf.hostmem, 
VIRTIO_GPU_SHM_ID_HOST_VISIBLE);
 }
 
 if (!(vpci_dev->flags & VIRTIO_PCI_FLAG_PAGE_PER_VQ)) {
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 3589386412..37a50b4658 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1153,7 +1153,7 @@ int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
uint8_t bar, uint64_t offset, uint64_t length,
uint8_t id)
 {
-struct virtio_pci_shm_cap cap = {
+struct virtio_pci_cap64 cap = {
 .cap.cap_len = sizeof cap,
 .cap.cfg_type = VIRTIO_PCI_CAP_SHARED_MEMORY_CFG,
 };
@@ -1164,7 +1164,7 @@ int virtio_pci_add_shm_cap(VirtIOPCIProxy *proxy,
 cap.length_hi = cpu_to_le32((length >> 32) & mask32);
 cap.cap.offset = cpu_to_le32(offset & mask32);
 cap.offset_hi = cpu_to_le32((offset >> 32) & mask32);
-cap.id = id;
+cap.cap.id = id;
 return virtio_pci_add_mem_cap(proxy, &cap.cap);
 }
 
diff --git a/include/standard-headers/linux/virtio_pci.h 
b/include/standard-headers/linux/virtio_pci.h
index 85d1420d29..db7a8e2fcb 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -172,13 +172,6 @@ struct virtio_pci_cfg_cap {
uint8_t pci_cfg_data[4]; /* Data for BAR access. */
 };
 
-struct virtio_pci_shm_cap {
-   struct virtio_pci_cap cap;
-   uint32_t offset_hi; /* Most sig 32 bits of offset */
-   uint32_t length_hi; /* Most sig 32 bits of length */
-   uint8_t  id;/* To distinguish shm chunks */
-};
-
 /* Macro versions of offsets for the Old Timers! */
 #define VIRTIO_PCI_CAP_VNDR0
 #define VIRTIO_PCI_CAP_NEXT1
-- 
2.30.2




[PATCH v2 5/8] virtio-gpu: Resource UUID

2021-07-28 Thread Antonio Caggiano
Enable resource UUID feature and implement command resource assign UUID.
For the moment, use the resource ID as UUID.

Signed-off-by: Antonio Caggiano 
---
 hw/display/trace-events|  1 +
 hw/display/virtio-gpu-base.c   |  2 ++
 hw/display/virtio-gpu-virgl.c  |  3 +++
 hw/display/virtio-gpu.c| 26 ++
 include/hw/virtio/virtio-gpu.h |  2 ++
 5 files changed, 34 insertions(+)

diff --git a/hw/display/trace-events b/hw/display/trace-events
index f03f6655bc..6b178fa75d 100644
--- a/hw/display/trace-events
+++ b/hw/display/trace-events
@@ -37,6 +37,7 @@ virtio_gpu_cmd_res_create_blob(uint32_t res, uint64_t size) 
"res 0x%x, size %" P
 virtio_gpu_cmd_res_unref(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_back_attach(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_back_detach(uint32_t res) "res 0x%x"
+virtio_gpu_cmd_res_assign_uuid(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_xfer_toh_2d(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_xfer_toh_3d(uint32_t res) "res 0x%x"
 virtio_gpu_cmd_res_xfer_fromh_3d(uint32_t res) "res 0x%x"
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 31b430664f..263a888922 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -218,6 +218,8 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 features |= (1 << VIRTIO_GPU_F_HOSTMEM);
 }
 
+features |= (1 << VIRTIO_GPU_F_RESOURCE_UUID);
+
 return features;
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 5a184cf445..38e5ca6c72 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -458,6 +458,9 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
 /* TODO add security */
 virgl_cmd_ctx_detach_resource(g, cmd);
 break;
+case VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID:
+virtio_gpu_resource_assign_uuid(g, cmd);
+break;
 case VIRTIO_GPU_CMD_GET_CAPSET_INFO:
 virgl_cmd_get_capset_info(g, cmd);
 break;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 9686f17d79..26b819dd3d 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -932,6 +932,29 @@ virtio_gpu_resource_detach_backing(VirtIOGPU *g,
 virtio_gpu_cleanup_mapping(g, res);
 }
 
+void virtio_gpu_resource_assign_uuid(VirtIOGPU *g,
+ struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_assign_uuid assign;
+struct virtio_gpu_resp_resource_uuid resp;
+
+VIRTIO_GPU_FILL_CMD(assign);
+virtio_gpu_bswap_32(&assign, sizeof(assign));
+trace_virtio_gpu_cmd_res_assign_uuid(assign.resource_id);
+
+res = virtio_gpu_find_check_resource(g, assign.resource_id, false, 
__func__, &cmd->error);
+if (!res) {
+return;
+}
+
+memset(&resp, 0, sizeof(resp));
+resp.hdr.type = VIRTIO_GPU_RESP_OK_RESOURCE_UUID;
+/* FIXME: for the moment use the resource id as UUID */
+*((uint32_t*)(resp.uuid + 12)) = assign.resource_id;
+virtio_gpu_ctrl_response(g, cmd, &resp.hdr, sizeof(resp));
+}
+
 void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
struct virtio_gpu_ctrl_command *cmd)
 {
@@ -980,6 +1003,9 @@ void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
 case VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING:
 virtio_gpu_resource_detach_backing(g, cmd);
 break;
+case VIRTIO_GPU_CMD_RESOURCE_ASSIGN_UUID:
+virtio_gpu_resource_assign_uuid(g, cmd);
+break;
 default:
 cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
 break;
diff --git a/include/hw/virtio/virtio-gpu.h b/include/hw/virtio/virtio-gpu.h
index 835ebcb1a0..5cab5f42ac 100644
--- a/include/hw/virtio/virtio-gpu.h
+++ b/include/hw/virtio/virtio-gpu.h
@@ -263,6 +263,8 @@ int virtio_gpu_create_mapping_iov(VirtIOGPU *g,
   uint32_t *niov);
 void virtio_gpu_cleanup_mapping_iov(VirtIOGPU *g,
 struct iovec *iov, uint32_t count);
+void virtio_gpu_resource_assign_uuid(VirtIOGPU *g,
+ struct virtio_gpu_ctrl_command *cmd);
 void virtio_gpu_process_cmdq(VirtIOGPU *g);
 void virtio_gpu_device_realize(DeviceState *qdev, Error **errp);
 void virtio_gpu_reset(VirtIODevice *vdev);
-- 
2.30.2




[PATCH v2 8/8] virtio-gpu: Handle resource blob commands

2021-07-28 Thread Antonio Caggiano
Support BLOB resources creation by calling virgl_renderer_resource_create_blob.

Signed-off-by: Antonio Caggiano 
---
 hw/display/virtio-gpu-virgl.c| 148 +++
 hw/display/virtio-gpu.c  |   9 +-
 include/hw/virtio/virtio-gpu-bswap.h |  18 
 include/hw/virtio/virtio-gpu.h   |   3 +
 4 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index ea903172dd..62ca9bacbf 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -16,6 +16,8 @@
 #include "trace.h"
 #include "hw/virtio/virtio.h"
 #include "hw/virtio/virtio-gpu.h"
+#include "hw/virtio/virtio-gpu-bswap.h"
+#include "hw/virtio/virtio-iommu.h"
 
 #include 
 
@@ -409,6 +411,143 @@ static void virgl_cmd_get_capset(VirtIOGPU *g,
 g_free(resp);
 }
 
+static void virgl_cmd_resource_create_blob(VirtIOGPU *g,
+   struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_create_blob cblob;
+int ret;
+
+VIRTIO_GPU_FILL_CMD(cblob);
+virtio_gpu_create_blob_bswap(&cblob);
+trace_virtio_gpu_cmd_res_create_blob(cblob.resource_id, cblob.size);
+
+if (cblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_find_resource(g, cblob.resource_id);
+if (res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource already exists %d\n",
+  __func__, cblob.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = g_new0(struct virtio_gpu_simple_resource, 1);
+res->resource_id = cblob.resource_id;
+res->blob_size = cblob.size;
+
+if (res->iov) {
+cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
+return;
+}
+
+if (cblob.blob_mem != VIRTIO_GPU_BLOB_MEM_HOST3D) {
+ret = virtio_gpu_create_mapping_iov(g, cblob.nr_entries, sizeof(cblob),
+cmd, &res->addrs, &res->iov,
+&res->iov_cnt);
+if (ret != 0) {
+cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
+return;
+}
+}
+
+if (cblob.blob_mem == VIRTIO_GPU_BLOB_MEM_GUEST) {
+virtio_gpu_init_udmabuf(res);
+}
+QTAILQ_INSERT_HEAD(&g->reslist, res, next);
+
+const struct virgl_renderer_resource_create_blob_args virgl_args = {
+.res_handle = cblob.resource_id,
+.ctx_id = cblob.hdr.ctx_id,
+.blob_mem = cblob.blob_mem,
+.blob_id = cblob.blob_id,
+.blob_flags = cblob.blob_flags,
+.size = cblob.size,
+.iovecs = res->iov,
+.num_iovs = res->iov_cnt,
+};
+ret = virgl_renderer_resource_create_blob(&virgl_args);
+if (ret) {
+g_print("Virgl blob create error: %s\n", strerror(-ret));
+}
+}
+
+static void virgl_cmd_resource_map_blob(VirtIOGPU *g,
+struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_map_blob mblob;
+int ret;
+void *data;
+uint64_t size;
+MemoryRegion *region;
+struct virtio_gpu_resp_map_info resp;
+
+VIRTIO_GPU_FILL_CMD(mblob);
+virtio_gpu_map_blob_bswap(&mblob);
+
+if (mblob.resource_id == 0) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource id 0 is not allowed\n",
+  __func__);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+res = virtio_gpu_find_resource(g, mblob.resource_id);
+if (!res) {
+qemu_log_mask(LOG_GUEST_ERROR, "%s: resource does not exist %d\n",
+  __func__, mblob.resource_id);
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_RESOURCE_ID;
+return;
+}
+
+ret = virgl_renderer_resource_map(mblob.hdr.ctx_id, res->resource_id, 
&data, &size);
+if (ret) {
+g_print("Virgl blob resource map error: %s\n", strerror(-ret));
+}
+
+region = g_new0(MemoryRegion, 1);
+memory_region_init_ram_device_ptr(region, OBJECT(g), NULL, size, data);
+memory_region_add_subregion(&g->parent_obj.hostmem, mblob.offset, region);
+memory_region_set_enabled(region, true);
+
+memset(&resp, 0, sizeof(resp));
+resp.hdr.type = VIRTIO_GPU_RESP_OK_MAP_INFO;
+virgl_renderer_resource_get_map_info(mblob.resource_id, &resp.map_info);
+virtio_gpu_ctrl_response(g, cmd, &resp.hdr, sizeof(resp));
+}
+
+static void virgl_cmd_resource_unmap_blob(VirtIOGPU *g,
+struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_simple_resource *res;
+struct virtio_gpu_resource_unmap_blob ublob;
+VIRTIO_GPU_FILL_CMD(ublob);
+virtio_gp

Re: [PATCH v2 4/9] memory: Don't do topology update in memory finalize()

2021-07-28 Thread Peter Xu
On Wed, Jul 28, 2021 at 02:13:17PM +0200, David Hildenbrand wrote:
> On 27.07.21 18:02, Peter Xu wrote:
> > On Tue, Jul 27, 2021 at 03:21:31PM +0200, David Hildenbrand wrote:
> > > On 23.07.21 21:34, Peter Xu wrote:
> > > > Topology update could be wrongly triggered in memory region finalize() 
> > > > if
> > > > there's bug somewhere else.  It'll be a very confusing stack when it
> > > > happens (e.g., sending KVM ioctl within the RCU thread, and we'll 
> > > > observe it
> > > > only until it fails!).
> > > > 
> > > > Instead of that, we use the push()/pop() helper to avoid memory 
> > > > transaction
> > > > commit, at the same time we use assertions to make sure there's no 
> > > > pending
> > > > updates or it's a nested transaction, so it could fail even earlier and 
> > > > in a
> > > > more explicit way.
> > > > 
> > > > Suggested-by: Paolo Bonzini 
> > > > Signed-off-by: Peter Xu 
> > > > ---
> > > >softmmu/memory.c | 23 +--
> > > >1 file changed, 21 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > > index 1a3e9ff8ad..dfce4a2bda 100644
> > > > --- a/softmmu/memory.c
> > > > +++ b/softmmu/memory.c
> > > > @@ -170,6 +170,12 @@ struct MemoryRegionIoeventfd {
> > > >EventNotifier *e;
> > > >};
> > > > +/* Returns whether there's any pending memory updates */
> > > > +static bool memory_region_has_pending_update(void)
> > > > +{
> > > > +return memory_region_update_pending || ioeventfd_update_pending;
> > > > +}
> > > > +
> > > >static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
> > > >   MemoryRegionIoeventfd *b)
> > > >{
> > > > @@ -1756,12 +1762,25 @@ static void memory_region_finalize(Object *obj)
> > > > * and cause an infinite loop.
> > > > */
> > > >mr->enabled = false;
> > > > -memory_region_transaction_begin();
> > > > +
> > > > +/*
> > > > + * Use push()/pop() instead of begin()/commit() to make sure below 
> > > > block
> > > > + * won't trigger any topology update (which should never happen, 
> > > > but it's
> > > > + * still a safety belt).
> > > > + */
> > > 
> > > Hmm, I wonder if we can just keep the begin/end semantics and just do an
> > > assertion before doing the commit? Does anything speak against that?
> > 
> > That sounds working too for the case of run_on_cpu and similar, but I think
> > this patch should be able to cover more.  For example, it's possible 
> > depth==0
> > when enter memory_region_finalize(), but some removal of subregions could
> > further cause memory layout changes.  IMHO we should also bail out early for
> > those cases too.  Thanks,
> > 
> 
> Do we really have to switch to push/pop to catch these cases early? I'd
> assume we'd just have to formulate the right assertions :)

The subject and commit message was trying to tell why. :)

"memory: Don't do topology update in memory finalize()"

The root reason is errors within memory commit can be very hard to digest,
however the assertion failure is easier to understand because any memory layout
change is not expected to happen.

The push/pop (I renamed it after your other comment to depth_inc/dec) avoids
memory commit happening at all within finalize(), and make sure we always fail
at the assertion as long as there's any potential memory update (even if the
memory update won't crash qemu immediately).

Thanks,

-- 
Peter Xu




Re: [PATCH v2 4/9] memory: Don't do topology update in memory finalize()

2021-07-28 Thread David Hildenbrand

On 28.07.21 15:56, Peter Xu wrote:

On Wed, Jul 28, 2021 at 02:13:17PM +0200, David Hildenbrand wrote:

On 27.07.21 18:02, Peter Xu wrote:

On Tue, Jul 27, 2021 at 03:21:31PM +0200, David Hildenbrand wrote:

On 23.07.21 21:34, Peter Xu wrote:

Topology update could be wrongly triggered in memory region finalize() if
there's bug somewhere else.  It'll be a very confusing stack when it
happens (e.g., sending KVM ioctl within the RCU thread, and we'll observe it
only until it fails!).

Instead of that, we use the push()/pop() helper to avoid memory transaction
commit, at the same time we use assertions to make sure there's no pending
updates or it's a nested transaction, so it could fail even earlier and in a
more explicit way.

Suggested-by: Paolo Bonzini 
Signed-off-by: Peter Xu 
---
softmmu/memory.c | 23 +--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 1a3e9ff8ad..dfce4a2bda 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -170,6 +170,12 @@ struct MemoryRegionIoeventfd {
EventNotifier *e;
};
+/* Returns whether there's any pending memory updates */
+static bool memory_region_has_pending_update(void)
+{
+return memory_region_update_pending || ioeventfd_update_pending;
+}
+
static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
   MemoryRegionIoeventfd *b)
{
@@ -1756,12 +1762,25 @@ static void memory_region_finalize(Object *obj)
 * and cause an infinite loop.
 */
mr->enabled = false;
-memory_region_transaction_begin();
+
+/*
+ * Use push()/pop() instead of begin()/commit() to make sure below block
+ * won't trigger any topology update (which should never happen, but it's
+ * still a safety belt).
+ */


Hmm, I wonder if we can just keep the begin/end semantics and just do an
assertion before doing the commit? Does anything speak against that?


That sounds working too for the case of run_on_cpu and similar, but I think
this patch should be able to cover more.  For example, it's possible depth==0
when enter memory_region_finalize(), but some removal of subregions could
further cause memory layout changes.  IMHO we should also bail out early for
those cases too.  Thanks,



Do we really have to switch to push/pop to catch these cases early? I'd
assume we'd just have to formulate the right assertions :)


The subject and commit message was trying to tell why. :)

"memory: Don't do topology update in memory finalize()"

The root reason is errors within memory commit can be very hard to digest,
however the assertion failure is easier to understand because any memory layout
change is not expected to happen.

The push/pop (I renamed it after your other comment to depth_inc/dec) avoids
memory commit happening at all within finalize(), and make sure we always fail
at the assertion as long as there's any potential memory update (even if the
memory update won't crash qemu immediately).


Okay, makes sense to me, thanks

Acked-by: David Hildenbrand 


--
Thanks,

David / dhildenb




Re: aarch64 efi boot failures with qemu 6.0+

2021-07-28 Thread Guenter Roeck

On 7/28/21 6:25 AM, Ard Biesheuvel wrote:

On Wed, 28 Jul 2021 at 15:11, Michael S. Tsirkin  wrote:


On Tue, Jul 27, 2021 at 12:36:03PM +0200, Igor Mammedov wrote:

On Tue, 27 Jul 2021 05:01:23 -0400
"Michael S. Tsirkin"  wrote:


On Mon, Jul 26, 2021 at 10:12:38PM -0700, Guenter Roeck wrote:

On 7/26/21 9:45 PM, Michael S. Tsirkin wrote:

On Mon, Jul 26, 2021 at 06:00:57PM +0200, Ard Biesheuvel wrote:

(cc Bjorn)

On Mon, 26 Jul 2021 at 11:08, Philippe Mathieu-Daudé  wrote:


On 7/26/21 12:56 AM, Guenter Roeck wrote:

On 7/25/21 3:14 PM, Michael S. Tsirkin wrote:

On Sat, Jul 24, 2021 at 11:52:34AM -0700, Guenter Roeck wrote:

Hi all,

starting with qemu v6.0, some of my aarch64 efi boot tests no longer
work. Analysis shows that PCI devices with IO ports do not instantiate
in qemu v6.0 (or v6.1-rc0) when booting through efi. The problem affects
(at least) ne2k_pci, tulip, dc390, and am53c974. The problem only
affects
aarch64, not x86/x86_64.

I bisected the problem to commit 0cf8882fd0 ("acpi/gpex: Inform os to
keep firmware resource map"). Since this commit, PCI device BAR
allocation has changed. Taking tulip as example, the kernel reports
the following PCI bar assignments when running qemu v5.2.

[3.921801] pci :00:01.0: [1011:0019] type 00 class 0x02
[3.922207] pci :00:01.0: reg 0x10: [io  0x-0x007f]
[3.922505] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]


IIUC, these lines are read back from the BARs


[3.927111] pci :00:01.0: BAR 0: assigned [io  0x1000-0x107f]
[3.927455] pci :00:01.0: BAR 1: assigned [mem
0x1000-0x107f]



... and this is the assignment created by the kernel.


With qemu v6.0, the assignment is reported as follows.

[3.922887] pci :00:01.0: [1011:0019] type 00 class 0x02
[3.923278] pci :00:01.0: reg 0x10: [io  0x-0x007f]
[3.923451] pci :00:01.0: reg 0x14: [mem 0x1000-0x107f]



The problem here is that Linux, for legacy reasons, does not support
I/O ports <= 0x1000 on PCI, so the I/O assignment created by EFI is
rejected.

This might make sense on x86, where legacy I/O ports may exist, but on
other architectures, this makes no sense.



Fixing Linux makes sense but OTOH EFI probably shouldn't create mappings
that trip up existing guests, right?



I think it is difficult to draw a line. Sure, maybe EFI should not create
such mappings, but then maybe qemu should not suddenly start to enforce
those mappings for existing guests either.


I would say both. But about QEMU actually I think you have a point here.
Re-reading the spec:

0: No (The operating system shall not ignore the PCI configuration that 
firmware has done
at boot time. However, the operating system is free to configure the devices in 
this hierarchy
that have not been configured by the firmware. There may be a reduced level of 
hot plug
capability support in this hierarchy due to resource constraints. This 
situation is the same as
the legacy situation where this _DSM is not provided.)
1: Yes (The operating system may ignore the PCI configuration that the firmware 
has done
at boot time, and reconfigure/rebalance the resources in the hierarchy.)


I think I misread the spec previously, and understood it to mean that
1 means must ignore. In fact 1 gives the most flexibility.
So why are we suddenly telling the guest it must not override
firmware?

The commit log says
 The diffences could result in resource assignment failure.

which is kind of vague ...

Jiahui Cen, Igor, what do you think about it?
I'm inclined to revert 0cf8882fd06ba0aeb1e90fa6f23fce85504d7e14
at least for now.

Looking at patch history, it seems consensus was that it's better to
enforce firmware allocations.

Also letting OS do as it pleases might break PCI devices that
don't tolerate reallocation. ex: firmware initializes PCI device
IO/BARs and then fetches ACPI tables, which get patched with
assigned resources.

to me returning 0 seems to be correct choice.
In addition resource hinting also works via firmware allocations,
if we revert the commit it might change those configs.



Well if firmware people now tell us their allocations were never
intended for guest OS use then maybe we should not intervene.



DSM #5 was introduced to permit firmware running on x86_64 systems to
boot 32-bit OSes (read Windows) unmodified, while still leaving
enlightened, 64-bit OSes the opportunity to reorganize the BARs if
there is sufficient space in the resource windows, and if the OS runs
in long mode so it can address all of it.

This is why the default-if-absent according to the spec is '0', and I
already explained up-thread why arm64 deviates from this.

But Igor has a point: there are cases where especially bus numbers
should not be touched, as firmware tables consumed by the OS may carry
b/d/f identifiers for things like SMMU pass through, where changing
the bus numbers obviously invalidates this information.

These are exceptional cases, though, and I wo

Re: [PATCH v3] hw/acpi: add an assertion check for non-null return from acpi_get_i386_pci_host

2021-07-28 Thread Ani Sinha



On Wed, 28 Jul 2021, Michael S. Tsirkin wrote:

> On Mon, Jul 26, 2021 at 10:27:43PM +0530, Ani Sinha wrote:
> > All existing code using acpi_get_i386_pci_host() checks for a non-null
> > return value from this function call. Instead of returning early when the 
> > value
> > returned is NULL, assert instead. Since there are only two possible host 
> > buses
> > for i386 - q35 and i440fx, a null value return from the function does not 
> > make
> > sense in most cases and is likely an error situation.
>
> add "on i386"?
>
> > Fixes: c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")
>
> This that seems inappropriate, this is not a bugfix.
>
> >
> > Signed-off-by: Ani Sinha 
>
>
> Frankly I don't see this as a useful cleanup.
> assert is generally a last resort thing.
>

Igor pushed in the direction of assertion. Otherwise, see my v2.

> > ---
> >  hw/acpi/pcihp.c  |  8 
> >  hw/i386/acpi-build.c | 15 ++-
> >  2 files changed, 14 insertions(+), 9 deletions(-)
> >
> > changelog:
> > v1: initial patch
> > v2: removed comment addition - that can be sent as a separate patch.
> > v3: added assertion for null host values for all cases except one.
> >
> > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > index f4d706e47d..054ee8cbc5 100644
> > --- a/hw/acpi/pcihp.c
> > +++ b/hw/acpi/pcihp.c
> > @@ -116,6 +116,12 @@ static void acpi_set_pci_info(void)
> >  bsel_is_set = true;
> >
> >  if (!host) {
> > +/*
> > + * This function can be eventually called from
> > + * qemu_devices_reset() -> acpi_pcihp_reset() even
> > + * for architectures other than i386. Hence, we need
>
> why call out i386 here? well because currently host
> is only non-null for q35 and i440fx which are both i386.
> all the above is not a given and we won't remember to update
> the comment if we change it. Generally graceful failure
> is the default or should be.

Hmm. there is much debate to be had about graceful and unfraceful
failures :-) Some might say ungraceful failures helps to catch issues
earlier before the state is messed up.




Re: [PATCH] block/io_uring: resubmit when result is -EAGAIN

2021-07-28 Thread Stefan Hajnoczi
On Wed, Jul 28, 2021 at 12:35:18PM +0200, Fabian Ebner wrote:
> Quoting from [0]:
> 
>  Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>  completion path. Normally we expect this to happen inline as part
>  of submission, but apparently SCSI has a weird corner case where it
>  can happen as part of normal completions.
> 
> Host kernels without patch [0] can panic when this happens [1], and
> resubmitting makes the panic more likely. On the other hand, for
> kernels with patch [0], resubmitting ensures that a block job is not
> aborted just because of such spurious errors.
> 
> [0]: 
> https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u
> 
> [1]:
>   #9 [b732000c8b70] asm_exc_page_fault at a4800ade
>  #10 [b732000c8bf8] io_prep_async_work at a3d89c16
>  #11 [b732000c8c50] io_rw_reissue at a3d8b2e1
>  #12 [b732000c8c78] io_complete_rw at a3d8baa8
>  #13 [b732000c8c98] blkdev_bio_end_io at a3d62a80
>  #14 [b732000c8cc8] bio_endio at a3f4e800
>  #15 [b732000c8ce8] dec_pending at a432f854
>  #16 [b732000c8d30] clone_endio at a433170c
>  #17 [b732000c8d70] bio_endio at a3f4e800
>  #18 [b732000c8d90] blk_update_request at a3f53a37
>  #19 [b732000c8dd0] scsi_end_request at a4233a5c
>  #20 [b732000c8e08] scsi_io_completion at a423432c
>  #21 [b732000c8e58] scsi_finish_command at a422c527
>  #22 [b732000c8e88] scsi_softirq_done at a42341e4
> 
> Signed-off-by: Fabian Ebner 
> ---
> 
> I'm new to this code and io_uring, so I don't know what all the
> implications are, but retrying upon EAGAIN does not sound like
> a bad thing to my inexperienced ears.
> 
> Some more context, leading up to this patch:
> 
> We had some users reporting issues after we switched to using io_uring
> by default. Namely, kernel panics [2] for some, and failing block jobs
> [3] (with a custom backup mechanism we implemented on top of QEMU's
> block layer) for others.
> 
> I had luck and managed to reprouce the issue, and it was a failed
> block job about half of the time and a kernel panic the other half.
> When using a host kernel with [0], it's a failed block job all the
> time, and this patch attempts to fix that, by resubmitting instead
> of bubbling up the error.

Great, thanks for the patch!

Some of the relevant information is below the '---' line and won't be
included in the git log. Please tweak the commit description to focus
on:
- io_uring may return spurious -EAGAIN with Linux SCSI
  (The details of how io_uring internally resubmits are not important
  but the fact that spurious -EAGAIN is visible to userspace matters.)
- Resubmitting such requests solves the problem

The kernel panic is not as important in the commit description since
this patch works the same regardless of whether the kernel panics
sometimes or not.

> 
> [2]: 
> https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/post-404382
> [3]: 
> https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/
> 
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 00a3ee9fb8..77d162cb24 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -165,7 +165,7 @@ static void luring_process_completions(LuringState *s)
>  total_bytes = ret + luringcb->total_read;
>  
>  if (ret < 0) {
> -if (ret == -EINTR) {
> +if (ret == -EINTR || ret == -EAGAIN) {
>  luring_resubmit(s, luringcb);
>  continue;
>  }

Please add a comment:

 /*
  * Only writev/readv/fsync requests on regular files or host block
  * devices are submitted. Therefore -EAGAIN is not expected but it's
  * known to happen sometimes with Linux SCSI. Submit again and hope the
  * request completes successfully.
  *
  * For more information, see:
  * 
https://lore.kernel.org/io-uring/20210727165811.284510-3-ax...@kernel.dk/T/#u
  *
  * If the code is changed to submit other types of requests in the
  * future, then this workaround may need to be extended to deal with
  * genuine -EAGAIN results that should not be resubmitted immediately.
  */

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] gitlab-ci.d/buildtest: Mark the aarch64 and ppc64-s390x CFI jobs as manual

2021-07-28 Thread Willian Rampazzo
On Wed, Jul 28, 2021 at 4:51 AM Thomas Huth  wrote:
>
> These two jobs are currently failing very often - the linker seems to
> get killed due to out-of-memory problems. Since apparently nobody has
> currently an idea how to fix that nicely, let's mark the jobs as manual
> for the time being until someone comes up with a proper fix.
>
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/buildtest.yml | 12 
>  1 file changed, 12 insertions(+)
>

Reviewed-by: Willian Rampazzo 




[PATCH 2/4] docs: Fold usb2.txt USB controller information into usb.rst

2021-07-28 Thread Peter Maydell
Fold the information in docs/usb2.txt about the different
kinds of supported USB controller into the main rST manual.

Signed-off-by: Peter Maydell 
---
 docs/system/devices/usb.rst | 86 +
 docs/usb2.txt   | 82 ---
 2 files changed, 86 insertions(+), 82 deletions(-)

diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index 7da142ecbb9..9f0e613dcc7 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -8,6 +8,92 @@ plug virtual USB devices or real host USB devices (only works 
with
 certain host operating systems). QEMU will automatically create and
 connect virtual USB hubs as necessary to connect multiple USB devices.
 
+USB controllers
+~~~
+
+XHCI controller support
+^^^
+
+QEMU has XHCI host adapter support.  The XHCI hardware design is much
+more virtualization-friendly when compared to EHCI and UHCI, thus XHCI
+emulation uses less resources (especially CPU).  So if your guest
+supports XHCI (which should be the case for any operating system
+released around 2010 or later) we recommend using it:
+
+qemu -device qemu-xhci
+
+XHCI supports USB 1.1, USB 2.0 and USB 3.0 devices, so this is the
+only controller you need.  With only a single USB controller (and
+therefore only a single USB bus) present in the system there is no
+need to use the bus= parameter when adding USB devices.
+
+
+EHCI controller support
+^^^
+
+The QEMU EHCI Adapter supports USB 2.0 devices.  It can be used either
+standalone or with companion controllers (UHCI, OHCI) for USB 1.1
+devices.  The companion controller setup is more convenient to use
+because it provides a single USB bus supporting both USB 2.0 and USB
+1.1 devices.  See next section for details.
+
+When running EHCI in standalone mode you can add UHCI or OHCI
+controllers for USB 1.1 devices too.  Each controller creates its own
+bus though, so there are two completely separate USB buses: One USB
+1.1 bus driven by the UHCI controller and one USB 2.0 bus driven by
+the EHCI controller.  Devices must be attached to the correct
+controller manually.
+
+The easiest way to add a UHCI controller to a ``pc`` machine is the
+``-usb`` switch.  QEMU will create the UHCI controller as function of
+the PIIX3 chipset.  The USB 1.1 bus will carry the name ``usb-bus.0``.
+
+You can use the standard ``-device`` switch to add a EHCI controller to
+your virtual machine.  It is strongly recommended to specify an ID for
+the controller so the USB 2.0 bus gets an individual name, for example
+``-device usb-ehci,id=ehci``.  This will give you a USB 2.0 bus named
+``ehci.0``.
+
+When adding USB devices using the ``-device`` switch you can specify the
+bus they should be attached to.  Here is a complete example:
+
+.. parsed-literal::
+
+|qemu_system| -M pc ${otheroptions}\\
+-drive if=none,id=usbstick,format=raw,file=/path/to/image   \\
+-usb\\
+-device usb-ehci,id=ehci\\
+-device usb-tablet,bus=usb-bus.0\\
+-device usb-storage,bus=ehci.0,drive=usbstick
+
+This attaches a USB tablet to the UHCI adapter and a USB mass storage
+device to the EHCI adapter.
+
+
+Companion controller support
+
+
+The UHCI and OHCI controllers can attach to a USB bus created by EHCI
+as companion controllers.  This is done by specifying the ``masterbus``
+and ``firstport`` properties.  ``masterbus`` specifies the bus name the
+controller should attach to.  ``firstport`` specifies the first port the
+controller should attach to, which is needed as usually one EHCI
+controller with six ports has three UHCI companion controllers with
+two ports each.
+
+There is a config file in docs which will do all this for
+you, which you can use like this:
+
+.. parsed-literal::
+
+   |qemu_system| -readconfig docs/config/ich9-ehci-uhci.cfg
+
+Then use ``bus=ehci.0`` to assign your USB devices to that bus.
+
+Using the ``-usb`` switch for ``q35`` machines will create a similar
+USB controller configuration.
+
+
 .. _Connecting USB devices:
 
 Connecting USB devices
diff --git a/docs/usb2.txt b/docs/usb2.txt
index 172614d3a7e..adf4ba3f2a0 100644
--- a/docs/usb2.txt
+++ b/docs/usb2.txt
@@ -1,86 +1,4 @@
 
-USB Quick Start
-===
-
-XHCI controller support

-
-QEMU has XHCI host adapter support.  The XHCI hardware design is much
-more virtualization-friendly when compared to EHCI and UHCI, thus XHCI
-emulation uses less resources (especially cpu).  So if your guest
-supports XHCI (which should be the case for any operating system
-released around 2010 or later) we recommend using it:
-
-qemu -device qemu-xhci
-
-XHCI supports USB 1.1, USB 2.0 and USB 3.0 devices, so this is the
-only controller you need. 

[PATCH 3/4] docs: Fold usb2.txt physical port addressing info into usb.rst

2021-07-28 Thread Peter Maydell
Fold the usb2.txt documentation about specifying which physical
port a USB device should use into usb.rst.

Signed-off-by: Peter Maydell 
---
 docs/system/devices/usb.rst | 33 +
 docs/usb2.txt   | 32 
 2 files changed, 33 insertions(+), 32 deletions(-)

diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index 9f0e613dcc7..bab0cd3fdfd 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -199,6 +199,39 @@ option or the ``device_add`` monitor command. Available 
devices are:
 ``u2f-{emulated,passthru}``
Universal Second Factor device
 
+Physical port addressing
+
+
+For all the above USB devices, by default QEMU will plug the device
+into the next available port on the specified USB bus, or onto
+some available USB bus if you didn't specify one explicitly.
+If you need to, you can also specify the physical port where
+the device will show up in the guest.  This can be done using the
+``port`` property.  UHCI has two root ports (1,2).  EHCI has six root
+ports (1-6), and the emulated (1.1) USB hub has eight ports.
+
+Plugging a tablet into UHCI port 1 works like this::
+
+-device usb-tablet,bus=usb-bus.0,port=1
+
+Plugging a hub into UHCI port 2 works like this::
+
+-device usb-hub,bus=usb-bus.0,port=2
+
+Plugging a virtual USB stick into port 4 of the hub just plugged works
+this way::
+
+-device usb-storage,bus=usb-bus.0,port=2.4,drive=...
+
+In the monitor, the ``device_add` command also accepts a ``port``
+property specification. If you want to unplug devices too you should
+specify some unique id which you can use to refer to the device.
+You can then use ``device_del`` to unplug the device later.
+For example::
+
+(qemu) device_add usb-tablet,bus=usb-bus.0,port=1,id=my-tablet
+(qemu) device_del my-tablet
+
 Hotplugging USB storage
 ~~~
 
diff --git a/docs/usb2.txt b/docs/usb2.txt
index adf4ba3f2a0..6a88d5314f9 100644
--- a/docs/usb2.txt
+++ b/docs/usb2.txt
@@ -6,38 +6,6 @@ Recently the USB pass through driver (also known as usb-host) 
and the
 QEMU USB subsystem gained a few capabilities which are available only
 via qdev properties, i,e. when using '-device'.
 
-
-physical port addressing
-
-
-First you can (for all USB devices) specify the physical port where
-the device will show up in the guest.  This can be done using the
-"port" property.  UHCI has two root ports (1,2).  EHCI has six root
-ports (1-6), the emulated (1.1) USB hub has eight ports.
-
-Plugging a tablet into UHCI port 1 works like this:
-
--device usb-tablet,bus=usb-bus.0,port=1
-
-Plugging a hub into UHCI port 2 works like this:
-
--device usb-hub,bus=usb-bus.0,port=2
-
-Plugging a virtual USB stick into port 4 of the hub just plugged works
-this way:
-
--device usb-storage,bus=usb-bus.0,port=2.4,drive=...
-
-You can do basically the same in the monitor using the device_add
-command.  If you want to unplug devices too you should specify some
-unique id which you can use to refer to the device ...
-
-(qemu) device_add usb-tablet,bus=usb-bus.0,port=1,id=my-tablet
-(qemu) device_del my-tablet
-
-... when unplugging it with device_del.
-
-
 USB pass through hints
 --
 
-- 
2.20.1




[PATCH 0/4] docs: Move usb2.txt, usb-storage.txt into rST manual

2021-07-28 Thread Peter Maydell
We already have a section of the rST manual describing USB
emulation support; this patchset moves the information
from the text files docs/usb2.txt and docs/usb-storage.txt
into the rST manual.

The changes here include reformatting into rST and a little bit of
wordsmithing in places.  I also added 'format=raw' to some example
command lines since QEMU now complains if you don't use it.  I have
not attempted to determine whether the information in the text files
(last updated in 2018) is stale...

thanks
-- PMM

Peter Maydell (4):
  docs: Incorporate information in usb-storage.txt into rST manual
  docs: Fold usb2.txt USB controller information into usb.rst
  docs: Fold usb2.txt physical port addressing info into usb.rst
  docs: Fold usb2.txt passthrough information into usb.rst

 docs/system/devices/usb.rst | 225 ++--
 docs/usb-storage.txt|  59 --
 docs/usb2.txt   | 172 ---
 MAINTAINERS |   3 +-
 4 files changed, 219 insertions(+), 240 deletions(-)
 delete mode 100644 docs/usb-storage.txt
 delete mode 100644 docs/usb2.txt

-- 
2.20.1




[PATCH 1/4] docs: Incorporate information in usb-storage.txt into rST manual

2021-07-28 Thread Peter Maydell
We already have a section on USB in the rST manual; fold
the information in docs/usb-storage.txt into it.

We add 'format=raw' to the various -drive options in the code
examples, because QEMU will print warnings these days if you
omit it.

Signed-off-by: Peter Maydell 
---
 docs/system/devices/usb.rst | 57 ++-
 docs/usb-storage.txt| 59 -
 MAINTAINERS |  2 +-
 3 files changed, 51 insertions(+), 67 deletions(-)
 delete mode 100644 docs/usb-storage.txt

diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index eeab78dcfbe..7da142ecbb9 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -28,17 +28,46 @@ option or the ``device_add`` monitor command. Available 
devices are:
 
 ``usb-storage,drive=drive_id``
Mass storage device backed by drive_id (see the :ref:`disk images`
-   chapter in the System Emulation Users Guide)
+   chapter in the System Emulation Users Guide). This is the classic
+   bulk-only transport protocol used by 99% of USB sticks. This
+   example shows it connected to an XHCI USB controller and with
+   a drive backed by a raw format disk image:
+
+   .. parsed-literal::
+
+   |qemu_system| [...]   \\
+-drive if=none,id=stick,format=raw,file=/path/to/file.img \\
+-device nec-usb-xhci,id=xhci  \\
+-device usb-storage,bus=xhci.0,drive=stick
 
 ``usb-uas``
-   USB attached SCSI device, see
-   `usb-storage.txt 
`__
-   for details
+   USB attached SCSI device. This does not create a SCSI disk, so
+   you need to explicitly create a ``scsi-hd`` or ``scsi-cd`` device
+   on the command line, as well as using the ``-drive`` option to
+   specify what those disks are backed by. One ``usb-uas`` device can
+   handle multiple logical units (disks). This example creates three
+   logical units: two disks and one cdrom drive:
+
+   .. parsed-literal::
+
+  |qemu_system| [...] \\
+   -drive if=none,id=uas-disk1,format=raw,file=/path/to/file1.img  \\
+   -drive if=none,id=uas-disk2,format=raw,file=/path/to/file2.img  \\
+   -drive 
if=none,id=uas-cdrom,media=cdrom,format=raw,file=/path/to/image.iso \\
+   -device nec-usb-xhci,id=xhci\\
+   -device usb-uas,id=uas,bus=xhci.0   \\
+   -device scsi-hd,bus=uas.0,scsi-id=0,lun=0,drive=uas-disk1   \\
+   -device scsi-hd,bus=uas.0,scsi-id=0,lun=1,drive=uas-disk2   \\
+   -device scsi-cd,bus=uas.0,scsi-id=0,lun=5,drive=uas-cdrom
 
 ``usb-bot``
-   Bulk-only transport storage device, see
-   `usb-storage.txt 
`__
-   for details here, too
+   Bulk-only transport storage device. This presents the guest with the
+   same USB bulk-only transport protocol interface as ``usb-storage``, but
+   the QEMU command line option works like ``usb-uas`` and does not
+   automatically create SCSI disks for you. ``usb-bot`` supports up to
+   16 LUNs. Unlike ``usb-uas``, the LUN numbers must be continuous,
+   i.e. for three devices you must use 0+1+2. The 0+1+5 numbering from the
+   ``usb-uas`` example above won't work with ``usb-bot``.
 
 ``usb-mtp,rootdir=dir``
Media transfer protocol device, using dir as root of the file tree
@@ -84,6 +113,20 @@ option or the ``device_add`` monitor command. Available 
devices are:
 ``u2f-{emulated,passthru}``
Universal Second Factor device
 
+Hotplugging USB storage
+~~~
+
+The ``usb-bot`` and ``usb-uas`` devices can be hotplugged.  In the hotplug
+case they are added with ``attached = false`` so the guest will not see
+the device until the ``attached`` property is explicitly set to true.
+That allows you to attach one or more scsi devices before making the
+device visible to the guest. The workflow looks like this:
+
+#. ``device-add usb-bot,id=foo``
+#. ``device-add scsi-{hd,cd},bus=foo.0,lun=0``
+#. optionally add more devices (luns 1 ... 15)
+#. ``scripts/qmp/qom-set foo.attached = true``
+
 .. _host_005fusb_005fdevices:
 
 Using host USB devices on a Linux host
diff --git a/docs/usb-storage.txt b/docs/usb-storage.txt
deleted file mode 100644
index 551af6f88bb..000
--- a/docs/usb-storage.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-
-qemu usb storage emulation
---
-
-QEMU has three devices for usb storage emulation.
-
-Number one emulates the classic bulk-only transport protocol which is
-used by 99% of the usb sticks on the market today and is called
-"usb-storage".  Usage (hooking up to xhci, other host controllers work
-too):
-
-  qemu ${other_vm_args}\
-   -drive if=none,id=stick,file=/path/to/file.img  \
-   -device nec-usb-xhci,id=xhci  

[PATCH 4/4] docs: Fold usb2.txt passthrough information into usb.rst

2021-07-28 Thread Peter Maydell
Fold the usb2.txt information on device passthrough into usb.rst;
since this is the last part of the .txt file we can delete it now.

Signed-off-by: Peter Maydell 
---
 docs/system/devices/usb.rst | 49 +++
 docs/usb2.txt   | 58 -
 MAINTAINERS |  1 -
 3 files changed, 49 insertions(+), 59 deletions(-)
 delete mode 100644 docs/usb2.txt

diff --git a/docs/system/devices/usb.rst b/docs/system/devices/usb.rst
index bab0cd3fdfd..afb7d6c2268 100644
--- a/docs/system/devices/usb.rst
+++ b/docs/system/devices/usb.rst
@@ -300,3 +300,52 @@ are not supported yet.
 
 When relaunching QEMU, you may have to unplug and plug again the USB
 device to make it work again (this is a bug).
+
+``usb-host`` properties for specifying the host device
+^^
+
+The example above uses the ``vendorid`` and ``productid`` to
+specify which host device to pass through, but this is not
+the only way to specify the host device. ``usb-host`` supports
+the following properties:
+
+``hostbus=``
+  Specifies the bus number the device must be attached to
+``hostaddr=``
+  Specifies the device address the device got assigned by the guest os
+``hostport=``
+  Specifies the physical port the device is attached to
+``vendorid=``
+  Specifies the vendor ID of the device
+``productid=``
+  Specifies the product ID of the device.
+
+In theory you can combine all these properties as you like.  In
+practice only a few combinations are useful:
+
+- ``vendorid`` and ``productid`` -- match for a specific device, pass it to
+  the guest when it shows up somewhere in the host.
+
+- ``hostbus`` and ``hostport`` -- match for a specific physical port in the
+  host, any device which is plugged in there gets passed to the
+  guest.
+
+- ``hostbus`` and ``hostaddr`` -- most useful for ad-hoc pass through as the
+  hostaddr isn't stable. The next time you plug the device into the host it
+  will get a new hostaddr.
+
+Note that on the host USB 1.1 devices are handled by UHCI/OHCI and USB
+2.0 by EHCI.  That means different USB devices plugged into the very
+same physical port on the host may show up on different host buses
+depending on the speed. Supposing that devices plugged into a given
+physical port appear as bus 1 + port 1 for 2.0 devices and bus 3 + port 1
+for 1.1 devices, you can pass through any device plugged into that port
+and also assign it to the correct USB bus in QEMU like this:
+
+.. parsed-literal::
+
+   |qemu_system| -M pc [...]\\
+-usb \\
+-device usb-ehci,id=ehci \\
+-device usb-host,bus=usb-bus.0,hostbus=3,hostport=1  \\
+-device usb-host,bus=ehci.0,hostbus=1,hostport=1
diff --git a/docs/usb2.txt b/docs/usb2.txt
deleted file mode 100644
index 6a88d5314f9..000
--- a/docs/usb2.txt
+++ /dev/null
@@ -1,58 +0,0 @@
-
-More USB tips & tricks
-==
-
-Recently the USB pass through driver (also known as usb-host) and the
-QEMU USB subsystem gained a few capabilities which are available only
-via qdev properties, i,e. when using '-device'.
-
-USB pass through hints
---
-
-The usb-host driver has a bunch of properties to specify the device
-which should be passed to the guest:
-
-  hostbus= -- Specifies the bus number the device must be attached
-  to.
-
-  hostaddr= -- Specifies the device address the device got
-  assigned by the guest os.
-
-  hostport= -- Specifies the physical port the device is attached
-  to.
-
-  vendorid= -- Specifies the vendor ID of the device.
-  productid= -- Specifies the product ID of the device.
-
-In theory you can combine all these properties as you like.  In
-practice only a few combinations are useful:
-
-  (1) vendorid+productid -- match for a specific device, pass it to
-  the guest when it shows up somewhere in the host.
-
-  (2) hostbus+hostport -- match for a specific physical port in the
-  host, any device which is plugged in there gets passed to the
-  guest.
-
-  (3) hostbus+hostaddr -- most useful for ad-hoc pass through as the
-  hostaddr isn't stable, the next time you plug in the device it
-  gets a new one ...
-
-Note that USB 1.1 devices are handled by UHCI/OHCI and USB 2.0 by
-EHCI.  That means a device plugged into the very same physical port
-may show up on different buses depending on the speed.  The port I'm
-using for testing is bus 1 + port 1 for 2.0 devices and bus 3 + port 1
-for 1.1 devices.  Passing through any device plugged into that port
-and also assign them to the correct bus can be done this way:
-
-qemu -M pc ${otheroptions}   \
--usb \
--device usb-ehci,id=ehci \
--device usb-host,bus=usb-bus.0,hostbus=3,hostport=1

Re: [PULL 0/1] Miscellaneous patches for 2021-07-27

2021-07-28 Thread Peter Maydell
On Tue, 27 Jul 2021 at 16:22, Markus Armbruster  wrote:
>
> The following changes since commit ca4b5ef371d6602b73bc5eec08e3199b05caf146:
>
>   Merge remote-tracking branch 
> 'remotes/dgilbert-gitlab/tags/pull-migration-20210726a' into staging 
> (2021-07-27 10:55:50 +0100)
>
> are available in the Git repository at:
>
>   git://repo.or.cz/qemu/armbru.git tags/pull-misc-2021-07-27
>
> for you to fetch changes up to 3e61a13af3d3a1942a1ec2f6dfd7b407a43e4273:
>
>   vl: Don't continue after -smp help. (2021-07-27 16:52:37 +0200)
>
> 
> Miscellaneous patches for 2021-07-27
>
> 
> Markus Armbruster (1):
>   vl: Don't continue after -smp help.
>
>  softmmu/vl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



[PATCH v13] qapi: introduce 'query-x86-cpuid' QMP command.

2021-07-28 Thread Valeriy Vdovin
Introducing new QMP command 'query-x86-cpuid'. This command can be used to
get virtualized cpu model info generated by QEMU during VM initialization in
the form of cpuid representation.

Diving into more details about virtual CPU generation: QEMU first parses '-cpu'
command line option. From there it takes the name of the model as the basis for
feature set of the new virtual CPU. After that it uses trailing '-cpu' options,
that state if additional cpu features should be present on the virtual CPU or
excluded from it (tokens '+'/'-' or '=on'/'=off').
After that QEMU checks if the host's cpu can actually support the derived
feature set and applies host limitations to it.
After this initialization procedure, virtual CPU has it's model and
vendor names, and a working feature set and is ready for identification
instructions such as CPUID.

To learn exactly how virtual CPU is presented to the guest machine via CPUID
instruction, new QMP command can be used. By calling 'query-x86-cpuid'
command, one can get a full listing of all CPUID leaves with subleaves which are
supported by the initialized virtual CPU.

Other than debug, the command is useful in cases when we would like to
utilize QEMU's virtual CPU initialization routines and put the retrieved
values into kernel CPUID overriding mechanics for more precise control
over how various processes perceive its underlying hardware with
container processes as a good example.

The command is specific to x86. It is currenly only implemented for KVM 
acceleator.

Output format:
The output is a plain list of leaf/subleaf argument combinations, that
return 4 words in registers EAX, EBX, ECX, EDX.

Use example:
qmp_request: {
  "execute": "query-x86-cpuid"
}

qmp_response: {
  "return": [
{
  "eax": 1073741825,
  "edx": 77,
  "in-eax": 1073741824,
  "ecx": 1447775574,
  "ebx": 1263359563
},
{
  "eax": 16777339,
  "edx": 0,
  "in-eax": 1073741825,
  "ecx": 0,
  "ebx": 0
},
{
  "eax": 13,
  "edx": 1231384169,
  "in-eax": 0,
  "ecx": 1818588270,
  "ebx": 1970169159
},
{
  "eax": 198354,
  "edx": 126614527,
  "in-eax": 1,
  "ecx": 2176328193,
  "ebx": 2048
},

{
  "eax": 12328,
  "edx": 0,
  "in-eax": 2147483656,
  "ecx": 0,
  "ebx": 0
}
  ]
}

Signed-off-by: Valeriy Vdovin 
---
v2: - Removed leaf/subleaf iterators.
- Modified cpu_x86_cpuid to return false in cases when count is
  greater than supported subleaves.
v3: - Fixed structure name coding style.
- Added more comments
- Ensured buildability for non-x86 targets.
v4: - Fixed cpu_x86_cpuid return value logic and handling of 0xA leaf.
- Fixed comments.
- Removed target check in qmp_query_cpu_model_cpuid.
v5: - Added error handling code in qmp_query_cpu_model_cpuid
v6: - Fixed error handling code. Added method to query_error_class
v7: - Changed implementation in favor of cached cpuid_data for
  KVM_SET_CPUID2
v8: - Renamed qmp method to query-kvm-cpuid and some fields in response.
- Modified documentation to qmp method
- Removed helper struct declaration
v9: - Renamed 'in_eax' / 'in_ecx' fields to 'in-eax' / 'in-ecx'
- Pasted more complete response to commit message.
v10:
- Subject changed
- Fixes in commit message
- Small fixes in QMP command docs
v11:
- Added explanation about CONFIG_KVM to the commit message.
v12:
- Changed title from query-kvm-cpuid to query-x86-cpuid
- Removed CONFIG_KVM ifdefs
- Added detailed error messages for some stub/unimplemented cases.
v13:
- Tagged with since 6.2

 qapi/machine-target.json   | 44 
 softmmu/cpus.c |  2 +-
 target/i386/kvm/kvm-stub.c | 10 
 target/i386/kvm/kvm.c  | 51 ++
 tests/qtest/qmp-cmd-test.c |  1 +
 5 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index e7811654b7..599394d067 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -329,3 +329,47 @@
 ##
 { 'command': 'query-cpu-definitions', 'returns': ['CpuDefinitionInfo'],
   'if': 'defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_I386) || 
defined(TARGET_S390X) || defined(TARGET_MIPS)' }
+
+##
+# @CpuidEntry:
+#
+# A single entry of a CPUID response.
+#
+# One entry holds full set of information (leaf) returned to the guest
+# in response to it calling a CPUID instruction with eax, ecx used as
+# the agruments to that instruction. ecx is an optional argument as
+# not all of the leaves support it.
+#
+# @in-eax: CPUID argument in eax
+# @in-ecx: CPUID argument in ecx
+# @eax: CPUID result in eax
+# @ebx: CPUID result in ebx
+# @ecx: CPUID result in ecx
+# @edx: CPUID result in edx
+#
+# Since: 6.2
+##
+{ 'struct': 'CpuidEntry',
+  'data': { 'in-eax' : 'uint32',
+'*in-ecx' : 'uint32',
+'eax' : 'uin

Re: [PATCH v5 02/10] ACPI ERST: specification for ERST support

2021-07-28 Thread Eric DeVolder




On 7/27/21 6:45 AM, Igor Mammedov wrote:

On Mon, 26 Jul 2021 14:52:15 -0500
Eric DeVolder  wrote:


On 7/26/21 5:06 AM, Igor Mammedov wrote:

On Wed, 21 Jul 2021 10:42:33 -0500
Eric DeVolder  wrote:
   

On 7/19/21 10:02 AM, Igor Mammedov wrote:

On Wed, 30 Jun 2021 19:26:39 +
Eric DeVolder  wrote:
  

Oops, at the end of the 4th paragraph, I meant to state that "Linux does not support 
the NVRAM mode."
rather than "non-NVRAM mode", which contradicts everything I stated prior.
Eric.

From: Eric DeVolder 
Sent: Wednesday, June 30, 2021 2:07 PM
To: qemu-devel@nongnu.org 
Cc: m...@redhat.com ; imamm...@redhat.com ; marcel.apfelb...@gmail.com 
; pbonz...@redhat.com ; r...@twiddle.net ; 
ehabk...@redhat.com ; Konrad Wilk ; Boris Ostrovsky 

Subject: [PATCH v5 02/10] ACPI ERST: specification for ERST support

Information on the implementation of the ACPI ERST support.

Signed-off-by: Eric DeVolder 
---
docs/specs/acpi_erst.txt | 152 
+++
1 file changed, 152 insertions(+)
create mode 100644 docs/specs/acpi_erst.txt

diff --git a/docs/specs/acpi_erst.txt b/docs/specs/acpi_erst.txt
new file mode 100644
index 000..79f8eb9
--- /dev/null
+++ b/docs/specs/acpi_erst.txt
@@ -0,0 +1,152 @@
+ACPI ERST DEVICE
+
+
+The ACPI ERST device is utilized to support the ACPI Error Record
+Serialization Table, ERST, functionality. The functionality is
+designed for storing error records in persistent storage for
+future reference/debugging.
+
+The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces
+(APEI)", and specifically subsection "Error Serialization", outlines
+a method for storing error records into persistent storage.
+
+The format of error records is described in the UEFI specification[2],
+in Appendix N "Common Platform Error Record".
+
+While the ACPI specification allows for an NVRAM "mode" (see
+GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is
+directly exposed for direct access by the OS/guest, this implements
+the non-NVRAM "mode". This non-NVRAM "mode" is what is implemented
+by most BIOS (since flash memory requires programming operations
+in order to update its contents). Furthermore, as of the time of this
+writing, Linux does not support the non-NVRAM "mode".


shouldn't it be s/non-NVRAM/NVRAM/ ?


Yes, it has been corrected.
  
  

+
+
+Background/Motivation
+-
+Linux uses the persistent storage filesystem, pstore, to record
+information (eg. dmesg tail) upon panics and shutdowns.  Pstore is
+independent of, and runs before, kdump.  In certain scenarios (ie.
+hosts/guests with root filesystems on NFS/iSCSI where networking
+software and/or hardware fails), pstore may contain the only
+information available for post-mortem debugging.


well,
it's not the only way, one can use existing pvpanic device to notify
mgmt layer about crash and mgmt layer can take appropriate measures
to for post-mortem debugging, including dumping guest state,
which is superior to anything pstore can offer as VM is still exists
and mgmt layer can inspect VMs crashed state directly or dump
necessary parts of it.

So ERST shouldn't be portrayed as the only way here but rather
as limited alternative to pvpanic in regards to post-mortem debugging
(it's the only way only on bare-metal).

It would be better to describe here other use-cases you've mentioned
in earlier reviews, that justify adding alternative to pvpanic.


I'm not sure how I would change this. I do say "may contain", which means it
is not the only way. Pvpanic is a way to notify the mgmt layer/host, but
this is a method solely with the guest. Each serves a different purpose;
plugs a different hole.
  


I'd suggest edit  "pstore may contain the only information" as "pstore may contain 
information"
   

Done


As noted in a separate message, my company has intentions of storing other
data in ERST beyond panics.

perhaps add your use cases here as well.
   
   

+Two common storage backends for the pstore filesystem are ACPI ERST
+and UEFI. Most BIOS implement ACPI ERST.  UEFI is not utilized in
+all guests. With QEMU supporting ACPI ERST, it becomes a viable
+pstore storage backend for virtual machines (as it is now for
+bare metal machines).
+
  

+Enabling support for ACPI ERST facilitates a consistent method to
+capture kernel panic information in a wide range of guests: from
+resource-constrained microvms to very large guests, and in
+particular, in direct-boot environments (which would lack UEFI
+run-time services).

this hunk probably not necessary
  

+
+Note that Microsoft Windows also utilizes the ACPI ERST for certain
+crash information, if available.

a pointer to a relevant source would be helpful here.


I've included the reference, here for your benefit.
Windows Hardware Error Architecutre, specifically Persistence Mechanism
https://docs.microsoft.com/en-us/windows-hardware/drivers/w

Re: [PATCH v5 06/10] ACPI ERST: build the ACPI ERST table

2021-07-28 Thread Eric DeVolder




On 7/27/21 7:01 AM, Igor Mammedov wrote:

On Mon, 26 Jul 2021 15:02:55 -0500
Eric DeVolder  wrote:

[...]

+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_END_OPERATION:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER  , 0, 32,
+s->bar0 + ERST_CSR_VALUE , 0, MASK32);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_EXECUTE_OPERATION:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_VALUE , ERST_EXECUTE_OPERATION_MAGIC, 
MASK8);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_READ_REGISTER_VALUE , 0, 32,
+s->bar0 + ERST_CSR_VALUE, 0x01, MASK8);
+break;
+case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_READ_REGISTER   , 0, 32,
+s->bar0 + ERST_CSR_VALUE, 0, MASK8);
+break;
+case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_READ_REGISTER   , 0, 64,
+s->bar0 + ERST_CSR_VALUE, 0, MASK64);
+break;
+case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER  , 0, 64,
+s->bar0 + ERST_CSR_VALUE , 0, MASK64);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_GET_RECORD_COUNT:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_READ_REGISTER   , 0, 32,
+s->bar0 + ERST_CSR_VALUE, 0, MASK32);
+break;
+case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_RESERVED:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VALUE, 0, 32,
+s->bar0 + ERST_CSR_ACTION, action, MASK8);
+break;
+case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
+build_serialization_instruction_entry(table_data, action,
+ACPI_ERST_INST_WRITE_REGISTER_VA

Re: [PATCH v5 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU

2021-07-28 Thread Eric DeVolder




On 7/27/21 7:55 AM, Igor Mammedov wrote:

PS:
If I haven't said it already, use checkpatch script before posting patches.



I do run checkpatch. On occasion I allow a warning about a line too long. And
there is the MAINTAINERs message due to the new files. Is there something else
that I'm missing?



Re: [PULL 0/1] Libslirp update

2021-07-28 Thread Peter Maydell
On Wed, 28 Jul 2021 at 11:51,  wrote:
>
> From: Marc-André Lureau 
>
> The following changes since commit a2376507f615495b1d16685449ce0ea78c2caf9d:
>
>   Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' 
> into staging (2021-07-24 11:04:57 +0100)
>
> are available in the Git repository at:
>
>   g...@github.com:elmarco/qemu.git tags/libslirp-pull-request
>
> for you to fetch changes up to 565301284e83fd5f12024a4e7959895380491668:
>
>   Update libslirp to v4.6.1 (2021-07-28 13:11:11 +0400)
>
> 
> Update libslirp
>
> Hi,
>
> Let's update libslirp to 4.6.1, with its various fixes.
>

Fails to build, OSX, when linking the qemu-system-* executables:

Undefined symbols for architecture x86_64:
  "_res_9_getservers", referenced from:
  _get_dns_addr_libresolv in libslirp.a(slirp_src_slirp.c.o)
  "_res_9_nclose", referenced from:
  _get_dns_addr_libresolv in libslirp.a(slirp_src_slirp.c.o)
  "_res_9_ninit", referenced from:
  _get_dns_addr_libresolv in libslirp.a(slirp_src_slirp.c.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

-- PMM



[Bug 1908626] Re: Atomic test-and-set instruction does not work on qemu-user

2021-07-28 Thread Thomas Huth
This is an automated cleanup. This bug report has been moved to QEMU's
new bug tracker on gitlab.com and thus gets marked as 'expired' now.
Please continue with the discussion here:

 https://gitlab.com/qemu-project/qemu/-/issues/509


** Changed in: qemu
   Status: New => Expired

** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #509
   https://gitlab.com/qemu-project/qemu/-/issues/509

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1908626

Title:
  Atomic test-and-set instruction does not work on qemu-user

Status in QEMU:
  Expired

Bug description:
  I try to compile and run PostgreSQL/Greenplum database inside docker 
container/qemu-aarch64-static:
  ```
   host: CentOS7 x86_64
   container: centos:centos7.9.2009 --platform linux/arm64/v8
   qemu-user-static: https://github.com/multiarch/qemu-user-static/releases/
  ```

  However, GP/PG's spinlock always gets stuck and reports PANIC errors. It 
seems its spinlock
  has something wrong.
  ```
  https://github.com/greenplum-db/gpdb/blob/master/src/include/storage/s_lock.h
  
https://github.com/greenplum-db/gpdb/blob/master/src/backend/storage/lmgr/s_lock.c
  ```

  So I extract its spinlock implementation into one test C source file (see 
attachment file),
  and get reprodcued:

  ```
  $ gcc spinlock_qemu.c
  $ ./a.out 
  C -- slock inited, lock value is: 0
  parent 139642, child 139645
  P -- slock lock before, lock value is: 0
  P -- slock locked, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock lock before, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock locked, lock value is: 1
  C -- slock unlock after, lock value is: 0
  C -- slock lock before, lock value is: 1
  P -- slock locked, lock value is: 1
  P -- slock unlock after, lock value is: 0
  P -- slock lock before, lock value is: 1
  C -- slock locked, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock locked, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock locked, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  P -- slock lock before, lock value is: 1
  spin timeout, lock value is 1 (pid 139642)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139642)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139642)
  ...
  ...
  ...
  ```

  NOTE: this code always works on PHYSICAL ARM64 server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1908626/+subscriptions




Re: [PULL 0/1] Libslirp update

2021-07-28 Thread Marc-André Lureau
Hi

On Wed, Jul 28, 2021 at 7:23 PM Peter Maydell 
wrote:

> On Wed, 28 Jul 2021 at 11:51,  wrote:
> >
> > From: Marc-André Lureau 
> >
> > The following changes since commit
> a2376507f615495b1d16685449ce0ea78c2caf9d:
> >
> >   Merge remote-tracking branch
> 'remotes/bonzini-gitlab/tags/for-upstream' into staging (2021-07-24
> 11:04:57 +0100)
> >
> > are available in the Git repository at:
> >
> >   g...@github.com:elmarco/qemu.git tags/libslirp-pull-request
> >
> > for you to fetch changes up to 565301284e83fd5f12024a4e7959895380491668:
> >
> >   Update libslirp to v4.6.1 (2021-07-28 13:11:11 +0400)
> >
> > 
> > Update libslirp
> >
> > Hi,
> >
> > Let's update libslirp to 4.6.1, with its various fixes.
> >
>
> Fails to build, OSX, when linking the qemu-system-* executables:
>
> Undefined symbols for architecture x86_64:
>   "_res_9_getservers", referenced from:
>   _get_dns_addr_libresolv in libslirp.a(slirp_src_slirp.c.o)
>   "_res_9_nclose", referenced from:
>   _get_dns_addr_libresolv in libslirp.a(slirp_src_slirp.c.o)
>   "_res_9_ninit", referenced from:
>   _get_dns_addr_libresolv in libslirp.a(slirp_src_slirp.c.o)
> ld: symbol(s) not found for architecture x86_64
> clang: error: linker command failed with exit code 1 (use -v to see
> invocation)
>
> -- PMM
>
>
Argh.. that's because we don't use it properly as a submodule, and libslirp
must link with 'resolv' on macos now.

I wish my previous pull request with the submodule change would receive
more help or attention, as I either couldn't reproduce the failure (neither
CI) or it was just some one-time warnings due to the transition...



-- 
Marc-André Lureau


Re: [PATCH v4 00/33] Qemu SGX virtualization

2021-07-28 Thread Paolo Bonzini

On 19/07/21 13:21, Yang Zhong wrote:

Since Sean Christopherson has left Intel and i am responsible for Qemu SGX
upstream work. His @intel.com address will be bouncing and his new email(
sea...@google.com) is also in CC lists.

This series is Qemu SGX virtualization implementation rebased on latest
Qemu release. The numa support for SGX will be sent in another patchset
once this basic SGX patchset are merged.

You can find Qemu repo here:

 https://github.com/intel/qemu-sgx.git upstream

If you want to try SGX, you can directly install the linux release(at least 
5.13.0-rc1+)
since kvm SGX has been merged into linux release.

 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

To simplify, you'd better install linux on host and guest, which can support
SGX on host and guest kernel. And to me, use below reference command to boot
SGX guest:

 #qemu-system-x86_64 \
 .. \
 -cpu host,+sgx-provisionkey \
 -object memory-backend-epc,id=mem1,size=64M,prealloc=on \
 -object memory-backend-epc,id=mem2,size=28M \
 -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2

Overview


Intel Software Guard eXtensions (SGX) is a set of instructions and mechanisms
for memory accesses in order to provide security accesses for sensitive
applications and data. SGX allows an application to use it's pariticular
address space as an *enclave*, which is a protected area provides 
confidentiality
and integrity even in the presence of privileged malware. Accesses to the
enclave memory area from any software not resident in the enclave are prevented,
including those from privileged software.

SGX virtaulization
==

The KVM SGX creates one new misc device, sgx_vepc, and Qemu will open 
'/dev/sgx_vepc'
device node to mmap() host EPC memory to guest. The Qemu also adds 'sgx-epc' 
device
to expose EPC sections to guest through CPUID and ACPI table.  The Qemu SGX also
supports multiple virtual EPC sections to guest, we just put them together 
physically
contiguous for the sake of simplicity. The kernel SGX NUMA has been merged into 
Linux
tip tree, we will support this function in the next phase.

Although the current host SGX subsystem can not support SGX2 feature, the 
KVM/Qemu
implementation still expose this feature to guest. Guest SGX2 support doesn't 
have
interaction with host kernel SGX driver, the SGX guest can normally use those 
new
instructions.

As for SGX virtualization detailed infomation, please reference 
docs/intel-sgx.txt
docuement(patch 33).

Changelog:
=

(Changelog here is for global changes, please see each patch's changelog for 
changes
made to specific patch.)

v3-->v4:
- Rebased the sgx patches into latest Qemu release.
- Moved sgx compound property setter/getter from MachineState to 
X86MachineState(Paolo).
- Re-defined struct SgxEPC, removed 'id' property and added struct 
SgxEPCList for
  sgx-epc.0.{memdev}(Paolo).
- Removed g_malloc0(), and changed the 'SGXEPCState *sgx_epc' to 
'SGXEPCState sgx_epc'
  in struct PCMachineState(Paolo).
- Changed the SGX compound property cmdline from sgx-epc.{memdev}.0 to
  sgx-epc.0.{memdev}(Paolo).
- Removed the signature from the 'git format-patch' command(Jarkko).

v2-->v3:
- Rebased the sgx patches into latest Qemu release.
- Implemented the compound property for SGX, ref patch5, the command from 
'-sgx-epc'
  to '-M'(Paolo).
- Moved the sgx common code from sgx-epc.c to sgx.c. The sgx-epc.c is
  only responsible for virtual epc device.
- Removed the previous patch13(linux-headers: Add placeholder for 
KVM_CAP_SGX_ATTRIBUTE)
  because ehabk...@redhat.com updated Linux headers to 5.13-rc4 with commit 
278f064e452.
- Updated the patch1 because ram_flags were changed by David Hildenbra.
- Added one patch24, which avoid reset operation caused by bios reset.
- Added one patch25, which make prealloc property consistent with Qemu 
cmdline during VM
  reset.

v1-->v2:
- Rebased the sgx patches into latest Qemu release.
- Unified the "share" and "protected" arguments with ram_flags in the
  memory_region_init_ram_from_fd()(Paolo).
- Added the new MemoryBackendEpcProperties and related documents(Eric 
Blake).
- Changed the KVM_CAP_SGX_ATTRIBUTE from 195 to 196(Kai).
- Changed the version and some grammar issues(Eric Blake).


Looks good, I will queue it for 6.2.

Thanks for your patience with the compound machine properties support.

Paolo


Sean Christopherson (21):
   memory: Add RAM_PROTECTED flag to skip IOMMU mappings
   hostmem: Add hostmem-epc as a backend for SGX EPC
   i386: Add 'sgx-epc' device to expose EPC sections to guest
   vl: Add sgx compound properties to expose SGX EPC sections to guest
   i386: Add primary SGX CPUID and MSR defines
   i386: Add SGX CPUID leaf FEAT_SGX_12_0_EAX
   i386: Add SGX CPUID leaf FEAT_SGX_12_0_EBX
   i386: Add SGX CPUID leaf FEAT_SGX_12_1_EA

Re: [PATCH 6/7] virtio-gpu: Initialize Venus

2021-07-28 Thread Paolo Bonzini

On 27/07/21 19:05, Antonio Caggiano wrote:

diff --git a/meson.build b/meson.build
index f2e148eaf9..31b65050b7 100644
--- a/meson.build
+++ b/meson.build
@@ -483,6 +483,7 @@ if not get_option('virglrenderer').auto() or have_system
   method: 'pkg-config',
   required: get_option('virglrenderer'),
   kwargs: static_kwargs)
+  add_project_arguments('-DVIRGL_RENDERER_UNSTABLE_APIS', language : 'c')


Can you instead use declare_dependency to make the option part of the 
virgl dependency?


Thanks,

Paolo




Re: [PATCH RFC server 04/11] vfio-user: find and init PCI device

2021-07-28 Thread Jag Raman


> On Jul 26, 2021, at 11:05 AM, John Levon  wrote:
> 
> On Mon, Jul 19, 2021 at 04:00:06PM -0400, Jagannathan Raman wrote:
> 
>> +vfu_pci_set_id(o->vfu_ctx,
>> +   pci_get_word(o->pci_dev->config + PCI_VENDOR_ID),
>> +   pci_get_word(o->pci_dev->config + PCI_DEVICE_ID),
>> +   pci_get_word(o->pci_dev->config + 
>> PCI_SUBSYSTEM_VENDOR_ID),
>> +   pci_get_word(o->pci_dev->config + PCI_SUBSYSTEM_ID));
> 
> Since you handle all config space accesses yourselves, is there even any need
> for this?

I think that makes sense. Since the QEMU server handles all the config space
accesses, it’s not necessary to register the device’s vendor/device ID with the 
library.

Thank you!
--
Jag

> 
> regards
> john



[PATCH] gitlab-ci.d/custom-runners: Improve rules for the staging branch

2021-07-28 Thread Thomas Huth
If maintainers are currently pushing to a branch called "staging"
in their repository, they are ending up with some stuck jobs - unless
they have a s390x CI runner machine available. That's ugly, we should
make sure that the related jobs are really only started if such a
runner is available. So let's only run these jobs if it's the
"staging" branch of the main repository of the QEMU project (where
we can be sure that the s390x runner is available), or if the user
explicitly set a S390X_RUNNER_AVAILABLE variable in their CI configs
to declare that they have such a runner available, too.

Fixes: 4799c21023 ("Jobs based on custom runners: add job definitions ...")
Signed-off-by: Thomas Huth 
---
 .gitlab-ci.d/custom-runners.yml | 40 +++--
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
index 061d3cdfed..564b94565d 100644
--- a/.gitlab-ci.d/custom-runners.yml
+++ b/.gitlab-ci.d/custom-runners.yml
@@ -24,7 +24,8 @@ ubuntu-18.04-s390x-all-linux-static:
  - ubuntu_18.04
  - s390x
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  # --disable-libssh is needed because of 
https://bugs.launchpad.net/qemu/+bug/1838763
  # --disable-glusterfs is needed because there's no static version of those 
libs in distro supplied packages
@@ -43,7 +44,8 @@ ubuntu-18.04-s390x-all:
  - ubuntu_18.04
  - s390x
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  - mkdir build
  - cd build
@@ -59,7 +61,8 @@ ubuntu-18.04-s390x-alldbg:
  - ubuntu_18.04
  - s390x
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  - mkdir build
  - cd build
@@ -76,7 +79,9 @@ ubuntu-18.04-s390x-clang:
  - ubuntu_18.04
  - s390x
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+ - if: "$S390X_RUNNER_AVAILABLE"
when: manual
  script:
  - mkdir build
@@ -93,7 +98,8 @@ ubuntu-18.04-s390x-tci:
  - ubuntu_18.04
  - s390x
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  - mkdir build
  - cd build
@@ -108,7 +114,9 @@ ubuntu-18.04-s390x-notcg:
  - ubuntu_18.04
  - s390x
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+ - if: "$S390X_RUNNER_AVAILABLE"
when: manual
  script:
  - mkdir build
@@ -128,7 +136,8 @@ ubuntu-20.04-aarch64-all-linux-static:
  - ubuntu_20.04
  - aarch64
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  # --disable-libssh is needed because of 
https://bugs.launchpad.net/qemu/+bug/1838763
  # --disable-glusterfs is needed because there's no static version of those 
libs in distro supplied packages
@@ -147,7 +156,8 @@ ubuntu-20.04-aarch64-all:
  - ubuntu_20.04
  - aarch64
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  - mkdir build
  - cd build
@@ -163,7 +173,8 @@ ubuntu-20.04-aarch64-alldbg:
  - ubuntu_20.04
  - aarch64
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  - mkdir build
  - cd build
@@ -180,7 +191,9 @@ ubuntu-20.04-aarch64-clang:
  - ubuntu_20.04
  - aarch64
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+ - if: "$S390X_RUNNER_AVAILABLE"
when: manual
  script:
  - mkdir build
@@ -197,7 +210,8 @@ ubuntu-20.04-aarch64-tci:
  - ubuntu_20.04
  - aarch64
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+ - if: "$S390X_RUNNER_AVAILABLE"
  script:
  - mkdir build
  - cd build
@@ -212,7 +226,9 @@ ubuntu-20.04-aarch64-notcg:
  - ubuntu_20.04
  - aarch64
  rules:
- - if: '$CI_COMMIT_BRANCH =~ /^staging/'
+ - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
/^staging/'
+   when: manual
+ - if: "$S390X_RUNNER_AVAILABLE"
when: manual
  script:
  - mkdir build
-- 
2.27.0




Re: [PATCH v2 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager

2021-07-28 Thread David Hildenbrand

On 27.07.21 19:10, Peter Xu wrote:

On Tue, Jul 27, 2021 at 11:25:09AM +0200, David Hildenbrand wrote:

For 2) I see 3 options:

a) Sync everything, fixup the dirty bitmap, never clear the dirty log of
discarded parts. It's fairly simple and straight forward, as I can simply
reuse the existing helper. Something that's discarded will never be dirty,
not even if a misbehaving guest touches memory it shouldn't. [this patch]

b) Sync only populated parts, no need to fixup the dirty bitmap, never clear
the dirty log of discarded parts. It's a bit more complicated but achieves
the same goal as a). [optimization I propose for the future]

c) Sync everything, don't fixup the dirty bitmap, clear the dirty log of
discarded parts initially. There are ways we still might migrate discarded
ranges, for example, if a misbehaving guest touches memory it shouldn't.
[what you propose]

Is my understanding correct? Any reasons why we should chose c) over b) long
term or c) over a) short term?


My major concern is we could do something during sync() for not a very good
reason by looping over virtio-mem bitmap for disgarded ranges - IIUC it should
be destined to be merely no-op if the guest is well-behaved, am I right?


I think yes, as long as we have no initially-set bit stuck somewhere.



Meanwhile, I still have no idea how much overhead the "loop" part could bring.
For a large virtio-mem region with frequent plugged/unplugged mem interacted,
it seems possible to take a while to me..  I have no solid idea yet.


Let's do some math. Assume the worst case on a 1TiB device with a 2MiB 
block size: We have 524288 blocks == bits. That's precisely a 64k bitmap 
in virtio-mem. In the worst case, every second bit would be clear 
("discarded"). For each clear bit ("discarded"), we would have to clear 
512 bits (64 bytes) in the dirty bitmap. That's storing 32 MiB.


So scanning 64 KiB, writing 32 MiB. Certainly not perfect, but I am not 
sure if it will really matter doing that once on every bitmap sync. I 
guess the bitmap syncing itself is much more expensive -- and not 
syncing the discarded ranges (b ) above) would make a bigger impact I guess.




The thing is I still think this extra operation during sync() can be ignored by
simply clear dirty log during bitmap init, then.. why not? :)


I guess clearing the dirty log (especially in KVM) might be more 
expensive. But, anyhow, we actually want b) long-term :)




Clear dirty bitmap is as simple as "reprotect the pages" functional-wise - if
they are unplugged memory ranges, and they shouldn't be written by the guest
(we still allow reads even for virtio-mem compatibility), then I don't see it
an issue to wr-protect it using clear dirty log when bitmap init.

It still makes sense to me to keep the dirty/clear bitmap in-sync, at least
before your plan b proposal; leaving the dirty bits set forever on unplugged
memory is okay but just sounds a bit weird.

Though my concern is only valid when virtio-mem is used, so I don't have a
strong opinion on it as you maintains virtio-mem. I believe you will always
have a better judgement than me on that. Especially, when/if Dave & Juan have
no problem on that. :)


I'd certainly sleep better at night if I can be 100% sure that a page 
not to be migrated will not get migrated. :)


I'll play with initial clearing and see how much of a difference it 
makes code wise. Thanks a lot for your feedback!


--
Thanks,

David / dhildenb




[Bug 1908626] Re: Atomic test-and-set instruction does not work on qemu-user

2021-07-28 Thread Thomas Huth
Hi taos! Could you please check whether this has been fixed already in
QEMU v6.1.0-rc1 ?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1908626

Title:
  Atomic test-and-set instruction does not work on qemu-user

Status in QEMU:
  Expired

Bug description:
  I try to compile and run PostgreSQL/Greenplum database inside docker 
container/qemu-aarch64-static:
  ```
   host: CentOS7 x86_64
   container: centos:centos7.9.2009 --platform linux/arm64/v8
   qemu-user-static: https://github.com/multiarch/qemu-user-static/releases/
  ```

  However, GP/PG's spinlock always gets stuck and reports PANIC errors. It 
seems its spinlock
  has something wrong.
  ```
  https://github.com/greenplum-db/gpdb/blob/master/src/include/storage/s_lock.h
  
https://github.com/greenplum-db/gpdb/blob/master/src/backend/storage/lmgr/s_lock.c
  ```

  So I extract its spinlock implementation into one test C source file (see 
attachment file),
  and get reprodcued:

  ```
  $ gcc spinlock_qemu.c
  $ ./a.out 
  C -- slock inited, lock value is: 0
  parent 139642, child 139645
  P -- slock lock before, lock value is: 0
  P -- slock locked, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock lock before, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock locked, lock value is: 1
  C -- slock unlock after, lock value is: 0
  C -- slock lock before, lock value is: 1
  P -- slock locked, lock value is: 1
  P -- slock unlock after, lock value is: 0
  P -- slock lock before, lock value is: 1
  C -- slock locked, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock locked, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  C -- slock locked, lock value is: 1
  P -- slock lock before, lock value is: 1
  C -- slock unlock after, lock value is: 0
  P -- slock locked, lock value is: 1
  C -- slock lock before, lock value is: 1
  P -- slock unlock after, lock value is: 0
  P -- slock lock before, lock value is: 1
  spin timeout, lock value is 1 (pid 139642)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139642)
  spin timeout, lock value is 1 (pid 139645)
  spin timeout, lock value is 1 (pid 139642)
  ...
  ...
  ...
  ```

  NOTE: this code always works on PHYSICAL ARM64 server.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1908626/+subscriptions




Re: [PATCH RFC 03/19] vfio-user: define VFIO Proxy and communication functions

2021-07-28 Thread John Johnson


> On Jul 27, 2021, at 9:34 AM, Stefan Hajnoczi  wrote:
> 
> On Sun, Jul 18, 2021 at 11:27:42PM -0700, Elena Ufimtseva wrote:
>> From: John G Johnson 
>> 
>> Add user.c and user.h files for vfio-user with the basic
>> send and receive functions.
>> 
>> Signed-off-by: John G Johnson 
>> Signed-off-by: Elena Ufimtseva 
>> Signed-off-by: Jagannathan Raman 
>> ---
>> hw/vfio/user.h| 120 ++
>> include/hw/vfio/vfio-common.h |   2 +
>> hw/vfio/user.c| 286 ++
>> MAINTAINERS   |   4 +
>> hw/vfio/meson.build   |   1 +
>> 5 files changed, 413 insertions(+)
>> create mode 100644 hw/vfio/user.h
>> create mode 100644 hw/vfio/user.c
> 
> The multi-threading, coroutine, and blocking I/O requirements of
> vfio_user_recv() and vfio_user_send_reply() are unclear to me. Please
> document them so it's clear what environment they can be called from. I
> guess they are not called from coroutines and proxy->ioc is a blocking
> IOChannel?
> 

Yes to both, moreover, a block comment above vfio_user_recv() would
be useful.  The call to setup vfio_user_recv() as the socket handler isn’t
in this patch, do you want the series re-org’d?



>> 
>> diff --git a/hw/vfio/user.h b/hw/vfio/user.h
>> new file mode 100644
>> index 00..cdbc074579
>> --- /dev/null
>> +++ b/hw/vfio/user.h
>> @@ -0,0 +1,120 @@
>> +#ifndef VFIO_USER_H
>> +#define VFIO_USER_H
>> +
>> +/*
>> + * vfio protocol over a UNIX socket.
>> + *
>> + * Copyright © 2018, 2021 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + * Each message has a standard header that describes the command
>> + * being sent, which is almost always a VFIO ioctl().
>> + *
>> + * The header may be followed by command-specfic data, such as the
>> + * region and offset info for read and write commands.
>> + */
>> +
>> +/* commands */
>> +enum vfio_user_command {
>> +VFIO_USER_VERSION   = 1,
>> +VFIO_USER_DMA_MAP   = 2,
>> +VFIO_USER_DMA_UNMAP = 3,
>> +VFIO_USER_DEVICE_GET_INFO   = 4,
>> +VFIO_USER_DEVICE_GET_REGION_INFO= 5,
>> +VFIO_USER_DEVICE_GET_REGION_IO_FDS  = 6,
>> +VFIO_USER_DEVICE_GET_IRQ_INFO   = 7,
>> +VFIO_USER_DEVICE_SET_IRQS   = 8,
>> +VFIO_USER_REGION_READ   = 9,
>> +VFIO_USER_REGION_WRITE  = 10,
>> +VFIO_USER_DMA_READ  = 11,
>> +VFIO_USER_DMA_WRITE = 12,
>> +VFIO_USER_DEVICE_RESET  = 13,
>> +VFIO_USER_DIRTY_PAGES   = 14,
>> +VFIO_USER_MAX,
>> +};
>> +
>> +/* flags */
>> +#define VFIO_USER_REQUEST   0x0
>> +#define VFIO_USER_REPLY 0x1
>> +#define VFIO_USER_TYPE  0xF
>> +
>> +#define VFIO_USER_NO_REPLY  0x10
>> +#define VFIO_USER_ERROR 0x20
>> +
>> +typedef struct vfio_user_hdr {
>> +uint16_t id;
>> +uint16_t command;
>> +uint32_t size;
>> +uint32_t flags;
>> +uint32_t error_reply;
>> +} vfio_user_hdr_t;
> 
> Please use QEMU coding style in QEMU code (i.e. not shared with Linux or
> external libraries):
> 
>  typedef struct {
>  ...
>  } VfioUserHdr;
> 
> You can also specify the struct VfioUserHdr tag if you want but it's
> only necessary to reference the struct before the end of the typedef
> definition.
> 
> https://qemu-project.gitlab.io/qemu/devel/style.html
> 

OK

>> +
>> +/*
>> + * VFIO_USER_VERSION
>> + */
>> +#define VFIO_USER_MAJOR_VER 0
>> +#define VFIO_USER_MINOR_VER 0
>> +
>> +struct vfio_user_version {
>> +vfio_user_hdr_t hdr;
>> +uint16_t major;
>> +uint16_t minor;
>> +char capabilities[];
>> +};
>> +
>> +#define VFIO_USER_DEF_MAX_FDS   8
>> +#define VFIO_USER_MAX_MAX_FDS   16
>> +
>> +#define VFIO_USER_DEF_MAX_XFER  (1024 * 1024)
>> +#define VFIO_USER_MAX_MAX_XFER  (64 * 1024 * 1024)
>> +
>> +typedef struct VFIOUserFDs {
>> +int send_fds;
>> +int recv_fds;
>> +int *fds;
>> +} VFIOUserFDs;
> 
> I think around here we switch from vfio-user spec definitions to QEMU
> implementation details. It might be nice to keep the vfio-user spec
> definitions in a separate header file so the boundary is clear.
> 

OK


>> +
>> +typedef struct VFIOUserReply {
>> +QTAILQ_ENTRY(VFIOUserReply) next;
>> +vfio_user_hdr_t *msg;
>> +VFIOUserFDs *fds;
>> +int rsize;
>> +uint32_t id;
>> +QemuCond cv;
>> +uint8_t complete;
> 
> Please use bool.
> 

OK

>> +} VFIOUserReply;
>> +
>> +enum proxy_state {
>> +CONNECTED = 1,
>> +RECV_ERROR = 2,
>> +CLOSING = 3,
>> +CLOSED = 4,
>> +};
> 
> These enum values probably need a prefix (VFIO_PROXY_*). Generic short
> names like CONNECTED, CLOSED, etc could lead to namespace collisions.
> Enum constants are in the global namespace.
> 

OK


>> +
>> +

[PATCH-for-6.1 0/3] hw/sd/sdcard: Fix assertion accessing out-of-range addresses with CMD30

2021-07-28 Thread Philippe Mathieu-Daudé
Fix an assertion reported by OSS-Fuzz, add corresponding qtest.

The change simple enough for the next rc.

Philippe Mathieu-Daudé (3):
  hw/sd/sdcard: Document out-of-range addresses for SEND_WRITE_PROT
  hw/sd/sdcard: Fix assertion accessing out-of-range addresses with
CMD30
  hw/sd/sdcard: Rename Write Protect Group variables

 hw/sd/sd.c | 37 --
 tests/qtest/fuzz-sdcard-test.c | 36 +
 2 files changed, 58 insertions(+), 15 deletions(-)

-- 
2.31.1




[PATCH-for-6.1 1/3] hw/sd/sdcard: Document out-of-range addresses for SEND_WRITE_PROT

2021-07-28 Thread Philippe Mathieu-Daudé
Per the 'Physical Layer Simplified Specification Version 3.01',
Table 4-22: 'Block Oriented Write Protection Commands'

  SEND_WRITE_PROT (CMD30)

  If the card provides write protection features, this command asks
  the card to send the status of the write protection bits [1].

  [1] 32 write protection bits (representing 32 write protect groups
  starting at the specified address) [...]
  The last (least significant) bit of the protection bits corresponds
  to the first addressed group. If the addresses of the last groups
  are outside the valid range, then the corresponding write protection
  bits shall be set to 0.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 1f964e022b1..707dcc12a14 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -822,7 +822,14 @@ static uint32_t sd_wpbits(SDState *sd, uint64_t addr)
 
 for (i = 0; i < 32; i++, wpnum++, addr += WPGROUP_SIZE) {
 assert(wpnum < sd->wpgrps_size);
-if (addr < sd->size && test_bit(wpnum, sd->wp_groups)) {
+if (addr >= sd->size) {
+/*
+ * If the addresses of the last groups are outside the valid range,
+ * then the corresponding write protection bits shall be set to 0.
+ */
+continue;
+}
+if (test_bit(wpnum, sd->wp_groups)) {
 ret |= (1 << i);
 }
 }
-- 
2.31.1




[PATCH-for-6.1 2/3] hw/sd/sdcard: Fix assertion accessing out-of-range addresses with CMD30

2021-07-28 Thread Philippe Mathieu-Daudé
OSS-Fuzz found sending illegal addresses when querying the write
protection bits triggers the assertion added in commit 84816fb63e5
("hw/sd/sdcard: Assert if accessing an illegal group"):

  qemu-fuzz-i386-target-generic-fuzz-sdhci-v3: ../hw/sd/sd.c:824: uint32_t 
sd_wpbits(SDState *, uint64_t):
  Assertion `wpnum < sd->wpgrps_size' failed.
  #3 0x7f62a8b22c91 in __assert_fail
  #4 0x5569adcec405 in sd_wpbits hw/sd/sd.c:824:9
  #5 0x5569adce5f6d in sd_normal_command hw/sd/sd.c:1389:38
  #6 0x5569adce3870 in sd_do_command hw/sd/sd.c:1737:17
  #7 0x5569adcf1566 in sdbus_do_command hw/sd/core.c:100:16
  #8 0x5569adcfc192 in sdhci_send_command hw/sd/sdhci.c:337:12
  #9 0x5569adcfa3a3 in sdhci_write hw/sd/sdhci.c:1186:9
  #10 0x5569adfb3447 in memory_region_write_accessor softmmu/memory.c:492:5

It is legal for the CMD30 to query for out-of-range addresses.
Such invalid addresses are simply ignored in the response (write
protection bits set to 0).

Note, we had an off-by-one in the wpgrps_size check since commit
a1bb27b1e98. Since we have a total of 'wpgrps_size' bits, the latest
valid group bit is 'wpgrps_size - 1'.

Since we now check the group bit is in range, remove the assertion.

Include the qtest reproducer provided by Alexander Bulekov:

  $ make check-qtest-i386
  ...
  Running test qtest-i386/fuzz-sdcard-test
  qemu-system-i386: ../hw/sd/sd.c:824: sd_wpbits: Assertion `wpnum < 
sd->wpgrps_size' failed.

Cc: qemu-sta...@nongnu.org
Reported-by: OSS-Fuzz (Issue 29225)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/495
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c |  4 ++--
 tests/qtest/fuzz-sdcard-test.c | 36 ++
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 707dcc12a14..273af75c1be 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -820,8 +820,8 @@ static uint32_t sd_wpbits(SDState *sd, uint64_t addr)
 
 wpnum = sd_addr_to_wpnum(addr);
 
-for (i = 0; i < 32; i++, wpnum++, addr += WPGROUP_SIZE) {
-assert(wpnum < sd->wpgrps_size);
+for (i = 0; i < 32 && wpnum < sd->wpgrps_size - 1;
+i++, wpnum++, addr += WPGROUP_SIZE) {
 if (addr >= sd->size) {
 /*
  * If the addresses of the last groups are outside the valid range,
diff --git a/tests/qtest/fuzz-sdcard-test.c b/tests/qtest/fuzz-sdcard-test.c
index 96602eac7e5..ae14305344a 100644
--- a/tests/qtest/fuzz-sdcard-test.c
+++ b/tests/qtest/fuzz-sdcard-test.c
@@ -52,6 +52,41 @@ static void oss_fuzz_29225(void)
 qtest_quit(s);
 }
 
+/*
+ * https://gitlab.com/qemu-project/qemu/-/issues/495
+ * Used to trigger:
+ *  Assertion `wpnum < sd->wpgrps_size' failed.
+ */
+static void oss_fuzz_36217(void)
+{
+QTestState *s;
+
+s = qtest_init(" -display none -m 32 -nodefaults -nographic"
+   " -device sdhci-pci,sd-spec-version=3 "
+   "-device sd-card,drive=d0 "
+   "-drive if=none,index=0,file=null-co://,format=raw,id=d0");
+
+qtest_outl(s, 0xcf8, 0x80001010);
+qtest_outl(s, 0xcfc, 0xe000);
+qtest_outl(s, 0xcf8, 0x80001004);
+qtest_outw(s, 0xcfc, 0x02);
+qtest_bufwrite(s, 0xe02c, "\x05", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x37", 0x1);
+qtest_bufwrite(s, 0xe00a, "\x01", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x29", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x02", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x03", 0x1);
+qtest_bufwrite(s, 0xe005, "\x01", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x06", 0x1);
+qtest_bufwrite(s, 0xe00c, "\x05", 0x1);
+qtest_bufwrite(s, 0xe00e, "\x20", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x08", 0x1);
+qtest_bufwrite(s, 0xe00b, "\x3d", 0x1);
+qtest_bufwrite(s, 0xe00f, "\x1e", 0x1);
+
+qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
 const char *arch = qtest_get_arch();
@@ -60,6 +95,7 @@ int main(int argc, char **argv)
 
if (strcmp(arch, "i386") == 0) {
 qtest_add_func("fuzz/sdcard/oss_fuzz_29225", oss_fuzz_29225);
+qtest_add_func("fuzz/sdcard/oss_fuzz_36217", oss_fuzz_36217);
}
 
return g_test_run();
-- 
2.31.1




[PATCH-for-6.2 3/3] hw/sd/sdcard: Rename Write Protect Group variables

2021-07-28 Thread Philippe Mathieu-Daudé
'wp_groups' holds a bitmap, rename it as 'wp_group_bmap'.
'wpgrps_size' is the bitmap size (in bits), rename it as
'wp_group_bits'.

Patch created mechanically using:

  $ sed -i -e s/wp_groups/wp_group_bmap/ \
   -e s/wpgrps_size/wp_group_bits/ hw/sd/sd.c

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sd.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 273af75c1be..75dcd3f7f65 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -116,8 +116,8 @@ struct SDState {
 int32_t state;/* current card state, one of SDCardStates */
 uint32_t vhs;
 bool wp_switch;
-unsigned long *wp_groups;
-int32_t wpgrps_size;
+unsigned long *wp_group_bmap;
+int32_t wp_group_bits;
 uint64_t size;
 uint32_t blk_len;
 uint32_t multi_blk_cnt;
@@ -567,10 +567,10 @@ static void sd_reset(DeviceState *dev)
 sd_set_cardstatus(sd);
 sd_set_sdstatus(sd);
 
-g_free(sd->wp_groups);
+g_free(sd->wp_group_bmap);
 sd->wp_switch = sd->blk ? !blk_is_writable(sd->blk) : false;
-sd->wpgrps_size = sect;
-sd->wp_groups = bitmap_new(sd->wpgrps_size);
+sd->wp_group_bits = sect;
+sd->wp_group_bmap = bitmap_new(sd->wp_group_bits);
 memset(sd->function_group, 0, sizeof(sd->function_group));
 sd->erase_start = INVALID_ADDRESS;
 sd->erase_end = INVALID_ADDRESS;
@@ -673,7 +673,7 @@ static const VMStateDescription sd_vmstate = {
 VMSTATE_UINT32(card_status, SDState),
 VMSTATE_PARTIAL_BUFFER(sd_status, SDState, 1),
 VMSTATE_UINT32(vhs, SDState),
-VMSTATE_BITMAP(wp_groups, SDState, 0, wpgrps_size),
+VMSTATE_BITMAP(wp_group_bmap, SDState, 0, wp_group_bits),
 VMSTATE_UINT32(blk_len, SDState),
 VMSTATE_UINT32(multi_blk_cnt, SDState),
 VMSTATE_UINT32(erase_start, SDState),
@@ -803,8 +803,8 @@ static void sd_erase(SDState *sd)
 if (sdsc) {
 /* Only SDSC cards support write protect groups */
 wpnum = sd_addr_to_wpnum(erase_addr);
-assert(wpnum < sd->wpgrps_size);
-if (test_bit(wpnum, sd->wp_groups)) {
+assert(wpnum < sd->wp_group_bits);
+if (test_bit(wpnum, sd->wp_group_bmap)) {
 sd->card_status |= WP_ERASE_SKIP;
 continue;
 }
@@ -820,7 +820,7 @@ static uint32_t sd_wpbits(SDState *sd, uint64_t addr)
 
 wpnum = sd_addr_to_wpnum(addr);
 
-for (i = 0; i < 32 && wpnum < sd->wpgrps_size - 1;
+for (i = 0; i < 32 && wpnum < sd->wp_group_bits - 1;
 i++, wpnum++, addr += WPGROUP_SIZE) {
 if (addr >= sd->size) {
 /*
@@ -829,7 +829,7 @@ static uint32_t sd_wpbits(SDState *sd, uint64_t addr)
  */
 continue;
 }
-if (test_bit(wpnum, sd->wp_groups)) {
+if (test_bit(wpnum, sd->wp_group_bmap)) {
 ret |= (1 << i);
 }
 }
@@ -869,7 +869,7 @@ static void sd_function_switch(SDState *sd, uint32_t arg)
 
 static inline bool sd_wp_addr(SDState *sd, uint64_t addr)
 {
-return test_bit(sd_addr_to_wpnum(addr), sd->wp_groups);
+return test_bit(sd_addr_to_wpnum(addr), sd->wp_group_bmap);
 }
 
 static void sd_lock_command(SDState *sd)
@@ -897,7 +897,7 @@ static void sd_lock_command(SDState *sd)
 sd->card_status |= LOCK_UNLOCK_FAILED;
 return;
 }
-bitmap_zero(sd->wp_groups, sd->wpgrps_size);
+bitmap_zero(sd->wp_group_bmap, sd->wp_group_bits);
 sd->csd[14] &= ~0x10;
 sd->card_status &= ~CARD_IS_LOCKED;
 sd->pwd_len = 0;
@@ -1348,7 +1348,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 }
 
 sd->state = sd_programming_state;
-set_bit(sd_addr_to_wpnum(addr), sd->wp_groups);
+set_bit(sd_addr_to_wpnum(addr), sd->wp_group_bmap);
 /* Bzzztt  Operation complete.  */
 sd->state = sd_transfer_state;
 return sd_r1b;
@@ -1370,7 +1370,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd, 
SDRequest req)
 }
 
 sd->state = sd_programming_state;
-clear_bit(sd_addr_to_wpnum(addr), sd->wp_groups);
+clear_bit(sd_addr_to_wpnum(addr), sd->wp_group_bmap);
 /* Bzzztt  Operation complete.  */
 sd->state = sd_transfer_state;
 return sd_r1b;
-- 
2.31.1




Re: [PATCH] gitlab-ci.d/custom-runners: Improve rules for the staging branch

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 7:38 PM, Thomas Huth wrote:
> If maintainers are currently pushing to a branch called "staging"
> in their repository, they are ending up with some stuck jobs - unless
> they have a s390x CI runner machine available. That's ugly, we should
> make sure that the related jobs are really only started if such a
> runner is available. So let's only run these jobs if it's the
> "staging" branch of the main repository of the QEMU project (where
> we can be sure that the s390x runner is available), or if the user
> explicitly set a S390X_RUNNER_AVAILABLE variable in their CI configs
> to declare that they have such a runner available, too.
> 
> Fixes: 4799c21023 ("Jobs based on custom runners: add job definitions ...")
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/custom-runners.yml | 40 +++--
>  1 file changed, 28 insertions(+), 12 deletions(-)
> 
> diff --git a/.gitlab-ci.d/custom-runners.yml b/.gitlab-ci.d/custom-runners.yml
> index 061d3cdfed..564b94565d 100644
> --- a/.gitlab-ci.d/custom-runners.yml
> +++ b/.gitlab-ci.d/custom-runners.yml
> @@ -24,7 +24,8 @@ ubuntu-18.04-s390x-all-linux-static:
>   - ubuntu_18.04
>   - s390x
>   rules:
> - - if: '$CI_COMMIT_BRANCH =~ /^staging/'
> + - if: '$CI_PROJECT_NAMESPACE == "qemu-project" && $CI_COMMIT_BRANCH =~ 
> /^staging/'
> + - if: "$S390X_RUNNER_AVAILABLE"

If you base this patch on top of "docs: Document GitLab
custom CI/CD variables" that you already queued, you can
directly add a description for S390X_RUNNER_AVAILABLE in
docs/devel/ci.rst, but this can be done later too.

Regardless of whether docs/devel/ci.rst is updated:
Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v3 1/8] cpus: Export queue work related fields to cpu.h

2021-07-28 Thread Peter Xu
This patch has no functional change, but prepares for moving the function
do_run_on_cpu() into softmmu/cpus.c.  It does:

  1. Move qemu_work_item into hw/core/cpu.h.
  2. Export queue_work_on_cpu()/qemu_work_cond.

All of them will be used by softmmu/cpus.c later.

Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 cpus-common.c | 11 ++-
 include/hw/core/cpu.h | 10 +-
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e58d..d814b2439a 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -27,7 +27,7 @@
 static QemuMutex qemu_cpu_list_lock;
 static QemuCond exclusive_cond;
 static QemuCond exclusive_resume;
-static QemuCond qemu_work_cond;
+QemuCond qemu_work_cond;
 
 /* >= 1 if a thread is inside start_exclusive/end_exclusive.  Written
  * under qemu_cpu_list_lock, read with atomic operations.
@@ -114,14 +114,7 @@ CPUState *qemu_get_cpu(int index)
 /* current CPU in the current thread. It is only valid inside cpu_exec() */
 __thread CPUState *current_cpu;
 
-struct qemu_work_item {
-QSIMPLEQ_ENTRY(qemu_work_item) node;
-run_on_cpu_func func;
-run_on_cpu_data data;
-bool free, exclusive, done;
-};
-
-static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
 qemu_mutex_lock(&cpu->work_mutex);
 QSIMPLEQ_INSERT_TAIL(&cpu->work_list, wi, node);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index bc864564ce..f62ae88524 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -243,7 +243,15 @@ typedef union {
 
 typedef void (*run_on_cpu_func)(CPUState *cpu, run_on_cpu_data data);
 
-struct qemu_work_item;
+struct qemu_work_item {
+QSIMPLEQ_ENTRY(qemu_work_item) node;
+run_on_cpu_func func;
+run_on_cpu_data data;
+bool free, exclusive, done;
+};
+
+void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi);
+extern QemuCond qemu_work_cond;
 
 #define CPU_UNSET_NUMA_NODE_ID -1
 #define CPU_TRACE_DSTATE_MAX_EVENTS 32
-- 
2.31.1




[PATCH v3 3/8] memory: Introduce memory_region_transaction_depth_{inc|dec}()

2021-07-28 Thread Peter Xu
memory_region_transaction_{begin|commit}() could be too big when finalizing a
memory region.  E.g., we should never attempt to update address space topology
during the finalize() of a memory region.  Provide helpers for further use.

Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index bfedaf9c4d..725d57ec17 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1079,10 +1079,20 @@ static void address_space_update_topology(AddressSpace 
*as)
 address_space_set_flatview(as);
 }
 
+static void memory_region_transaction_depth_inc(void)
+{
+memory_region_transaction_depth++;
+}
+
+static void memory_region_transaction_depth_dec(void)
+{
+memory_region_transaction_depth--;
+}
+
 void memory_region_transaction_begin(void)
 {
 qemu_flush_coalesced_mmio_buffer();
-++memory_region_transaction_depth;
+memory_region_transaction_depth_inc();
 }
 
 void memory_region_transaction_commit(void)
@@ -1092,7 +1102,7 @@ void memory_region_transaction_commit(void)
 assert(memory_region_transaction_depth);
 assert(qemu_mutex_iothread_locked());
 
---memory_region_transaction_depth;
+memory_region_transaction_depth_dec();
 if (!memory_region_transaction_depth) {
 if (memory_region_update_pending) {
 flatviews_reset();
-- 
2.31.1




[PATCH v3 2/8] cpus: Move do_run_on_cpu into softmmu/cpus.c

2021-07-28 Thread Peter Xu
It's only used by softmmu binaries not linux-user ones.  Make it static and
drop the definition in the header too.

Since at it, initialize variable "wi" with less loc.

Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 cpus-common.c | 25 -
 include/hw/core/cpu.h | 12 
 softmmu/cpus.c| 23 +++
 3 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/cpus-common.c b/cpus-common.c
index d814b2439a..670826363f 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -124,31 +124,6 @@ void queue_work_on_cpu(CPUState *cpu, struct 
qemu_work_item *wi)
 qemu_cpu_kick(cpu);
 }
 
-void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
-   QemuMutex *mutex)
-{
-struct qemu_work_item wi;
-
-if (qemu_cpu_is_self(cpu)) {
-func(cpu, data);
-return;
-}
-
-wi.func = func;
-wi.data = data;
-wi.done = false;
-wi.free = false;
-wi.exclusive = false;
-
-queue_work_on_cpu(cpu, &wi);
-while (!qatomic_mb_read(&wi.done)) {
-CPUState *self_cpu = current_cpu;
-
-qemu_cond_wait(&qemu_work_cond, mutex);
-current_cpu = self_cpu;
-}
-}
-
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data 
data)
 {
 struct qemu_work_item *wi;
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index f62ae88524..711ecad62f 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -689,18 +689,6 @@ void qemu_cpu_kick(CPUState *cpu);
  */
 bool cpu_is_stopped(CPUState *cpu);
 
-/**
- * do_run_on_cpu:
- * @cpu: The vCPU to run on.
- * @func: The function to be executed.
- * @data: Data to pass to the function.
- * @mutex: Mutex to release while waiting for @func to run.
- *
- * Used internally in the implementation of run_on_cpu.
- */
-void do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
-   QemuMutex *mutex);
-
 /**
  * run_on_cpu:
  * @cpu: The vCPU to run on.
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840..49e0368438 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -382,6 +382,29 @@ void qemu_init_cpu_loop(void)
 qemu_thread_get_self(&io_thread);
 }
 
+static void
+do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
+  QemuMutex *mutex)
+{
+struct qemu_work_item wi = {
+.func = func,
+.data = data,
+};
+
+if (qemu_cpu_is_self(cpu)) {
+func(cpu, data);
+return;
+}
+
+queue_work_on_cpu(cpu, &wi);
+while (!qatomic_mb_read(&wi.done)) {
+CPUState *self_cpu = current_cpu;
+
+qemu_cond_wait(&qemu_work_cond, mutex);
+current_cpu = self_cpu;
+}
+}
+
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
 {
 do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
-- 
2.31.1




[PATCH v3 6/8] cpus: Remove the mutex parameter from do_run_on_cpu()

2021-07-28 Thread Peter Xu
We must use the BQL for do_run_on_cpu() without much choice, it means the
parameter is useless.  Remove it.  Meanwhile use the newly introduced
qemu_cond_wait_iothread() in do_run_on_cpu().

Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 softmmu/cpus.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index e714dfbf2b..9154cd7e78 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -383,8 +383,7 @@ void qemu_init_cpu_loop(void)
 }
 
 static void
-do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data,
-  QemuMutex *mutex)
+do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
 {
 struct qemu_work_item wi = {
 .func = func,
@@ -400,14 +399,14 @@ do_run_on_cpu(CPUState *cpu, run_on_cpu_func func, 
run_on_cpu_data data,
 while (!qatomic_mb_read(&wi.done)) {
 CPUState *self_cpu = current_cpu;
 
-qemu_cond_wait(&qemu_work_cond, mutex);
+qemu_cond_wait_iothread(&qemu_work_cond);
 current_cpu = self_cpu;
 }
 }
 
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
 {
-do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
+do_run_on_cpu(cpu, func, data);
 }
 
 static void qemu_cpu_stop(CPUState *cpu, bool exit)
-- 
2.31.1




[PATCH v3 4/8] memory: Don't do topology update in memory finalize()

2021-07-28 Thread Peter Xu
Topology update could be wrongly triggered in memory region finalize() if
there's bug somewhere else.  It'll be a very confusing stack when it
happens (e.g., sending KVM ioctl within the RCU thread, and we'll observe it
only until it fails!).

Instead of that, we use the push()/pop() helper to avoid memory transaction
commit, at the same time we use assertions to make sure there's no pending
updates or it's a nested transaction, so it could fail even earlier and in a
more explicit way.

Suggested-by: Paolo Bonzini 
Acked-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 725d57ec17..35b2568fc2 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -170,6 +170,12 @@ struct MemoryRegionIoeventfd {
 EventNotifier *e;
 };
 
+/* Returns whether there's any pending memory updates */
+static bool memory_region_has_pending_update(void)
+{
+return memory_region_update_pending || ioeventfd_update_pending;
+}
+
 static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
MemoryRegionIoeventfd *b)
 {
@@ -1756,12 +1762,25 @@ static void memory_region_finalize(Object *obj)
  * and cause an infinite loop.
  */
 mr->enabled = false;
-memory_region_transaction_begin();
+
+/*
+ * Use depth_inc()/depth_dec() instead of begin()/commit() to make sure
+ * below block won't trigger any topology update (which should never
+ * happen, but it's still a safety belt).
+ */
+memory_region_transaction_depth_inc();
 while (!QTAILQ_EMPTY(&mr->subregions)) {
 MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
 memory_region_del_subregion(mr, subregion);
 }
-memory_region_transaction_commit();
+memory_region_transaction_depth_dec();
+
+/*
+ * Make sure we're either in a nested transaction or there must have no
+ * pending updates due to memory_region_del_subregion() above.
+ */
+assert(memory_region_transaction_depth ||
+   !memory_region_has_pending_update());
 
 mr->destructor(mr);
 memory_region_clear_coalescing(mr);
-- 
2.31.1




[PATCH v3 5/8] cpus: Use qemu_cond_wait_iothread() where proper

2021-07-28 Thread Peter Xu
The helper is introduced but we've still got plenty of places that are directly
referencing the qemu_global_mutex itself.  Spread the usage.

Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 softmmu/cpus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 49e0368438..e714dfbf2b 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -439,7 +439,7 @@ void qemu_wait_io_event(CPUState *cpu)
 slept = true;
 qemu_plugin_vcpu_idle_cb(cpu);
 }
-qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+qemu_cond_wait_iothread(cpu->halt_cond);
 }
 if (slept) {
 qemu_plugin_vcpu_resume_cb(cpu);
@@ -582,7 +582,7 @@ void pause_all_vcpus(void)
 replay_mutex_unlock();
 
 while (!all_vcpus_paused()) {
-qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
+qemu_cond_wait_iothread(&qemu_pause_cond);
 CPU_FOREACH(cpu) {
 qemu_cpu_kick(cpu);
 }
@@ -653,7 +653,7 @@ void qemu_init_vcpu(CPUState *cpu)
 cpus_accel->create_vcpu_thread(cpu);
 
 while (!cpu->created) {
-qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
+qemu_cond_wait_iothread(&qemu_cpu_cond);
 }
 }
 
-- 
2.31.1




[PATCH v3 0/8] memory: Sanity checks memory transaction when releasing BQL

2021-07-28 Thread Peter Xu
This is v3 of the series. And, I think this is 6.2 material.

v2 changelog:
- Init qemu_work_item directly in do_run_on_cpu() [David]
- Rename memory_region_transaction_{push|pop} to *_depth_{inc|dec} [David]
- Changing wording of s/helpless/useless/ [David]
- Squashed v2 patch 7 into patch 8 [David]
- Collected r-bs

It was actually got forgotten for months until it was used to identify another
potential issue of bql usage here (besides it could still be helpful when
debugging a previous kvm dirty ring issue in that series):

https://lore.kernel.org/qemu-devel/ch0pr02mb7898bbd73d0f3f7d5003bb178b...@ch0pr02mb7898.namprd02.prod.outlook.com/

So I figured maybe it's still worth to have it, hence a repost.

There're some changes against v1:

  - patch "cpus: Introduce qemu_cond_timedwait_iothread()" is dropped because
it's introduced in another commit already (b0c3cf9407e64).

  - two more patches to move do_run_on_cpu() into softmmu/ to fix a linux-user
compliation issue.

Please review, thanks.

=== Original Cover letter ===

This is a continuous work of previous discussion on memory transactions [1].
It should be helpful to fail QEMU far earlier if there's misuse of BQL against
the QEMU memory model.

One example is run_on_cpu() during memory commit.  That'll work previously, but
it'll fail with very strange errors (like KVM ioctl failure due to memslot
already existed, and it's not guaranteed to trigger constantly).  Now it'll
directly fail when run_on_cpu() is called.

Please have a look, thanks.

[1] https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg03205.html

Peter Xu (8):
  cpus: Export queue work related fields to cpu.h
  cpus: Move do_run_on_cpu into softmmu/cpus.c
  memory: Introduce memory_region_transaction_depth_{inc|dec}()
  memory: Don't do topology update in memory finalize()
  cpus: Use qemu_cond_wait_iothread() where proper
  cpus: Remove the mutex parameter from do_run_on_cpu()
  memory: Assert on no ongoing memory transaction before release BQL
  memory: Delay the transaction pop() until commit completed

 cpus-common.c  | 36 ++-
 include/exec/memory-internal.h |  1 +
 include/hw/core/cpu.h  | 22 ++
 softmmu/cpus.c | 39 ++---
 softmmu/memory.c   | 53 ++
 5 files changed, 94 insertions(+), 57 deletions(-)

-- 
2.31.1





[PATCH v3 7/8] memory: Assert on no ongoing memory transaction before release BQL

2021-07-28 Thread Peter Xu
Firstly, add a "prepare" function before unlocking BQL.  There're only three
places that can release the BQL: unlock(), cond_wait() or cond_timedwait().

Make sure we don't have any more ongoing memory transaction when releasing the
BQL.  This will trigger an abort if we misuse the QEMU memory model, e.g., when
calling run_on_cpu() during a memory commit.

Reviewed-by: David Hildenbrand 
Signed-off-by: Peter Xu 
---
 include/exec/memory-internal.h | 1 +
 softmmu/cpus.c | 9 +
 softmmu/memory.c   | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/include/exec/memory-internal.h b/include/exec/memory-internal.h
index 9fcc2af25c..3124b91c4b 100644
--- a/include/exec/memory-internal.h
+++ b/include/exec/memory-internal.h
@@ -35,6 +35,7 @@ static inline AddressSpaceDispatch 
*address_space_to_dispatch(AddressSpace *as)
 
 FlatView *address_space_get_flatview(AddressSpace *as);
 void flatview_unref(FlatView *view);
+bool memory_region_has_pending_transaction(void);
 
 extern const MemoryRegionOps unassigned_mem_ops;
 
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 9154cd7e78..4d190f9076 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -31,6 +31,7 @@
 #include "qapi/qapi-events-run-state.h"
 #include "qapi/qmp/qerror.h"
 #include "exec/gdbstub.h"
+#include "exec/memory-internal.h"
 #include "sysemu/hw_accel.h"
 #include "exec/exec-all.h"
 #include "qemu/thread.h"
@@ -66,6 +67,11 @@
 
 static QemuMutex qemu_global_mutex;
 
+static void qemu_mutex_unlock_iothread_prepare(void)
+{
+assert(!memory_region_has_pending_transaction());
+}
+
 bool cpu_is_stopped(CPUState *cpu)
 {
 return cpu->stopped || !runstate_is_running();
@@ -520,16 +526,19 @@ void qemu_mutex_unlock_iothread(void)
 {
 g_assert(qemu_mutex_iothread_locked());
 iothread_locked = false;
+qemu_mutex_unlock_iothread_prepare();
 qemu_mutex_unlock(&qemu_global_mutex);
 }
 
 void qemu_cond_wait_iothread(QemuCond *cond)
 {
+qemu_mutex_unlock_iothread_prepare();
 qemu_cond_wait(cond, &qemu_global_mutex);
 }
 
 void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
 {
+qemu_mutex_unlock_iothread_prepare();
 qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 35b2568fc2..62ec00b52d 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -176,6 +176,12 @@ static bool memory_region_has_pending_update(void)
 return memory_region_update_pending || ioeventfd_update_pending;
 }
 
+bool memory_region_has_pending_transaction(void)
+{
+return memory_region_transaction_depth ||
+memory_region_has_pending_update();
+}
+
 static bool memory_region_ioeventfd_before(MemoryRegionIoeventfd *a,
MemoryRegionIoeventfd *b)
 {
-- 
2.31.1




[PATCH v3 8/8] memory: Delay the transaction pop() until commit completed

2021-07-28 Thread Peter Xu
This should be functionally the same as before, but this allows the
memory_region_transaction_depth to be non-zero during commit, which can help us
to do sanity check on misuses.

Since at it, fix an indentation issue on the bracket.

Signed-off-by: Peter Xu 
---
 softmmu/memory.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 62ec00b52d..e830649011 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1114,8 +1114,7 @@ void memory_region_transaction_commit(void)
 assert(memory_region_transaction_depth);
 assert(qemu_mutex_iothread_locked());
 
-memory_region_transaction_depth_dec();
-if (!memory_region_transaction_depth) {
+if (memory_region_transaction_depth == 1) {
 if (memory_region_update_pending) {
 flatviews_reset();
 
@@ -1134,7 +1133,14 @@ void memory_region_transaction_commit(void)
 }
 ioeventfd_update_pending = false;
 }
-   }
+}
+
+/*
+ * Decrease the depth at last, so that memory_region_transaction_depth will
+ * still be non-zero during committing.  This can help us to do some sanity
+ * check within the process of committing.
+ */
+memory_region_transaction_depth_dec();
 }
 
 static void memory_region_destructor_none(MemoryRegion *mr)
-- 
2.31.1




Re: [PATCH v3 2/8] cpus: Move do_run_on_cpu into softmmu/cpus.c

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 8:31 PM, Peter Xu wrote:
> It's only used by softmmu binaries not linux-user ones.  Make it static and
> drop the definition in the header too.
> 
> Since at it, initialize variable "wi" with less loc.
> 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Peter Xu 
> ---
>  cpus-common.c | 25 -
>  include/hw/core/cpu.h | 12 
>  softmmu/cpus.c| 23 +++
>  3 files changed, 23 insertions(+), 37 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




[PATCH] MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver

2021-07-28 Thread Philippe Mathieu-Daudé
I'm interested in following the activity around the NVMe bdrv.

Signed-off-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 42ac45c3e50..4a42e9eceaf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3178,6 +3178,7 @@ F: block/null.c
 NVMe Block Driver
 M: Stefan Hajnoczi 
 R: Fam Zheng 
+R: Philippe Mathieu-Daudé 
 L: qemu-bl...@nongnu.org
 S: Supported
 F: block/nvme*
-- 
2.31.1




Re: [PATCH v3 6/8] cpus: Remove the mutex parameter from do_run_on_cpu()

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 8:31 PM, Peter Xu wrote:
> We must use the BQL for do_run_on_cpu() without much choice, it means the
> parameter is useless.  Remove it.  Meanwhile use the newly introduced
> qemu_cond_wait_iothread() in do_run_on_cpu().
> 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Peter Xu 
> ---
>  softmmu/cpus.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 5/8] cpus: Use qemu_cond_wait_iothread() where proper

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 8:31 PM, Peter Xu wrote:
> The helper is introduced but we've still got plenty of places that are 
> directly
> referencing the qemu_global_mutex itself.  Spread the usage.
> 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Peter Xu 
> ---
>  softmmu/cpus.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 7/8] virtio-gpu: Initialize Venus

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 3:46 PM, Antonio Caggiano wrote:
> Enable VirGL unstable APIs and request Venus when initializing VirGL.
> 
> Signed-off-by: Antonio Caggiano 
> ---
>  hw/display/virtio-gpu-virgl.c | 2 +-
>  meson.build   | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)

> diff --git a/meson.build b/meson.build
> index f2e148eaf9..31b65050b7 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -483,6 +483,7 @@ if not get_option('virglrenderer').auto() or have_system
>   method: 'pkg-config',
>   required: get_option('virglrenderer'),
>   kwargs: static_kwargs)
> +  add_project_arguments('-DVIRGL_RENDERER_UNSTABLE_APIS', language : 'c')

Unstable in mainstream repository doesn't sound right.
What is the plan for the project to stabilize it?




Re: [PATCH v3 1/8] cpus: Export queue work related fields to cpu.h

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 8:31 PM, Peter Xu wrote:
> This patch has no functional change, but prepares for moving the function
> do_run_on_cpu() into softmmu/cpus.c.  It does:
> 
>   1. Move qemu_work_item into hw/core/cpu.h.
>   2. Export queue_work_on_cpu()/qemu_work_cond.
> 
> All of them will be used by softmmu/cpus.c later.
> 
> Reviewed-by: David Hildenbrand 
> Signed-off-by: Peter Xu 
> ---
>  cpus-common.c | 11 ++-
>  include/hw/core/cpu.h | 10 +-
>  2 files changed, 11 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 2/8] virtio-gpu: hostmem [wip]

2021-07-28 Thread Philippe Mathieu-Daudé
On 7/28/21 3:46 PM, Antonio Caggiano wrote:
> From: Gerd Hoffmann 
> 
> ---
>  hw/display/virtio-gpu-base.c|  4 +++
>  hw/display/virtio-gpu-pci.c | 14 +
>  hw/display/virtio-gpu.c |  1 +
>  hw/display/virtio-vga.c | 32 +++--
>  include/hw/virtio/virtio-gpu.h  |  5 
>  include/standard-headers/linux/virtio_gpu.h |  5 
>  6 files changed, 52 insertions(+), 9 deletions(-)

WIP aims for a RFC series.




Re: About two-dimensional page translation (e.g., Intel EPT) and shadow page table in Linux QEMU/KVM

2021-07-28 Thread harry harry
Sean, sorry for the late reply. Thanks for your careful explanations.

> For emulation of any instruction/flow that starts with a guest virtual 
> address.
> On Intel CPUs, that includes quite literally any "full" instruction emulation,
> since KVM needs to translate CS:RIP to a guest physical address in order to 
> fetch
> the guest's code stream.  KVM can't avoid "full" emulation unless the guest is
> heavily enlightened, e.g. to avoid string I/O, among many other things.

Do you mean the emulated MMU is needed when it *only* wants to
translate GVAs to GPAs in the guest level?
In such cases, the hardware MMU cannot be used because hardware MMU
can only translate GVAs to HPAs, right?



Re: [PATCH] gitlab-ci.d/custom-runners: Improve rules for the staging branch

2021-07-28 Thread Willian Rampazzo
On Wed, Jul 28, 2021 at 2:39 PM Thomas Huth  wrote:
>
> If maintainers are currently pushing to a branch called "staging"
> in their repository, they are ending up with some stuck jobs - unless
> they have a s390x CI runner machine available. That's ugly, we should
> make sure that the related jobs are really only started if such a
> runner is available. So let's only run these jobs if it's the
> "staging" branch of the main repository of the QEMU project (where
> we can be sure that the s390x runner is available), or if the user
> explicitly set a S390X_RUNNER_AVAILABLE variable in their CI configs
> to declare that they have such a runner available, too.
>
> Fixes: 4799c21023 ("Jobs based on custom runners: add job definitions ...")
> Signed-off-by: Thomas Huth 
> ---
>  .gitlab-ci.d/custom-runners.yml | 40 +++--
>  1 file changed, 28 insertions(+), 12 deletions(-)
>

Reviewed-by: Willian Rampazzo 




Re: [PATCH v2 2/2] tests: migration-test: Add dirty ring test

2021-07-28 Thread Richard Henderson

On 6/15/21 7:55 AM, Peter Xu wrote:

Add dirty ring test if kernel supports it.  Add the dirty ring parameter on
source should be mostly enough, but let's change the dest too to make them
match always.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Peter Xu 
---
  tests/qtest/migration-test.c | 58 ++--
  1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index d9225f58d4d..9ef6b471353 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -27,6 +27,10 @@
  #include "migration-helpers.h"
  #include "tests/migration/migration-test.h"
  
+#if defined(__linux__)

+#include "linux/kvm.h"
+#endif


This breaks the build for hosts that do not support kvm, e.g. sparc:


[2/3] Compiling C object tests/qtest/migration-test.p/migration-test.c.o
FAILED: tests/qtest/migration-test.p/migration-test.c.o
cc -Itests/qtest/migration-test.p -Itests/qtest -I../qemu/tests/qtest -I. -Iqapi -Itrace 
-Iui -Iui/shader -I/usr/include/glib-2.0 -I/usr/lib/sparc64-linux-gnu/glib-2.0/include 
-fdiagnostics-color=auto -pipe -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g -isystem 
/home/rth/qemu/qemu/linux-headers -isystem linux-headers -iquote . -iquote 
/home/rth/qemu/qemu -iquote /home/rth/qemu/qemu/include -iquote 
/home/rth/qemu/qemu/disas/libvixl -iquote /home/rth/qemu/qemu/tcg/sparc -pthread 
-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -m64 -mcpu=ultrasparc -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wundef 
-Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
-Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels 
-Wexpansion-to-defined -Wimplicit-fallthrough=2 -Wno-missing-include-dirs 
-Wno-shift-negative-value -Wno-psabi -fPIE -MD -MQ 
tests/qtest/migration-test.p/migration-test.c.o -MF 
tests/qtest/migration-test.p/migration-test.c.o.d -o 
tests/qtest/migration-test.p/migration-test.c.o -c ../qemu/tests/qtest/migration-test.c

In file included from ../qemu/tests/qtest/migration-test.c:31:
/home/rth/qemu/qemu/linux-headers/linux/kvm.h:15:10: fatal error: asm/kvm.h: No such file 
or directory

   15 | #include 
  |  ^~~
compilation terminated.


r~



Re: [PATCH v2 0/6] migration/ram: Optimize for virtio-mem via RamDiscardManager

2021-07-28 Thread Peter Xu
On Wed, Jul 28, 2021 at 07:39:39PM +0200, David Hildenbrand wrote:
> > Meanwhile, I still have no idea how much overhead the "loop" part could 
> > bring.
> > For a large virtio-mem region with frequent plugged/unplugged mem 
> > interacted,
> > it seems possible to take a while to me..  I have no solid idea yet.
> 
> Let's do some math. Assume the worst case on a 1TiB device with a 2MiB block
> size: We have 524288 blocks == bits. That's precisely a 64k bitmap in
> virtio-mem. In the worst case, every second bit would be clear
> ("discarded"). For each clear bit ("discarded"), we would have to clear 512
> bits (64 bytes) in the dirty bitmap. That's storing 32 MiB.
> 
> So scanning 64 KiB, writing 32 MiB. Certainly not perfect, but I am not sure
> if it will really matter doing that once on every bitmap sync. I guess the
> bitmap syncing itself is much more expensive -- and not syncing the
> discarded ranges (b ) above) would make a bigger impact I guess.

I'm not worried about the memory size to be accessed as bitmaps; it's more
about the loop itself.  500K blocks/bits means the cb() worse case can be
called 500K/2=250k times, no matter what's the hook is doing.

But yeah that's the worst case thing and for a 1TB chunk, I agree that can also
be too harsh.  It's just that if it's very easy to be done in bitmap init then
still worth thinking about it.

> 
> > 
> > The thing is I still think this extra operation during sync() can be 
> > ignored by
> > simply clear dirty log during bitmap init, then.. why not? :)
> 
> I guess clearing the dirty log (especially in KVM) might be more expensive.

If we send one ioctl per cb that'll be expensive for sure.  I think it'll be
fine if we send one clear ioctl to kvm, summarizing the whole bitmap to clear.

The other thing is imho having overhead during bitmap init is always better
than having that during sync(). :)

> But, anyhow, we actually want b) long-term :)

Regarding the longterm plan - sorry to say that, but I still keep a skeptical
view.. :) 

You did mention that for 1tb memory we only have 32mb dirty bitmap, that's
actually not so huge.  That's why I'm not sure whether that complexity of plan
b would bring a lot of help (before I started to think about the interface of
it).  But I could be missing something.

> 
> > 
> > Clear dirty bitmap is as simple as "reprotect the pages" functional-wise - 
> > if
> > they are unplugged memory ranges, and they shouldn't be written by the guest
> > (we still allow reads even for virtio-mem compatibility), then I don't see 
> > it
> > an issue to wr-protect it using clear dirty log when bitmap init.
> > 
> > It still makes sense to me to keep the dirty/clear bitmap in-sync, at least
> > before your plan b proposal; leaving the dirty bits set forever on unplugged
> > memory is okay but just sounds a bit weird.
> > 
> > Though my concern is only valid when virtio-mem is used, so I don't have a
> > strong opinion on it as you maintains virtio-mem. I believe you will always
> > have a better judgement than me on that. Especially, when/if Dave & Juan 
> > have
> > no problem on that. :)
> 
> I'd certainly sleep better at night if I can be 100% sure that a page not to
> be migrated will not get migrated. :)
> 
> I'll play with initial clearing and see how much of a difference it makes
> code wise. Thanks a lot for your feedback!

Thanks!

-- 
Peter Xu




  1   2   >