Live migration regarding Intel PT

2021-08-25 Thread Xiaoyao Li

Hi Eduardo,

I have some question regrading Intel PT live migration.

Commit "e37a5c7fa459 (i386: Add Intel Processor Trace feature support)" 
expose Intel PT with a fixed capabilities of CPUID 0x14 for live 
migration. And the fixed capabilities are the value reported on 
ICX(IceLake). However, the upcoming SPR(Sapphire Rapids) has less 
capabilities of INTEL_PT_CYCLE_BITMAP than ICX. It fails to enable PT in 
guest on SPR machine.


If change the check on INTEL_PT_CYCLE_BITMAP to allow different value to 
allow it work on SPR. I think it breaks live migration, right?


For me, not making each sub-function of PT as configurable in QEMU 
indeed makes it hard for live migration. Why not make PT as unmigratable 
in the first place when introducing the support in QEMU?


Thanks,
-Xiaoyao



Re: [RFC 03/10] hw/mos6522: Remove redundant mos6522_timer1_update() calls

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


Reads and writes to the TL and TC registers have no immediate effect on
a running timer, with the exception of a write to TCH. Hence these
mos6522_timer_update() calls are not needed.

Signed-off-by: Finn Thain 


Perhaps better to flip this description around i.e. mention that the low bytes are 
written to a latch and then the full 16-bit value is transferred to the latch/counter 
when the high byte is written?


Otherwise I think this looks okay.


---
  hw/misc/mos6522.c | 7 ---
  1 file changed, 7 deletions(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index ff246b5437..1d4a56077e 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -234,7 +234,6 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned 
size)
  val = s->timers[0].latch & 0xff;
  break;
  case VIA_REG_T1LH:
-/* XXX: check this */
  val = (s->timers[0].latch >> 8) & 0xff;
  break;
  case VIA_REG_T2CL:
@@ -303,8 +302,6 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
  break;
  case VIA_REG_T1CL:
  s->timers[0].latch = (s->timers[0].latch & 0xff00) | val;
-mos6522_timer1_update(s, &s->timers[0],
-  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
  break;
  case VIA_REG_T1CH:
  s->timers[0].latch = (s->timers[0].latch & 0xff) | (val << 8);
@@ -313,14 +310,10 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t 
val, unsigned size)
  break;
  case VIA_REG_T1LL:
  s->timers[0].latch = (s->timers[0].latch & 0xff00) | val;
-mos6522_timer1_update(s, &s->timers[0],
-  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
  break;
  case VIA_REG_T1LH:
  s->timers[0].latch = (s->timers[0].latch & 0xff) | (val << 8);
  s->ifr &= ~T1_INT;
-mos6522_timer1_update(s, &s->timers[0],
-  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
  break;
  case VIA_REG_T2CL:
  s->timers[1].latch = (s->timers[1].latch & 0xff00) | val;




ATB,

Mark.



Re: [RFC 04/10] hw/mos6522: Rename timer callback functions

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


This improves readability.

Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 1d4a56077e..c0d6bee4cc 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -154,7 +154,7 @@ static void mos6522_timer2_update(MOS6522State *s, 
MOS6522Timer *ti,
  }
  }
  
-static void mos6522_timer1(void *opaque)

+static void mos6522_timer1_expired(void *opaque)
  {
  MOS6522State *s = opaque;
  MOS6522Timer *ti = &s->timers[0];
@@ -164,7 +164,7 @@ static void mos6522_timer1(void *opaque)
  mos6522_update_irq(s);
  }
  
-static void mos6522_timer2(void *opaque)

+static void mos6522_timer2_expired(void *opaque)
  {
  MOS6522State *s = opaque;
  MOS6522Timer *ti = &s->timers[1];
@@ -445,8 +445,10 @@ static void mos6522_init(Object *obj)
  s->timers[i].index = i;
  }
  
-s->timers[0].timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, mos6522_timer1, s);

-s->timers[1].timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, mos6522_timer2, s);
+s->timers[0].timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  mos6522_timer1_expired, s);
+s->timers[1].timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  mos6522_timer2_expired, s);
  }
  
  static void mos6522_finalize(Object *obj)


I'm not overly keen on this one: the general QEMU convention for a timer callback is 
for it to be named *_timer() rather than *_expired(), so I'd prefer to keep this 
consistent with the rest of the codebase.



ATB,

Mark.



[Bug 1926995] Re: hw/remote/mpqemu-link.c:221: bad error checking ?

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1926995

Title:
  hw/remote/mpqemu-link.c:221: bad error checking ?

Status in QEMU:
  Fix Released

Bug description:
  hw/remote/mpqemu-link.c:221:36: warning: logical ‘and’ of mutually
  exclusive tests is always false [-Wlogical-op]

  Source code is

 if (msg->cmd >= MPQEMU_CMD_MAX && msg->cmd < 0) {
  return false;
  }

  Maybe better code:

 if (msg->cmd >= MPQEMU_CMD_MAX || msg->cmd < 0) {
  return false;
  }

  It might be useful to switch on gcc compiler flag -Wlogical-op
  to see these warnings.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1926995/+subscriptions




[Bug 1926759] Re: WFI instruction results in unhandled CPU exception

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1926759

Title:
  WFI instruction results in unhandled CPU exception

Status in QEMU:
  Fix Released

Bug description:
  Hi

  I refer to the WFI instruction. The bytecode is 0xe320f003. After the
  execution, qemu exit with the following  crash log.

  qemu: unhandled CPU exception 0x10001 - aborting
  R00=0001 R01=40800b34 R02=40800b3c R03=000102ec
  R04=00010a28 R05=00010158 R06=00087460 R07=00010158
  R08= R09= R10=00085b7c R11=408009f4
  R12=40800a08 R13=408009f0 R14=0001057c R15=000102f8
  PSR=6010 -ZC- A usr32
  qemu:handle_cpu_signal received signal outside vCPU context @ 
pc=0x7f5c21d0fa12

  WFI aims to enter a low-power state and wait for interrupt. The raised
  exception seems not a right behavior. I can provide a testcase if you
  needed. Many thanks.

  Regards
  Muhui

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1926759/+subscriptions




[Bug 1910696] Re: Qemu fails to start with error " There is no option group 'spice'"

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1910696

Title:
  Qemu fails to start with error " There is no option group 'spice'"

Status in QEMU:
  Fix Released

Bug description:
  After upgrade from 5.1.0 to 5.2.0, qemu fails on start with error:
  `
  /usr/bin/qemu-system-x86_64 -S -name trinti -uuid 
f8ad2ff6-8808-4f42-8f0b-9e23acd20f84 -daemonize -cpu host -nographic -serial 
chardev:console -nodefaults -no-reboot -no-user-config -sandbox 
on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny 
-readconfig /var/log/lxd/trinti/qemu.conf -pidfile /var/log/lxd/trinti/qemu.pid 
-D /var/log/lxd/trinti/qemu.log -chroot /var/lib/lxd/virtual-machines/trinti 
-smbios type=2,manufacturer=Canonical Ltd.,product=LXD -runas nobody: 
  qemu-system-x86_64:/var/log/lxd/trinti/qemu.conf:27: There is no option group 
'spice'
  qemu-system-x86_64: -readconfig /var/log/lxd/trinti/qemu.conf: read config 
/var/log/lxd/trinti/qemu.conf: Invalid argument
  `
  Bisected to first bad commit: 
https://github.com/qemu/qemu/commit/cbe5fa11789035c43fd2108ac6f45848954954b5

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1910696/+subscriptions




[Bug 1890160] Re: Abort in vmxnet3_validate_queues

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1890160

Title:
  Abort in vmxnet3_validate_queues

Status in QEMU:
  Fix Released

Bug description:
  Hello,
  Reproducer:

  cat << EOF | ./i386-softmmu/qemu-system-i386 \
  -device vmxnet3 -m 64 -nodefaults -qtest stdio -nographic
  outl 0xcf8 0x80001014
  outl 0xcfc 0xe0001000
  outl 0xcf8 0x80001018
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  write 0x0 0x1 0xe1
  write 0x1 0x1 0xfe
  write 0x2 0x1 0xbe
  write 0x3 0x1 0xba
  write 0x3e 0x1 0xe1
  writeq 0xe0001020 0xef0bff5ecafe
  EOF

  ==
  qemu: hardware error: Bad TX queues number: 225

  #6 0x7f04b89d455a in abort 
/build/glibc-GwnBeO/glibc-2.30/stdlib/abort.c:79:7
  #7 0x558f5be89b67 in hw_error 
/home/alxndr/Development/qemu/general-fuzz/softmmu/cpus.c:927:5
  #8 0x558f5d3c3968 in vmxnet3_validate_queues 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1388:9
  #9 0x558f5d3bb716 in vmxnet3_activate_device 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1449:5
  #10 0x558f5d3b6fba in vmxnet3_handle_command 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1576:9
  #11 0x558f5d3b410f in vmxnet3_io_bar1_write 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1772:9
  #12 0x558f5bec4193 in memory_region_write_accessor 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:483:5
  #13 0x558f5bec3637 in access_with_adjusted_size 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:544:18
  #14 0x558f5bec1256 in memory_region_dispatch_write 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:1466:16

  -Alex

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1890160/+subscriptions




[Bug 1914870] Re: libvixl compilation failure on Debian unstable

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914870

Title:
  libvixl compilation failure on Debian unstable

Status in QEMU:
  Fix Released

Bug description:
  As of commit 0e324626306:

  $ lsb_release -d
  Description:Debian GNU/Linux bullseye/sid

  Project version: 5.2.50
  C compiler for the host machine: cc (gcc 10.2.1 "cc (Debian 10.2.1-6) 10.2.1 
20210110")
  C linker for the host machine: cc ld.bfd 2.35.1
  C++ compiler for the host machine: c++ (gcc 10.2.1 "c++ (Debian 10.2.1-6) 
10.2.1 20210110")
  C++ linker for the host machine: c++ ld.bfd 2.35.1

  [6/79] Compiling C++ object libcommon.fa.p/disas_libvixl_vixl_utils.cc.o
  FAILED: libcommon.fa.p/disas_libvixl_vixl_utils.cc.o 
  c++ -Ilibcommon.fa.p -I. -I.. -Iqapi -Itrace -Iui/shader 
-I/usr/include/capstone -I/usr/include/glib-2.0 
-I/usr/lib/hppa-linux-gnu/glib-2.0/include -fdiagnostics-color=auto -pipe -Wall 
-Winvalid-pch -Wnon-virtual-dtor -Werror -std=gnu++11 -O2 -g -isystem 
/home/philmd/qemu/linux-headers -isystem linux-headers -iquote . -iquote 
/home/philmd/qemu -iquote /home/philmd/qemu/include -iquote 
/home/philmd/qemu/disas/libvixl -iquote /home/philmd/qemu/tcg/hppa -iquote 
/home/philmd/qemu/accel/tcg -pthread -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -U_FORTIFY_SOURCE 
-D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wundef -Wwrite-strings -fno-strict-aliasing -fno-common -fwrapv -Wtype-limits 
-Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body 
-Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2 
-Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi -fPIE -MD -MQ 
libcommon.fa.p/disas_libvixl_vixl_utils.cc.o -MF 
libcommon.fa.p/disas_libvixl_vixl_utils.cc.o.d -o 
libcommon.fa.p/disas_libvixl_vixl_utils.cc.o -c ../disas/libvixl/vixl/utils.cc
  In file included from /home/philmd/qemu/disas/libvixl/vixl/utils.h:30,
   from ../disas/libvixl/vixl/utils.cc:27:
  /usr/include/string.h:36:43: error: missing binary operator before token "("
 36 | #if defined __cplusplus && (__GNUC_PREREQ (4, 4) \
|   ^
  /usr/include/string.h:53:62: error: missing binary operator before token "("
 53 | #if defined __USE_MISC || defined __USE_XOPEN || __GLIBC_USE (ISOC2X)
|  ^
  /usr/include/string.h:165:21: error: missing binary operator before token "("
165 |  || __GLIBC_USE (LIB_EXT2) || __GLIBC_USE (ISOC2X))
| ^
  /usr/include/string.h:174:43: error: missing binary operator before token "("
174 | #if defined __USE_XOPEN2K8 || __GLIBC_USE (LIB_EXT2) || __GLIBC_USE 
(ISOC2X)
|   ^
  /usr/include/string.h:492:19: error: missing binary operator before token "("
492 | #if __GNUC_PREREQ (3,4)
|   ^
  In file included from /home/philmd/qemu/disas/libvixl/vixl/utils.h:30,
   from ../disas/libvixl/vixl/utils.cc:27:
  /usr/include/string.h:28:1: error: ‘__BEGIN_DECLS’ does not name a type
 28 | __BEGIN_DECLS
| ^
  In file included from /home/philmd/qemu/disas/libvixl/vixl/utils.h:30,
   from ../disas/libvixl/vixl/utils.cc:27:
  /usr/include/string.h:44:8: error: ‘size_t’ has not been declared
 44 |size_t __n) __THROW __nonnull ((1, 2));
|^~
  /usr/include/string.h:44:20: error: expected initializer before ‘__THROW’
 44 |size_t __n) __THROW __nonnull ((1, 2));
|^~~
  /usr/include/string.h:47:56: error: ‘size_t’ has not been declared
 47 | extern void *memmove (void *__dest, const void *__src, size_t __n)
|^~
  /usr/include/string.h:48:6: error: expected initializer before ‘__THROW’
 48 |  __THROW __nonnull ((1, 2));
|  ^~~
  /usr/include/string.h:61:42: error: ‘size_t’ has not been declared
 61 | extern void *memset (void *__s, int __c, size_t __n) __THROW 
__nonnull ((1));
|  ^~

  Is there a package dependency missing?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1914870/+subscriptions




[Bug 1926111] Re: Assertion `tx_queue_idx <= s->txq_num' failed in vmxnet3_io_bar0_write

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1926111

Title:
  Assertion `tx_queue_idx <= s->txq_num' failed in vmxnet3_io_bar0_write

Status in QEMU:
  Fix Released

Bug description:
  === Stacktrace ===

  qemu-fuzz-i386: ../hw/net/vmxnet3.c:1096: void vmxnet3_io_bar0_write(void *, 
hwaddr, uint64_t, unsigned int): Assertion `tx_queue_idx <= s->txq_num' failed.
  ==602353== ERROR: libFuzzer: deadly signal
  #5 0x7fe4b93a7ce0 in raise signal/../sysdeps/unix/sysv/linux/raise.c:48:3
  #6 0x7fe4b9391536 in abort stdlib/abort.c:79:7
  #7 0x7fe4b939140e in __assert_fail_base assert/assert.c:92:3
  #8 0x7fe4b93a0661 in __assert_fail assert/assert.c:101:3
  #9 0x563e6cf5ebb5 in vmxnet3_io_bar0_write  hw/net/vmxnet3.c:1096:9
  #10 0x563e6eefdb00 in memory_region_write_accessor  softmmu/memory.c:491:5
  #11 0x563e6eefcfdd in access_with_adjusted_size  softmmu/memory.c:552:18
  #12 0x563e6eefac90 in memory_region_dispatch_write  softmmu/memory.c:1502:16
  #13 0x563e6e834e16 in flatview_write_continue  softmmu/physmem.c:2746:23
  #14 0x563e6e81cd38 in flatview_write  softmmu/physmem.c:2786:14
  #15 0x563e6e81c868 in address_space_write  softmmu/physmem.c:2878:18

  === Reproducer ===
  cat << EOF | ./qemu-system-i386  -display none -machine accel=qtest, -m \
  512M -machine q35 -nodefaults -device vmxnet3,netdev=net0 -netdev \
  user,id=net0 -qtest stdio
  outl 0xcf8 0x8810
  outl 0xcfc 0xe000
  outl 0xcf8 0x8814
  outl 0xcf8 0x8804
  outw 0xcfc 0x7
  outl 0xcf8 0x8815
  outl 0xcfc 0x00b5
  write 0x0 0x1 0xe1
  write 0x1 0x1 0xfe
  write 0x2 0x1 0xbe
  write 0x3 0x1 0xba
  write 0xff00b020 0x4 0xfeca
  write 0xe630 0x1 0x00
  EOF

  
  === Testcase ===

  /*
   * Autogenerated Fuzzer Test Case
   *
   * This work is licensed under the terms of the GNU GPL, version 2 or later.
   * See the COPYING file in the top-level directory.
   */

  #include "qemu/osdep.h"

  #include "libqos/libqtest.h"

  static void test_fuzz(void) {
  QTestState *s = qtest_init(" -display none , -m 512M -machine q35 
-nodefaults "
 "-device vmxnet3,netdev=net0 -netdev 
user,id=net0");
  qtest_outl(s, 0xcf8, 0x8810);
  qtest_outl(s, 0xcfc, 0xe000);
  qtest_outl(s, 0xcf8, 0x8814);
  qtest_outl(s, 0xcf8, 0x8804);
  qtest_outw(s, 0xcfc, 0x7);
  qtest_outl(s, 0xcf8, 0x8815);
  qtest_outl(s, 0xcfc, 0x00b5);
  qtest_bufwrite(s, 0x0, "\xe1", 0x1);
  qtest_bufwrite(s, 0x1, "\xfe", 0x1);
  qtest_bufwrite(s, 0x2, "\xbe", 0x1);
  qtest_bufwrite(s, 0x3, "\xba", 0x1);
  qtest_bufwrite(s, 0xff00b020, "\x00\x00\xfe\xca", 0x4);
  qtest_bufwrite(s, 0xe630, "\x00", 0x1);
  qtest_quit(s);
  }
  int main(int argc, char **argv) {
  const char *arch = qtest_get_arch();

  g_test_init(&argc, &argv, NULL);

  if (strcmp(arch, "i386") == 0) {
  qtest_add_func("fuzz/test_fuzz", test_fuzz);
  }

  return g_test_run();
  }

  
  === OSS-Fuzz Report ===
  https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33603
  https://oss-fuzz.com/testcase?key=6071483232288768

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1926111/+subscriptions




[Bug 1890157] Re: Assertion failure in net_tx_pkt_reset through vmxnet3

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1890157

Title:
  Assertion failure in net_tx_pkt_reset through vmxnet3

Status in QEMU:
  Fix Released

Bug description:
  Hello,
  Reproducer:

  cat << EOF | ./i386-softmmu/qemu-system-i386 \
  -device vmxnet3 -m 64 -nodefaults -qtest stdio -nographic
  outl 0xcf8 0x80001014
  outl 0xcfc 0xe0001000
  outl 0xcf8 0x80001018
  outl 0xcf8 0x80001004
  outw 0xcfc 0x7
  outl 0xcf8 0x80001083
  write 0x0 0x1 0xe1
  write 0x1 0x1 0xfe
  write 0x2 0x1 0xbe
  write 0x3 0x1 0xba
  writeq 0xe0001020 0xefefff5ecafe
  writeq 0xe0001020 0x5e5ccafe0002
  EOF

  ==
  qemu-system-i386: 
/home/alxndr/Development/qemu/general-fuzz/hw/net/net_tx_pkt.c:450: void 
net_tx_pkt_reset(struct NetTxPkt *): Assertion `pkt->raw' failed.

  #9 0x564838761930 in net_tx_pkt_reset 
/home/alxndr/Development/qemu/general-fuzz/hw/net/net_tx_pkt.c:450:5
  #10 0x564838881749 in vmxnet3_deactivate_device 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1159:9
  #11 0x56483888cf71 in vmxnet3_reset 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1170:5
  #12 0x564838882124 in vmxnet3_handle_command 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1610:9
  #13 0x56483887f10f in vmxnet3_io_bar1_write 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1772:9
  #14 0x56483738f193 in memory_region_write_accessor 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:483:5
  #15 0x56483738e637 in access_with_adjusted_size 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:544:18
  #16 0x56483738c256 in memory_region_dispatch_write 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:1466:16
  #17 0x56483673d4a6 in flatview_write_continue 
/home/alxndr/Development/qemu/general-fuzz/exec.c:3176:23

  -Alex

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1890157/+subscriptions




[Bug 1925512] Re: UNDEFINED case for instruction BLX

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1925512

Title:
  UNDEFINED case for instruction BLX

Status in QEMU:
  Fix Released

Bug description:
  Hi

  I refer to the instruction BLX imm (T2 encoding) in ARMv7 (Thumb
  mode).

  0 S imm10H  11 J1 0 J2 imm10L H

  
  if H == '1' then UNDEFINED;
  I1 = NOT(J1 EOR S);  I2 = NOT(J2 EOR S);  imm32 = 
SignExtend(S:I1:I2:imm10H:imm10L:'00', 32);
  targetInstrSet = InstrSet_A32;
  if InITBlock() && !LastInITBlock() then UNPREDICTABLE;

  According to the manual, if H equals to 1, this instruction should be
  an UNDEFINED instruction. However, it seems QEMU does not check this
  constraint in function trans_BLX_i. Thanks

  Regards
  Muhui

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1925512/+subscriptions




[Bug 1905356] Re: No check for unaligned data access in ARM32 instructions

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1905356

Title:
  No check for unaligned data access in ARM32 instructions

Status in QEMU:
  Fix Released

Bug description:
  hi

  According to the ARM documentation, there are alignment requirements
  of load/store instructions.  Alignment fault should be raised if the
  alignment check is failed. However, it seems that QEMU doesn't
  implement this, which is against the documentation of ARM. For
  example, the instruction LDRD/STRD/LDREX/STREX must check the address
  is word alignment no matter what value the SCTLR.A is.

  I attached a testcase, which contains an instruction at VA 0x10240:
  ldrd r0,[pc.#1] in the main function. QEMU can successfully load the
  data in the unaligned address. The test is done in QEMU 5.1.0. I can
  provide more testcases for the other instructions if you need. Many
  thanks.

  To patch this, we need a check while we translate the instruction to
  tcg. If the address is unaligned, a signal number (i.e., SIGBUS)
  should be raised.

  Regards
  Muhui

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1905356/+subscriptions




[Bug 1926044] Re: QEMU-user doesn't report HWCAP2_MTE

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1926044

Title:
  QEMU-user doesn't report HWCAP2_MTE

Status in QEMU:
  Fix Released

Bug description:
  Reproducible on ffa090bc56e73e287a63261e70ac02c0970be61a

  Host Debian 5.10.24 x86_64 GNU

  Configured with "configure --disable-system --enable-linux-user
  --static"

  This one works and prints "OK" as expected:
  clang tests/tcg/aarch64/mte-3.c -target aarch64-linux-gnu  -fsanitize=memtag 
-march=armv8+memtag
  qemu-aarch64 --cpu max -L /usr/aarch64-linux-gnu ./a.out && echo OK

  
  This one fails and print "0":
  cat mytest.c
  #include 
  #include 

  #ifndef HWCAP2_MTE
  #define HWCAP2_MTE (1 << 18)
  #endif

  int main(int ac, char **av)
  {
  printf("%d\n", (int)(getauxval(AT_HWCAP2) & HWCAP2_MTE));
  }

  
  clang mytest.c -target aarch64-linux-gnu  -fsanitize=memtag 
-march=armv8+memtag
  qemu-aarch64 --cpu max -L /usr/aarch64-linux-gnu ./a.out

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1926044/+subscriptions




[Bug 1910603] Re: [OSS-Fuzz] Issue 29174 sb16: Abrt in audio_bug

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1910603

Title:
  [OSS-Fuzz] Issue 29174 sb16: Abrt in audio_bug

Status in QEMU:
  Fix Released

Bug description:
  === Reproducer ===
  cat << EOF | ../build-system/qemu-system-i386 \
  -machine q35 -device sb16,audiodev=snd0 \
  -audiodev none,id=snd0 -nographic -nodefaults \
  -qtest stdio
  outw 0x22c 0x41
  outb 0x22c 0x0
  outw 0x22c 0x1004
  outw 0x22c 0x1c
  EOF

  === Stack Trace ===
  A bug was just triggered in audio_calloc
  Save all your work and restart without audio
  I am sorry
  Context:
  Aborted

  #0 raise
  #1 abort
  #2 audio_bug /src/qemu/audio/audio.c:119:9
  #3 audio_calloc /src/qemu/audio/audio.c:154:9
  #4 audio_pcm_sw_alloc_resources_out /src/qemu/audio/audio_template.h:116:15
  #5 audio_pcm_sw_init_out /src/qemu/audio/audio_template.h:175:11
  #6 audio_pcm_create_voice_pair_out /src/qemu/audio/audio_template.h:410:9
  #7 AUD_open_out /src/qemu/audio/audio_template.h:503:14
  #8 continue_dma8 /src/qemu/hw/audio/sb16.c:216:20
  #9 dma_cmd8 /src/qemu/hw/audio/sb16.c:276:5
  #10 command /src/qemu/hw/audio/sb16.c:0
  #11 dsp_write /src/qemu/hw/audio/sb16.c:949:13
  #12 portio_write /src/qemu/softmmu/ioport.c:205:13
  #13 memory_region_write_accessor /src/qemu/softmmu/memory.c:491:5
  #14 access_with_adjusted_size /src/qemu/softmmu/memory.c:552:18
  #15 memory_region_dispatch_write /src/qemu/softmmu/memory.c:0:13
  #16 flatview_write_continue /src/qemu/softmmu/physmem.c:2759:23
  #17 flatview_write /src/qemu/softmmu/physmem.c:2799:14
  #18 address_space_write /src/qemu/softmmu/physmem.c:2891:18
  #19 cpu_outw /src/qemu/softmmu/ioport.c:70:5

  
  OSS-Fuzz Report:
  https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29174

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1910603/+subscriptions




[Bug 1922887] Re: STR in Thumb 32 decode problem

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1922887

Title:
  STR in Thumb 32 decode problem

Status in QEMU:
  Fix Released

Bug description:
  Hi

  It seems that QEMU does not have a proper check on the STR instruction
  in Thumb32 mode.

  Specifically, the machine code is 0xf84f0ddd, which is 0b 1000 0100  
 1101 1101 1101. 
  This is an STR (immediate, Thumb) instruction with a T4 encoding scheme.

  The symbols is

  Rn = 
  Rt = 
  P = 1
  U = 0
  W = 1

  The decode ASL is below:

  if P == ‘1’ && U == ‘1’ && W == ‘0’ then SEE STRT;
  if Rn == ‘1101’ && P == ‘1’ && U == ‘0’ && W == ‘1’ && imm8 == ‘0100’ 
then SEE PUSH;
  if Rn == ‘’ || (P == ‘0’ && W == ‘0’) then UNDEFINED;
  t = UInt(Rt); n = UInt(Rn); imm32 = ZeroExtend(imm8, 32);
  index = (P == ‘1’); add = (U == ‘1’); wback = (W == ‘1’);
  if t == 15 || (wback && n == t) then UNPREDICTABLE;

  When Rn == , it should be an undefined instruction, which should
  raise SEGILL signal. However, it seems that QEMU does not check this
  constraint, which should be a bug. Many thanks

  Regards
  Muhui

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1922887/+subscriptions




Re: [RFC 05/10] hw/mos6522: Don't clear T1 interrupt flag on latch write

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


The Synertek datasheet says, "A write to T1L-H loads an 8-bit count value
into the latch. A read of T1L-H transfers the contents of the latch to
the data bus. Neither operation has an affect [sic] on the interrupt
flag."

Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index c0d6bee4cc..8991f4 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -313,7 +313,6 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
  break;
  case VIA_REG_T1LH:
  s->timers[0].latch = (s->timers[0].latch & 0xff) | (val << 8);
-s->ifr &= ~T1_INT;
  break;
  case VIA_REG_T2CL:
  s->timers[1].latch = (s->timers[1].latch & 0xff00) | val;


Hmmm. The reference document I used for QEMU's 6522 device is at 
http://archive.6502.org/datasheets/mos_6522_preliminary_nov_1977.pdf and according to 
page 6 and the section "Writing the Timer 1 Registers" writing to the high byte of 
the latch does indeed clear the T1 interrupt flag.


Side note: there is reference in Gary Davidian's excellent CHM video that 6522s 
obtained from different manufacturers had different behaviours, and there are also 
web pages mentioning that 6522s integrated as part of other silicon e.g. IOSB/CUDA 
also had their own bugs... :/



ATB,

Mark.



[Bug 1878641] Re: Abort() in mch_update_pciexbar

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1878641

Title:
  Abort() in mch_update_pciexbar

Status in QEMU:
  Fix Released

Bug description:
  Hello,
  I found an input which triggers an abort() in mch_update_pciexbar:

  #0  0x7686d761 in __GI_raise (sig=sig@entry=0x6) at 
../sysdeps/unix/sysv/linux/raise.c:50
  #1  0x7685755b in __GI_abort () at abort.c:79
  #2  0x5705c7ae in mch_update_pciexbar (mch=0x62905920) at 
/home/alxndr/Development/qemu/hw/pci-host/q35.c:324
  #3  0x5705bb6a in mch_write_config (d=0x62905920, address=0x60, 
val=0x8400056e, len=0x4) at /home/alxndr/Development/qemu/hw/pci-host/q35.c:480
  #4  0x570954fb in pci_host_config_write_common 
(pci_dev=0x62905920, addr=0x60, limit=0x100, val=0x8400056e, len=0x4) at 
/home/alxndr/Development/qemu/hw/pci/pci_host.c:81
  #5  0x5709606e in pci_data_write (s=0x61d96080, addr=0xf260, 
val=0x8400056e, len=0x4) at /home/alxndr/Development/qemu/hw/pci/pci_host.c:118
  #6  0x570967d0 in pci_host_data_write (opaque=0x62905200, 
addr=0x0, val=0x8400056e, len=0x4) at 
/home/alxndr/Development/qemu/hw/pci/pci_host.c:165
  #7  0x564938b5 in memory_region_write_accessor (mr=0x62905610, 
addr=0x0, value=0x7fff9c70, size=0x4, shift=0x0, mask=0x, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:483
  #8  0x5649328a in access_with_adjusted_size (addr=0x0, 
value=0x7fff9c70, size=0x4, access_size_min=0x1, access_size_max=0x4, 
access_fn=0x56493360 , mr=0x62905610, 
attrs=...) at /home/alxndr/Development/qemu/memory.c:544
  #9  0x56491df6 in memory_region_dispatch_write (mr=0x62905610, 
addr=0x0, data=0x8400056e, op=MO_32, attrs=...) at 
/home/alxndr/Development/qemu/memory.c:1476
  #10 0x562cbbf4 in flatview_write_continue (fv=0x60633b00, 
addr=0xcfc, attrs=..., ptr=0x7fffa4e0, len=0x4, addr1=0x0, l=0x4, 
mr=0x62905610) at /home/alxndr/Development/qemu/exec.c:3137
  #11 0x562bbad9 in flatview_write (fv=0x60633b00, addr=0xcfc, 
attrs=..., buf=0x7fffa4e0, len=0x4) at 
/home/alxndr/Development/qemu/exec.c:3177
  #12 0x562bb609 in address_space_write (as=0x5968f940 
, addr=0xcfc, attrs=..., buf=0x7fffa4e0, len=0x4) at 
/home/alxndr/Development/qemu/exec.c:3268
  #13 0x56478c0a in cpu_outl (addr=0xcfc, val=0x8400056e) at 
/home/alxndr/Development/qemu/ioport.c:80
  #14 0x5648166f in qtest_process_command (chr=0x59691d00 
, words=0x6039ebf0) at /home/alxndr/Development/qemu/qtest.c:396
  #15 0x5647f187 in qtest_process_inbuf (chr=0x59691d00 
, inbuf=0x6190f680) at /home/alxndr/Development/qemu/qtest.c:710
  #16 0x5647e8b4 in qtest_read (opaque=0x59691d00 , 
buf=0x7fffca40 "outl 0xcf8 0xf260\noutl 0xcfc 0x8400056e\n-M pc-q35-5.0 
-device intel-hda,id=hda0 -device hda-output,bus=hda0.0 -device 
hda-micro,bus=hda0.0 -device hda-duplex,bus=hda0.0 -display none -nodefaults 
-nographic\n\377\377\377\177", size=0xd2) at 
/home/alxndr/Development/qemu/qtest.c:722
  #17 0x579c260c in qemu_chr_be_write_impl (s=0x60f01f30, 
buf=0x7fffca40 "outl 0xcf8 0xf260\noutl 0xcfc 0x8400056e\n-M pc-q35-5.0 
-device intel-hda,id=hda0 -device hda-output,bus=hda0.0 -device 
hda-micro,bus=hda0.0 -device hda-duplex,bus=hda0.0 -display none -nodefaults 
-nographic\n\377\377\377\177", len=0xd2) at 
/home/alxndr/Development/qemu/chardev/char.c:183
  #18 0x579c275b in qemu_chr_be_write (s=0x60f01f30, 
buf=0x7fffca40 "outl 0xcf8 0xf260\noutl 0xcfc 0x8400056e\n-M pc-q35-5.0 
-device intel-hda,id=hda0 -device hda-output,bus=hda0.0 -device 
hda-micro,bus=hda0.0 -device hda-duplex,bus=hda0.0 -display none -nodefaults 
-nographic\n\377\377\377\177", len=0xd2) at 
/home/alxndr/Development/qemu/chardev/char.c:195
  #19 0x579cb97a in fd_chr_read (chan=0x608026a0, cond=G_IO_IN, 
opaque=0x60f01f30) at /home/alxndr/Development/qemu/chardev/char-fd.c:68
  #20 0x57a530ea in qio_channel_fd_source_dispatch 
(source=0x60c2ef00, callback=0x579cb540 , 
user_data=0x60f01f30) at /home/alxndr/Development/qemu/io/channel-watch.c:84
  #21 0x77ca8898 in g_main_context_dispatch () at 
/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
  #22 0x57c10b85 in glib_pollfds_poll () at 
/home/alxndr/Development/qemu/util/main-loop.c:219
  #23 0x57c0f57e in os_host_main_loop_wait (timeout=0x0) at 
/home/alxndr/Development/qemu/util/main-loop.c:242
  #24 0x57c0f177 in main_loop_wait (nonblocking=0x0) at 
/home/alxndr/Development/qemu/util/main-loop.c:518
  #25 0x5689fd1e in qemu_main_loop () at 
/home/alxndr/Development/qemu/softmmu/vl.c:1664
  #26 0x57a6a29d in main (argc=0x17, argv=0x7fffe148, 
env

[Bug 1897568] Re: Strange keyboard behaviour in Vim editor

2021-08-25 Thread Thomas Huth
Felix, if you want to discuss the default behaviour, please get in touch with 
the author of the patch, since he might not read this bug tracker here.
Anyway, the patch has been released with QEMU 6.1, so I'm closing this ticket 
here now.

** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1897568

Title:
  Strange keyboard behaviour in Vim editor

Status in QEMU:
  Fix Released

Bug description:
  
  I'm running MS-DOS 7.10 in a QEMU virtual machine, and there is a problem 
with the keyboard in the Vim editor.  The arrow keys jump over a line, as if 
you had typed the key twice.  PgUp and PgDn are likewise affected.  Other 
applications are not affected, unless you shell out from Vim.

  The QEMU version is 5.0.0, and I'm using the "-k sv" option, but I've
  tried without it and it doesn't make a difference.

  I don't get this keyboard behaviour in the exact same VM under VMware
  Player or Bochs.

  -Albert.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1897568/+subscriptions




[Bug 1620660] Re: man page is missing suboptions for "-display"

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Expired => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1620660

Title:
  man page is missing suboptions for "-display"

Status in QEMU:
  Fix Released

Bug description:
  Some of the display options have suboptions, for example:

  > -display gtk[,grab_on_hover=on|off][,gl=on|off]

  None of these suboptions are currently documented in qemu-options.hx
  (checked git@f04ec5a)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1620660/+subscriptions




[Bug 1923497] Re: bios_linker_loader_add_checksum: Assertion `start_offset < file->blob->len' failed

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1923497

Title:
  bios_linker_loader_add_checksum: Assertion `start_offset <
  file->blob->len' failed

Status in QEMU:
  Fix Released

Bug description:
  Trying boot/start a Windows 10 VM.  Worked until recently when this
  error started showing up.

  I have the following installed on Fedora 33:
  qemu-kvm-5.1.0-9.fc33.x86_64

  This is the error:

  Error starting domain: internal error: process exited while connecting
  to monitor: qemu-system-x86_64:
  /builddir/build/BUILD/qemu-5.1.0/hw/acpi/bios-linker-loader.c:239:
  bios_linker_loader_add_checksum: Assertion `start_offset <
  file->blob->len' failed.

  Traceback (most recent call last):
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 65, in 
cb_wrapper
  callback(asyncjob, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/asyncjob.py", line 101, in tmpcb
  callback(*args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 
57, in newfn
  ret = fn(self, *args, **kwargs)
File "/usr/share/virt-manager/virtManager/object/domain.py", line 1329, in 
startup
  self._backend.create()
File "/usr/lib64/python3.9/site-packages/libvirt.py", line 1234, in create
  if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
  libvirt.libvirtError: internal error: process exited while connecting to 
monitor: qemu-system-x86_64: 
/builddir/build/BUILD/qemu-5.1.0/hw/acpi/bios-linker-loader.c:239: 
bios_linker_loader_add_checksum: Assertion `start_offset < file->blob->len' 
failed.

  I see this were referenced in a patch from some time ago and
  supposedly fixed.  Here is the patch info I was able to find:

  http://next.patchew.org/QEMU/1515677902-23436-1-git-send-email-
  peter.mayd...@linaro.org/1515677902-23436-10-git-send-email-
  peter.mayd...@linaro.org/

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1923497/+subscriptions




[Bug 1895363] Re: borland IDEs double up cursor key presses (need timing on PS2 port input)

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1895363

Title:
  borland IDEs double up cursor key presses (need timing on PS2 port
  input)

Status in QEMU:
  Fix Released

Bug description:
  Most DOS-era IDEs from Borland (I have tried Borland C++ 2.0, Borland
  C++ 3.1 and Turbo Pascal 7.1) exhibit strange responses to the
  keyboard.  Cursor keys are registered twice, so each press of a cursor
  key causes the cursor to move twice. Also the other keys occasionally
  are missed or duplicated.

  From an internet search, the problem appears to be this.  These
  programs read the PS2 input register multiple times per incoming byte,
  on the assumption that the byte will remain there for at least a few
  hundred microseconds, before the next byte (if any) appears there.
  qemu treats a read of the register by the guest as an acknowledgement
  of the incoming byte and puts the next byte into the register
  immediately, thus breaking the programs that expect each successive
  byte to stay in place for a while.

  The obvious solution is to use a timer to advance through the queued
  bytes.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1895363/+subscriptions




[Bug 1914117] Re: Short files returned via FTP on Qemu with various architectures and OSes

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1914117

Title:
  Short files returned via FTP on Qemu with various architectures and
  OSes

Status in QEMU:
  Fix Released

Bug description:
  
  Qemu 5.2 on Mac OS X Big Sur.

  I originally thought that it might be caused by the home-brew version of 
Qemu, but this evening I have removed the brew edition and compiled from 
scratch (using Ninja & Xcode compiler). 
  Still getting the same problem,.

  On the following architectures: 
  arm64, amd64 and sometimes i386 running NetBSD host OS; 
  i386 running OpenBSD host OS:

  I have seen a consistent problem with FTP returning short files. The
  file will be a couple of bytes too short. I do not believe this is a
  problem with the OS. Downloading the perl source code from CPAN does
  not work properly, nor does downloading bind from isc. I've tried this
  on different architectures as above.

  (Qemu 4.2 on Ubuntu/x86_64 with NetBSD/i386 seems to function fine. My
  gut feel is there is something not right on the Mac OS version of Qemu
  or a bug in 5.2 - obviously in the network layer somewhere. If you
  have anything you want me to try, please let me know - happy to help
  get a resolution.)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1914117/+subscriptions




[Bug 1923629] Re: RISC-V Vector Instruction vssub.vv not saturating

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1923629

Title:
  RISC-V Vector Instruction vssub.vv not saturating

Status in QEMU:
  Fix Released

Bug description:
  I noticed doing a negate ( 0 – 0x8000 ) using vssub.vv produces an
  incorrect result of 0x8000 (should saturate to 0x7FFF).

  Here is the bit of the code:

vmv.v.i v16, 0
…
  8f040457  vssub.vvv8,v16,v8

  I believe the instruction encoding is correct (vssub.vv with vd = v8,
  vs2 = v16, rs1 = v8), but the result does not saturate in QEMU.

  I’ve just tested with what I think is the latest branch (
  https://github.com/sifive/qemu/tree/rvv-1.0-upstream-v7 commit 26 Feb
  2021: 1151361fa7d45cc90d69086ccf1a4d8397931811 ) and the problem still
  exists.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1923629/+subscriptions




[Bug 1923583] Re: colo: pvm flush failed after svm killed

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1923583

Title:
  colo: pvm flush failed after svm killed

Status in QEMU:
  Fix Released

Bug description:
  Hi,
 Primary vm flush failed after killing svm, which leads primary vm guest 
filesystem unavailable.

  qemu versoin: 5.2.0
  host/guest os: CentOS Linux release 7.6.1810 (Core)

  Reproduce steps:
  1. create colo vm following 
https://github.com/qemu/qemu/blob/master/docs/COLO-FT.txt
  2. kill secondary vm (don't remove nbd child from quorum on primary vm)and 
wait for a minute. the interval depends on guest os.
  result: primary vm file system shutdown because of flush cache error.

  After serveral tests, I found that qemu-5.0.0 worked well, and it's
  the commit
  
https://git.qemu.org/?p=qemu.git;a=commit;h=883833e29cb800b4d92b5d4736252f4004885191(block:
  Flush all children in generic code) leads this change, and both
  virtio-blk and ide turned out to be bad.

  I think it's nbd(replication) flush failed leads bdrv_co_flush(quorum_bs) 
failed, here is the call stack.
  #0  bdrv_co_flush (bs=0x56242b3cc0b0=nbd_bs) at ../block/io.c:2856
  #1  0x562428b0f399 in bdrv_co_flush (bs=0x56242b3c7e00=replication_bs) at 
../block/io.c:2920
  #2  0x562428b0f399 in bdrv_co_flush (bs=0x56242a4ad800=quorum_bs) at 
../block/io.c:2920
  #3  0x562428b70d56 in blk_do_flush (blk=0x56242a4ad4a0) at 
../block/block-backend.c:1672
  #4  0x562428b70d87 in blk_aio_flush_entry (opaque=0x7fd0980073f0) at 
../block/block-backend.c:1680
  #5  0x562428c5f9a7 in coroutine_trampoline (i0=-1409269904, i1=32721) at 
../util/coroutine-ucontext.c:173

  While i am not sure whether i use colo inproperly? Can we assume that
  nbd child of quorum immediately removed right after svm crashed? Or
  it's really a bug? Does the following patch fix? Help is needed!
  Thanks a lot!

  diff --git a/block/quorum.c b/block/quorum.c
  index cfc1436..f2c0805 100644
  --- a/block/quorum.c
  +++ b/block/quorum.c
  @@ -1279,7 +1279,7 @@ static BlockDriver bdrv_quorum = {
   .bdrv_dirname   = quorum_dirname,
   .bdrv_co_block_status   = quorum_co_block_status,
   
  -.bdrv_co_flush_to_disk  = quorum_co_flush,
  +.bdrv_co_flush  = quorum_co_flush,
   
   .bdrv_getlength = quorum_getlength,

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1923583/+subscriptions




[Bug 1890159] Re: Assertion failure in net_tx_pkt_add_raw_fragment through vmxnet3

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1890159

Title:
  Assertion failure in net_tx_pkt_add_raw_fragment through vmxnet3

Status in QEMU:
  Fix Released

Bug description:
  Hello,
  Reproducer:

  cat << EOF | ./i386-softmmu/qemu-system-i386 \
  -device vmxnet3 -m 64 -nodefaults -qtest stdio -nographic
  outl 0xcf8 0x80001010
  outl 0xcfc 0xe000
  outl 0xcf8 0x80001014
  outl 0xcfc 0xe0001000
  outl 0xcf8 0x80001018
  outl 0xcf8 0x80001001
  outl 0xcfc 0x3fff3fff
  outl 0xcf8 0x80001016
  outl 0xcfc 0x5c84ff00
  outl 0xcf8 0x800010ff
  write 0x0 0x1 0xe1
  write 0x1 0x1 0xfe
  write 0x2 0x1 0xbe
  write 0x3 0x1 0xba
  writeq 0xff001020 0xef0bff5ecafe
  writel 0xe605 0xa7ff845e
  EOF

  ==
  qemu-system-i386: hw/net/net_tx_pkt.c:382: _Bool 
net_tx_pkt_add_raw_fragment(struct NetTxPkt *, hwaddr, size_t): Assertion 
`pkt->max_raw_frags > pkt->raw_frags' failed.
  Aborted

  
  #9 0x5607db7efdc0 in net_tx_pkt_add_raw_fragment 
/home/alxndr/Development/qemu/general-fuzz/hw/net/net_tx_pkt.c:382:5
  #10 0x5607db902ef0 in vmxnet3_process_tx_queue 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:653:18
  #11 0x5607db9021db in vmxnet3_io_bar0_write 
/home/alxndr/Development/qemu/general-fuzz/hw/net/vmxnet3.c:1097:9
  #12 0x5607da41f193 in memory_region_write_accessor 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:483:5
  #13 0x5607da41e637 in access_with_adjusted_size 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:544:18
  #14 0x5607da41c256 in memory_region_dispatch_write 
/home/alxndr/Development/qemu/general-fuzz/softmmu/memory.c:1466:16
  #15 0x5607d97cd4a6 in flatview_write_continue 
/home/alxndr/Development/qemu/general-fuzz/exec.c:3176:23

  -Alex

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1890159/+subscriptions




[Bug 1892081] Re: Performance improvement when using "QEMU_FLATTEN" with softfloat type conversions

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1892081

Title:
  Performance improvement when using "QEMU_FLATTEN" with softfloat type
  conversions

Status in QEMU:
  Fix Released

Bug description:
  Attached below is a matrix multiplication program for double data
  types. The program performs the casting operation "(double)rand()"
  when generating random numbers.

  This operation calls the integer to float softfloat conversion
  function "int32_to_float_64".

  Adding the "QEMU_FLATTEN" attribute to the function definition
  decreases the instructions per call of the function by about 63%.

  Attached are before and after performance screenshots from
  KCachegrind.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1892081/+subscriptions




[Bug 1905444] Re: [OSS-Fuzz] Issue 27796 in oss-fuzz: qemu:qemu-fuzz-i386-target-generic-fuzz-xhci: Stack-overflow in address_space_stl_internal

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1905444

Title:
  [OSS-Fuzz] Issue 27796 in oss-fuzz: qemu:qemu-fuzz-i386-target-
  generic-fuzz-xhci: Stack-overflow in address_space_stl_internal

Status in QEMU:
  Fix Released

Bug description:
   affects qemu

  OSS-Fuzz Report: https://bugs.chromium.org/p/oss-
  fuzz/issues/detail?id=27796

  === Reproducer (build with --enable-sanitizers) ===
  cat << EOF | ./qemu-system-i386 -display none  -machine accel=qtest, \
  -m 512M -machine q35 -nodefaults \
  -drive file=null-co://,if=none,format=raw,id=disk0 \
  -device qemu-xhci,id=xhci -device usb-tablet,bus=xhci.0 \
  -qtest-log none -qtest stdio
  outl 0xcf8 0x8803
  outw 0xcfc 0x5e46
  outl 0xcf8 0x8810
  outl 0xcfc 0xff5a5e46
  write 0xff5a5020 0x6 0x0b70
  outl 0xcf8 0x8893
  outb 0xcfc 0x93
  writel 0xff5a7000 0xff5a5020
  write 0xff5a700c 0x4 0x0c0c2e58
  write 0xff5a4040 0x4 0x00d26001
  write 0xff5a4044 0x4 0x030
  EOF

  === Stack Trace ===
  ==50473==ERROR: AddressSanitizer: stack-overflow on address 0x7ffe3ec97e28 
(pc 0x55e292eac159 bp 0x7ffe3ec98670 sp 0x7ffe3ec97e30 T0)
  #0 0x55e292eac159 in __asan_memcpy (u-system-i386+0x2a0e159)
  #1 0x55e2944bc04e in flatview_do_translate softmmu/physmem.c:513:12
  #2 0x55e2944dbe90 in flatview_translate softmmu/physmem.c:563:15
  #3 0x55e2944dbe90 in address_space_translate include/exec/memory.h:2362:12
  #4 0x55e2944dbe90 in address_space_stl_internal memory_ldst.c.inc:316:10
  #5 0x55e29393d2a0 in xhci_intr_update hw/usb/hcd-xhci.c:554:13
  #6 0x55e29393efb9 in xhci_runtime_write hw/usb/hcd-xhci.c:3032:9
  #7 0x55e294230428 in memory_region_write_accessor softmmu/memory.c:484:5
  #8 0x55e29422fe63 in access_with_adjusted_size softmmu/memory.c:545:18
  #9 0x55e29422f6fc in memory_region_dispatch_write softmmu/memory.c
  #10 0x55e2944dc03c in address_space_stl_internal memory_ldst.c.inc:319:13
  #11 0x55e29393d2a0 in xhci_intr_update hw/usb/hcd-xhci.c:554:13
  #12 0x55e29393efb9 in xhci_runtime_write hw/usb/hcd-xhci.c:3032:9
  #13 0x55e294230428 in memory_region_write_accessor softmmu/memory.c:484:5
  #14 0x55e29422fe63 in access_with_adjusted_size softmmu/memory.c:545:18
  #15 0x55e29422f6fc in memory_region_dispatch_write softmmu/memory.c
  #16 0x55e2944dc03c in address_space_stl_internal memory_ldst.c.inc:319:13
  #17 0x55e29393d2a0 in xhci_intr_update hw/usb/hcd-xhci.c:554:13
  #18 0x55e29393efb9 in xhci_runtime_write hw/usb/hcd-xhci.c:3032:9

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1905444/+subscriptions




[Bug 1880763] Re: Missing page crossing check in use_goto_tb() for rx target

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1880763

Title:
  Missing page crossing check in use_goto_tb() for rx target

Status in QEMU:
  Fix Released

Bug description:
  Currently the rx target doesn't have the page crossing check in its 
  use_goto_tb() function. 
  This is a required feature for stable system mode emulations that all 
  other targets implement.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1880763/+subscriptions




Re: [PATCH v4] block/file-win32: add reopen handlers

2021-08-25 Thread Hanna Reitz

On 25.08.21 01:48, Viktor Prutyanov wrote:

Make 'qemu-img commit' work on Windows.

Command 'commit' requires reopening backing file in RW mode. So,
add reopen prepare/commit/abort handlers and change dwShareMode
for CreateFile call in order to allow further read/write reopening.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418

Suggested-by: Hanna Reitz 
Signed-off-by: Viktor Prutyanov 
Tested-by: Helge Konetzka 
---
  v2:
 - fix indentation in raw_reopen_prepare
 - free rs if raw_reopen_prepare fails
  v3:
 - restore suggested-by field missed in v2
  v4:
 - add file type check
 - add comment about options
 - replace rs check with assert in raw_reopen_commit

  block/file-win32.c | 100 -
  1 file changed, 99 insertions(+), 1 deletion(-)

diff --git a/block/file-win32.c b/block/file-win32.c
index 2642088bd6..8320495f2b 100644
--- a/block/file-win32.c
+++ b/block/file-win32.c


[...]


@@ -634,6 +638,96 @@ static int coroutine_fn raw_co_create_opts(BlockDriver 
*drv,
  return raw_co_create(&options, errp);
  }
  
+static int raw_reopen_prepare(BDRVReopenState *state,

+  BlockReopenQueue *queue, Error **errp)
+{
+BDRVRawState *s = state->bs->opaque;
+BDRVRawReopenState *rs;
+int access_flags;
+DWORD overlapped;
+int ret = 0;
+
+if (s->type != FTYPE_FILE) {
+error_setg(errp, "Can only reopen files");
+return -EINVAL;
+}
+
+rs = g_new0(BDRVRawReopenState, 1);
+
+/*
+ * We do not support changing any options (only flags). By leaving
+ * all options in state->options, we tell the generic reopen code
+ * that we do not support changing any of them, so it will verify
+ * that their values did not change.
+ */
+
+raw_parse_flags(state->flags, s->aio != NULL, &access_flags, &overlapped);
+rs->hfile = CreateFile(state->bs->filename, access_flags,
+   FILE_SHARE_READ | FILE_SHARE_WRITE, NULL,
+   OPEN_EXISTING, overlapped, NULL);
+
+if (rs->hfile == INVALID_HANDLE_VALUE) {
+int err = GetLastError();
+
+error_setg_win32(errp, err, "Could not reopen '%s'",
+ state->bs->filename);
+if (err == ERROR_ACCESS_DENIED) {
+ret = -EACCES;
+} else {
+ret = -EINVAL;
+}
+goto fail;
+}
+
+if (s->aio) {
+ret = win32_aio_attach(s->aio, rs->hfile);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Could not enable AIO");
+goto fail;


I believe if we fail here, we’ve already opened rs->hfile, so we must 
close it or we’d leak it.


(Sorry I missed this in my v3 review :/)

Hanna


+}
+}
+
+state->opaque = rs;
+
+return 0;
+
+fail:
+g_free(rs);
+state->opaque = NULL;
+
+return ret;
+}





Re: [PATCH 1/2] hw/arm/virt: Rename default_bus_bypass_iommu

2021-08-25 Thread Markus Armbruster
Markus Armbruster  writes:

> Did this series fall through the cracks for 6.1?

Missed 6.1.  What now?

> Jean-Philippe Brucker  writes:
>
>> Since commit d8fb7d0969d5 ("vl: switch -M parsing to keyval"), machine
>> parameter definitions cannot use underscores, because keyval_dashify()
>> transforms them to dashes and the parser doesn't find the parameter.
>>
>> This affects option default_bus_bypass_iommu which was introduced in the
>> same release:
>>
>> $ qemu-system-aarch64 -M virt,default_bus_bypass_iommu=on
>> qemu-system-aarch64: Property 'virt-6.1-machine.default-bus-bypass-iommu' 
>> not found
>>
>> Rename the parameter to "default-bus-bypass-iommu". Passing
>> "default_bus_bypass_iommu" is still valid since the underscore are
>> transformed automatically.
>>
>> Fixes: 6d7a85483a06 ("hw/arm/virt: Add default_bus_bypass_iommu machine 
>> option")
>> Signed-off-by: Jean-Philippe Brucker 
>> ---
>>  hw/arm/virt.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>> index b4598d3fe6..7075cdc15e 100644
>> --- a/hw/arm/virt.c
>> +++ b/hw/arm/virt.c
>> @@ -2671,10 +2671,10 @@ static void virt_machine_class_init(ObjectClass *oc, 
>> void *data)
>>"Set the IOMMU type. "
>>"Valid values are none and 
>> smmuv3");
>>  
>> -object_class_property_add_bool(oc, "default_bus_bypass_iommu",
>> +object_class_property_add_bool(oc, "default-bus-bypass-iommu",
>> virt_get_default_bus_bypass_iommu,
>> virt_set_default_bus_bypass_iommu);
>> -object_class_property_set_description(oc, "default_bus_bypass_iommu",
>> +object_class_property_set_description(oc, "default-bus-bypass-iommu",
>>"Set on/off to enable/disable "
>>"bypass_iommu for default root 
>> bus");




[Bug 1761798] Re: live migration intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

2021-08-25 Thread Thomas Huth
Is this still happening with the latest release?

** Changed in: nova
   Status: Confirmed => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1761798

Title:
  live migration intermittently fails in CI with "VQ 0 size 0x80 Guest
  index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

Status in OpenStack Compute (nova):
  Incomplete
Status in QEMU:
  Incomplete

Bug description:
  Seen here:

  http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-
  multinode-live-
  migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-0002.txt.gz

  2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev 
pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to 
/dev/pts/0 (label charserial0)
  warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
  2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 
0x12c inconsistent with Host index 0x134: delta 0xfff8
  2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load 
virtio-blk:virtio
  2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for 
instance 0x0 of device ':00:04.0/virtio-blk'
  2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: 
Operation not permitted
  2018-04-05 21:48:43.198+: shutting down, reason=crashed

  And in the n-cpu logs on the other host:

  http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-
  multinode-live-migration/8de6e74/logs/screen-n-
  cpu.txt.gz#_Apr_05_21_48_43_257541

  There is a related Red Hat bug:

  https://bugzilla.redhat.com/show_bug.cgi?id=1450524

  The CI job failures are at present using the Pike UCA:

  ii  libvirt-bin 3.6.0-1ubuntu6.2~cloud0

  ii  qemu-system-x86 1:2.10+dfsg-0ubuntu3.5~cloud0

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1761798/+subscriptions




[Bug 1705118] Re: qemu user mode: rt signals not implemented for sparc guests

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1705118

Title:
  qemu user mode: rt signals not implemented for sparc guests

Status in QEMU:
  Fix Released

Bug description:
  The documentation
   says that
  qemu in user mode supports POSIX signal handling.

  Catching SIGSEGV according to POSIX, however, does not work on
ppc, ppc64, ppc64le, s390x, sparc64.
  It does work, however, on
aarch64, alpha, arm, hppa, m68k, mips, mips64, sh4.

  How to reproduce:
  The attached program runs fine (exits with code 0) on
- real hardware Linux/PowerPC64 (in 32-bit and 64-bit mode),
- real hardware Linux/PowerPC64LE,
- qemu-system-s390x emulated Linux/s390x,
- real hardware Linux/SPARC64.
  $ gcc -O -Wall testsigsegv.c; ./a.out; echo $?
  0

  For ppc:
  $ powerpc-linux-gnu-gcc-5 -O -Wall -static testsigsegv.c -o testsigsegv-ppc
  $ ~/inst-qemu/2.9.0/bin/qemu-ppc testsigsegv-ppc
  $ echo $?
  3

  For ppc64:
  $ powerpc64-linux-gnu-gcc-5 -O -Wall -static testsigsegv.c -o 
testsigsegv-ppc64
  $ ~/inst-qemu/2.9.0/bin/qemu-ppc64 testsigsegv-ppc64
  $ echo $?
  3

  For ppc64le:
  $ powerpc64le-linux-gnu-gcc-5 -O -Wall -static testsigsegv.c -o 
testsigsegv-ppc64le
  $ ~/inst-qemu/2.9.0/bin/qemu-ppc64le testsigsegv-ppc64le
  $ echo $?
  3

  For s390x:
  $ s390x-linux-gnu-gcc-5 -O -Wall -static testsigsegv.c -o testsigsegv-s390x
  $ ~/inst-qemu/2.9.0/bin/qemu-s390x testsigsegv-s390x
  $ echo $?
  3
  $ s390x-linux-gnu-gcc-5 -O -Wall -static testsigsegv.c 
-DAVOID_LINUX_S390X_COMPAT -o testsigsegv-s390x-a
  $ ~/inst-qemu/2.9.0/bin/qemu-s390x testsigsegv-s390x-a
  $ echo $?
  0
  So, the test fails here because the Linux/s390x kernel omits the least
  significant 12 bits of the fault address in the 'si_addr' field. But
  qemu-s390x is not compatible with the Linux/s390x behaviour: it puts
  the complete fault address in the 'si_addr' field.

  For sparc64:
  $ sparc64-linux-gnu-gcc-5 -O -Wall -static testsigsegv.c -o 
testsigsegv-sparc64
  $ ~/inst-qemu/2.9.0/bin/qemu-sparc64 testsigsegv-sparc64
  Segmentation fault (core dumped)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1705118/+subscriptions




[PATCH v3 1/2] sev/i386: Introduce sev_add_kernel_loader_hashes for measured linux boot

2021-08-25 Thread Dov Murik
Add the sev_add_kernel_loader_hashes function to calculate the hashes of
the kernel/initrd/cmdline and fill a designated OVMF encrypted hash
table area.  For this to work, OVMF must support an encrypted area to
place the data which is advertised via a special GUID in the OVMF reset
table.

The hashes of each of the files is calculated (or the string in the case
of the cmdline with trailing '\0' included).  Each entry in the hashes
table is GUID identified and since they're passed through the
sev_encrypt_flash interface, the hashes will be accumulated by the PSP
measurement (SEV_LAUNCH_MEASURE).

Co-developed-by: James Bottomley 
Signed-off-by: James Bottomley 
Signed-off-by: Dov Murik 
Reviewed-by: Connor Kuehl 
---
 target/i386/sev_i386.h |  12 
 target/i386/sev-stub.c |   5 ++
 target/i386/sev.c  | 137 +
 3 files changed, 154 insertions(+)

diff --git a/target/i386/sev_i386.h b/target/i386/sev_i386.h
index ae6d840478..deb3eec409 100644
--- a/target/i386/sev_i386.h
+++ b/target/i386/sev_i386.h
@@ -28,6 +28,17 @@
 #define SEV_POLICY_DOMAIN   0x10
 #define SEV_POLICY_SEV  0x20
 
+typedef struct KernelLoaderContext {
+char *setup_data;
+size_t setup_size;
+char *kernel_data;
+size_t kernel_size;
+char *initrd_data;
+size_t initrd_size;
+char *cmdline_data;
+size_t cmdline_size;
+} KernelLoaderContext;
+
 extern bool sev_es_enabled(void);
 extern uint64_t sev_get_me_mask(void);
 extern SevInfo *sev_get_info(void);
@@ -37,5 +48,6 @@ extern char *sev_get_launch_measurement(void);
 extern SevCapability *sev_get_capabilities(Error **errp);
 extern SevAttestationReport *
 sev_get_attestation_report(const char *mnonce, Error **errp);
+extern bool sev_add_kernel_loader_hashes(KernelLoaderContext *ctx, Error 
**errp);
 
 #endif
diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
index 0227cb5177..addb089f36 100644
--- a/target/i386/sev-stub.c
+++ b/target/i386/sev-stub.c
@@ -81,3 +81,8 @@ sev_get_attestation_report(const char *mnonce, Error **errp)
 error_setg(errp, "SEV is not available in this QEMU");
 return NULL;
 }
+
+bool sev_add_kernel_loader_hashes(KernelLoaderContext *ctx, Error **errp)
+{
+g_assert_not_reached();
+}
diff --git a/target/i386/sev.c b/target/i386/sev.c
index 83df8c09f6..857d75bd3e 100644
--- a/target/i386/sev.c
+++ b/target/i386/sev.c
@@ -23,6 +23,7 @@
 #include "qemu/base64.h"
 #include "qemu/module.h"
 #include "qemu/uuid.h"
+#include "crypto/hash.h"
 #include "sysemu/kvm.h"
 #include "sev_i386.h"
 #include "sysemu/sysemu.h"
@@ -83,6 +84,32 @@ typedef struct __attribute__((__packed__)) SevInfoBlock {
 uint32_t reset_addr;
 } SevInfoBlock;
 
+#define SEV_HASH_TABLE_RV_GUID  "7255371f-3a3b-4b04-927b-1da6efa8d454"
+typedef struct QEMU_PACKED SevHashTableDescriptor {
+/* SEV hash table area guest address */
+uint32_t base;
+/* SEV hash table area size (in bytes) */
+uint32_t size;
+} SevHashTableDescriptor;
+
+/* hard code sha256 digest size */
+#define HASH_SIZE 32
+
+typedef struct QEMU_PACKED SevHashTableEntry {
+QemuUUID guid;
+uint16_t len;
+uint8_t hash[HASH_SIZE];
+} SevHashTableEntry;
+
+typedef struct QEMU_PACKED SevHashTable {
+QemuUUID guid;
+uint16_t len;
+SevHashTableEntry cmdline;
+SevHashTableEntry initrd;
+SevHashTableEntry kernel;
+uint8_t padding[];
+} SevHashTable;
+
 static SevGuestState *sev_guest;
 static Error *sev_mig_blocker;
 
@@ -1077,6 +1104,116 @@ int sev_es_save_reset_vector(void *flash_ptr, uint64_t 
flash_size)
 return 0;
 }
 
+static const QemuUUID sev_hash_table_header_guid = {
+.data = UUID_LE(0x9438d606, 0x4f22, 0x4cc9, 0xb4, 0x79, 0xa7, 0x93,
+0xd4, 0x11, 0xfd, 0x21)
+};
+
+static const QemuUUID sev_kernel_entry_guid = {
+.data = UUID_LE(0x4de79437, 0xabd2, 0x427f, 0xb8, 0x35, 0xd5, 0xb1,
+0x72, 0xd2, 0x04, 0x5b)
+};
+static const QemuUUID sev_initrd_entry_guid = {
+.data = UUID_LE(0x44baf731, 0x3a2f, 0x4bd7, 0x9a, 0xf1, 0x41, 0xe2,
+0x91, 0x69, 0x78, 0x1d)
+};
+static const QemuUUID sev_cmdline_entry_guid = {
+.data = UUID_LE(0x97d02dd8, 0xbd20, 0x4c94, 0xaa, 0x78, 0xe7, 0x71,
+0x4d, 0x36, 0xab, 0x2a)
+};
+
+/*
+ * Add the hashes of the linux kernel/initrd/cmdline to an encrypted guest page
+ * which is included in SEV's initial memory measurement.
+ */
+bool sev_add_kernel_loader_hashes(KernelLoaderContext *ctx, Error **errp)
+{
+uint8_t *data;
+SevHashTableDescriptor *area;
+SevHashTable *ht;
+uint8_t cmdline_hash[HASH_SIZE];
+uint8_t initrd_hash[HASH_SIZE];
+uint8_t kernel_hash[HASH_SIZE];
+uint8_t *hashp;
+size_t hash_len = HASH_SIZE;
+int aligned_len;
+
+if (!pc_system_ovmf_table_find(SEV_HASH_TABLE_RV_GUID, &data, NULL)) {
+error_setg(errp, "SEV: kernel specified but OVMF has no hash table 
guid");
+return false;
+}
+area = (

[Bug 1585840] Re: multiprocess program gets incorrect results with qemu arm-linux-user

2021-08-25 Thread Thomas Huth
** Changed in: qemu
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1585840

Title:
  multiprocess program gets incorrect results with qemu arm-linux-user

Status in QEMU:
  Fix Released

Bug description:
  The attached program can run either in a threaded mode or a
  multiprocess mode.  It defaults to threaded mode, and switches to
  multiprocess mode if the first positional argument is "process".
  "success" of the test is defined as the final count being seen as
  200 by both tasks.

  In standard linux x86_64 userspace (i7, 4 cores) and in standard armhf
  userspace (4 cores), the test program consistently completes
  successfully in both modes.  But with qemu arm-linux-user, the test
  consistently succeeds in threaded mode and generally fails in
  multiprocess mode.

  The test reflects an essential aspect of how the Free and Open Source
  project linuxcnc's IPC system works: shared memory regions (created by
  shmat, but mmap would probably behave the same) contain data and
  mutexes.  I observed that our testsuite encounters numerous deadlocks
  and failures when running in an schroot with qemu-user (x86_64 host),
  and I believe the underlying cause is improper support for atomic
  operations in a multiprocess model. (the testsuite consistently passes
  on real hardware)

  I observed the same failure at v1.6.0 and master
  (v2.6.0-424-g287db79), as well as in the outdated Debian version
  1:2.1+dfsg-12+deb8u5a.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1585840/+subscriptions




[PATCH v3 2/2] x86/sev: generate SEV kernel loader hashes in x86_load_linux

2021-08-25 Thread Dov Murik
If SEV is enabled and a kernel is passed via -kernel, pass the hashes of
kernel/initrd/cmdline in an encrypted guest page to OVMF for SEV
measured boot.

Co-developed-by: James Bottomley 
Signed-off-by: James Bottomley 
Signed-off-by: Dov Murik 
Reviewed-by: Connor Kuehl 
---
 hw/i386/x86.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 00448ed55a..4044104cfe 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -45,6 +45,7 @@
 #include "hw/i386/fw_cfg.h"
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
+#include "target/i386/sev_i386.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -778,6 +779,7 @@ void x86_load_linux(X86MachineState *x86ms,
 const char *initrd_filename = machine->initrd_filename;
 const char *dtb_filename = machine->dtb;
 const char *kernel_cmdline = machine->kernel_cmdline;
+KernelLoaderContext kernel_loader_context = {};
 
 /* Align to 16 bytes as a paranoia measure */
 cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
@@ -924,6 +926,8 @@ void x86_load_linux(X86MachineState *x86ms,
 fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
 fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
+kernel_loader_context.cmdline_data = (char *)kernel_cmdline;
+kernel_loader_context.cmdline_size = strlen(kernel_cmdline) + 1;
 
 if (protocol >= 0x202) {
 stl_p(header + 0x228, cmdline_addr);
@@ -1005,6 +1009,8 @@ void x86_load_linux(X86MachineState *x86ms,
 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, initrd_size);
+kernel_loader_context.initrd_data = initrd_data;
+kernel_loader_context.initrd_size = initrd_size;
 
 stl_p(header + 0x218, initrd_addr);
 stl_p(header + 0x21c, initrd_size);
@@ -1063,15 +1069,32 @@ void x86_load_linux(X86MachineState *x86ms,
 load_image_size(dtb_filename, setup_data->data, dtb_size);
 }
 
-memcpy(setup, header, MIN(sizeof(header), setup_size));
+/*
+ * If we're starting an encrypted VM, it will be OVMF based, which uses the
+ * efi stub for booting and doesn't require any values to be placed in the
+ * kernel header.  We therefore don't update the header so the hash of the
+ * kernel on the other side of the fw_cfg interface matches the hash of the
+ * file the user passed in.
+ */
+if (!sev_enabled()) {
+memcpy(setup, header, MIN(sizeof(header), setup_size));
+}
 
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
+kernel_loader_context.kernel_data = (char *)kernel;
+kernel_loader_context.kernel_size = kernel_size;
 
 fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_ADDR, real_addr);
 fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, setup_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
+kernel_loader_context.setup_data = (char *)setup;
+kernel_loader_context.setup_size = setup_size;
+
+if (sev_enabled()) {
+sev_add_kernel_loader_hashes(&kernel_loader_context, &error_fatal);
+}
 
 option_rom[nb_option_roms].bootindex = 0;
 option_rom[nb_option_roms].name = "linuxboot.bin";
-- 
2.25.1




Re: [PATCH 2/2] dump-guest-memory: Block live migration

2021-08-25 Thread Marc-André Lureau
Hi

On Tue, Aug 24, 2021 at 7:27 PM Peter Xu  wrote:

> Both dump-guest-memory and live migration caches vm state at the beginning.
> Either of them entering the other one will cause race on the vm state, and
> even
> more severe on that (please refer to the crash report in the bug link).
>
> Let's block live migration in dump-guest-memory, and that'll also block
> dump-guest-memory if it detected that we're during a live migration.
>
> Side note: migrate_del_blocker() can be called even if the blocker is not
> inserted yet, so it's safe to unconditionally delete that blocker in
> dump_cleanup (g_slist_remove allows no-entry-found case).
>
> Suggested-by: Dr. David Alan Gilbert 
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1996609
> Signed-off-by: Peter Xu 
> ---
>  dump/dump.c   | 20 +++-
>  include/sysemu/dump.h |  1 +
>  2 files changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/dump/dump.c b/dump/dump.c
> index ab625909f3..7996d7a6c5 100644
> --- a/dump/dump.c
> +++ b/dump/dump.c
> @@ -29,6 +29,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
>  #include "hw/misc/vmcoreinfo.h"
> +#include "migration/blocker.h"
>
>  #ifdef TARGET_X86_64
>  #include "win_dump.h"
> @@ -101,6 +102,7 @@ static int dump_cleanup(DumpState *s)
>  qemu_mutex_unlock_iothread();
>  }
>  }
> +migrate_del_blocker(s->dump_migration_blocker);
>
>  return 0;
>  }
> @@ -1857,6 +1859,19 @@ static void dump_init(DumpState *s, int fd, bool
> has_format,
>  }
>  }
>
> +if (!s->dump_migration_blocker) {
> +error_setg(&s->dump_migration_blocker,
> +   "Live migration disabled: dump-guest-memory in
> progress");
> +}
> +
> +/*
> + * Allows even for -only-migratable, but forbid migration during the
> + * process of dump guest memory.
> + */
> +if (migrate_add_blocker_internal(s->dump_migration_blocker, errp)) {
> +goto cleanup;
> +}
> +
>

Shouldn't this be placed earlier in the function, before
runstate_is_running() and vm_stop() ?

 return;
>
>  cleanup:
> @@ -1927,11 +1942,6 @@ void qmp_dump_guest_memory(bool paging, const char
> *file,
>  Error *local_err = NULL;
>  bool detach_p = false;
>
> -if (runstate_check(RUN_STATE_INMIGRATE)) {
> -error_setg(errp, "Dump not allowed during incoming migration.");
> -return;
> -}
> -
>  /* if there is a dump in background, we should wait until the dump
>   * finished */
>  if (dump_in_progress()) {
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index 250143cb5a..7b619c2a43 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -195,6 +195,7 @@ typedef struct DumpState {
>* finished. */
>  uint8_t *guest_note; /* ELF note content */
>  size_t guest_note_size;
> +Error *dump_migration_blocker; /* Blocker for live migration */
>  } DumpState;
>
>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
> --
> 2.31.1
>
>


[PATCH v3 0/2] [RESEND] x86/sev: Measured Linux SEV guest with kernel/initrd/cmdline

2021-08-25 Thread Dov Murik
(Resending for QEMU 6.2; no code changes since the last round.)

Currently booting with -kernel/-initrd/-append is not supported in SEV
confidential guests, because the content of these blobs is not measured
and therefore not trusted by the SEV guest.

However, in some cases the kernel, initrd, and cmdline are not secret
but should not be modified by the host.  In such a case, we want to
verify inside the trusted VM that the kernel, initrd, and cmdline are
indeed the ones expected by the Guest Owner, and only if that is the
case go on and boot them up (removing the need for grub inside OVMF in
that mode).

To support that, OVMF adds a special area for hashes of
kernel/initrd/cmdline; that area is expected to be filled by QEMU and
encrypted as part of the initial SEV guest launch.  This in turn makes
the hashes part of the PSP measured content, and OVMF can trust these
inputs if they match the hashes.

This series adds an SEV function to generate the table of hashes for
OVMF and encrypt it (patch 1/2), and calls this function if SEV is
enabled when the kernel/initrd/cmdline are prepared (patch 2/2).

Corresponding OVMF support [1] is already available in edk2 (patch series
"Measured SEV boot with kernel/initrd/cmdline").

[1] https://edk2.groups.io/g/devel/message/78250

---

v3: 
https://lore.kernel.org/qemu-devel/20210624102040.2015280-1-dovmu...@linux.ibm.com/
v3 changes:
 - initrd hash is now mandatory; if no -initrd is passed, calculate the
   hash of the empty buffer.  This is now aligned with the OVMF
   behaviour which verifies the empty initrd (correctly).
 - make SevHashTable entries fixed: 3 entries for cmdline, initrd, and kernel.
 - in sev_add_kernel_loader_hashes: first calculate all the hashes, only then
   fill-in the hashes table in the guest's memory.
 - Use g_assert_not_reached in sev-stub.c.
 - Use QEMU_PACKED attribute for structs.
 - Use QemuUUID type for guids.
 - in sev_add_kernel_loader_hashes: use ARRAY_SIZE(iov) instead of literal 2.

v2: 
https://lore.kernel.org/qemu-devel/20210621190553.1763020-1-dovmu...@linux.ibm.com/
v2 changes:
 - Extract main functionality to sev.c (with empty stub in sev-stub.c)
 - Use sev_enabled() instead of machine->cgs->ready to detect SEV guest
 - Coding style changes

v1: 
https://lore.kernel.org/qemu-devel/20210525065931.1628554-1-dovmu...@linux.ibm.com/

Dov Murik (2):
  sev/i386: Introduce sev_add_kernel_loader_hashes for measured linux
boot
  x86/sev: generate SEV kernel loader hashes in x86_load_linux

 target/i386/sev_i386.h |  12 
 hw/i386/x86.c  |  25 +++-
 target/i386/sev-stub.c |   5 ++
 target/i386/sev.c  | 137 +
 4 files changed, 178 insertions(+), 1 deletion(-)


base-commit: f9baca549e44791be0dd98de15add3d8452a8af0
-- 
2.25.1




[Bug 1819182] Re: info does not recognize file format of vpc with subformat=fixed

2021-08-25 Thread Thomas Huth
This is an automated cleanup. This bug report has been moved to QEMU's
new bug tracker on gitlab.com and thus gets marked as 'expired' now.
Please continue with the discussion here:

 https://gitlab.com/qemu-project/qemu/-/issues/559


** Changed in: qemu
   Status: In Progress => Expired

** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #559
   https://gitlab.com/qemu-project/qemu/-/issues/559

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1819182

Title:
  info does not recognize file format of vpc with subformat=fixed

Status in QEMU:
  Expired

Bug description:
  After creating or converting an image to vpc with 'subformat=fixed'
  'qemu-img info' incorrectly identifies the image as 'raw' format.

  $ qemu-img --version
  qemu-img version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.10)
  Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

  $ qemu-img create -f vpc -o subformat=fixed my.vpc 2G
  Formatting 'my.vpc', fmt=vpc size=2147483648 subformat=fixed

  $ qemu-img info my.vpc
  image: my.vpc
  file format: raw
  virtual size: 2.0G (2147992064 bytes)
  disk size: 4.0K

  $ qemu-img info -f vpc my.vpc
  image: my.vpc
  file format: vpc
  virtual size: 2.0G (2147991552 bytes)
  disk size: 4.0K

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1819182/+subscriptions




[Bug 1884982] Re: User-emu documentation mentions inexistent "runtime" downloads

2021-08-25 Thread Thomas Huth
This is an automated cleanup. This bug report has been moved to QEMU's
new bug tracker on gitlab.com and thus gets marked as 'expired' now.
Please continue with the discussion here:

 https://gitlab.com/qemu-project/qemu/-/issues/560


** Tags added: net

** Changed in: qemu
   Status: In Progress => Expired

** Bug watch added: gitlab.com/qemu-project/qemu/-/issues #560
   https://gitlab.com/qemu-project/qemu/-/issues/560

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1884982

Title:
  User-emu documentation mentions inexistent "runtime" downloads

Status in QEMU:
  Expired

Bug description:
  The official documentation for the user-space emulator[1] contains
  many references to binary blobs no longer provided by  QEMU.org for
  download. The parts mentioning them should be rephrased to avoid
  confusion and instructions for building these components should be
  provided (maybe as a reference to the LFS book with some scripts,
  or... cut a deal with some super slim Linux distros). The specific
  parts are:

  * qemu-XXX-i386-wine.tar.gz, a wine build under the prefix /wine.
  * qemu-runtime-i386-XXX-.tar.gz, a glibc build.

    [1]: https://www.qemu.org/docs/master/user/main.html

  In addition, the documentation contains many other instances of
  inexistent "tar.gz" files, such as in "Network emulation". Most of
  these are inherited from the days of texi documentation more than 10
  years ago, and they are so old that GitHub's blame have become
  unreliable. Someone really should run `fgrep -r 'tar.gz' doc' on the
  QEMU source tree.

  The issue was previously reported as [2], but nobody bother enough to
  google the filename to find out where the confused user got the idea
  from.

    [2]: https://www.mail-archive.com/qemu-
  de...@nongnu.org/msg569174.html

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1884982/+subscriptions




Re: [PATCH v2 2/3] hw/usb/hcd-xhci-pci: Abort if setting link property failed

2021-08-25 Thread Markus Armbruster
Peter Maydell  writes:

> On Tue, 24 Aug 2021 at 16:15, Markus Armbruster  wrote:
>> True, except when I called it "kind of wrong", I was still talking about
>> functions with an Error **errp parameter.
>
> Oh yes, so you were. I even quoted your sentence starting
> "In functions with an Error **errp parameter ...".
> I must have been half-asleep still this morning.
>
> Apologies for starting an unnecessary thread after which we all
> turn out to be in complete agreement :-)

No problem at all :)




QEMU's Launchpad tracker is now closed down

2021-08-25 Thread Thomas Huth



 Hi all,

almost everybody is using the new Gitlab issue tracker 
(https://gitlab.com/qemu-project/qemu/-/issues) already, which is really 
great, but just to make it official: The QEMU Launchpad tracker 
(https://bugs.launchpad.net/qemu) is now discontinued and should not be used 
anymore. At least I will stop looking at the tickets there now. After QEMU 
6.1 has just been released, I now either closed the remaining tickets as 
"Fix released" or moved the non-resolved stragglers over to the gitlab issue 
tracker (there are three tickets left on Launchpad, which however are likely 
going to simply expire if nobody speaks up).


Note that notifications for the new Gitlab issue tracker are not sent to the 
mailing list anymore. If you are interesting, I strongly recommend to enable 
the notifications for the tracker in your gitlab account instead.


 Thanks,
  Thomas




Re: [PATCH 0/2] dump-guest-memory: Add blocker for migration

2021-08-25 Thread Markus Armbruster
Peter Xu  writes:

> Both dump-guest-memory and live migration have vm state cached internally.
> Allowing them to happen together means the vm state can be messed up.  Simply
> block live migration for dump-guest-memory.
>
> One trivial thing to mention is we should still allow dump-guest-memory even 
> if
> -only-migratable is specified, because that flag should majorly be used to
> guarantee not adding devices that will block migration by accident.  Dump 
> guest
> memory is not like that - it'll only block for the seconds when it's dumping.

I recently ran into a similarly unusual use of migration blockers:

Subject: -only-migrate and the two different uses of migration blockers
 (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
Message-ID: <87sg0amuuz.fsf...@dusky.pond.sub.org>

We appear to use migration blockers in two ways:

(1) Prevent migration for an indefinite time, typically due to use of
some feature that isn't compatible with migration.

(2) Delay migration for a short time.

Option -only-migrate is designed for (1).  It interferes with (2).

Example for (1): device "x-pci-proxy-dev" doesn't support migration.  It
adds a migration blocker on realize, and deletes it on unrealize.  With
-only-migrate, device realize fails.  Works as designed.

Example for (2): spapr_mce_req_event() makes an effort to prevent
migration degrate the reporting of FWNMIs.  It adds a migration blocker
when it receives one, and deletes it when it's done handling it.  This
is a best effort; if migration is already in progress by the time FWNMI
is received, we simply carry on, and that's okay.  However, option
-only-migrate sabotages the best effort entirely.

While this isn't exactly terrible, it may be a weakness in our thinking
and our infrastructure.  I'm bringing it up so the people in charge are
aware :)

https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html

Downthread there, Dave Gilbert opined

It almost feels like they need a way to temporarily hold off
'completion' of migratio - i.e. the phase where we stop the CPU and
write the device data;  mind you you'd also probably want it to stop
cold-migrates/snapshots?




[PATCH 3/5] vfio: defer to enable msix in migration resume phase

2021-08-25 Thread Longpeng(Mike)
The vf's unmasked msix vectors will be enable one by one in
migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
for each vector, it's a bit expensive if the vf has more
vectors.

We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
vector notifiers to reduce the cost.

The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
we mesure the cost of the vfio_msix_enable for each one, and
we can see 10% costs can be reduced.

Origin  Apply this patch
1st 8   4
2nd 15  11
3rd 22  18
4th 24  25
5th 36  33
6th 44  40
7th 51  47
8th 58  54
Total   258ms   232ms

Signed-off-by: Longpeng(Mike) 
---
 hw/vfio/pci.c | 22 ++
 hw/vfio/pci.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7cc43fe..ca37fb7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
msix)
 int ret = 0, i, argsz;
 int32_t *fds;
 
+if (!vdev->nr_vectors) {
+return 0;
+}
+
 argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
 
 irq_set = g_malloc0(argsz);
@@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 }
 }
 
+if (vdev->defer_add_virq) {
+vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
+goto clear_pending;
+}
+
 /*
  * We don't want to have the host allocate all possible MSI vectors
  * for a device if they're not in use, so we shutdown and incrementally
@@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
unsigned int nr,
 }
 }
 
+clear_pending:
 /* Disable PBA emulation when nothing more is pending. */
 clear_bit(nr, vdev->msix->pending);
 if (find_first_bit(vdev->msix->pending,
@@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
 if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
   vfio_msix_vector_release, NULL)) {
 error_report("vfio: msix_set_vector_notifiers failed");
+return;
+}
+
+if (!pdev->msix_function_masked && vdev->defer_add_virq) {
+int ret;
+vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
+ret = vfio_enable_vectors(vdev, true);
+if (ret) {
+error_report("vfio: failed to enable vectors, %d", ret);
+}
 }
 
 trace_vfio_msix_enable(vdev->vbasedev.name);
@@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
QEMUFile *f)
 if (msi_enabled(pdev)) {
 vfio_msi_enable(vdev);
 } else if (msix_enabled(pdev)) {
+vdev->defer_add_virq = true;
 vfio_msix_enable(vdev);
+vdev->defer_add_virq = false;
 }
 
 return ret;
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 6477751..4235c83 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -171,6 +171,7 @@ struct VFIOPCIDevice {
 bool no_kvm_ioeventfd;
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
+bool defer_add_virq;
 VFIODisplay *dpy;
 Notifier irqchip_change_notifier;
 };
-- 
1.8.3.1




[PATCH 4/5] kvm: irqchip: support defer to commit the route

2021-08-25 Thread Longpeng(Mike)
The kvm_irqchip_commit_routes() is relatively expensive, so
provide the users a choice to commit the route immediately
or not when they add msi/msix route.

Signed-off-by: Longpeng(Mike) 
---
 accel/kvm/kvm-all.c| 10 +++---
 accel/stubs/kvm-stub.c |  3 ++-
 hw/misc/ivshmem.c  |  2 +-
 hw/vfio/pci.c  |  2 +-
 hw/virtio/virtio-pci.c |  2 +-
 include/sysemu/kvm.h   |  4 +++-
 target/i386/kvm/kvm.c  |  2 +-
 7 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0125c17..1f788a2 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1950,7 +1950,8 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 return kvm_set_irq(s, route->kroute.gsi, 1);
 }
 
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
+  bool defer_commit)
 {
 struct kvm_irq_routing_entry kroute = {};
 int virq;
@@ -1993,7 +1994,9 @@ int kvm_irqchip_add_msi_route(KVMState *s, int vector, 
PCIDevice *dev)
 
 kvm_add_routing_entry(s, &kroute);
 kvm_arch_add_msi_route_post(&kroute, vector, dev);
-kvm_irqchip_commit_routes(s);
+if (!defer_commit) {
+kvm_irqchip_commit_routes(s);
+}
 
 return virq;
 }
@@ -2151,7 +2154,8 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 abort();
 }
 
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
+  bool defer_commit)
 {
 return -ENOSYS;
 }
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 5b1d00a..d5caaca 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -81,7 +81,8 @@ int kvm_on_sigbus(int code, void *addr)
 }
 
 #ifndef CONFIG_USER_ONLY
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
+  bool defer_commit)
 {
 return -ENOSYS;
 }
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 1ba4a98..98b14cc 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -429,7 +429,7 @@ static void ivshmem_add_kvm_msi_virq(IVShmemState *s, int 
vector,
 IVSHMEM_DPRINTF("ivshmem_add_kvm_msi_virq vector:%d\n", vector);
 assert(!s->msi_vectors[vector].pdev);
 
-ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev);
+ret = kvm_irqchip_add_msi_route(kvm_state, vector, pdev, false);
 if (ret < 0) {
 error_setg(errp, "kvm_irqchip_add_msi_route failed");
 return;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ca37fb7..3ab67d6 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -427,7 +427,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
VFIOMSIVector *vector,
 return;
 }
 
-virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev);
+virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev, false);
 if (virq < 0) {
 event_notifier_cleanup(&vector->kvm_interrupt);
 return;
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 433060a..7e2d021 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -684,7 +684,7 @@ static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy 
*proxy,
 int ret;
 
 if (irqfd->users == 0) {
-ret = kvm_irqchip_add_msi_route(kvm_state, vector, &proxy->pci_dev);
+ret = kvm_irqchip_add_msi_route(kvm_state, vector, &proxy->pci_dev, 
false);
 if (ret < 0) {
 return ret;
 }
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee..1932dc0 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -473,9 +473,11 @@ void kvm_init_cpu_signals(CPUState *cpu);
  *  message.
  * @dev:Owner PCI device to add the route. If @dev is specified
  *  as @NULL, an empty MSI message will be inited.
+ * @defer_commit:   Defer to commit new route to the KVM core.
  * @return: virq (>=0) when success, errno (<0) when failed.
  */
-int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev);
+int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev,
+  bool defer_commit);
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg,
  PCIDevice *dev);
 void kvm_irqchip_commit_routes(KVMState *s);
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index e69abe4..896406b 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -4724,7 +4724,7 @@ void kvm_arch_init_irq_routing(KVMState *s)
 /* If the ioapic is in QEMU and the lapics are in KVM, reserve
MSI routes for signaling interrupts to the local apics. */
 for (i = 0; i < IOAPIC_NUM_PINS; i++) {
-if (kvm_irqchip_add_msi_route(s, 0, NULL) < 0) {
+if (kvm_irqch

[PATCH 2/5] msix: simplfy the conditional in msix_set/unset_vector_notifiers

2021-08-25 Thread Longpeng(Mike)
'msix_function_masked' is kept pace with the device's config,
we can use it to replace the complex conditional in
msix_set/unset_vector_notifiers.

poll_notifier should be reset to NULL in the error path in
msix_set_vector_notifiers, fix it incidentally.

Signed-off-by: Longpeng(Mike) 
---
 hw/pci/msix.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331c..8057709 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -592,8 +592,7 @@ int msix_set_vector_notifiers(PCIDevice *dev,
 dev->msix_vector_release_notifier = release_notifier;
 dev->msix_vector_poll_notifier = poll_notifier;
 
-if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
-(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+if (!dev->msix_function_masked) {
 for (vector = 0; vector < dev->msix_entries_nr; vector++) {
 ret = msix_set_notifier_for_vector(dev, vector);
 if (ret < 0) {
@@ -612,6 +611,7 @@ undo:
 }
 dev->msix_vector_use_notifier = NULL;
 dev->msix_vector_release_notifier = NULL;
+dev->msix_vector_poll_notifier = NULL;
 return ret;
 }
 
@@ -622,8 +622,7 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
 assert(dev->msix_vector_use_notifier &&
dev->msix_vector_release_notifier);
 
-if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
-(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
+if (!dev->msix_function_masked) {
 for (vector = 0; vector < dev->msix_entries_nr; vector++) {
 msix_unset_notifier_for_vector(dev, vector);
 }
-- 
1.8.3.1




[PATCH 1/5] vfio: use helper to simplfy the failure path in vfio_msi_enable

2021-08-25 Thread Longpeng(Mike)
The main difference of the failure path in vfio_msi_enable and
vfio_msi_disable_common is enable INTX or not.

Extend the vfio_msi_disable_common to provide a arg to decide
whether need to fallback, and then we can use this helper to
instead the redundant code in vfio_msi_enable.

Signed-off-by: Longpeng(Mike) 
---
 hw/vfio/pci.c | 34 --
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e1ea1d8..7cc43fe 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -47,6 +47,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_msi_disable_common(VFIOPCIDevice *vdev, bool enable_intx);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -650,29 +651,17 @@ retry:
 if (ret) {
 if (ret < 0) {
 error_report("vfio: Error: Failed to setup MSI fds: %m");
-} else if (ret != vdev->nr_vectors) {
+} else {
 error_report("vfio: Error: Failed to enable %d "
  "MSI vectors, retry with %d", vdev->nr_vectors, ret);
 }
 
-for (i = 0; i < vdev->nr_vectors; i++) {
-VFIOMSIVector *vector = &vdev->msi_vectors[i];
-if (vector->virq >= 0) {
-vfio_remove_kvm_msi_virq(vector);
-}
-qemu_set_fd_handler(event_notifier_get_fd(&vector->interrupt),
-NULL, NULL, NULL);
-event_notifier_cleanup(&vector->interrupt);
-}
-
-g_free(vdev->msi_vectors);
-vdev->msi_vectors = NULL;
+vfio_msi_disable_common(vdev, false);
 
-if (ret > 0 && ret != vdev->nr_vectors) {
+if (ret > 0) {
 vdev->nr_vectors = ret;
 goto retry;
 }
-vdev->nr_vectors = 0;
 
 /*
  * Failing to setup MSI doesn't really fall within any specification.
@@ -680,7 +669,6 @@ retry:
  * out to fall back to INTx for this device.
  */
 error_report("vfio: Error: Failed to enable MSI");
-vdev->interrupt = VFIO_INT_NONE;
 
 return;
 }
@@ -688,7 +676,7 @@ retry:
 trace_vfio_msi_enable(vdev->vbasedev.name, vdev->nr_vectors);
 }
 
-static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
+static void vfio_msi_disable_common(VFIOPCIDevice *vdev, bool enable_intx)
 {
 Error *err = NULL;
 int i;
@@ -710,9 +698,11 @@ static void vfio_msi_disable_common(VFIOPCIDevice *vdev)
 vdev->nr_vectors = 0;
 vdev->interrupt = VFIO_INT_NONE;
 
-vfio_intx_enable(vdev, &err);
-if (err) {
-error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+if (enable_intx) {
+vfio_intx_enable(vdev, &err);
+if (err) {
+error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
+}
 }
 }
 
@@ -737,7 +727,7 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev)
 vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
 }
 
-vfio_msi_disable_common(vdev);
+vfio_msi_disable_common(vdev, true);
 
 memset(vdev->msix->pending, 0,
BITS_TO_LONGS(vdev->msix->entries) * sizeof(unsigned long));
@@ -748,7 +738,7 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev)
 static void vfio_msi_disable(VFIOPCIDevice *vdev)
 {
 vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSI_IRQ_INDEX);
-vfio_msi_disable_common(vdev);
+vfio_msi_disable_common(vdev, true);
 
 trace_vfio_msi_disable(vdev->vbasedev.name);
 }
-- 
1.8.3.1




[PATCH 0/5] optimize the downtime for vfio migration

2021-08-25 Thread Longpeng(Mike)
In vfio migration resume phase, the cost would increase if the
vfio device has more unmasked vectors. We try to optimize it in
this series.

Patch 1 & 2 are simple code cleanups.
Patch 3 defers to set irqs to vfio core.
Patch 4 & 5 defer to commit the route to KVM core. 

The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
we mesure the cost of the vfio_msix_enable for each one, and
we can see the total cost can be significantly reduced.

Origin Apply Patch 3 Apply Patch 3/4/5   
1st 8  4 2
2nd 15 112
3rd 22 182
4th 24 253
5th 36 332
6th 44 403
7th 51 473
8th 58 544
Total   258ms  232ms 21ms


Longpeng (Mike) (5):
  vfio: use helper to simplfy the failure path in vfio_msi_enable
  msix: simplfy the conditional in msix_set/unset_vector_notifiers
  vfio: defer to enable msix in migration resume phase
  kvm: irqchip: support defer to commit the route
  vfio: defer to commit kvm route in migraiton resume phase

 accel/kvm/kvm-all.c| 10 +++--
 accel/stubs/kvm-stub.c |  3 +-
 hw/misc/ivshmem.c  |  2 +-
 hw/pci/msix.c  |  7 ++--
 hw/vfio/pci.c  | 99 ++
 hw/vfio/pci.h  |  1 +
 hw/virtio/virtio-pci.c |  2 +-
 include/sysemu/kvm.h   |  4 +-
 target/i386/kvm/kvm.c  |  2 +-
 9 files changed, 95 insertions(+), 35 deletions(-)

-- 
1.8.3.1




[PATCH 5/5] vfio: defer to commit kvm route in migraiton resume phase

2021-08-25 Thread Longpeng(Mike)
In migration resume phase, all unmasked msix vectors need to be
setup when load the VF state. However, the setup operation would
takes longer if the VF has more unmasked vectors.

In our case, the VF has 65 vectors and each one spend at most 0.8ms
on setup operation the total cost of the VF is about 8-58ms. For a
VM that has 8 VFs of this type, the total cost is more than 250ms.

vfio_pci_load_config
  vfio_msix_enable
msix_set_vector_notifiers
  for (vector = 0; vector < dev->msix_entries_nr; vector++) {
vfio_msix_vector_do_use
  vfio_add_kvm_msi_virq
kvm_irqchip_commit_routes <-- expensive
  }

We can reduce the cost by only commit once outside the loop. The
routes is cached in kvm_state, we commit them first and then bind
irqfd for each vector.

The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
we mesure the cost of the vfio_msix_enable for each one, and
we can see 90+% costs can be reduce.

Origin  Apply this patch
and vfio enable optimization
1st 8   2
2nd 15  2
3rd 22  2
4th 24  3
5th 36  2
6th 44  3
7th 51  3
8th 58  4
Total   258ms   21ms

The optimition can be also applied to msi type.

Signed-off-by: Longpeng(Mike) 
---
 hw/vfio/pci.c | 47 ---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 3ab67d6..50e7ec7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -427,12 +427,17 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
VFIOMSIVector *vector,
 return;
 }
 
-virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev, false);
+virq = kvm_irqchip_add_msi_route(kvm_state, vector_n, &vdev->pdev,
+ vdev->defer_add_virq);
 if (virq < 0) {
 event_notifier_cleanup(&vector->kvm_interrupt);
 return;
 }
 
+if (vdev->defer_add_virq) {
+goto out;
+}
+
 if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state, &vector->kvm_interrupt,
NULL, virq) < 0) {
 kvm_irqchip_release_virq(kvm_state, virq);
@@ -440,6 +445,7 @@ static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, 
VFIOMSIVector *vector,
 return;
 }
 
+out:
 vector->virq = virq;
 }
 
@@ -577,6 +583,36 @@ static void vfio_msix_vector_release(PCIDevice *pdev, 
unsigned int nr)
 }
 }
 
+static void vfio_commit_kvm_msi_virq(VFIOPCIDevice *vdev)
+{
+int i;
+VFIOMSIVector *vector;
+bool commited = false;
+
+for (i = 0; i < vdev->nr_vectors; i++) {
+vector = &vdev->msi_vectors[i];
+
+if (vector->virq < 0) {
+continue;
+}
+
+/* Commit cached route entries to KVM core first if not yet */
+if (!commited) {
+kvm_irqchip_commit_routes(kvm_state);
+commited = true;
+}
+
+if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+   &vector->kvm_interrupt,
+   NULL, vector->virq) < 0) {
+kvm_irqchip_release_virq(kvm_state, vector->virq);
+event_notifier_cleanup(&vector->kvm_interrupt);
+vector->virq = -1;
+return;
+}
+}
+}
+
 static void vfio_msix_enable(VFIOPCIDevice *vdev)
 {
 PCIDevice *pdev = &vdev->pdev;
@@ -624,6 +660,7 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
 if (!pdev->msix_function_masked && vdev->defer_add_virq) {
 int ret;
 vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
+vfio_commit_kvm_msi_virq(vdev);
 ret = vfio_enable_vectors(vdev, true);
 if (ret) {
 error_report("vfio: failed to enable vectors, %d", ret);
@@ -664,6 +701,10 @@ retry:
 vfio_add_kvm_msi_virq(vdev, vector, i, false);
 }
 
+if (vdev->defer_add_virq){
+vfio_commit_kvm_msi_virq(vdev);
+}
+
 /* Set interrupt type prior to possible interrupts */
 vdev->interrupt = VFIO_INT_MSI;
 
@@ -2473,13 +2514,13 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
QEMUFile *f)
 vfio_pci_write_config(pdev, PCI_COMMAND,
   pci_get_word(pdev->config + PCI_COMMAND), 2);
 
+vdev->defer_add_virq = true;
 if (msi_enabled(pdev)) {
 vfio_msi_enable(vdev);
 } else if (msix_enabled(pdev)) {
-vdev->defer_add_virq = true;
 vfio_msix_enable(vdev);
-vdev->defer_add_virq = false;
 }
+vdev->defer_add_virq = false;
 
 return ret;
 }
-- 
1.8.3.1




Re: [PATCH 1/2] migration: Add migrate_add_blocker_internal()

2021-08-25 Thread Juan Quintela
Peter Xu  wrote:
> An internal version that removes -only-migratable implications.  It can be 
> used
> for temporary migration blockers like dump-guest-memory.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Juan Quintela 




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-25 Thread David Hildenbrand

On 24.08.21 21:52, Peter Xu wrote:

On Tue, Aug 24, 2021 at 06:24:27PM +0200, David Hildenbrand wrote:

Not so much; here's the list of priorities and the devices using it:

 |+-|
 | priority   | devices |
 |+-|
 | MIG_PRI_IOMMU  |   3 |
 | MIG_PRI_PCI_BUS|   7 |
 | MIG_PRI_VIRTIO_MEM |   1 |
 | MIG_PRI_GICV3_ITS  |   1 |
 | MIG_PRI_GICV3  |   1 |
 |+-|


iommu is probably ok. I think virtio mem is ok too,
in that it is normally created by virtio-mem-pci ...


IIRC:

intel-iommu has to be created on the QEMU cmdline before creating
virtio-mem-pci.

-device intel-iommu,caching-mode=on,intremap=on,device-iotlb=on \
...
-device 
virtio-mem-pci,disable-legacy=on,disable-modern=off,iommu_platform=on,ats=on,id=vm0,...

Creating virtio-mem-pci will implicitly create virtio-mem. virtio-mem device
state has to be migrated before migrating intel-iommu state.


Since we're at it.. frankly I didn't fully digest why virtio-mem needs to be
migrated before when reading commit 0fd7616e0f1171b.  It gives me the feeling
more like that virtio-mem has a ordering dependency with vfio-pci not
virtio-mem, but I could be wrong.


"We have to take care of incoming migration: at the point the
 IOMMUs get restored and start creating mappings in vfio,
 RamDiscardManager implementations might not be back up and running yet:
 let's add runstate priorities to enforce the order when restoring.

s/runstate/vmstate/ :|

I recall that we can see IOMMU_NOTIFIER_MAP events when restoring an 
IOMMU device. vfio_get_xlat_addr() would be called and could reject 
mappings targeting virtio-mem memory in case the virtio-mem device has 
not restored its bitmap from the migration stream such that 
ram_discard_manager_is_populated() would be reliable. Consequently, we 
have to restore the virtio-mem device state (not the virtio-mem-pci 
device state!) before restoring an IOMMU.






E.g., the IOMMU unit shouldn't be walking page table if without a device using
it, then it won't fail until the device walks it in region_add() hooks when
memory replay happens.


I recall it happened when switching to the iommu region e.g., in 
vtd_post_load()->vtd_switch_address_space(). But I forgot the exact call 
path.


--
Thanks,

David / dhildenb




Re: [PATCH v2 1/5] qemu/qarray.h: introduce QArray

2021-08-25 Thread Markus Armbruster
Christian Schoenebeck  writes:

> On Dienstag, 24. August 2021 17:24:50 CEST Christian Schoenebeck wrote:
>> On Dienstag, 24. August 2021 16:45:12 CEST Markus Armbruster wrote:
>> > Christian Schoenebeck  writes:
>> > > On Dienstag, 24. August 2021 10:22:52 CEST Markus Armbruster wrote:
>> > [...]
>> > 
>> > >> Please use GPLv2+ unless you have a compelling reason not to.
>> > >> 
>> > >> [...]
>> > > 
>> > > Is that a requirement?
>> > > 
>> > > It is just my personal license preference. AFAICS there are numerous
>> > > sources in QEMU released under MIT license as well.
>> > 
>> > The licensing situation is a mess.
>> > 
>> > The only hard requirement is "compatible with GPLv2+".  We prefer GPLv2+
>> > for new code, except as detailed in ./LICENSE.  We're stuck with a
>> > sizable body of existing code that is GPLv2 (no +), but decided to put
>> > limits to that madness.  Again, see ./LICENSE for details.
>> > 
>> > I'm asking you to help with limiting the madness by sticking to GPLv2+
>> > whenever possible.
>> 
>> Okay, I see that there is quite a homogenous license structure in Qemu.

Self-inflicted wound.  We should have insisted on GPLv2+.

>> However the MIT license is a very permissive license, so I don't see any
>> conflicts.
>
> s/homogenous/heterogeneous/
>
>> What if I release this file under public domain? That's not even copyleft at
>> all. What that be OK for you?
>
> "Would" that be OK for you?

My preference: GPLv2+ > MIT > public domain.

If you go with anything but GPLv2+, please explain why in your commit
message.  One sentence should suffice, say "MIT license to minimize
license issues when "stealing" this code for other projects."

>> My idea was that people might simply take this header file and use it in
>> other C projects as well. Putting it under GPL might cause conflicts for
>> other projects.




Re: [RFC 06/10] hw/mos6522: Implement oneshot mode

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 19 ---
  include/hw/misc/mos6522.h |  3 +++
  2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 8991f4..5b1657ac0d 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -79,6 +79,7 @@ static void set_counter(MOS6522State *s, MOS6522Timer *ti, 
unsigned int val)
  trace_mos6522_set_counter(1 + ti->index, val);
  ti->load_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
  ti->counter_value = val;
+ti->oneshot_fired = false;
  if (ti->index == 0) {
  mos6522_timer1_update(s, ti, ti->load_time);
  } else {
@@ -133,7 +134,8 @@ static void mos6522_timer1_update(MOS6522State *s, 
MOS6522Timer *ti,
  return;
  }
  ti->next_irq_time = get_next_irq_time(s, ti, current_time);
-if ((s->ier & T1_INT) == 0 || (s->acr & T1MODE) != T1MODE_CONT) {
+if ((s->ier & T1_INT) == 0 ||
+((s->acr & T1MODE) == T1MODE_ONESHOT && ti->oneshot_fired)) {
  timer_del(ti->timer);
  } else {
  timer_mod(ti->timer, ti->next_irq_time);
@@ -147,7 +149,7 @@ static void mos6522_timer2_update(MOS6522State *s, 
MOS6522Timer *ti,
  return;
  }
  ti->next_irq_time = get_next_irq_time(s, ti, current_time);
-if ((s->ier & T2_INT) == 0) {
+if ((s->ier & T2_INT) == 0 || (s->acr & T2MODE) || ti->oneshot_fired) {
  timer_del(ti->timer);
  } else {
  timer_mod(ti->timer, ti->next_irq_time);
@@ -159,6 +161,7 @@ static void mos6522_timer1_expired(void *opaque)
  MOS6522State *s = opaque;
  MOS6522Timer *ti = &s->timers[0];
  
+ti->oneshot_fired = true;

  mos6522_timer1_update(s, ti, ti->next_irq_time);
  s->ifr |= T1_INT;
  mos6522_update_irq(s);
@@ -169,6 +172,7 @@ static void mos6522_timer2_expired(void *opaque)
  MOS6522State *s = opaque;
  MOS6522Timer *ti = &s->timers[1];
  
+ti->oneshot_fired = true;

  mos6522_timer2_update(s, ti, ti->next_irq_time);
  s->ifr |= T2_INT;
  mos6522_update_irq(s);


I was trying to understand why you need ti->oneshot_fired here since the 
mos6522_timer*_update() functions should simply not re-arm the timer if not in 
continuous mode...



@@ -198,10 +202,12 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned 
size)
  int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
  
  if (now >= s->timers[0].next_irq_time) {

+s->timers[0].oneshot_fired = true;
  mos6522_timer1_update(s, &s->timers[0], now);
  s->ifr |= T1_INT;
  }
  if (now >= s->timers[1].next_irq_time) {
+s->timers[1].oneshot_fired = true;
  mos6522_timer2_update(s, &s->timers[1], now);
  s->ifr |= T2_INT;
  }


...however this block above raises the timer interrupt outside of the timer callback. 
This block isn't part of your original patch but was introduced as part of 
cd8843ff25d ("mos6522: fix T1 and T2 timers") but I'm wondering if it is wrong.


If you remove both of the above if (now ... ) {} blocks then does one-shot mode work 
by just adding the (s->acr & T2MODE) check in mos6522_timer2_update()? I'm guessing 
that Linux/m68k does use one or both of the timers in one-shot mode?



@@ -279,6 +285,7 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
  {
  MOS6522State *s = opaque;
  MOS6522DeviceClass *mdc = MOS6522_GET_CLASS(s);
+int64_t now;
  
  trace_mos6522_write(addr, val);
  
@@ -318,9 +325,6 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)

  s->timers[1].latch = (s->timers[1].latch & 0xff00) | val;
  break;
  case VIA_REG_T2CH:
-/* To ensure T2 generates an interrupt on zero crossing with the
-   common timer code, write the value directly from the latch to
-   the counter */
  s->timers[1].latch = (s->timers[1].latch & 0xff) | (val << 8);
  s->ifr &= ~T2_INT;
  set_counter(s, &s->timers[1], s->timers[1].latch);
@@ -330,8 +334,9 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
  break;
  case VIA_REG_ACR:
  s->acr = val;
-mos6522_timer1_update(s, &s->timers[0],
-  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL));
+now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+mos6522_timer1_update(s, &s->timers[0], now);
+mos6522_timer2_update(s, &s->timers[1], now);
  break;
  case VIA_REG_PCR:
  s->pcr = val;
diff --git a/include/hw/misc/mos6522.h b/include/hw/misc/mos6522.h
index fc95d22b0f..94b1dc324c 100644
--- a/include/hw/misc/mos6522.h
+++ b/include/hw/misc/mos6522.h
@@ -50,8 +50,10 @@
  #define T1_INT 0x40/* Timer 1 interrupt */
  
  /* Bits in ACR */

+#define T2MODE 0x20/* Timer 2 mode */
  #define T1MODE 0xc0/* Timer 1 

Re: [RFC 07/10] hw/mos6522: Fix initial timer counter reload

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


The first reload of timer 1 is early by half of a clock cycle as it gets
measured from a falling edge. By contrast, the succeeding reloads are
measured from rising edge to rising edge.

Neglecting that complication, the behaviour of the counter should be the
same from one reload to the next. The sequence is always:

N, N-1, N-2, ... 2, 1, 0, -1, N, N-1, N-2, ...

But at the first reload, the present driver does this instead:

N, N-1, N-2, ... 2, 1, 0, -1, N-1, N-2, ...

Fix this deviation for both timer 1 and timer 2, and allow for the fact
that on a real 6522 the timer 2 counter is not reloaded when it wraps.

Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 19 +++
  1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 5b1657ac0d..0a241fe9f8 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -63,15 +63,16 @@ static unsigned int get_counter(MOS6522State *s, 
MOS6522Timer *ti)
  if (ti->index == 0) {
  /* the timer goes down from latch to -1 (period of latch + 2) */
  if (d <= (ti->counter_value + 1)) {
-counter = (ti->counter_value - d) & 0x;
+counter = ti->counter_value - d;
  } else {
-counter = (d - (ti->counter_value + 1)) % (ti->latch + 2);
-counter = (ti->latch - counter) & 0x;
+int64_t d_post_reload = d - (ti->counter_value + 2);
+/* XXX this calculation assumes that ti->latch has not changed */
+counter = ti->latch - (d_post_reload % (ti->latch + 2));
  }
  } else {
-counter = (ti->counter_value - d) & 0x;
+counter = ti->counter_value - d;
  }
-return counter;
+return counter & 0x;
  }
  
  static void set_counter(MOS6522State *s, MOS6522Timer *ti, unsigned int val)

@@ -103,11 +104,13 @@ static int64_t get_next_irq_time(MOS6522State *s, 
MOS6522Timer *ti,
  
  /* the timer goes down from latch to -1 (period of latch + 2) */

  if (d <= (ti->counter_value + 1)) {
-counter = (ti->counter_value - d) & 0x;
+counter = ti->counter_value - d;
  } else {
-counter = (d - (ti->counter_value + 1)) % (ti->latch + 2);
-counter = (ti->latch - counter) & 0x;
+int64_t d_post_reload = d - (ti->counter_value + 2);
+/* XXX this calculation assumes that ti->latch has not changed */
+counter = ti->latch - (d_post_reload % (ti->latch + 2));
  }
+counter &= 0x;
  
  /* Note: we consider the irq is raised on 0 */

  if (counter == 0x) {


I think the code looks right, but I couldn't see an explicit reference to this 
behaviour in http://archive.6502.org/datasheets/mos_6522_preliminary_nov_1977.pdf. 
Presumably this matches what you've observed on real hardware?



ATB,

Mark.



Re: [RFC 08/10] hw/mos6522: Call mos6522_update_irq() when appropriate

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


It necessary to call mos6522_update_irq() when the interrupt flags
change and unnecessary when they haven't.

Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 0a241fe9f8..0dd3ccf945 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -208,11 +208,13 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned 
size)
  s->timers[0].oneshot_fired = true;
  mos6522_timer1_update(s, &s->timers[0], now);
  s->ifr |= T1_INT;
+mos6522_update_irq(s);
  }
  if (now >= s->timers[1].next_irq_time) {
  s->timers[1].oneshot_fired = true;
  mos6522_timer2_update(s, &s->timers[1], now);
  s->ifr |= T2_INT;
+mos6522_update_irq(s);
  }


Again this seems to be in the block of code I'm not sure is correct, so my first 
instinct is to see if removing it helps first - although the patch logically seems 
correct.



  switch (addr) {
  case VIA_REG_B:
@@ -237,7 +239,6 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned 
size)
  break;
  case VIA_REG_T1CH:
  val = get_counter(s, &s->timers[0]) >> 8;
-mos6522_update_irq(s);


As get_counter() simply generates the current counter value I'd say this part 
is correct.


  break;
  case VIA_REG_T1LL:
  val = s->timers[0].latch & 0xff;




ATB,

Mark.



Re: [RFC 09/10] hw/mos6522: Avoid using discrepant QEMU clock values

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


mos6522_read() and mos6522_write() may call various functions to determine
timer irq state, timer counter value and QEMUTimer deadline. All called
functions must use the same value for the present time.

Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 51 +--
  1 file changed, 27 insertions(+), 24 deletions(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 0dd3ccf945..23a440b64f 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -39,9 +39,9 @@
  /* XXX: implement all timer modes */
  
  static void mos6522_timer1_update(MOS6522State *s, MOS6522Timer *ti,

-  int64_t current_time);
+  int64_t now);
  static void mos6522_timer2_update(MOS6522State *s, MOS6522Timer *ti,
-  int64_t current_time);
+  int64_t now);
  
  static void mos6522_update_irq(MOS6522State *s)

  {
@@ -52,12 +52,12 @@ static void mos6522_update_irq(MOS6522State *s)
  }
  }
  
-static unsigned int get_counter(MOS6522State *s, MOS6522Timer *ti)

+static unsigned int get_counter(MOS6522State *s, MOS6522Timer *ti, int64_t now)
  {
  int64_t d;
  unsigned int counter;
  
-d = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - ti->load_time,

+d = muldiv64(now - ti->load_time,
   ti->frequency, NANOSECONDS_PER_SECOND);
  
  if (ti->index == 0) {

@@ -89,7 +89,7 @@ static void set_counter(MOS6522State *s, MOS6522Timer *ti, 
unsigned int val)
  }
  
  static int64_t get_next_irq_time(MOS6522State *s, MOS6522Timer *ti,

- int64_t current_time)
+ int64_t now)
  {
  int64_t d, next_time;
  unsigned int counter;
@@ -99,7 +99,7 @@ static int64_t get_next_irq_time(MOS6522State *s, 
MOS6522Timer *ti,
  }
  
  /* current counter value */

-d = muldiv64(qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) - ti->load_time,
+d = muldiv64(now - ti->load_time,
   ti->frequency, NANOSECONDS_PER_SECOND);
  
  /* the timer goes down from latch to -1 (period of latch + 2) */

@@ -123,20 +123,19 @@ static int64_t get_next_irq_time(MOS6522State *s, 
MOS6522Timer *ti,
  trace_mos6522_get_next_irq_time(ti->latch, d, next_time - d);
  next_time = muldiv64(next_time, NANOSECONDS_PER_SECOND, ti->frequency) +
   ti->load_time;
-
-if (next_time <= current_time) {
-next_time = current_time + 1;
-}
  return next_time;
  }
  
  static void mos6522_timer1_update(MOS6522State *s, MOS6522Timer *ti,

- int64_t current_time)
+  int64_t now)
  {
  if (!ti->timer) {
  return;
  }
-ti->next_irq_time = get_next_irq_time(s, ti, current_time);
+ti->next_irq_time = get_next_irq_time(s, ti, now);
+if (ti->next_irq_time <= now) {
+ti->next_irq_time = now + 1;
+}
  if ((s->ier & T1_INT) == 0 ||
  ((s->acr & T1MODE) == T1MODE_ONESHOT && ti->oneshot_fired)) {
  timer_del(ti->timer);
@@ -146,12 +145,15 @@ static void mos6522_timer1_update(MOS6522State *s, 
MOS6522Timer *ti,
  }
  
  static void mos6522_timer2_update(MOS6522State *s, MOS6522Timer *ti,

- int64_t current_time)
+  int64_t now)
  {
  if (!ti->timer) {
  return;
  }
-ti->next_irq_time = get_next_irq_time(s, ti, current_time);
+ti->next_irq_time = get_next_irq_time(s, ti, now);
+if (ti->next_irq_time <= now) {
+ti->next_irq_time = now + 1;
+}
  if ((s->ier & T2_INT) == 0 || (s->acr & T2MODE) || ti->oneshot_fired) {
  timer_del(ti->timer);
  } else {
@@ -163,9 +165,10 @@ static void mos6522_timer1_expired(void *opaque)
  {
  MOS6522State *s = opaque;
  MOS6522Timer *ti = &s->timers[0];
+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
  
  ti->oneshot_fired = true;

-mos6522_timer1_update(s, ti, ti->next_irq_time);
+mos6522_timer1_update(s, ti, now);


Presumably using ti->next_irq_time has already fixed the current time to be that at 
which the timer routine actually expired, rather than the current executing time. Are 
you seeing large differences in these numbers that can cause timer drift? If so, I'm 
wondering if this change should be in a separate patch.



  s->ifr |= T1_INT;
  mos6522_update_irq(s);
  }
@@ -174,9 +177,10 @@ static void mos6522_timer2_expired(void *opaque)
  {
  MOS6522State *s = opaque;
  MOS6522Timer *ti = &s->timers[1];
+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
  
  ti->oneshot_fired = true;

-mos6522_timer2_update(s, ti, ti->next_irq_time);
+mos6522_timer2_update(s, ti, now);


And same again here.


  s->ifr |= T2_INT;
  mos6522_update_irq(s);
  }
@@ -233,12 +237,12 @@ uint64

Re: [RFC 10/10] hw/mos6522: Synchronize timer interrupt and timer counter

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


We rely on a QEMUTimer callback to set the interrupt flag, and this races
with counter register accesses, such that the guest might see the counter
reloaded but might not see the interrupt flagged.

According to the datasheet, a real 6522 device counts down to , then
raises the relevant IRQ. After the  count, the counter reloads from
the latch (for timer 1) or continues to decrement thru FFFE (for timer 2).

Therefore, the guest operating system may read zero from T1CH and infer
that the counter has not yet wrapped (given another full count hasn't
yet elapsed.)

Similarly, the guest may find the timer interrupt flag to be set and
infer that the counter is non-zero (given another full count hasn't yet
elapsed).

Synchronize the timer counter and interrupt flag such that the guest will
observe the correct sequence of states. (It's still not right, because in
reality it's not possible to access the registers more than once per
"phase 2" clock cycle.)

Eliminate the duplication of logic in get_counter() and
get_next_irq_time() by calling the former before the latter.

Note that get_counter() is called prior to changing the latch. This is
because get_counter() may need to use the old latch value in order to
reload the counter.

Signed-off-by: Finn Thain 
---
  hw/misc/mos6522.c | 154 --
  hw/misc/trace-events  |   2 +-
  include/hw/misc/mos6522.h |   8 +-
  3 files changed, 88 insertions(+), 76 deletions(-)

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 23a440b64f..bd5df4963b 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -52,26 +52,58 @@ static void mos6522_update_irq(MOS6522State *s)
  }
  }
  
+static void mos6522_timer_raise_irq(MOS6522State *s, MOS6522Timer *ti)

+{
+if (ti->state == irq) {
+return;
+}
+ti->state = irq;
+if (ti->index == 0) {
+s->ifr |= T1_INT;
+} else {
+s->ifr |= T2_INT;
+}
+mos6522_update_irq(s);
+}
+
  static unsigned int get_counter(MOS6522State *s, MOS6522Timer *ti, int64_t 
now)
  {
  int64_t d;
  unsigned int counter;
-
+bool reload;
+
+/*
+ * Timer 1 counts down from the latch value to -1 (period of latch + 2),
+ * then raises its interrupt and reloads.
+ * Timer 2 counts down from the latch value to -1, then raises its
+ * interrupt and continues to -2 and so on without any further interrupts.
+ * (In reality, the first count should be measured from the falling edge
+ * of the "phase two" clock, making its period N + 1.5. The subsequent
+ * counts have period N + 2. This detail has been ignored here.)
+ */
  d = muldiv64(now - ti->load_time,
   ti->frequency, NANOSECONDS_PER_SECOND);
  
-if (ti->index == 0) {

-/* the timer goes down from latch to -1 (period of latch + 2) */
-if (d <= (ti->counter_value + 1)) {
-counter = ti->counter_value - d;
-} else {
-int64_t d_post_reload = d - (ti->counter_value + 2);
-/* XXX this calculation assumes that ti->latch has not changed */
-counter = ti->latch - (d_post_reload % (ti->latch + 2));
-}
-} else {
-counter = ti->counter_value - d;
+reload = (d >= ti->counter_value + 2);
+
+if (ti->index == 0 && reload) {
+int64_t more_reloads;
+
+d -= ti->counter_value + 2;
+more_reloads = d / (ti->latch + 2);
+d -= more_reloads * (ti->latch + 2);
+ti->load_time += muldiv64(ti->counter_value + 2 +
+  more_reloads * (ti->latch + 2),
+  NANOSECONDS_PER_SECOND, ti->frequency);
+ti->counter_value = ti->latch;
  }
+
+counter = ti->counter_value - d;
+
+if (reload) {
+mos6522_timer_raise_irq(s, ti);
+}
+
  return counter & 0x;
  }
  
@@ -80,7 +112,7 @@ static void set_counter(MOS6522State *s, MOS6522Timer *ti, unsigned int val)

  trace_mos6522_set_counter(1 + ti->index, val);
  ti->load_time = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
  ti->counter_value = val;
-ti->oneshot_fired = false;
+ti->state = decrement;
  if (ti->index == 0) {
  mos6522_timer1_update(s, ti, ti->load_time);
  } else {
@@ -91,38 +123,15 @@ static void set_counter(MOS6522State *s, MOS6522Timer *ti, 
unsigned int val)
  static int64_t get_next_irq_time(MOS6522State *s, MOS6522Timer *ti,
   int64_t now)
  {
-int64_t d, next_time;
-unsigned int counter;
+int64_t next_time;
  
  if (ti->frequency == 0) {

  return INT64_MAX;
  }
  
-/* current counter value */

-d = muldiv64(now - ti->load_time,
- ti->frequency, NANOSECONDS_PER_SECOND);
-
-/* the timer goes down from latch to -1 (period of latch + 2) */
-if (d <= (ti->counter_value + 1)) {
-counter = ti->counter_value - d;
-

Re: [RFC 00/10] hw/mos6522: VIA timer emulation fixes and improvements

2021-08-25 Thread Mark Cave-Ayland

On 24/08/2021 11:09, Finn Thain wrote:


This is a patch series that I started last year. The aim was to try to
get a monotonic clocksource for Linux/m68k guests. That aim hasn't been
achieved yet (for q800 machines) but I'm submitting the patch series as
an RFC because,

  - It does improve 6522 emulation fidelity.

  - It allows Linux/m68k to make use of the additional timer that the
hardware indeed offers but which QEMU omits. This has several
benefits for Linux guests [1].

  - I see that Mark has been working on timer emulation issues in his
github repo [2] and it seems likely that MacOS, NetBSD or A/UX guests
will also require better 6522 emulation.

To make collaboration easier these patches can also be fetched from
github [3].

On a real Quadra, accesses to the SY6522 chips are slow because they are
synchronous with the 783360 Hz "phase 2" clock. In QEMU, they are slow
only because of the division operation in the timer count calculation.

This patch series improves the fidelity of the emulated chip, but the
price is more division ops. I haven't tried to measure this yet.

The emulated 6522 still deviates from the behaviour of the real thing,
however. For example, two consecutive accesses to a real 6522 timer
counter can never yield the same value. This is not true of the 6522 in
QEMU 6 wherein two consecutive accesses to a timer count register have
been observed to yield the same value.

Linux is not particularly robust in the face of a 6522 that deviates
from the usual behaviour. The problem presently affecting a Linux guest
is that its 'via' clocksource is prone to monotonicity failure. That is,
the clocksource counter can jump backwards. This can be observed by
patching Linux like so:

diff --git a/arch/m68k/mac/via.c b/arch/m68k/mac/via.c
--- a/arch/m68k/mac/via.c
+++ b/arch/m68k/mac/via.c
@@ -606,6 +606,8 @@ void __init via_init_clock(void)
clocksource_register_hz(&mac_clk, VIA_CLOCK_FREQ);
  }
  
+static u32 prev_ticks;

+
  static u64 mac_read_clk(struct clocksource *cs)
  {
unsigned long flags;
@@ -631,6 +633,8 @@ static u64 mac_read_clk(struct clocksource *cs)
count = count_high << 8;
ticks = VIA_TIMER_CYCLES - count;
ticks += clk_offset + clk_total;
+if (ticks < prev_ticks) pr_warn("%s: %u < %u\n", __func__, ticks, prev_ticks);
+prev_ticks = ticks;
local_irq_restore(flags);
  
  	return ticks;


This problem can be partly blamed on a 6522 design limitation, which is
that the timer counter has no overflow register. Hence, if a timer
counter wraps around and the kernel is late to handle the subsequent
interrupt, the kernel can't account for any missed ticks.

On a real Quadra, the kernel mitigates this limitation by minimizing
interrupt latency. But on QEMU, interrupt latency is unbounded. This
can't be mitigated by the guest kernel at all and leads to clock drift.
This can be observed by patching QEMU like so:

diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -379,6 +379,12 @@ void mos6522_write(void *opaque, hwaddr addr, uint64_t 
val, unsigned size)
  s->pcr = val;
  break;
  case VIA_REG_IFR:
+if (val & T1_INT) {
+static int64_t last_t1_int_cleared;
+int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+if (now - last_t1_int_cleared > 2000) printf("\t%s: t1 int clear is 
late\n", __func__);
+last_t1_int_cleared = now;
+}
  /* reset bits */
  s->ifr &= ~val;
  mos6522_update_irq(s);

This logic asserts that, given that Linux/m68k sets CONFIG_HZ to 100,
the emulator will theoretically see each timer 1 interrupt cleared
within 20 ms of the previous one. But that deadline is often missed on
my QEMU host [4].

On real Mac hardware you could observe the same scenario if a high
priority interrupt were to sufficiently delay the timer interrupt
handler. (This is the reason why the VIA1 interrupt priority gets
increased from level 1 to level 5 when running on Quadras.)

Anyway, for now, the clocksource monotonicity problem in Linux/mac68k
guests is still unresolved. Nonetheless, I think this patch series does
improve the situation.

[1] I've also been working on some improvements to Linux/m68k based on
Arnd Bergman's clockevent RFC patch,
https://lore.kernel.org/linux-m68k/20201008154651.1901126-14-a...@arndb.de/
The idea is to add a oneshot clockevent device by making use of the
second VIA1 timer. This approach should help mitigate the clock drift
problem as well as assist with GENERIC_CLOCKEVENTS adoption.

[2] https://github.com/mcayland/qemu/commits/q800.upstream

[3] https://github.com/fthain/qemu/commits/via-timer/

[4] This theoretical 20 ms deadline is not missed prior to every
backwards jump in the clocksource counter. AFAICT, that's because the
true deadline is somewhat shorter than 20 ms.


Finn Thain (10):
   hw/mos6522: Remove get_load_time() methods and fu

[PATCH v2 1/3] softmmu/vl: Add a "grab-mod" parameter to the -display sdl option

2021-08-25 Thread Thomas Huth
The -display sdl option is not using QAPI internally yet, and uses hand-
crafted parsing instead (see parse_display() in vl.c), which is quite
ugly, since most of the other code is using the QAPIfied DisplayOption
already. Unfortunately, the "alt_grab" and "ctrl_grab" use underscores in
their names which has recently been forbidden in new QAPI code, so
a straight conversion is not possible. While we could add some exceptions
to the QAPI schema parser for this, the way these parameters have been
designed was maybe a bad idea anyway: First, it's not possible to enable
both parameters at the same time, thus instead of two boolean parameters
it would be better to have only one multi-choice parameter instead.
Second, the naming is also somewhat unfortunate since the "alt_grab"
parameter is not about the ALT key, but rather about the left SHIFT key
that has to be used additionally when the parameter is enabled.

So instead of trying to QAPIfy "alt_grab" and "ctrl_grab", let's rather
introduce an alternative to these parameters instead, a new parameter
called "grab-mod" which can either be set to "lshift-lctrl-lalt" or to
"rctrl". In case we ever want to support additional modes later, we can
then also simply extend the list of supported strings here.

Signed-off-by: Thomas Huth 
---
 qemu-options.hx |  6 +-
 softmmu/vl.c| 15 ---
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 83aa59a920..0bff756ded 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1834,7 +1834,7 @@ DEF("display", HAS_ARG, QEMU_OPTION_display,
 #endif
 #if defined(CONFIG_SDL)
 "-display sdl[,alt_grab=on|off][,ctrl_grab=on|off][,gl=on|core|es|off]\n"
-"[,show-cursor=on|off][,window-close=on|off]\n"
+"
[,grab-mod=][,show-cursor=on|off][,window-close=on|off]\n"
 #endif
 #if defined(CONFIG_GTK)
 "-display gtk[,full-screen=on|off][,gl=on|off][,grab-on-hover=on|off]\n"
@@ -1880,6 +1880,10 @@ SRST
 window; see the SDL documentation for other possibilities).
 Valid parameters are:
 
+``grab-mod=`` : Used to select the modifier keys for toggling
+the mouse grabbing in conjunction with the "g" key. `` can be
+either `lshift-lctrl-lalt` or `rctrl`.
+
 ``alt_grab=on|off`` : Use Control+Alt+Shift-g to toggle mouse grabbing
 
 ``ctrl_grab=on|off`` : Use Right-Control-g to toggle mouse grabbing
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5ca11e7469..294990debf 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1017,15 +1017,24 @@ static void parse_display(const char *p)
  * parse_display_qapi() due to some options not in
  * DisplayOptions, specifically:
  *   - ctrl_grab + alt_grab
- * Not clear yet what happens to them long-term.  Should
- * replaced by something better or deprecated and dropped.
+ * They can't be moved into the QAPI since they use underscores,
+ * thus they will get replaced by "grab-mod" in the long term
  */
 #if defined(CONFIG_SDL)
 dpy.type = DISPLAY_TYPE_SDL;
 while (*opts) {
 const char *nextopt;
 
-if (strstart(opts, ",alt_grab=", &nextopt)) {
+if (strstart(opts, ",grab-mod=", &nextopt)) {
+opts = nextopt;
+if (strstart(opts, "lshift-lctrl-lalt", &nextopt)) {
+alt_grab = 1;
+} else if (strstart(opts, "rctrl", &nextopt)) {
+ctrl_grab = 1;
+} else {
+goto invalid_sdl_args;
+}
+} else if (strstart(opts, ",alt_grab=", &nextopt)) {
 opts = nextopt;
 if (strstart(opts, "on", &nextopt)) {
 alt_grab = 1;
-- 
2.27.0




[PATCH v2 0/3] softmmu/vl: Deprecate old and crufty display ui options

2021-08-25 Thread Thomas Huth
-display sdl uses a hand-crafted parser in vl.c, which is quite ugly
since the other parts of -display have been QAPIfied already. A straight
conversion to QAPI is not advisable since the "alt_grab" and "ctrl_grab"
parameters are not the best solution anyway. So this patch series
introduces a new "grab-mod" parameter as replacement instead and then
deprecates the old and crufty options.

While we're at it, the third patch also suggests to deprecated the
old -sdl and -curses top-level options.

v2:
 - Update version numbers to 6.2
 - Added Acked-bys from Peter Krempa

Thomas Huth (3):
  softmmu/vl: Add a "grab-mod" parameter to the -display sdl option
  softmmu/vl: Deprecate the old grab options
  softmmu/vl: Deprecate the -sdl and -curses option

 docs/about/deprecated.rst | 20 
 qemu-options.hx   | 18 +-
 softmmu/vl.c  | 24 +---
 3 files changed, 54 insertions(+), 8 deletions(-)

-- 
2.27.0




[PATCH v2 2/3] softmmu/vl: Deprecate the old grab options

2021-08-25 Thread Thomas Huth
The alt_grab and ctrl_grab parameter of the -display sdl option prevent
the QAPIfication of the "sdl" part of the -display option, so we should
eventually remove them. And since this feature is also rather niche anyway,
we should not clutter the top-level option list with these, so let's
also deprecate the "-alt-grab" and the "-ctrl-grab" options while we're
at it.

Once the deprecation period of "alt_grab" and "ctrl_grab" is over, we
then can finally switch the -display sdl option to use QAPI internally,
too.

Acked-by: Peter Krempa 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst | 10 ++
 qemu-options.hx   | 12 
 softmmu/vl.c  |  6 ++
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 6d438f1c8d..868eca0dd4 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -138,6 +138,16 @@ an underscore between "window" and "close").
 The ``-no-quit`` is a synonym for ``-display ...,window-close=off`` which
 should be used instead.
 
+``-alt-grab`` and ``-display sdl,alt_grab=on`` (since 6.2)
+''
+
+Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
+
+``-ctrl-grab`` and ``-display sdl,ctrl_grab=on`` (since 6.2)
+
+
+Use ``-display sdl,grab-mod=rctrl`` instead.
+
 
 QEMU Machine Protocol (QMP) commands
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 0bff756ded..4f46233527 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1884,9 +1884,11 @@ SRST
 the mouse grabbing in conjunction with the "g" key. `` can be
 either `lshift-lctrl-lalt` or `rctrl`.
 
-``alt_grab=on|off`` : Use Control+Alt+Shift-g to toggle mouse grabbing
+``alt_grab=on|off`` : Use Control+Alt+Shift-g to toggle mouse grabbing.
+This parameter is deprecated - use ``grab-mod`` instead.
 
-``ctrl_grab=on|off`` : Use Right-Control-g to toggle mouse grabbing
+``ctrl_grab=on|off`` : Use Right-Control-g to toggle mouse grabbing.
+This parameter is deprecated - use ``grab-mod`` instead.
 
 ``gl=on|off|core|es`` : Use OpenGL for displaying
 
@@ -1971,7 +1973,8 @@ SRST
 ``-alt-grab``
 Use Ctrl-Alt-Shift to grab mouse (instead of Ctrl-Alt). Note that
 this also affects the special keys (for fullscreen, monitor-mode
-switching, etc).
+switching, etc). This option is deprecated - please use
+``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
 ERST
 
 DEF("ctrl-grab", 0, QEMU_OPTION_ctrl_grab,
@@ -1981,7 +1984,8 @@ SRST
 ``-ctrl-grab``
 Use Right-Ctrl to grab mouse (instead of Ctrl-Alt). Note that this
 also affects the special keys (for fullscreen, monitor-mode
-switching, etc).
+switching, etc). This option is deprecated - please use
+``-display sdl,grab-mod=rctrl`` instead.
 ERST
 
 DEF("no-quit", 0, QEMU_OPTION_no_quit,
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 294990debf..613948ab46 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1043,6 +1043,7 @@ static void parse_display(const char *p)
 } else {
 goto invalid_sdl_args;
 }
+warn_report("alt_grab is deprecated, use grab-mod instead.");
 } else if (strstart(opts, ",ctrl_grab=", &nextopt)) {
 opts = nextopt;
 if (strstart(opts, "on", &nextopt)) {
@@ -1052,6 +1053,7 @@ static void parse_display(const char *p)
 } else {
 goto invalid_sdl_args;
 }
+warn_report("ctrl_grab is deprecated, use grab-mod instead.");
 } else if (strstart(opts, ",window_close=", &nextopt) ||
strstart(opts, ",window-close=", &nextopt)) {
 if (strstart(opts, ",window_close=", NULL)) {
@@ -3253,9 +3255,13 @@ void qemu_init(int argc, char **argv, char **envp)
 break;
 case QEMU_OPTION_alt_grab:
 alt_grab = 1;
+warn_report("-alt-grab is deprecated, please use "
+"-display sdl,grab-mod=lshift-lctrl-lalt 
instead.");
 break;
 case QEMU_OPTION_ctrl_grab:
 ctrl_grab = 1;
+warn_report("-ctrl-grab is deprecated, please use "
+"-display sdl,grab-mod=rctrl instead.");
 break;
 case QEMU_OPTION_no_quit:
 dpy.has_window_close = true;
-- 
2.27.0




[PATCH v2 3/3] softmmu/vl: Deprecate the -sdl and -curses option

2021-08-25 Thread Thomas Huth
It's not that much complicated to type "-display sdl" or "-display curses",
so we should not clutter our main option name space with such simple
wrapper options and rather present the users with a concise interface
instead. Thus let's deprecate the "-sdl" and "-curses" wrapper options now.

Acked-by: Peter Krempa 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst | 10 ++
 softmmu/vl.c  |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 868eca0dd4..d5bec67a78 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -148,6 +148,16 @@ Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
 
 Use ``-display sdl,grab-mod=rctrl`` instead.
 
+``-sdl`` (since 6.2)
+
+
+Use ``-display sdl`` instead.
+
+``-curses`` (since 6.2)
+'''
+
+Use ``-display curses`` instead.
+
 
 QEMU Machine Protocol (QMP) commands
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 613948ab46..bb59dbf0de 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2897,6 +2897,8 @@ void qemu_init(int argc, char **argv, char **envp)
 dpy.type = DISPLAY_TYPE_NONE;
 break;
 case QEMU_OPTION_curses:
+warn_report("-curses is deprecated, "
+"use -display curses instead.");
 #ifdef CONFIG_CURSES
 dpy.type = DISPLAY_TYPE_CURSES;
 #else
@@ -3270,6 +3272,7 @@ void qemu_init(int argc, char **argv, char **envp)
 "-display ...,window-close=off instead.");
 break;
 case QEMU_OPTION_sdl:
+warn_report("-sdl is deprecated, use -display sdl instead.");
 #ifdef CONFIG_SDL
 dpy.type = DISPLAY_TYPE_SDL;
 break;
-- 
2.27.0




[PATCH 1/2] monitor/hmp: correctly invert password argument detection again

2021-08-25 Thread Stefan Reiter
Commit cfb5387a1d 'hmp: remove "change vnc TARGET" command' claims to
remove the HMP "change vnc" command, but doesn't actually do that.
Instead if rewires it to use 'qmp_change_vnc_password', and in the
process inverts the argument detection - ignoring the first issue, this
inversion is wrong, as this will now ask the user for a password if one
is already provided, and simply fail if none is given.

Fixes: cfb5387a1d ("hmp: remove "change vnc TARGET" command")
Signed-off-by: Stefan Reiter 
---
 monitor/hmp-cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index f7a211e5a4..31366e6331 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1591,7 +1591,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 }
 if (strcmp(target, "passwd") == 0 ||
 strcmp(target, "password") == 0) {
-if (arg) {
+if (!arg) {
 MonitorHMP *hmp_mon = container_of(mon, MonitorHMP, common);
 monitor_read_password(hmp_mon, hmp_change_read_arg, NULL);
 return;
-- 
2.30.2





[PATCH 0/2] VNC-related HMP/QMP fixes

2021-08-25 Thread Stefan Reiter
Since the removal of the generic 'qmp_change' command, one can no longer replace
the 'default' VNC display listen address at runtime (AFAIK). For our users who
need to set up a secondary VNC access port, this means configuring a second VNC
display (in addition to our standard one for web-access), but it turns out one
cannot set a password on this second display at the moment, as the
'set_password' call only operates on the 'default' display.

Additionally, using secret objects, the password is only read once at startup.
This could be considered a bug too, but is not touched in this series and left
for a later date.

Stefan Reiter (2):
  monitor/hmp: correctly invert password argument detection again
  monitor: allow VNC related QMP and HMP commands to take a display ID

 hmp-commands.hx| 28 +++-
 monitor/hmp-cmds.c | 22 +++---
 monitor/qmp-cmds.c |  9 +
 qapi/ui.json   | 12 ++--
 4 files changed, 49 insertions(+), 22 deletions(-)

-- 
2.30.2





[PATCH 2/2] monitor: allow VNC related QMP and HMP commands to take a display ID

2021-08-25 Thread Stefan Reiter
It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...

It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
'set_password' and 'expire_password' QMP and HMP commands.

For HMP, this is a bit trickier, since at least 'set_password' already
has the 'connected' parameter following the mandatory 'password' one, so
we need to prefix the display ID with "id=" to allow correct parsing.

With this prefix, no existing command or workflow should be affected.

While rewriting the descriptions, also remove the line "Use zero to make
the password stay valid forever." from 'set_password', I believe this was
intended for 'expire_password', but would even be wrong there.

Signed-off-by: Stefan Reiter 
---
 hmp-commands.hx| 28 +++-
 monitor/hmp-cmds.c | 20 ++--
 monitor/qmp-cmds.c |  9 +
 qapi/ui.json   | 12 ++--
 4 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index e01ca13ca8..0b5abcfb8a 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1541,34 +1541,36 @@ ERST
 
 {
 .name   = "set_password",
-.args_type  = "protocol:s,password:s,connected:s?",
-.params = "protocol password action-if-connected",
+.args_type  = "protocol:s,password:s,display:s?,connected:s?",
+.params = "protocol password [id=display] [action-if-connected]",
 .help   = "set spice/vnc password",
 .cmd= hmp_set_password,
 },
 
 SRST
-``set_password [ vnc | spice ] password [ action-if-connected ]``
-  Change spice/vnc password.  Use zero to make the password stay valid
-  forever.  *action-if-connected* specifies what should happen in
+``set_password [ vnc | spice ] password [ id=display ] [ action-if-connected 
]``
+  Change spice/vnc password.  *display* (must be prefixed with
+  'id=') can be used with 'vnc' to specify which display to set the
+  password on.  *action-if-connected* specifies what should happen in
   case a connection is established: *fail* makes the password change
-  fail.  *disconnect* changes the password and disconnects the
-  client.  *keep* changes the password and keeps the connection up.
-  *keep* is the default.
+  fail.  *disconnect* changes the password and disconnects the client.
+  *keep* changes the password and keeps the connection up.  *keep* is
+  the default.
 ERST
 
 {
 .name   = "expire_password",
-.args_type  = "protocol:s,time:s",
-.params = "protocol time",
+.args_type  = "protocol:s,time:s,display:s?",
+.params = "protocol time [id=display]",
 .help   = "set spice/vnc password expire-time",
 .cmd= hmp_expire_password,
 },
 
 SRST
-``expire_password [ vnc | spice ]`` *expire-time*
-  Specify when a password for spice/vnc becomes
-  invalid. *expire-time* accepts:
+``expire_password [ vnc | spice ] expire-time [ id=display ]``
+  Specify when a password for spice/vnc becomes invalid.
+  *display* behaves the same as in ``set_password``.
+  *expire-time* accepts:
 
   ``now``
 Invalidate password instantly.
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 31366e6331..30f5b2c3e3 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1546,10 +1546,20 @@ void hmp_set_password(Monitor *mon, const QDict *qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
 const char *password  = qdict_get_str(qdict, "password");
+const char *display = qdict_get_try_str(qdict, "display");
 const char *connected = qdict_get_try_str(qdict, "connected");
 Error *err = NULL;
 
-qmp_set_password(protocol, password, !!connected, connected, &err);
+if (display && strncmp(display, "id=", 3)) {
+connected = display;
+display = NULL;
+} else if (display) {
+/* skip "id=" */
+display = display + 3;
+}
+
+qmp_set_password(protocol, password, !!connected, connected, !!display,
+ display, &err);
 hmp_handle_error(mon, err);
 }
 
@@ -1557,9 +1567,15 @@ void hmp_expire_password(Monitor *mon, const QDict 
*qdict)
 {
 const char *protocol  = qdict_get_str(qdict, "protocol");
 const char *whenstr = qdict_get_str(qdict, "time");
+const char *display = qdict_get_try_str(qdict, "display");
 Error *err = NULL;
 
-qmp_expire_password(protocol, whenstr, &err);
+if (display && !strncmp(display, "id=", 3)) {
+/* skip "id=" */
+display = display + 3;
+}
+
+qmp_expire_password(protocol, whenstr, !!display, display, &err);
 hmp_handle_error(mon, err);
 }
 
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index f7d64a6457..a9ded90a41 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -165,7 +165,8 @@

Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-25 Thread Markus Armbruster
Peter Xu  writes:

> On Mon, Aug 23, 2021 at 05:56:23PM -0400, Eduardo Habkost wrote:
>> I don't have any other example, but I assume address assignment
>> based on ordering is a common pattern in device code.
>> 
>> I would take a very close and careful look at the devices with
>> non-default vmsd priority.  If you can prove that the 13 device
>> types with non-default priority are all order-insensitive, a
>> custom sort function as you describe might be safe.
>
> Besides virtio-mem-pci, there'll also similar devfn issue with all
> MIG_PRI_PCI_BUS, as they'll be allocated just like other pci devices.  Say,
> below two cmdlines will generate different pci topology too:
>
>   $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \
>-device pcie-root-port,chassis=1 \
>-device virtio-net-pci
>
> And:
>
>   $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \
>-device virtio-net-pci
>-device pcie-root-port,chassis=1 \
>
> This cannot be solved by keeping priority==0 ordering.
>
> After a second thought, I think I was initially wrong on seeing migration
> priority and device realization the same problem.
>
> For example, for live migration we have a requirement on PCI_BUS being 
> migrated
> earlier than MIG_PRI_IOMMU because there's bus number information required
> because IOMMU relies on the bus number to find address spaces.  However that's
> definitely not a requirement for device realizations, say, realizing vIOMMU
> after pci buses are fine (bus assigned during bios).
>
> I've probably messed up with the ideas (though they really look alike!).  
> Sorry
> about that.
>
> Since the only ordering constraint so far is IOMMU vs all the rest of devices,
> I'll introduce a new priority mechanism and only make sure vIOMMUs are 
> realized
> earlier.  That'll also avoid other implications on pci devfn allocations.
>
> Will rework a new version tomorrow.  Thanks a lot for all the comments,

Is it really a good idea to magically reorder device realization just to
make a non-working command line work?  Why can't we just fail the
non-working command line in a way that tells users how to get a working
one?  We have way too much ordering magic already...

If we decide we want more magic, then I'd argue for *dependencies*
instead of priorities.  Dependencies are specific and local: $this needs
to go after $that because $reasons.  Priorities are unspecific and
global.




[PATCH] sun4m: fix setting CPU id when more than one CPU is present

2021-08-25 Thread Mark Cave-Ayland
Commit 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState property") 
changed
the sun4m CPU reset code to use the start-powered-off property and so split the
creation of the CPU into separate instantiation and realization phases to enable
the new start-powered-off property to be set.

This accidentally broke sun4m machines with more than one CPU present since
sparc_cpu_realizefn() sets a default CPU id, and now that realization occurs 
after
calling cpu_sparc_set_id() in cpu_devinit() the CPU id gets reset back to the
default instead of being uniquely encoded based upon the CPU number. As soon as
another CPU is brought online, the OS gets confused between them and promptly
panics.

Resolve the issue by moving the cpu_sparc_set_id() call in cpu_devinit() to 
after
the point where the CPU device has been realized as before.

Fixes: 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState property")
Signed-off-by: Mark Cave-Ayland 
---
 hw/sparc/sun4m.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 42e139849e..7f3a7c0027 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -803,11 +803,11 @@ static void cpu_devinit(const char *cpu_type, unsigned 
int id,
 cpu = SPARC_CPU(object_new(cpu_type));
 env = &cpu->env;
 
-cpu_sparc_set_id(env, id);
 qemu_register_reset(sun4m_cpu_reset, cpu);
 object_property_set_bool(OBJECT(cpu), "start-powered-off", id != 0,
  &error_fatal);
 qdev_realize_and_unref(DEVICE(cpu), NULL, &error_fatal);
+cpu_sparc_set_id(env, id);
 *cpu_irqs = qemu_allocate_irqs(cpu_set_irq, cpu, MAX_PILS);
 env->prom_addr = prom_addr;
 }
-- 
2.20.1




Re: [PATCH 2/5] msix: simplfy the conditional in msix_set/unset_vector_notifiers

2021-08-25 Thread Philippe Mathieu-Daudé
On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
> 'msix_function_masked' is kept pace with the device's config,
> we can use it to replace the complex conditional in
> msix_set/unset_vector_notifiers.

Typo 'simplfy' -> 'simplify' in this/previous patch subject.

> poll_notifier should be reset to NULL in the error path in
> msix_set_vector_notifiers, fix it incidentally.

I'd rather see this fix in a different patch, being
unrelated to the msix_function_masked optimization.

> Signed-off-by: Longpeng(Mike) 
> ---
>  hw/pci/msix.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index ae9331c..8057709 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -592,8 +592,7 @@ int msix_set_vector_notifiers(PCIDevice *dev,
>  dev->msix_vector_release_notifier = release_notifier;
>  dev->msix_vector_poll_notifier = poll_notifier;
>  
> -if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
> -(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
> +if (!dev->msix_function_masked) {
>  for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>  ret = msix_set_notifier_for_vector(dev, vector);
>  if (ret < 0) {
> @@ -612,6 +611,7 @@ undo:
>  }
>  dev->msix_vector_use_notifier = NULL;
>  dev->msix_vector_release_notifier = NULL;
> +dev->msix_vector_poll_notifier = NULL;
>  return ret;
>  }
>  
> @@ -622,8 +622,7 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
>  assert(dev->msix_vector_use_notifier &&
> dev->msix_vector_release_notifier);
>  
> -if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
> -(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
> +if (!dev->msix_function_masked) {
>  for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>  msix_unset_notifier_for_vector(dev, vector);
>  }
> 




Re: [PATCH v3 2/6] block: block-status cache for data regions

2021-08-25 Thread Vladimir Sementsov-Ogievskiy

12.08.2021 11:41, Hanna Reitz wrote:

As we have attempted before
(https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html,
"file-posix: Cache lseek result for data regions";
https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html,
"file-posix: Cache next hole"), this patch seeks to reduce the number of
SEEK_DATA/HOLE operations the file-posix driver has to perform.  The
main difference is that this time it is implemented as part of the
general block layer code.

The problem we face is that on some filesystems or in some
circumstances, SEEK_DATA/HOLE is unreasonably slow.  Given the
implementation is outside of qemu, there is little we can do about its
performance.

We have already introduced the want_zero parameter to
bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls
unless we really want zero information; but sometimes we do want that
information, because for files that consist largely of zero areas,
special-casing those areas can give large performance boosts.  So the
real problem is with files that consist largely of data, so that
inquiring the block status does not gain us much performance, but where
such an inquiry itself takes a lot of time.

To address this, we want to cache data regions.  Most of the time, when
bad performance is reported, it is in places where the image is iterated
over from start to end (qemu-img convert or the mirror job), so a simple
yet effective solution is to cache only the current data region.

(Note that only caching data regions but not zero regions means that
returning false information from the cache is not catastrophic: Treating
zeroes as data is fine.  While we try to invalidate the cache on zero
writes and discards, such incongruences may still occur when there are
other processes writing to the image.)

We only use the cache for nodes without children (i.e. protocol nodes),
because that is where the problem is: Drivers that rely on block-status
implementations outside of qemu (e.g. SEEK_DATA/HOLE).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307
Signed-off-by: Hanna Reitz 
---
  include/block/block_int.h | 50 


[..]


+/*
+ * Note that checking QLIST_EMPTY(&bs->children) is also done when
+ * the cache is queried above.  Technically, we do not need to 
check
+ * it here; the worst that can happen is that we fill the cache for
+ * non-protocol nodes, and then it is never used.  However, filling
+ * the cache requires an RCU update, so double check here to avoid
+ * such an update if possible.
+ */
+if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
+QLIST_EMPTY(&bs->children))
+{
+/*
+ * When a protocol driver reports BLOCK_OFFSET_VALID, the
+ * returned local_map value must be the same as the offset we
+ * have passed (aligned_offset).
+ * Assert this, because we follow this rule when reading from
+ * the cache (see the `local_map = aligned_offset` assignment
+ * above), and the result the cache delivers must be the same
+ * as the driver would deliver.
+ */
+assert(local_map == aligned_offset);


maybe, also assert(local_file == bs);

as well, we may check only BDRV_BLOCK_DATA flag above and
assert BDRV_BLOCK_OFFSET_VALID here

anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir



Re: [PATCH 2/5] msix: simplfy the conditional in msix_set/unset_vector_notifiers

2021-08-25 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/8/25 17:52, Philippe Mathieu-Daudé 写道:
> On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
>> 'msix_function_masked' is kept pace with the device's config,
>> we can use it to replace the complex conditional in
>> msix_set/unset_vector_notifiers.
> 
> Typo 'simplfy' -> 'simplify' in this/previous patch subject.
> Ok.

>> poll_notifier should be reset to NULL in the error path in
>> msix_set_vector_notifiers, fix it incidentally.
> 
> I'd rather see this fix in a different patch, being
> unrelated to the msix_function_masked optimization.
>
Ok, will split in next version. Thanks.

>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  hw/pci/msix.c | 7 +++
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index ae9331c..8057709 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -592,8 +592,7 @@ int msix_set_vector_notifiers(PCIDevice *dev,
>>  dev->msix_vector_release_notifier = release_notifier;
>>  dev->msix_vector_poll_notifier = poll_notifier;
>>  
>> -if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>> -(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>> +if (!dev->msix_function_masked) {
>>  for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>>  ret = msix_set_notifier_for_vector(dev, vector);
>>  if (ret < 0) {
>> @@ -612,6 +611,7 @@ undo:
>>  }
>>  dev->msix_vector_use_notifier = NULL;
>>  dev->msix_vector_release_notifier = NULL;
>> +dev->msix_vector_poll_notifier = NULL;
>>  return ret;
>>  }
>>  
>> @@ -622,8 +622,7 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
>>  assert(dev->msix_vector_use_notifier &&
>> dev->msix_vector_release_notifier);
>>  
>> -if ((dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &
>> -(MSIX_ENABLE_MASK | MSIX_MASKALL_MASK)) == MSIX_ENABLE_MASK) {
>> +if (!dev->msix_function_masked) {
>>  for (vector = 0; vector < dev->msix_entries_nr; vector++) {
>>  msix_unset_notifier_for_vector(dev, vector);
>>  }
>>
> 
> .
> 



Re: [PATCH 3/5] vfio: defer to enable msix in migration resume phase

2021-08-25 Thread Philippe Mathieu-Daudé
On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
> The vf's unmasked msix vectors will be enable one by one in
> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called

Typo "migration"

> for each vector, it's a bit expensive if the vf has more
> vectors.
> 
> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
> vector notifiers to reduce the cost.
> 
> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
> we mesure the cost of the vfio_msix_enable for each one, and

Typo "measure"

> we can see 10% costs can be reduced.
> 
> Origin  Apply this patch
> 1st 8   4
> 2nd 15  11
> 3rd 22  18
> 4th 24  25
> 5th 36  33
> 6th 44  40
> 7th 51  47
> 8th 58  54
> Total   258ms   232ms
> 
> Signed-off-by: Longpeng(Mike) 
> ---
>  hw/vfio/pci.c | 22 ++
>  hw/vfio/pci.h |  1 +
>  2 files changed, 23 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 7cc43fe..ca37fb7 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool 
> msix)
>  int ret = 0, i, argsz;
>  int32_t *fds;
>  
> +if (!vdev->nr_vectors) {
> +return 0;
> +}
> +
>  argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>  
>  irq_set = g_malloc0(argsz);
> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> unsigned int nr,
>  }
>  }
>  
> +if (vdev->defer_add_virq) {
> +vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
> +goto clear_pending;
> +}
> +
>  /*
>   * We don't want to have the host allocate all possible MSI vectors
>   * for a device if they're not in use, so we shutdown and incrementally
> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
> unsigned int nr,
>  }
>  }
>  
> +clear_pending:
>  /* Disable PBA emulation when nothing more is pending. */
>  clear_bit(nr, vdev->msix->pending);
>  if (find_first_bit(vdev->msix->pending,
> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>  if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>vfio_msix_vector_release, NULL)) {
>  error_report("vfio: msix_set_vector_notifiers failed");
> +return;
> +}
> +
> +if (!pdev->msix_function_masked && vdev->defer_add_virq) {
> +int ret;
> +vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
> +ret = vfio_enable_vectors(vdev, true);
> +if (ret) {
> +error_report("vfio: failed to enable vectors, %d", ret);
> +}
>  }
>  
>  trace_vfio_msix_enable(vdev->vbasedev.name);
> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
> QEMUFile *f)
>  if (msi_enabled(pdev)) {
>  vfio_msi_enable(vdev);
>  } else if (msix_enabled(pdev)) {
> +vdev->defer_add_virq = true;
>  vfio_msix_enable(vdev);

What about passing defer_add_virq as boolean argument
to vfio_msix_enable()?

> +vdev->defer_add_virq = false;
>  }
>  
>  return ret;
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 6477751..4235c83 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -171,6 +171,7 @@ struct VFIOPCIDevice {
>  bool no_kvm_ioeventfd;
>  bool no_vfio_ioeventfd;
>  bool enable_ramfb;
> +bool defer_add_virq;
>  VFIODisplay *dpy;
>  Notifier irqchip_change_notifier;
>  };
> 




Re: [PATCH 0/5] optimize the downtime for vfio migration

2021-08-25 Thread Philippe Mathieu-Daudé
Cc'ing David/Juan for migration big picture (just in case).

On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
> In vfio migration resume phase, the cost would increase if the
> vfio device has more unmasked vectors. We try to optimize it in
> this series.
> 
> Patch 1 & 2 are simple code cleanups.
> Patch 3 defers to set irqs to vfio core.
> Patch 4 & 5 defer to commit the route to KVM core. 
> 
> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
> we mesure the cost of the vfio_msix_enable for each one, and
> we can see the total cost can be significantly reduced.
> 
> Origin Apply Patch 3 Apply Patch 3/4/5   
> 1st 8  4 2
> 2nd 15 112
> 3rd 22 182
> 4th 24 253
> 5th 36 332
> 6th 44 403
> 7th 51 473
> 8th 58 544
> Total   258ms  232ms 21ms
> 
> 
> Longpeng (Mike) (5):
>   vfio: use helper to simplfy the failure path in vfio_msi_enable
>   msix: simplfy the conditional in msix_set/unset_vector_notifiers
>   vfio: defer to enable msix in migration resume phase
>   kvm: irqchip: support defer to commit the route
>   vfio: defer to commit kvm route in migraiton resume phase

Overall makes sense and LGTM but migration/KVM are not my area :/




Re: [PATCH 3/5] vfio: defer to enable msix in migration resume phase

2021-08-25 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/8/25 17:57, Philippe Mathieu-Daudé 写道:
> On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
>> The vf's unmasked msix vectors will be enable one by one in
>> migraiton resume phase, VFIO_DEVICE_SET_IRQS will be called
> 
> Typo "migration"
> 
Ok.

>> for each vector, it's a bit expensive if the vf has more
>> vectors.
>>
>> We can call VFIO_DEVICE_SET_IRQS once outside the loop of set
>> vector notifiers to reduce the cost.
>>
>> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
>> we mesure the cost of the vfio_msix_enable for each one, and
> 
> Typo "measure"
> 
Ok.

>> we can see 10% costs can be reduced.
>>
>> Origin  Apply this patch
>> 1st 8   4
>> 2nd 15  11
>> 3rd 22  18
>> 4th 24  25
>> 5th 36  33
>> 6th 44  40
>> 7th 51  47
>> 8th 58  54
>> Total   258ms   232ms
>>
>> Signed-off-by: Longpeng(Mike) 
>> ---
>>  hw/vfio/pci.c | 22 ++
>>  hw/vfio/pci.h |  1 +
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 7cc43fe..ca37fb7 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -372,6 +372,10 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, 
>> bool msix)
>>  int ret = 0, i, argsz;
>>  int32_t *fds;
>>  
>> +if (!vdev->nr_vectors) {
>> +return 0;
>> +}
>> +
>>  argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
>>  
>>  irq_set = g_malloc0(argsz);
>> @@ -495,6 +499,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>>  }
>>  }
>>  
>> +if (vdev->defer_add_virq) {
>> +vdev->nr_vectors = MAX(vdev->nr_vectors, nr + 1);
>> +goto clear_pending;
>> +}
>> +
>>  /*
>>   * We don't want to have the host allocate all possible MSI vectors
>>   * for a device if they're not in use, so we shutdown and incrementally
>> @@ -524,6 +533,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>> unsigned int nr,
>>  }
>>  }
>>  
>> +clear_pending:
>>  /* Disable PBA emulation when nothing more is pending. */
>>  clear_bit(nr, vdev->msix->pending);
>>  if (find_first_bit(vdev->msix->pending,
>> @@ -608,6 +618,16 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>>  if (msix_set_vector_notifiers(pdev, vfio_msix_vector_use,
>>vfio_msix_vector_release, NULL)) {
>>  error_report("vfio: msix_set_vector_notifiers failed");
>> +return;
>> +}
>> +
>> +if (!pdev->msix_function_masked && vdev->defer_add_virq) {
>> +int ret;
>> +vfio_disable_irqindex(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>> +ret = vfio_enable_vectors(vdev, true);
>> +if (ret) {
>> +error_report("vfio: failed to enable vectors, %d", ret);
>> +}
>>  }
>>  
>>  trace_vfio_msix_enable(vdev->vbasedev.name);
>> @@ -2456,7 +2476,9 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
>> QEMUFile *f)
>>  if (msi_enabled(pdev)) {
>>  vfio_msi_enable(vdev);
>>  } else if (msix_enabled(pdev)) {
>> +vdev->defer_add_virq = true;
>>  vfio_msix_enable(vdev);
> 
> What about passing defer_add_virq as boolean argument
> to vfio_msix_enable()?
> 
We'll use defer_add_virq in the deep of the calltrace, it need to change more
functions to support the parameter passing in this way.

>> +vdev->defer_add_virq = false;
>>  }
>>  
>>  return ret;
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 6477751..4235c83 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -171,6 +171,7 @@ struct VFIOPCIDevice {
>>  bool no_kvm_ioeventfd;
>>  bool no_vfio_ioeventfd;
>>  bool enable_ramfb;
>> +bool defer_add_virq;
>>  VFIODisplay *dpy;
>>  Notifier irqchip_change_notifier;
>>  };
>>
> 
> .
> 



Re: [PATCH 0/5] optimize the downtime for vfio migration

2021-08-25 Thread Longpeng (Mike, Cloud Infrastructure Service Product Dept.)



在 2021/8/25 18:05, Philippe Mathieu-Daudé 写道:
> Cc'ing David/Juan for migration big picture (just in case).
> 
> On 8/25/21 9:56 AM, Longpeng(Mike) wrote:
>> In vfio migration resume phase, the cost would increase if the
>> vfio device has more unmasked vectors. We try to optimize it in
>> this series.
>>
>> Patch 1 & 2 are simple code cleanups.
>> Patch 3 defers to set irqs to vfio core.
>> Patch 4 & 5 defer to commit the route to KVM core. 
>>
>> The test VM has 128 vcpus and 8 VF (with 65 vectors enabled),
>> we mesure the cost of the vfio_msix_enable for each one, and
>> we can see the total cost can be significantly reduced.
>>
>> Origin Apply Patch 3 Apply Patch 3/4/5   
>> 1st 8  4 2
>> 2nd 15 112
>> 3rd 22 182
>> 4th 24 253
>> 5th 36 332
>> 6th 44 403
>> 7th 51 473
>> 8th 58 544
>> Total   258ms  232ms 21ms
>>
>>
>> Longpeng (Mike) (5):
>>   vfio: use helper to simplfy the failure path in vfio_msi_enable
>>   msix: simplfy the conditional in msix_set/unset_vector_notifiers
>>   vfio: defer to enable msix in migration resume phase
>>   kvm: irqchip: support defer to commit the route
>>   vfio: defer to commit kvm route in migraiton resume phase
> 
> Overall makes sense and LGTM but migration/KVM are not my area :/
>
Thanks all the same :)

> .
> 



Re: [PATCH] sun4m: fix setting CPU id when more than one CPU is present

2021-08-25 Thread Philippe Mathieu-Daudé
On 8/25/21 11:51 AM, Mark Cave-Ayland wrote:
> Commit 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState property") 
> changed
> the sun4m CPU reset code to use the start-powered-off property and so split 
> the
> creation of the CPU into separate instantiation and realization phases to 
> enable
> the new start-powered-off property to be set.
> 
> This accidentally broke sun4m machines with more than one CPU present since
> sparc_cpu_realizefn() sets a default CPU id, and now that realization occurs 
> after
> calling cpu_sparc_set_id() in cpu_devinit() the CPU id gets reset back to the
> default instead of being uniquely encoded based upon the CPU number. As soon 
> as
> another CPU is brought online, the OS gets confused between them and promptly
> panics.
> 
> Resolve the issue by moving the cpu_sparc_set_id() call in cpu_devinit() to 
> after
> the point where the CPU device has been realized as before.
> 
> Fixes: 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState property")
> Signed-off-by: Mark Cave-Ayland 
> ---
>  hw/sparc/sun4m.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
> index 42e139849e..7f3a7c0027 100644
> --- a/hw/sparc/sun4m.c
> +++ b/hw/sparc/sun4m.c
> @@ -803,11 +803,11 @@ static void cpu_devinit(const char *cpu_type, unsigned 
> int id,
>  cpu = SPARC_CPU(object_new(cpu_type));
>  env = &cpu->env;
>  
> -cpu_sparc_set_id(env, id);
>  qemu_register_reset(sun4m_cpu_reset, cpu);
>  object_property_set_bool(OBJECT(cpu), "start-powered-off", id != 0,
>   &error_fatal);
>  qdev_realize_and_unref(DEVICE(cpu), NULL, &error_fatal);
> +cpu_sparc_set_id(env, id);
>  *cpu_irqs = qemu_allocate_irqs(cpu_set_irq, cpu, MAX_PILS);
>  env->prom_addr = prom_addr;
>  }

What about directly passing the CPU ID as property (untested):

-- >8 --
Author: Philippe Mathieu-Daudé 
Date:   Wed Aug 25 12:26:02 2021 +0200

sun4m: fix setting CPU id when more than one CPU is present

Commit 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState
property") changed
the sun4m CPU reset code to use the start-powered-off property and
so split the
creation of the CPU into separate instantiation and realization
phases to enable
the new start-powered-off property to be set.

This accidentally broke sun4m machines with more than one CPU
present since
sparc_cpu_realizefn() sets a default CPU id, and now that
realization occurs after
calling cpu_sparc_set_id() in cpu_devinit() the CPU id gets reset
back to the
default instead of being uniquely encoded based upon the CPU number.
As soon as
another CPU is brought online, the OS gets confused between them and
promptly
panics.

Resolve the issue by adding a 'cpu-id' property to CPUSPARCState,
removing
cpu_sparc_set_id().

Fixes: 24f675cd3b ("sparc/sun4m: Use start-powered-off CPUState
property")
Signed-off-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 

diff --git a/target/sparc/cpu.h b/target/sparc/cpu.h
index ff8ae73002a..78ca0925d25 100644
--- a/target/sparc/cpu.h
+++ b/target/sparc/cpu.h
@@ -262,6 +262,7 @@ struct sparc_def_t {
 uint32_t mmu_cxr_mask;
 uint32_t mmu_sfsr_mask;
 uint32_t mmu_trcr_mask;
+uint8_t mxcc_cpuid;
 uint32_t mxcc_version;
 uint32_t features;
 uint32_t nwindows;
@@ -583,7 +584,6 @@ void cpu_raise_exception_ra(CPUSPARCState *, int,
uintptr_t) QEMU_NORETURN;

 #ifndef NO_CPU_IO_DEFS
 /* cpu_init.c */
-void cpu_sparc_set_id(CPUSPARCState *env, unsigned int cpu);
 void sparc_cpu_list(void);
 /* mmu_helper.c */
 bool sparc_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
diff --git a/hw/sparc/leon3.c b/hw/sparc/leon3.c
index 7b4dec17211..8189045fdbf 100644
--- a/hw/sparc/leon3.c
+++ b/hw/sparc/leon3.c
@@ -238,8 +238,6 @@ static void leon3_generic_hw_init(MachineState *machine)
 cpu = SPARC_CPU(cpu_create(machine->cpu_type));
 env = &cpu->env;

-cpu_sparc_set_id(env, 0);
-
 /* Reset data */
 reset_info= g_malloc0(sizeof(ResetData));
 reset_info->cpu   = cpu;
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 42e139849ed..5be2e8e73f2 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -803,10 +803,10 @@ static void cpu_devinit(const char *cpu_type,
unsigned int id,
 cpu = SPARC_CPU(object_new(cpu_type));
 env = &cpu->env;

-cpu_sparc_set_id(env, id);
 qemu_register_reset(sun4m_cpu_reset, cpu);
 object_property_set_bool(OBJECT(cpu), "start-powered-off", id != 0,
  &error_fatal);
+object_property_set_uint(OBJECT(cpu), "cpu-id", id, &error_fatal);
 qdev_realize_and_unref(DEVICE(cpu), NULL, &error_fatal);
 *cpu_irqs = qemu_allocate_irqs(cpu_set_irq, cpu, MAX_PILS);
 env->prom_addr = prom_addr;
diff --git a/target/sparc/cpu.c b/target/sparc/cpu.c
index da6b30ec747..d76929c68c7 100644
--- a/target/sparc/cpu.c
+++ b/ta

[PULL 05/44] target/arm: Fix mask handling for MVE narrowing operations

2021-08-25 Thread Peter Maydell
In the MVE helpers for the narrowing operations (DO_VSHRN and
DO_VSHRN_SAT) we were using the wrong bits of the predicate mask for
the 'top' versions of the insn.  This is because the loop works over
the double-sized input elements and shifts the predicate mask by that
many bits each time, but when we write out the half-sized output we
must look at the mask bits for whichever half of the element we are
writing to.

Correct this by shifting the whole mask right by ESIZE bits for the
'top' insns.  This allows us also to simplify the saturation bit
checking (where we had noticed that we needed to look at a different
mask bit for the 'top' insn.)

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 82151b06200..847ef5156ad 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1358,6 +1358,7 @@ DO_VSHLL_ALL(vshllt, true)
 TYPE *d = vd;   \
 uint16_t mask = mve_element_mask(env);  \
 unsigned le;\
+mask >>= ESIZE * TOP;   \
 for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 TYPE r = FN(m[H##LESIZE(le)], shift);   \
 mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
@@ -1419,11 +1420,12 @@ static inline int32_t do_sat_bhs(int64_t val, int64_t 
min, int64_t max,
 uint16_t mask = mve_element_mask(env);  \
 bool qc = false;\
 unsigned le;\
+mask >>= ESIZE * TOP;   \
 for (le = 0; le < 16 / LESIZE; le++, mask >>= LESIZE) { \
 bool sat = false;   \
 TYPE r = FN(m[H##LESIZE(le)], shift, &sat); \
 mergemask(&d[H##ESIZE(le * 2 + TOP)], r, mask); \
-qc |= sat && (mask & 1 << (TOP * ESIZE));   \
+qc |= sat & mask & 1;   \
 }   \
 if (qc) {   \
 env->vfp.qc[0] = qc;\
-- 
2.20.1




[PULL 01/44] target/arm: Note that we handle VMOVL as a special case of VSHLL

2021-08-25 Thread Peter Maydell
Although the architecture doesn't define it as an alias, VMOVL
(vector move long) is encoded as a VSHLL with a zero shift.
Add a comment in the decode file noting that we handle VMOVL
as part of VSHLL.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve.decode | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 595d97568eb..fa9d921f933 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -364,6 +364,8 @@ VRSHRI_U  111 1  1 . ... ... ... 0 0010 0 1 . 1 
... 0 @2_shr_h
 VRSHRI_U  111 1  1 . ... ... ... 0 0010 0 1 . 1 ... 0 @2_shr_w
 
 # VSHLL T1 encoding; the T2 VSHLL encoding is elsewhere in this file
+# Note that VMOVL is encoded as "VSHLL with a zero shift count"; we
+# implement it that way rather than special-casing it in the decode.
 VSHLL_BS  111 0 1110 1 . 1 .. ... ... 0  0 1 . 0 ... 0 @2_shll_b
 VSHLL_BS  111 0 1110 1 . 1 .. ... ... 0  0 1 . 0 ... 0 @2_shll_h
 
-- 
2.20.1




[PULL 00/44] target-arm queue

2021-08-25 Thread Peter Maydell
First set of arm patches for 6.2. I have a lot more in my
to-review queue still...

-- PMM

The following changes since commit d42685765653ec155fdf60910662f8830bdb2cef:

  Open 6.2 development tree (2021-08-25 10:25:12 +0100)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20210825

for you to fetch changes up to 24b1a6aa43615be22c7ee66bd68ec5675f6a6a9a:

  docs: Document how to use gdb with unix sockets (2021-08-25 10:48:51 +0100)


target-arm queue:
 * More MVE emulation work
 * Implement M-profile trapping on division by zero
 * kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()
 * hw/char/pl011: add support for sending break
 * fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
 * hw/dma/pl330: Add memory region to replace default
 * sbsa-ref: Rename SBSA_GWDT enum value
 * fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices
 * docs: Document how to use gdb with unix sockets


Eduardo Habkost (1):
  sbsa-ref: Rename SBSA_GWDT enum value

Guenter Roeck (2):
  fsl-imx6ul: Instantiate SAI1/2/3 and ASRC as unimplemented devices
  fsl-imx7: Instantiate SAI1/2/3 as unimplemented devices

Hamza Mahfooz (1):
  target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()

Jan Luebbe (1):
  hw/char/pl011: add support for sending break

Peter Maydell (37):
  target/arm: Note that we handle VMOVL as a special case of VSHLL
  target/arm: Print MVE VPR in CPU dumps
  target/arm: Fix MVE VSLI by 0 and VSRI by 
  target/arm: Fix signed VADDV
  target/arm: Fix mask handling for MVE narrowing operations
  target/arm: Fix 48-bit saturating shifts
  target/arm: Fix MVE 48-bit SQRSHRL for small right shifts
  target/arm: Fix calculation of LTP mask when LR is 0
  target/arm: Factor out mve_eci_mask()
  target/arm: Fix VPT advance when ECI is non-zero
  target/arm: Fix VLDRB/H/W for predicated elements
  target/arm: Implement MVE VMULL (polynomial)
  target/arm: Implement MVE incrementing/decrementing dup insns
  target/arm: Factor out gen_vpst()
  target/arm: Implement MVE integer vector comparisons
  target/arm: Implement MVE integer vector-vs-scalar comparisons
  target/arm: Implement MVE VPSEL
  target/arm: Implement MVE VMLAS
  target/arm: Implement MVE shift-by-scalar
  target/arm: Move 'x' and 'a' bit definitions into vmlaldav formats
  target/arm: Implement MVE integer min/max across vector
  target/arm: Implement MVE VABAV
  target/arm: Implement MVE narrowing moves
  target/arm: Rename MVEGenDualAccOpFn to MVEGenLongDualAccOpFn
  target/arm: Implement MVE VMLADAV and VMLSLDAV
  target/arm: Implement MVE VMLA
  target/arm: Implement MVE saturating doubling multiply accumulates
  target/arm: Implement MVE VQABS, VQNEG
  target/arm: Implement MVE VMAXA, VMINA
  target/arm: Implement MVE VMOV to/from 2 general-purpose registers
  target/arm: Implement MVE VPNOT
  target/arm: Implement MVE VCTP
  target/arm: Implement MVE scatter-gather insns
  target/arm: Implement MVE scatter-gather immediate forms
  target/arm: Implement MVE interleaving loads/stores
  target/arm: Re-indent sdiv and udiv helpers
  target/arm: Implement M-profile trapping on division by zero

Sebastian Meyer (1):
  docs: Document how to use gdb with unix sockets

Wen, Jianxian (1):
  hw/dma/pl330: Add memory region to replace default

 docs/system/gdb.rst|   26 +-
 include/hw/arm/fsl-imx7.h  |5 +
 target/arm/cpu.h   |1 +
 target/arm/helper-mve.h|  283 ++
 target/arm/helper.h|4 +-
 target/arm/translate-a32.h |2 +
 target/arm/vec_internal.h  |   11 +
 target/arm/mve.decode  |  226 +++-
 target/arm/t32.decode  |1 +
 hw/arm/exynos4210.c|3 +
 hw/arm/fsl-imx6ul.c|   12 +
 hw/arm/fsl-imx7.c  |7 +
 hw/arm/sbsa-ref.c  |6 +-
 hw/arm/xilinx_zynq.c   |3 +
 hw/char/pl011.c|6 +
 hw/dma/pl330.c |   26 +-
 target/arm/cpu.c   |3 +
 target/arm/helper.c|   34 +-
 target/arm/kvm.c   |   17 +-
 target/arm/m_helper.c  |4 +
 target/arm/mve_helper.c| 1254 ++--
 target/arm/translate-mve.c |  877 ++-
 target/arm/translate-vfp.c |2 +-
 target/arm/translate.c |   37 +-
 target/arm/vec_helper.c|   14 +-
 25 files changed, 2746 insertions(+), 118 deletions(-)



[PULL 02/44] target/arm: Print MVE VPR in CPU dumps

2021-08-25 Thread Peter Maydell
Include the MVE VPR register value in the CPU dumps produced by
arm_cpu_dump_state() if we are printing FPU information. This
makes it easier to interpret debug logs when predication is
active.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 2866dd76588..a82e39dd97f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1017,6 +1017,9 @@ static void arm_cpu_dump_state(CPUState *cs, FILE *f, int 
flags)
  i, v);
 }
 qemu_fprintf(f, "FPSCR: %08x\n", vfp_get_fpscr(env));
+if (cpu_isar_feature(aa32_mve, cpu)) {
+qemu_fprintf(f, "VPR: %08x\n", env->v7m.vpr);
+}
 }
 }
 
-- 
2.20.1




[PULL 03/44] target/arm: Fix MVE VSLI by 0 and VSRI by

2021-08-25 Thread Peter Maydell
In the MVE shift-and-insert insns, we special case VSLI by 0
and VSRI by . VSRI by  means "don't update the destination",
which is what we've implemented. However VSLI by 0 is "set
destination to the input", so we don't want to use the same
special-casing that we do for VSRI by .

Since the generic logic gives the right answer for a shift
by 0, just use that.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index db5d6220854..f14fa914b68 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1279,11 +1279,12 @@ DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
 uint16_t mask;  \
 uint64_t shiftmask; \
 unsigned e; \
-if (shift == 0 || shift == ESIZE * 8) { \
+if (shift == ESIZE * 8) {   \
 /*  \
- * Only VSLI can shift by 0; only VSRI can shift by .   \
- * The generic logic would give the right answer for 0 but  \
- * fails for .  \
+ * Only VSRI can shift by ; it should mean "don't   \
+ * update the destination". The generic logic can't handle  \
+ * this because it would try to shift by an out-of-range\
+ * amount, so special case it here. \
  */ \
 goto done;  \
 }   \
-- 
2.20.1




[PULL 13/44] target/arm: Implement MVE incrementing/decrementing dup insns

2021-08-25 Thread Peter Maydell
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP,
VIWDUP and VDWDUP.  These fill the elements of a vector with
successively incrementing values, starting at the offset specified in
a general purpose register.  The final value of the offset is written
back to this register.  The wrapping variants take a second general
purpose register which specifies the point where the count should
wrap back to 0.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  12 
 target/arm/mve.decode  |  25 
 target/arm/mve_helper.c|  63 +++
 target/arm/translate-mve.c | 120 +
 4 files changed, 220 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 84adfb21517..b9af03cc03b 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -35,6 +35,18 @@ DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, 
ptr, i32)
 
 DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+DEF_HELPER_FLAGS_4(mve_viduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+DEF_HELPER_FLAGS_4(mve_vidupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
+
+DEF_HELPER_FLAGS_5(mve_viwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_5(mve_viwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_5(mve_viwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+
+DEF_HELPER_FLAGS_5(mve_vdwdupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_5(mve_vdwduph, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+DEF_HELPER_FLAGS_5(mve_vdwdupw, TCG_CALL_NO_WG, i32, env, ptr, i32, i32, i32)
+
 DEF_HELPER_FLAGS_3(mve_vclsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vclsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index de079ec517d..88c9c18ebf1 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -35,6 +35,8 @@
 &2scalar qd qn rm size
 &1imm qd imm cmode op
 &2shift qd qm shift size
+&vidup qd rn size imm
+&viwdup qd rn rm size imm
 
 @vldr_vstr ... . . . . l:1 rn:4 ... .. imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -259,6 +261,29 @@ VDUP 1110 1110 1 1 10 ... 0  1011 . 0 0 1 
 @vdup size=0
 VDUP 1110 1110 1 0 10 ... 0  1011 . 0 1 1  @vdup size=1
 VDUP 1110 1110 1 0 10 ... 0  1011 . 0 0 1  @vdup size=2
 
+# Incrementing and decrementing dup
+
+# VIDUP, VDDUP format immediate: 1 << (immh:imml)
+%imm_vidup 7:1 0:1 !function=vidup_imm
+
+# VIDUP, VDDUP registers: Rm bits [3:1] from insn, bit 0 is 1;
+# Rn bits [3:1] from insn, bit 0 is 0
+%vidup_rm 1:3 !function=times_2_plus_1
+%vidup_rn 17:3 !function=times_2
+
+@vidup     . . size:2      \
+ qd=%qd imm=%imm_vidup rn=%vidup_rn &vidup
+@viwdup    . . size:2      \
+ qd=%qd imm=%imm_vidup rm=%vidup_rm rn=%vidup_rn &viwdup
+{
+  VIDUP  1110 1110 0 . .. ... 1 ... 0  . 110 111 . @vidup
+  VIWDUP 1110 1110 0 . .. ... 1 ... 0  . 110 ... . @viwdup
+}
+{
+  VDDUP  1110 1110 0 . .. ... 1 ... 1  . 110 111 . @vidup
+  VDWDUP 1110 1110 0 . .. ... 1 ... 1  . 110 ... . @viwdup
+}
+
 # multiply-add long dual accumulate
 # rdahi: bits [3:1] from insn, bit 0 is 1
 # rdalo: bits [3:1] from insn, bit 0 is 0
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 91fb346d7e5..38b4181db2a 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1695,3 +1695,66 @@ uint32_t HELPER(mve_sqrshr)(CPUARMState *env, uint32_t 
n, uint32_t shift)
 {
 return do_sqrshl_bhs(n, -(int8_t)shift, 32, true, &env->QF);
 }
+
+#define DO_VIDUP(OP, ESIZE, TYPE, FN)   \
+uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd,   \
+   uint32_t offset, uint32_t imm)   \
+{   \
+TYPE *d = vd;   \
+uint16_t mask = mve_element_mask(env);  \
+unsigned e; \
+for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {  \
+mergemask(&d[H##ESIZE(e)], offset, mask);   \
+offset = FN(offset, imm);   \
+}   \
+mve_advance_vpt(env);   \
+return offset;  \
+}
+
+#define DO_VIWDUP(OP, ESIZE, TYPE, FN)  \
+uint32_t HELPER(mve_##OP)(CPUARMState *env, void *vd,   \
+   

[PULL 09/44] target/arm: Factor out mve_eci_mask()

2021-08-25 Thread Peter Maydell
In some situations we need a mask telling us which parts of the
vector correspond to beats that are not being executed because of
ECI, separately from the combined "which bytes are predicated away"
mask.  Factor this mask calculation out of mve_element_mask() into
its own function.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 58 -
 1 file changed, 34 insertions(+), 24 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index bc67b86e700..280726d 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -26,6 +26,35 @@
 #include "exec/exec-all.h"
 #include "tcg/tcg.h"
 
+static uint16_t mve_eci_mask(CPUARMState *env)
+{
+/*
+ * Return the mask of which elements in the MVE vector correspond
+ * to beats being executed. The mask has 1 bits for executed lanes
+ * and 0 bits where ECI says this beat was already executed.
+ */
+int eci;
+
+if ((env->condexec_bits & 0xf) != 0) {
+return 0x;
+}
+
+eci = env->condexec_bits >> 4;
+switch (eci) {
+case ECI_NONE:
+return 0x;
+case ECI_A0:
+return 0xfff0;
+case ECI_A0A1:
+return 0xff00;
+case ECI_A0A1A2:
+case ECI_A0A1A2B0:
+return 0xf000;
+default:
+g_assert_not_reached();
+}
+}
+
 static uint16_t mve_element_mask(CPUARMState *env)
 {
 /*
@@ -68,30 +97,11 @@ static uint16_t mve_element_mask(CPUARMState *env)
 mask &= ltpmask;
 }
 
-if ((env->condexec_bits & 0xf) == 0) {
-/*
- * ECI bits indicate which beats are already executed;
- * we handle this by effectively predicating them out.
- */
-int eci = env->condexec_bits >> 4;
-switch (eci) {
-case ECI_NONE:
-break;
-case ECI_A0:
-mask &= 0xfff0;
-break;
-case ECI_A0A1:
-mask &= 0xff00;
-break;
-case ECI_A0A1A2:
-case ECI_A0A1A2B0:
-mask &= 0xf000;
-break;
-default:
-g_assert_not_reached();
-}
-}
-
+/*
+ * ECI bits indicate which beats are already executed;
+ * we handle this by effectively predicating them out.
+ */
+mask &= mve_eci_mask(env);
 return mask;
 }
 
-- 
2.20.1




[PULL 06/44] target/arm: Fix 48-bit saturating shifts

2021-08-25 Thread Peter Maydell
In do_sqrshl48_d() and do_uqrshl48_d() we got some of the edge
cases wrong and failed to saturate correctly:

(1) In do_sqrshl48_d() we used the same code that do_shrshl_bhs()
does to obtain the saturated most-negative and most-positive 48-bit
signed values for the large-shift-left case.  This gives (1 << 47)
for saturate-to-most-negative, but we weren't sign-extending this
value to the 64-bit output as the pseudocode requires.

(2) For left shifts by less than 48, we copied the "8/16 bit" code
from do_sqrshl_bhs() and do_uqrshl_bhs().  This doesn't do the right
thing because it assumes the C type we're working with is at least
twice the number of bits we're saturating to (so that a shift left by
bits-1 can't shift anything off the top of the value).  This isn't
true for bits == 48, so we would incorrectly return 0 rather than the
most-positive value for situations like "shift (1 << 44) right by
20".  Instead check for saturation by doing the shift and signextend
and then testing whether shifting back left again gives the original
value.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 847ef5156ad..5730b48f35e 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1576,9 +1576,8 @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t 
shift,
 }
 return src >> -shift;
 } else if (shift < 48) {
-int64_t val = src << shift;
-int64_t extval = sextract64(val, 0, 48);
-if (!sat || val == extval) {
+int64_t extval = sextract64(src << shift, 0, 48);
+if (!sat || src == (extval >> shift)) {
 return extval;
 }
 } else if (!sat || src == 0) {
@@ -1586,7 +1585,7 @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t 
shift,
 }
 
 *sat = 1;
-return (1ULL << 47) - (src >= 0);
+return src >= 0 ? MAKE_64BIT_MASK(0, 47) : MAKE_64BIT_MASK(47, 17);
 }
 
 /* Operate on 64-bit values, but saturate at 48 bits */
@@ -1609,9 +1608,8 @@ static inline uint64_t do_uqrshl48_d(uint64_t src, 
int64_t shift,
 return extval;
 }
 } else if (shift < 48) {
-uint64_t val = src << shift;
-uint64_t extval = extract64(val, 0, 48);
-if (!sat || val == extval) {
+uint64_t extval = extract64(src << shift, 0, 48);
+if (!sat || src == (extval >> shift)) {
 return extval;
 }
 } else if (!sat || src == 0) {
-- 
2.20.1




[PULL 27/44] target/arm: Implement MVE saturating doubling multiply accumulates

2021-08-25 Thread Peter Maydell
Implement the MVE saturating doubling multiply accumulate insns
VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH.  These perform a multiply,
double, add the accumulator shifted by the element size, possibly
round, saturate to twice the element size, then take the high half of
the result.  The *MLAH insns do vector * scalar + vector, and the
*MLASH insns do vector * vector + scalar.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h| 16 +++
 target/arm/mve.decode  |  5 ++
 target/arm/mve_helper.c| 95 ++
 target/arm/translate-mve.c |  4 ++
 4 files changed, 120 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 328e31e2665..2f54396b2df 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -375,6 +375,22 @@ DEF_HELPER_FLAGS_4(mve_vmlasb, TCG_CALL_NO_WG, void, env, 
ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vmlash, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vmlasw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmlahb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrdmlahh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrdmlahw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrdmlashb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrdmlashh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrdmlashw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vmlaldavsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavsw, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
 DEF_HELPER_FLAGS_4(mve_vmlaldavxsh, TCG_CALL_NO_WG, i64, env, ptr, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index cd9c806a11c..7a6de3991b6 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -416,6 +416,11 @@ VQRDMULH_scalar   1110 0 . .. ... 1 ... 0 1110 . 110 
 @2scalar
 VMLA 111- 1110 0 . .. ... 1 ... 0 1110 . 100  @2scalar
 VMLAS111- 1110 0 . .. ... 1 ... 1 1110 . 100  @2scalar
 
+VQRDMLAH 1110 1110 0 . .. ... 0 ... 0 1110 . 100  @2scalar
+VQRDMLASH1110 1110 0 . .. ... 0 ... 1 1110 . 100  @2scalar
+VQDMLAH  1110 1110 0 . .. ... 0 ... 0 1110 . 110  @2scalar
+VQDMLASH 1110 1110 0 . .. ... 0 ... 1 1110 . 110  @2scalar
+
 # Vector add across vector
 {
   VADDV  111 u:1 1110  size:2 01 ... 0  0 0 a:1 0 qm:3 0 
rda=%rdalo
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 8004b9bb728..a69fcd2243c 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -964,6 +964,28 @@ DO_VQDMLADH_OP(vqrdmlsdhxw, 4, int32_t, 1, 1, 
do_vqdmlsdh_w)
 mve_advance_vpt(env);   \
 }
 
+#define DO_2OP_SAT_ACC_SCALAR(OP, ESIZE, TYPE, FN)  \
+void HELPER(glue(mve_, OP))(CPUARMState *env, void *vd, void *vn,   \
+uint32_t rm)\
+{   \
+TYPE *d = vd, *n = vn;  \
+TYPE m = rm;\
+uint16_t mask = mve_element_mask(env);  \
+unsigned e; \
+bool qc = false;\
+for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {  \
+bool sat = false;   \
+mergemask(&d[H##ESIZE(e)],  \
+  FN(d[H##ESIZE(e)], n[H##ESIZE(e)], m, &sat),  \
+  mask);\
+qc |= sat & mask & 1;   \
+}   \
+if (qc) {   \
+env->vfp.qc[0] = qc;\
+}   \
+mve_advance_vpt(env);   \
+}
+
 /* provide unsigned 2-op scalar helpers for all sizes */
 #define DO_2OP_SCALAR_U(OP, FN) \
 DO_2OP_SCALA

[PULL 07/44] target/arm: Fix MVE 48-bit SQRSHRL for small right shifts

2021-08-25 Thread Peter Maydell
We got an edge case wrong in the 48-bit SQRSHRL implementation: if
the shift is to the right, although it always makes the result
smaller than the input value it might not be within the 48-bit range
the result is supposed to be if the input had some bits in [63..48]
set and the shift didn't bring all of those within the [47..0] range.

Handle this similarly to the way we already do for this case in
do_uqrshl48_d(): extend the calculated result from 48 bits,
and return that if not saturating or if it doesn't change the
result; otherwise fall through to return a saturated value.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 5730b48f35e..1a4b2ef8075 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1563,6 +1563,8 @@ uint64_t HELPER(mve_uqrshll)(CPUARMState *env, uint64_t 
n, uint32_t shift)
 static inline int64_t do_sqrshl48_d(int64_t src, int64_t shift,
 bool round, uint32_t *sat)
 {
+int64_t val, extval;
+
 if (shift <= -48) {
 /* Rounding the sign bit always produces 0. */
 if (round) {
@@ -1572,9 +1574,14 @@ static inline int64_t do_sqrshl48_d(int64_t src, int64_t 
shift,
 } else if (shift < 0) {
 if (round) {
 src >>= -shift - 1;
-return (src >> 1) + (src & 1);
+val = (src >> 1) + (src & 1);
+} else {
+val = src >> -shift;
+}
+extval = sextract64(val, 0, 48);
+if (!sat || val == extval) {
+return extval;
 }
-return src >> -shift;
 } else if (shift < 48) {
 int64_t extval = sextract64(src << shift, 0, 48);
 if (!sat || src == (extval >> shift)) {
-- 
2.20.1




[PULL 16/44] target/arm: Implement MVE integer vector-vs-scalar comparisons

2021-08-25 Thread Peter Maydell
Implement the MVE integer vector comparison instructions that compare
each element against a scalar from a general purpose register.  These
are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)"
encodings T4, T5 and T6.

We have to move the decodetree pattern for VPST, because it
overlaps with VCMP T4 with size = 0b11.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h| 32 +++
 target/arm/mve.decode  | 18 +---
 target/arm/mve_helper.c| 44 +++---
 target/arm/translate-mve.c | 43 +
 4 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index ca5a6ab51cc..4f9903e66ef 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -512,3 +512,35 @@ DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, 
ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
 DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpeq_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpne_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpcs_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmphi_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpge_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmplt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmpgt_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarb, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarh, TCG_CALL_NO_WG, void, env, ptr, i32)
+DEF_HELPER_FLAGS_3(mve_vcmple_scalarw, TCG_CALL_NO_WG, void, env, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 76bbf9a6136..ef708ba80ff 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -38,6 +38,7 @@
 &vidup qd rn size imm
 &viwdup qd rn rm size imm
 &vcmp qm qn size mask
+&vcmp_scalar qn rm size mask
 
 @vldr_vstr ... . . . . l:1 rn:4 ... .. imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -90,6 +91,8 @@
 # Vector comparison; 4-bit Qm but 3-bit Qn
 %mask_22_13  22:1 13:3
 @vcmp  .. size:2 qn:3 .     &vcmp qm=%qm 
mask=%mask_22_13
+@vcmp_scalar   .. size:2 qn:3 .    rm:4 &vcmp_scalar \
+ mask=%mask_22_13
 
 # Vector loads and stores
 
@@ -349,9 +352,6 @@ VQRDMULH_scalar   1110 0 . .. ... 1 ... 0 1110 . 110 
 @2scalar
  rdahi=%rdahi rdalo=%rdalo
 }
 
-# Predicate operations
-VPST  1110 0 . 11 000 1 ... 0  0100 1101 mask=%mask_22_13
-
 # Logical immediate operations (1 reg and modified-immediate)
 
 # The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
@@ -474,3 +474,15 @@ VCMPGE 1110 0 . .. ... 1 ... 1  0 0 . 
0 ... 0 @vcmp
 VCMPLT 1110 0 . .. ... 1 ... 1  1 0 . 0 ... 0 @vcmp
 VCMPGT 1110 0 . .. ... 1 ... 1  0 0 . 0 ... 1 @vcmp
 VCMPLE 1110 0 . .. ... 1 ... 1  1 0 . 0 ... 1 @vcmp
+
+{
+  VPST 1110 0 . 11 000 1 ... 0  0100 1101 mask=%mask_22_13
+  VCMPEQ_scalar    1110 0 . .. ... 1 ... 0  0 1 0 0  @vcmp_scalar
+}
+VCMPNE_scalar  1110 0 . .. ... 1 ... 0  1 1 0 0  @vcmp_scalar
+VCMPCS_scalar  1110 0 . .. ... 1 ... 0  0 1 1 0  @vcmp_scalar
+VCMPHI_scalar  1110 0 . .. ... 1 ... 0  1 1 1 0  @vcmp_scalar
+VCMPGE_scalar  1110 0 . .. ... 1 ... 1  0 1 0 0  @vcmp_scalar
+VCMPLT_scalar  1110 0 . .. ... 1 ... 1  1 1 0 0  @vcmp_scalar
+V

[PULL 11/44] target/arm: Fix VLDRB/H/W for predicated elements

2021-08-25 Thread Peter Maydell
For vector loads, predicated elements are zeroed, instead of
retaining their previous values (as happens for most data
processing operations). This means we need to distinguish
"beat not executed due to ECI" (don't touch destination
element) from "beat executed but predicated out" (zero
destination element).

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index bc89ce94d5a..be8b9545317 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -146,12 +146,13 @@ static void mve_advance_vpt(CPUARMState *env)
 env->v7m.vpr = vpr;
 }
 
-
+/* For loads, predicated lanes are zeroed instead of keeping their old values 
*/
 #define DO_VLDR(OP, MSIZE, LDTYPE, ESIZE, TYPE) \
 void HELPER(mve_##OP)(CPUARMState *env, void *vd, uint32_t addr)\
 {   \
 TYPE *d = vd;   \
 uint16_t mask = mve_element_mask(env);  \
+uint16_t eci_mask = mve_eci_mask(env);  \
 unsigned b, e;  \
 /*  \
  * R_SXTM allows the dest reg to become UNKNOWN for abandoned   \
@@ -159,8 +160,9 @@ static void mve_advance_vpt(CPUARMState *env)
  * then take an exception.  \
  */ \
 for (b = 0, e = 0; b < 16; b += ESIZE, e++) {   \
-if (mask & (1 << b)) {  \
-d[H##ESIZE(e)] = cpu_##LDTYPE##_data_ra(env, addr, GETPC()); \
+if (eci_mask & (1 << b)) {  \
+d[H##ESIZE(e)] = (mask & (1 << b)) ?\
+cpu_##LDTYPE##_data_ra(env, addr, GETPC()) : 0; \
 }   \
 addr += MSIZE;  \
 }   \
-- 
2.20.1




[PULL 04/44] target/arm: Fix signed VADDV

2021-08-25 Thread Peter Maydell
A cut-and-paste error meant we handled signed VADDV like
unsigned VADDV; fix the type used.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index f14fa914b68..82151b06200 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1182,9 +1182,9 @@ DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
 return ra;  \
 }   \
 
-DO_VADDV(vaddvsb, 1, uint8_t)
-DO_VADDV(vaddvsh, 2, uint16_t)
-DO_VADDV(vaddvsw, 4, uint32_t)
+DO_VADDV(vaddvsb, 1, int8_t)
+DO_VADDV(vaddvsh, 2, int16_t)
+DO_VADDV(vaddvsw, 4, int32_t)
 DO_VADDV(vaddvub, 1, uint8_t)
 DO_VADDV(vaddvuh, 2, uint16_t)
 DO_VADDV(vaddvuw, 4, uint32_t)
-- 
2.20.1




[PULL 22/44] target/arm: Implement MVE VABAV

2021-08-25 Thread Peter Maydell
Implement the MVE VABAV insn, which computes absolute differences
between elements of two vectors and accumulates the result into
a general purpose register.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  7 +++
 target/arm/mve.decode  |  6 ++
 target/arm/mve_helper.c| 26 +++
 target/arm/translate-mve.c | 43 ++
 4 files changed, 82 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 2c66fcba792..c7e7aab2cbb 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -402,6 +402,13 @@ DEF_HELPER_FLAGS_3(mve_vminavw, TCG_CALL_NO_WG, i32, env, 
ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vaddlv_s, TCG_CALL_NO_WG, i64, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vaddlv_u, TCG_CALL_NO_WG, i64, env, ptr, i64)
 
+DEF_HELPER_FLAGS_4(mve_vabavsb, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vabavsh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vabavsw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vabavub, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vabavuh, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vabavuw, TCG_CALL_NO_WG, i32, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_3(mve_vmovi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vandi, TCG_CALL_NO_WG, void, env, ptr, i64)
 DEF_HELPER_FLAGS_3(mve_vorri, TCG_CALL_NO_WG, void, env, ptr, i64)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 83dc0300d69..c8a06edca78 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -41,6 +41,7 @@
 &vcmp_scalar qn rm size mask
 &shl_scalar qda rm size
 &vmaxv qm rda size
+&vabav qn qm rda size
 
 @vldr_vstr ... . . . . l:1 rn:4 ... .. imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -386,6 +387,11 @@ VMLAS111- 1110 0 . .. ... 1 ... 1 1110 . 100 
 @2scalar
  rdahi=%rdahi rdalo=%rdalo
 }
 
+@vabav     .. size:2  rda:4    &vabav qn=%qn 
qm=%qm
+
+VABAV_S  111 0 1110 10 .. ... 0   . 0 . 0 ... 1 @vabav
+VABAV_U  111 1 1110 10 .. ... 0   . 0 . 0 ... 1 @vabav
+
 # Logical immediate operations (1 reg and modified-immediate)
 
 # The cmode/op bits here decode VORR/VBIC/VMOV/VMVN, but
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 924ad7f2bdc..fed0f3cd610 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1320,6 +1320,32 @@ DO_VMAXMINV(vminavb, 1, int8_t, uint8_t, do_mina)
 DO_VMAXMINV(vminavh, 2, int16_t, uint16_t, do_mina)
 DO_VMAXMINV(vminavw, 4, int32_t, uint32_t, do_mina)
 
+#define DO_VABAV(OP, ESIZE, TYPE)   \
+uint32_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
+void *vm, uint32_t ra)  \
+{   \
+uint16_t mask = mve_element_mask(env);  \
+unsigned e; \
+TYPE *m = vm, *n = vn;  \
+for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) {  \
+if (mask & 1) { \
+int64_t n0 = n[H##ESIZE(e)];\
+int64_t m0 = m[H##ESIZE(e)];\
+uint32_t r = n0 >= m0 ? (n0 - m0) : (m0 - n0);  \
+ra += r;\
+}   \
+}   \
+mve_advance_vpt(env);   \
+return ra;  \
+}
+
+DO_VABAV(vabavsb, 1, int8_t)
+DO_VABAV(vabavsh, 2, int16_t)
+DO_VABAV(vabavsw, 4, int32_t)
+DO_VABAV(vabavub, 1, uint8_t)
+DO_VABAV(vabavuh, 2, uint16_t)
+DO_VABAV(vabavuw, 4, uint32_t)
+
 #define DO_VADDLV(OP, TYPE, LTYPE)  \
 uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vm, \
 uint64_t ra)\
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 2fce74f86ab..247f6719e6f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -45,6 +45,7 @@ typedef void MVEGenVIDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, 
TCGv_i32, TCGv_i32);
 typedef void MVEGenVIWDUPFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_i32, TCGv_i32, 
TCGv_i32);
 typedef void MVEGenCmpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
 typedef void MVEGenScalarCmpFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void MVEGenVABAVFn(TCGv_i32, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_i32);
 
 /* Return the offset of a Qn register (same semantics as aa32_vfp_qreg()) */
 static inline long mve_qreg_offset(unsigned reg)
@@ -1369,3 +13

[PULL 30/44] target/arm: Implement MVE VMOV to/from 2 general-purpose registers

2021-08-25 Thread Peter Maydell
Implement the MVE VMOV forms that move data between 2 general-purpose
registers and 2 32-bit lanes in a vector register.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-a32.h |  1 +
 target/arm/mve.decode  |  4 ++
 target/arm/translate-mve.c | 85 ++
 target/arm/translate-vfp.c |  2 +-
 4 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index 6dfcafe1796..6f4d65ddb00 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -49,6 +49,7 @@ void gen_rev16(TCGv_i32 dest, TCGv_i32 var);
 void clear_eci_state(DisasContext *s);
 bool mve_eci_check(DisasContext *s);
 void mve_update_and_store_eci(DisasContext *s);
+bool mve_skip_vmov(DisasContext *s, int vn, int index, int size);
 
 static inline TCGv_i32 load_cpu_offset(int offset)
 {
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 0955ed0cc22..774ee2a1a5b 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -136,6 +136,10 @@ VLDR_VSTR1110110 1 a:1 . w:1 .  ... 01 
...   @vldr_vstr \
 VLDR_VSTR1110110 1 a:1 . w:1 .  ... 10 ...   @vldr_vstr \
  size=2 p=1
 
+# Moves between 2 32-bit vector lanes and 2 general purpose registers
+VMOV_to_2gp  1110 1100 0 . 00 rt2:4 ... 0  000 idx:1 rt:4 qd=%qd
+VMOV_from_2gp1110 1100 0 . 01 rt2:4 ... 0  000 idx:1 rt:4 qd=%qd
+
 # Vector 2-op
 VAND 1110  0 . 00 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
 VBIC 1110  0 . 01 ... 0 ... 0 0001 . 1 . 1 ... 0 @2op_nosz
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 02c26987a2d..93707fdd681 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1507,3 +1507,88 @@ static bool do_vabav(DisasContext *s, arg_vabav *a, 
MVEGenVABAVFn *fn)
 
 DO_VABAV(VABAV_S, vabavs)
 DO_VABAV(VABAV_U, vabavu)
+
+static bool trans_VMOV_to_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
+{
+/*
+ * VMOV two 32-bit vector lanes to two general-purpose registers.
+ * This insn is not predicated but it is subject to beat-wise
+ * execution if it is not in an IT block. For us this means
+ * only that if PSR.ECI says we should not be executing the beat
+ * corresponding to the lane of the vector register being accessed
+ * then we should skip perfoming the move, and that we need to do
+ * the usual check for bad ECI state and advance of ECI state.
+ * (If PSR.ECI is non-zero then we cannot be in an IT block.)
+ */
+TCGv_i32 tmp;
+int vd;
+
+if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
+a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15 ||
+a->rt == a->rt2) {
+/* Rt/Rt2 cases are UNPREDICTABLE */
+return false;
+}
+if (!mve_eci_check(s) || !vfp_access_check(s)) {
+return true;
+}
+
+/* Convert Qreg index to Dreg for read_neon_element32() etc */
+vd = a->qd * 2;
+
+if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
+tmp = tcg_temp_new_i32();
+read_neon_element32(tmp, vd, a->idx, MO_32);
+store_reg(s, a->rt, tmp);
+}
+if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
+tmp = tcg_temp_new_i32();
+read_neon_element32(tmp, vd + 1, a->idx, MO_32);
+store_reg(s, a->rt2, tmp);
+}
+
+mve_update_and_store_eci(s);
+return true;
+}
+
+static bool trans_VMOV_from_2gp(DisasContext *s, arg_VMOV_to_2gp *a)
+{
+/*
+ * VMOV two general-purpose registers to two 32-bit vector lanes.
+ * This insn is not predicated but it is subject to beat-wise
+ * execution if it is not in an IT block. For us this means
+ * only that if PSR.ECI says we should not be executing the beat
+ * corresponding to the lane of the vector register being accessed
+ * then we should skip perfoming the move, and that we need to do
+ * the usual check for bad ECI state and advance of ECI state.
+ * (If PSR.ECI is non-zero then we cannot be in an IT block.)
+ */
+TCGv_i32 tmp;
+int vd;
+
+if (!dc_isar_feature(aa32_mve, s) || !mve_check_qreg_bank(s, a->qd) ||
+a->rt == 13 || a->rt == 15 || a->rt2 == 13 || a->rt2 == 15) {
+/* Rt/Rt2 cases are UNPREDICTABLE */
+return false;
+}
+if (!mve_eci_check(s) || !vfp_access_check(s)) {
+return true;
+}
+
+/* Convert Qreg idx to Dreg for read_neon_element32() etc */
+vd = a->qd * 2;
+
+if (!mve_skip_vmov(s, vd, a->idx, MO_32)) {
+tmp = load_reg(s, a->rt);
+write_neon_element32(tmp, vd, a->idx, MO_32);
+tcg_temp_free_i32(tmp);
+}
+if (!mve_skip_vmov(s, vd + 1, a->idx, MO_32)) {
+tmp = load_reg(s, a->rt2);
+write_neon_element32(tmp, vd + 1, a->idx, MO_32);
+tcg_temp_free_i32(tmp);
+}
+
+mve_update_and_store_eci(s);
+return tr

[PULL 12/44] target/arm: Implement MVE VMULL (polynomial)

2021-08-25 Thread Peter Maydell
Implement the MVE VMULL (polynomial) insn.  Unlike Neon, this comes
in two flavours: 8x8->16 and a 16x16->32.  Also unlike Neon, the
inputs are in either the low or the high half of each double-width
element.

The assembler for this insn indicates the size with "P8" or "P16",
encoded into bit 28 as size = 0 or 1. We choose to follow the
same encoding as VQDMULL and decode this into a->size as MO_16
or MO_32 indicating the size of the result elements. This then
carries through to the helper function names where it then
matches up with the existing pmull_h() which does an 8x8->16
operation and a new pmull_w() which does the 16x16->32.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  5 +
 target/arm/vec_internal.h  | 11 +++
 target/arm/mve.decode  | 14 ++
 target/arm/mve_helper.c| 16 
 target/arm/translate-mve.c | 28 
 target/arm/vec_helper.c| 14 +-
 6 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 56e40844ad9..84adfb21517 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -145,6 +145,11 @@ DEF_HELPER_FLAGS_4(mve_vmulltub, TCG_CALL_NO_WG, void, 
env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vmulltuw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vmullpbh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullpth, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullpbw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_4(mve_vmullptw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vqdmulhb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqdmulhh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vqdmulhw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/vec_internal.h b/target/arm/vec_internal.h
index 865d2139447..2a335582906 100644
--- a/target/arm/vec_internal.h
+++ b/target/arm/vec_internal.h
@@ -206,4 +206,15 @@ int16_t do_sqrdmlah_h(int16_t, int16_t, int16_t, bool, 
bool, uint32_t *);
 int32_t do_sqrdmlah_s(int32_t, int32_t, int32_t, bool, bool, uint32_t *);
 int64_t do_sqrdmlah_d(int64_t, int64_t, int64_t, bool, bool);
 
+/*
+ * 8 x 8 -> 16 vector polynomial multiply where the inputs are
+ * in the low 8 bits of each 16-bit element
+*/
+uint64_t pmull_h(uint64_t op1, uint64_t op2);
+/*
+ * 16 x 16 -> 32 vector polynomial multiply where the inputs are
+ * in the low 16 bits of each 32-bit element
+ */
+uint64_t pmull_w(uint64_t op1, uint64_t op2);
+
 #endif /* TARGET_ARM_VEC_INTERNALS_H */
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index fa9d921f933..de079ec517d 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -173,10 +173,16 @@ VHADD_U  111 1  0 . .. ... 0 ... 0  . 1 . 
0 ... 0 @2op
 VHSUB_S  111 0  0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 VHSUB_U  111 1  0 . .. ... 0 ... 0 0010 . 1 . 0 ... 0 @2op
 
-VMULL_BS 111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-VMULL_BU 111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
-VMULL_TS 111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
-VMULL_TU 111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+{
+  VMULLP_B   111 . 1110 0 . 11 ... 1 ... 0 1110 . 0 . 0 ... 0 @2op_sz28
+  VMULL_BS   111 0 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
+  VMULL_BU   111 1 1110 0 . .. ... 1 ... 0 1110 . 0 . 0 ... 0 @2op
+}
+{
+  VMULLP_T   111 . 1110 0 . 11 ... 1 ... 1 1110 . 0 . 0 ... 0 @2op_sz28
+  VMULL_TS   111 0 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+  VMULL_TU   111 1 1110 0 . .. ... 1 ... 1 1110 . 0 . 0 ... 0 @2op
+}
 
 VQDMULH  1110  0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
 VQRDMULH   0 . .. ... 0 ... 0 1011 . 1 . 0 ... 0 @2op
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index be8b9545317..91fb346d7e5 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -481,6 +481,22 @@ DO_2OP_L(vmulltub, 1, 1, uint8_t, 2, uint16_t, DO_MUL)
 DO_2OP_L(vmulltuh, 1, 2, uint16_t, 4, uint32_t, DO_MUL)
 DO_2OP_L(vmulltuw, 1, 4, uint32_t, 8, uint64_t, DO_MUL)
 
+/*
+ * Polynomial multiply. We can always do this generating 64 bits
+ * of the result at a time, so we don't need to use DO_2OP_L.
+ */
+#define VMULLPH_MASK 0x00ff00ff00ff00ffULL
+#define VMULLPW_MASK 0xULL
+#define DO_VMULLPBH(N, M) pmull_h((N) & VMULLPH_MASK, (M) & VMULLPH_MASK)
+#define DO_VMULLPTH(N, M) DO_VMULLPBH((N) >> 8, (M) >> 8)
+#define DO_VMULLPBW(N, M) pmull_w((N) & VMULLPW_MASK, (M) & VMULLPW_MASK)
+#define DO_VMULLPTW(N, M) DO_VMULLPBW((N) >> 16, (M) >> 16)
+
+DO_2OP(vmullpbh, 8, uint64_t, DO_VMULLPBH)
+DO_2OP(vmullpth, 8, uint64_t, DO_VMULLPTH)
+DO

[PULL 08/44] target/arm: Fix calculation of LTP mask when LR is 0

2021-08-25 Thread Peter Maydell
In mve_element_mask(), we calculate a mask for tail predication which
should have a number of 1 bits based on the value of LR.  However,
our MAKE_64BIT_MASK() macro has undefined behaviour when passed a
zero length.  Special case this to give the all-zeroes mask we
require.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 1a4b2ef8075..bc67b86e700 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -64,7 +64,8 @@ static uint16_t mve_element_mask(CPUARMState *env)
  */
 int masklen = env->regs[14] << env->v7m.ltpsize;
 assert(masklen <= 16);
-mask &= MAKE_64BIT_MASK(0, masklen);
+uint16_t ltpmask = masklen ? MAKE_64BIT_MASK(0, masklen) : 0;
+mask &= ltpmask;
 }
 
 if ((env->condexec_bits & 0xf) == 0) {
-- 
2.20.1




[PULL 14/44] target/arm: Factor out gen_vpst()

2021-08-25 Thread Peter Maydell
Factor out the "generate code to update VPR.MASK01/MASK23" part of
trans_VPST(); we are going to want to reuse it for the VPT insns.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/translate-mve.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index a220521c00b..6d8da361469 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -737,33 +737,24 @@ static bool trans_VRMLSLDAVH(DisasContext *s, 
arg_vmlaldav *a)
 return do_long_dual_acc(s, a, fns[a->x]);
 }
 
-static bool trans_VPST(DisasContext *s, arg_VPST *a)
+static void gen_vpst(DisasContext *s, uint32_t mask)
 {
-TCGv_i32 vpr;
-
-/* mask == 0 is a "related encoding" */
-if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
-return false;
-}
-if (!mve_eci_check(s) || !vfp_access_check(s)) {
-return true;
-}
 /*
  * Set the VPR mask fields. We take advantage of MASK01 and MASK23
  * being adjacent fields in the register.
  *
- * This insn is not predicated, but it is subject to beat-wise
+ * Updating the masks is not predicated, but it is subject to beat-wise
  * execution, and the mask is updated on the odd-numbered beats.
  * So if PSR.ECI says we should skip beat 1, we mustn't update the
  * 01 mask field.
  */
-vpr = load_cpu_field(v7m.vpr);
+TCGv_i32 vpr = load_cpu_field(v7m.vpr);
 switch (s->eci) {
 case ECI_NONE:
 case ECI_A0:
 /* Update both 01 and 23 fields */
 tcg_gen_deposit_i32(vpr, vpr,
-tcg_constant_i32(a->mask | (a->mask << 4)),
+tcg_constant_i32(mask | (mask << 4)),
 R_V7M_VPR_MASK01_SHIFT,
 R_V7M_VPR_MASK01_LENGTH + R_V7M_VPR_MASK23_LENGTH);
 break;
@@ -772,13 +763,25 @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
 case ECI_A0A1A2B0:
 /* Update only the 23 mask field */
 tcg_gen_deposit_i32(vpr, vpr,
-tcg_constant_i32(a->mask),
+tcg_constant_i32(mask),
 R_V7M_VPR_MASK23_SHIFT, R_V7M_VPR_MASK23_LENGTH);
 break;
 default:
 g_assert_not_reached();
 }
 store_cpu_field(vpr, v7m.vpr);
+}
+
+static bool trans_VPST(DisasContext *s, arg_VPST *a)
+{
+/* mask == 0 is a "related encoding" */
+if (!dc_isar_feature(aa32_mve, s) || !a->mask) {
+return false;
+}
+if (!mve_eci_check(s) || !vfp_access_check(s)) {
+return true;
+}
+gen_vpst(s, a->mask);
 mve_update_and_store_eci(s);
 return true;
 }
-- 
2.20.1




[PULL 17/44] target/arm: Implement MVE VPSEL

2021-08-25 Thread Peter Maydell
Implement the MVE VPSEL insn, which sets each byte of the destination
vector Qd to the byte from either Qn or Qm depending on the value of
the corresponding bit in VPR.P0.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  2 ++
 target/arm/mve.decode  |  7 +--
 target/arm/mve_helper.c| 19 +++
 target/arm/translate-mve.c |  2 ++
 4 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 4f9903e66ef..16c4c3b8f61 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -82,6 +82,8 @@ DEF_HELPER_FLAGS_4(mve_vorr, TCG_CALL_NO_WG, void, env, ptr, 
ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
+DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+
 DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vaddw, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index ef708ba80ff..4bd20a9a319 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -468,8 +468,11 @@ VSHLC 111 0 1110 1 . 1 imm:5 ... 0  1100 
rdm:4 qd=%qd
 # effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
 VCMPEQ 1110 0 . .. ... 1 ... 0  0 0 . 0 ... 0 @vcmp
 VCMPNE 1110 0 . .. ... 1 ... 0  1 0 . 0 ... 0 @vcmp
-VCMPCS 1110 0 . .. ... 1 ... 0  0 0 . 0 ... 1 @vcmp
-VCMPHI 1110 0 . .. ... 1 ... 0  1 0 . 0 ... 1 @vcmp
+{
+  VPSEL    1110 0 . 11 ... 1 ... 0  . 0 . 0 ... 1 @2op_nosz
+  VCMPCS   1110 0 . .. ... 1 ... 0  0 0 . 0 ... 1 @vcmp
+  VCMPHI   1110 0 . .. ... 1 ... 0  1 0 . 0 ... 1 @vcmp
+}
 VCMPGE 1110 0 . .. ... 1 ... 1  0 0 . 0 ... 0 @vcmp
 VCMPLT 1110 0 . .. ... 1 ... 1  1 0 . 0 ... 0 @vcmp
 VCMPGT 1110 0 . .. ... 1 ... 1  0 0 . 0 ... 1 @vcmp
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 1a021a9a817..03171766b57 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1842,3 +1842,22 @@ DO_VCMP_S(vcmpge, DO_GE)
 DO_VCMP_S(vcmplt, DO_LT)
 DO_VCMP_S(vcmpgt, DO_GT)
 DO_VCMP_S(vcmple, DO_LE)
+
+void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void *vn, void *vm)
+{
+/*
+ * Qd[n] = VPR.P0[n] ? Qn[n] : Qm[n]
+ * but note that whether bytes are written to Qd is still subject
+ * to (all forms of) predication in the usual way.
+ */
+uint64_t *d = vd, *n = vn, *m = vm;
+uint16_t mask = mve_element_mask(env);
+uint16_t p0 = FIELD_EX32(env->v7m.vpr, V7M_VPR, P0);
+unsigned e;
+for (e = 0; e < 16 / 8; e++, mask >>= 8, p0 >>= 8) {
+uint64_t r = m[H8(e)];
+mergemask(&r, n[H8(e)], p0);
+mergemask(&d[H8(e)], r, mask);
+}
+mve_advance_vpt(env);
+}
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 6c6f159aa3e..aa38218e08f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -376,6 +376,8 @@ DO_LOGIC(VORR, gen_helper_mve_vorr)
 DO_LOGIC(VORN, gen_helper_mve_vorn)
 DO_LOGIC(VEOR, gen_helper_mve_veor)
 
+DO_LOGIC(VPSEL, gen_helper_mve_vpsel)
+
 #define DO_2OP(INSN, FN) \
 static bool trans_##INSN(DisasContext *s, arg_2op *a)   \
 {   \
-- 
2.20.1




[PULL 31/44] target/arm: Implement MVE VPNOT

2021-08-25 Thread Peter Maydell
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0
(subject to both predication and to beatwise execution).

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  1 +
 target/arm/mve.decode  |  1 +
 target/arm/mve_helper.c| 17 +
 target/arm/translate-mve.c | 19 +++
 4 files changed, 38 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 651020aaad8..8cb941912fc 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -119,6 +119,7 @@ DEF_HELPER_FLAGS_4(mve_vorn, TCG_CALL_NO_WG, void, env, 
ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_veor, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 
 DEF_HELPER_FLAGS_4(mve_vpsel, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_1(mve_vpnot, TCG_CALL_NO_WG, void, env)
 
 DEF_HELPER_FLAGS_4(mve_vaddb, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
 DEF_HELPER_FLAGS_4(mve_vaddh, TCG_CALL_NO_WG, void, env, ptr, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 774ee2a1a5b..40bd0c04b59 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -571,6 +571,7 @@ VCMPGT 1110 0 . .. ... 1 ... 1  0 0 . 0 
... 1 @vcmp
 VCMPLE 1110 0 . .. ... 1 ... 1  1 0 . 0 ... 1 @vcmp
 
 {
+  VPNOT    1110 0 0 11 000 1 000 0  0100 1101
   VPST 1110 0 . 11 000 1 ... 0  0100 1101 mask=%mask_22_13
   VCMPEQ_scalar    1110 0 . .. ... 1 ... 0  0 1 0 0  @vcmp_scalar
 }
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index d326205cbf0..c22a00c5ed6 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -2201,6 +2201,23 @@ void HELPER(mve_vpsel)(CPUARMState *env, void *vd, void 
*vn, void *vm)
 mve_advance_vpt(env);
 }
 
+void HELPER(mve_vpnot)(CPUARMState *env)
+{
+/*
+ * P0 bits for unexecuted beats (where eci_mask is 0) are unchanged.
+ * P0 bits for predicated lanes in executed bits (where mask is 0) are 0.
+ * P0 bits otherwise are inverted.
+ * (This is the same logic as VCMP.)
+ * This insn is itself subject to predication and to beat-wise execution,
+ * and after it executes VPT state advances in the usual way.
+ */
+uint16_t mask = mve_element_mask(env);
+uint16_t eci_mask = mve_eci_mask(env);
+uint16_t beatpred = ~env->v7m.vpr & mask;
+env->v7m.vpr = (env->v7m.vpr & ~(uint32_t)eci_mask) | (beatpred & 
eci_mask);
+mve_advance_vpt(env);
+}
+
 #define DO_1OP_SAT(OP, ESIZE, TYPE, FN) \
 void HELPER(mve_##OP)(CPUARMState *env, void *vd, void *vm) \
 {   \
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index 93707fdd681..cc2e58cfe2f 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -887,6 +887,25 @@ static bool trans_VPST(DisasContext *s, arg_VPST *a)
 return true;
 }
 
+static bool trans_VPNOT(DisasContext *s, arg_VPNOT *a)
+{
+/*
+ * Invert the predicate in VPR.P0. We have call out to
+ * a helper because this insn itself is beatwise and can
+ * be predicated.
+ */
+if (!dc_isar_feature(aa32_mve, s)) {
+return false;
+}
+if (!mve_eci_check(s) || !vfp_access_check(s)) {
+return true;
+}
+
+gen_helper_mve_vpnot(cpu_env);
+mve_update_eci(s);
+return true;
+}
+
 static bool trans_VADDV(DisasContext *s, arg_VADDV *a)
 {
 /* VADDV: vector add across vector */
-- 
2.20.1




[PULL 38/44] target/arm: kvm: use RCU_READ_LOCK_GUARD() in kvm_arch_fixup_msi_route()

2021-08-25 Thread Peter Maydell
From: Hamza Mahfooz 

As per commit 5626f8c6d468 ("rcu: Add automatically released rcu_read_lock
variants"), RCU_READ_LOCK_GUARD() should be used instead of
rcu_read_{un}lock().

Signed-off-by: Hamza Mahfooz 
Reviewed-by: Paolo Bonzini 
Message-id: 20210727235201.11491-1-some...@effective-light.com
Signed-off-by: Peter Maydell 
---
 target/arm/kvm.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index d8381ba2245..5d55de1a493 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -998,7 +998,6 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry 
*route,
 hwaddr xlat, len, doorbell_gpa;
 MemoryRegionSection mrs;
 MemoryRegion *mr;
-int ret = 1;
 
 if (as == &address_space_memory) {
 return 0;
@@ -1006,15 +1005,19 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
 
 /* MSI doorbell address is translated by an IOMMU */
 
-rcu_read_lock();
+RCU_READ_LOCK_GUARD();
+
 mr = address_space_translate(as, address, &xlat, &len, true,
  MEMTXATTRS_UNSPECIFIED);
+
 if (!mr) {
-goto unlock;
+return 1;
 }
+
 mrs = memory_region_find(mr, xlat, 1);
+
 if (!mrs.mr) {
-goto unlock;
+return 1;
 }
 
 doorbell_gpa = mrs.offset_within_address_space;
@@ -1025,11 +1028,7 @@ int kvm_arch_fixup_msi_route(struct 
kvm_irq_routing_entry *route,
 
 trace_kvm_arm_fixup_msi_route(address, doorbell_gpa);
 
-ret = 0;
-
-unlock:
-rcu_read_unlock();
-return ret;
+return 0;
 }
 
 int kvm_arch_add_msi_route_post(struct kvm_irq_routing_entry *route,
-- 
2.20.1




[PULL 10/44] target/arm: Fix VPT advance when ECI is non-zero

2021-08-25 Thread Peter Maydell
We were not paying attention to the ECI state when advancing the VPT
state.  Architecturally, VPT state advance happens for every beat
(see the pseudocode VPTAdvance()), so on every beat the 4 bits of
VPR.P0 corresponding to the current beat are inverted if required,
and at the end of beats 1 and 3 the VPR MASK fields are updated.
This means that if the ECI state says we should not be executing all
4 beats then we need to skip some of the updating of the VPR that we
currently do in mve_advance_vpt().

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/mve_helper.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 280726d..bc89ce94d5a 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -110,6 +110,8 @@ static void mve_advance_vpt(CPUARMState *env)
 /* Advance the VPT and ECI state if necessary */
 uint32_t vpr = env->v7m.vpr;
 unsigned mask01, mask23;
+uint16_t inv_mask;
+uint16_t eci_mask = mve_eci_mask(env);
 
 if ((env->condexec_bits & 0xf) == 0) {
 env->condexec_bits = (env->condexec_bits == (ECI_A0A1A2B0 << 4)) ?
@@ -121,17 +123,25 @@ static void mve_advance_vpt(CPUARMState *env)
 return;
 }
 
+/* Invert P0 bits if needed, but only for beats we actually executed */
 mask01 = FIELD_EX32(vpr, V7M_VPR, MASK01);
 mask23 = FIELD_EX32(vpr, V7M_VPR, MASK23);
-if (mask01 > 8) {
-/* high bit set, but not 0b1000: invert the relevant half of P0 */
-vpr ^= 0xff;
+/* Start by assuming we invert all bits corresponding to executed beats */
+inv_mask = eci_mask;
+if (mask01 <= 8) {
+/* MASK01 says don't invert low half of P0 */
+inv_mask &= ~0xff;
 }
-if (mask23 > 8) {
-/* high bit set, but not 0b1000: invert the relevant half of P0 */
-vpr ^= 0xff00;
+if (mask23 <= 8) {
+/* MASK23 says don't invert high half of P0 */
+inv_mask &= ~0xff00;
 }
-vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
+vpr ^= inv_mask;
+/* Only update MASK01 if beat 1 executed */
+if (eci_mask & 0xf0) {
+vpr = FIELD_DP32(vpr, V7M_VPR, MASK01, mask01 << 1);
+}
+/* Beat 3 always executes, so update MASK23 */
 vpr = FIELD_DP32(vpr, V7M_VPR, MASK23, mask23 << 1);
 env->v7m.vpr = vpr;
 }
-- 
2.20.1




[PULL 15/44] target/arm: Implement MVE integer vector comparisons

2021-08-25 Thread Peter Maydell
Implement the MVE integer vector comparison instructions.  These are
"VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings
T1, T2 and T3.

These insns compare corresponding elements in each vector, and update
the VPR.P0 predicate bits with the results of the comparison.  VPT
also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively
"VCMP then VPST".

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h| 32 ++
 target/arm/mve.decode  | 18 +++-
 target/arm/mve_helper.c| 56 ++
 target/arm/translate-mve.c | 47 
 4 files changed, 152 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index b9af03cc03b..ca5a6ab51cc 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -480,3 +480,35 @@ DEF_HELPER_FLAGS_3(mve_uqshl, TCG_CALL_NO_RWG, i32, env, 
i32, i32)
 DEF_HELPER_FLAGS_3(mve_sqshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
 DEF_HELPER_FLAGS_3(mve_uqrshl, TCG_CALL_NO_RWG, i32, env, i32, i32)
 DEF_HELPER_FLAGS_3(mve_sqrshr, TCG_CALL_NO_RWG, i32, env, i32, i32)
+
+DEF_HELPER_FLAGS_3(mve_vcmpeqb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpeqh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpeqw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpneb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpneh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpnew, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpcsb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpcsh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpcsw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmphib, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmphih, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmphiw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpgeb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpgeh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpgew, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpltb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmplth, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpltw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpgtb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpgth, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpgtw, TCG_CALL_NO_WG, void, env, ptr, ptr)
+
+DEF_HELPER_FLAGS_3(mve_vcmpleb, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmpleh, TCG_CALL_NO_WG, void, env, ptr, ptr)
+DEF_HELPER_FLAGS_3(mve_vcmplew, TCG_CALL_NO_WG, void, env, ptr, ptr)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 88c9c18ebf1..76bbf9a6136 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -37,6 +37,7 @@
 &2shift qd qm shift size
 &vidup qd rn size imm
 &viwdup qd rn rm size imm
+&vcmp qm qn size mask
 
 @vldr_vstr ... . . . . l:1 rn:4 ... .. imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -86,6 +87,10 @@
 @2_shr_w   .. 1 .     &2shift qd=%qd qm=%qm \
  size=2 shift=%rshift_i5
 
+# Vector comparison; 4-bit Qm but 3-bit Qn
+%mask_22_13  22:1 13:3
+@vcmp  .. size:2 qn:3 .     &vcmp qm=%qm 
mask=%mask_22_13
+
 # Vector loads and stores
 
 # Widening loads and narrowing stores:
@@ -345,7 +350,6 @@ VQRDMULH_scalar   1110 0 . .. ... 1 ... 0 1110 . 110 
 @2scalar
 }
 
 # Predicate operations
-%mask_22_13  22:1 13:3
 VPST  1110 0 . 11 000 1 ... 0  0100 1101 mask=%mask_22_13
 
 # Logical immediate operations (1 reg and modified-immediate)
@@ -458,3 +462,15 @@ VQRSHRUNT 111 1 1110 1 . ... ... ... 1  1 1 . 
0 ... 0 @2_shr_b
 VQRSHRUNT 111 1 1110 1 . ... ... ... 1  1 1 . 0 ... 0 @2_shr_h
 
 VSHLC 111 0 1110 1 . 1 imm:5 ... 0  1100 rdm:4 qd=%qd
+
+# Comparisons. We expand out the conditions which are split across
+# encodings T1, T2, T3 and the fc bits. These include VPT, which is
+# effectively "VCMP then VPST". A plain "VCMP" has a mask field of zero.
+VCMPEQ 1110 0 . .. ... 1 ... 0  0 0 . 0 ... 0 @vcmp
+VCMPNE 1110 0 . .. ... 1 ... 0  1 0 . 0 ... 0 @vcmp
+VCMPCS 1110 0 . .. ... 1 ... 0  0 0 . 0 ... 1 @vcmp
+VCMPHI 1110 0 . .. ... 1 ... 0  1 0 . 0 ... 1 @vcmp
+VCMPGE 1110 0 . .. ... 1 ... 1  0 0 . 0 ... 0 @vcmp
+VCMPLT 1110 0 . .. ... 1 ... 1  1 0 . 0 ... 0 @vcmp
+VCMPGT 1110 0 . .. ... 1 ... 1  0 0 . 0 ... 1 @vcmp
+VCMPLE 1110 0 . .. ... 1 ... 1  1 0 . 0 ... 1 @vcmp
diff --git

[PULL 33/44] target/arm: Implement MVE scatter-gather insns

2021-08-25 Thread Peter Maydell
Implement the MVE gather-loads and scatter-stores which
form the address by adding a base value from a scalar
register to an offset in each element of a vector.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  32 +
 target/arm/mve.decode  |  12 
 target/arm/mve_helper.c| 129 +
 target/arm/translate-mve.c |  97 
 4 files changed, 270 insertions(+)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index b6cf3f0c94d..ba842b97c17 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -33,6 +33,38 @@ DEF_HELPER_FLAGS_3(mve_vstrb_h, TCG_CALL_NO_WG, void, env, 
ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrb_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 DEF_HELPER_FLAGS_3(mve_vstrh_w, TCG_CALL_NO_WG, void, env, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vstrb_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_4(mve_vldrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_4(mve_vldrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_4(mve_vldrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_4(mve_vstrh_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_4(mve_vstrw_sg_os_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+DEF_HELPER_FLAGS_4(mve_vstrd_sg_os_ud, TCG_CALL_NO_WG, void, env, ptr, ptr, 
i32)
+
 DEF_HELPER_FLAGS_3(mve_vdup, TCG_CALL_NO_WG, void, env, ptr, i32)
 
 DEF_HELPER_FLAGS_4(mve_vidupb, TCG_CALL_NO_WG, i32, env, ptr, i32, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 40bd0c04b59..6c3f45c7195 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -42,11 +42,18 @@
 &shl_scalar qda rm size
 &vmaxv qm rda size
 &vabav qn qm rda size
+&vldst_sg qd qm rn size msize os
+
+# scatter-gather memory size is in bits 6:4
+%sg_msize 6:1 4:1
 
 @vldr_vstr ... . . . . l:1 rn:4 ... .. imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
 @vldst_wn ... u:1 ... . . . . l:1 . rn:3 qd:3 . ... .. imm:7 &vldr_vstr
 
+@vldst_sg    rn:4  ... size:2 ... ... os:1 &vldst_sg \
+  qd=%qd qm=%qm msize=%sg_msize
+
 @1op    size:2 ..     &1op qd=%qd qm=%qm
 @1op_nosz         &1op qd=%qd qm=%qm size=0
 @2op   .. size:2      &2op qd=%qd qm=%qm qn=%qn
@@ -136,6 +143,11 @@ VLDR_VSTR1110110 1 a:1 . w:1 .  ... 01 
...   @vldr_vstr \
 VLDR_VSTR1110110 1 a:1 . w:1 .  ... 10 ...   @vldr_vstr \
  size=2 p=1
 
+# gather loads/scatter stores
+VLDR_S_sg111 0 1100 1 . 01  ... 0 111 .   @vldst_sg
+VLDR_U_sg111 1 1100 1 . 01  ... 0 111 .   @vldst_sg
+VSTR_sg  111 0 1100 1 . 00  ... 0 111 .   @vldst_sg
+
 # Moves between 2 32-bit vector lanes and 2 general purpose registers
 VMOV_to_2gp  1110 1100 0 . 00 rt2:4 ... 0  000 idx:1 rt:4 qd=%qd
 VMOV_from_2gp1110 1100 0 . 01 rt2:4 ... 0  000 idx:1 rt:4 qd=%qd
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 1752555a218..2b882db1c3d 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -206,6 +206,135 @@ DO_VSTR(vstrh_w, 2, stw, 4, int32_t)
 #undef DO_VLDR
 #undef DO_VSTR
 
+/*
+ * Gather loads/scatter stores. Here each element of

[PULL 19/44] target/arm: Implement MVE shift-by-scalar

2021-08-25 Thread Peter Maydell
Implement the MVE instructions which perform shifts by a scalar.
These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2.  They take the
shift amount in a general purpose register and shift every element in
the vector by that amount.

Mostly we can reuse the helper functions for shift-by-immediate; we
do need two new helpers for VQRSHL.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/helper-mve.h|  8 +++
 target/arm/mve.decode  | 23 ---
 target/arm/mve_helper.c|  2 ++
 target/arm/translate-mve.c | 46 ++
 4 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/target/arm/helper-mve.h b/target/arm/helper-mve.h
index 715b1bbd012..0ee5ea3cabd 100644
--- a/target/arm/helper-mve.h
+++ b/target/arm/helper-mve.h
@@ -414,6 +414,14 @@ DEF_HELPER_FLAGS_4(mve_vrshli_ub, TCG_CALL_NO_WG, void, 
env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(mve_vqrshli_sb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshli_sh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshli_sw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(mve_vqrshli_ub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshli_uh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(mve_vqrshli_uw, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(mve_vshllbsb, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vshllbsh, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(mve_vshllbub, TCG_CALL_NO_WG, void, env, ptr, ptr, i32)
diff --git a/target/arm/mve.decode b/target/arm/mve.decode
index 226b74790b3..eb26b103d12 100644
--- a/target/arm/mve.decode
+++ b/target/arm/mve.decode
@@ -39,6 +39,7 @@
 &viwdup qd rn rm size imm
 &vcmp qm qn size mask
 &vcmp_scalar qn rm size mask
+&shl_scalar qda rm size
 
 @vldr_vstr ... . . . . l:1 rn:4 ... .. imm:7 &vldr_vstr qd=%qd u=0
 # Note that both Rn and Qd are 3 bits only (no D bit)
@@ -88,6 +89,8 @@
 @2_shr_w   .. 1 .     &2shift qd=%qd qm=%qm \
  size=2 shift=%rshift_i5
 
+@shl_scalar    size:2 ..    rm:4 &shl_scalar qda=%qd
+
 # Vector comparison; 4-bit Qm but 3-bit Qn
 %mask_22_13  22:1 13:3
 @vcmp  .. size:2 qn:3 .     &vcmp qm=%qm 
mask=%mask_22_13
@@ -320,7 +323,23 @@ VRMLSLDAVH    1110 1 ... ... 0 ... x:1 1110 . 0 
a:1 0 ... 1 @vmlaldav_no
 
 VADD_scalar  1110 1110 0 . .. ... 1 ... 0  . 100  @2scalar
 VSUB_scalar  1110 1110 0 . .. ... 1 ... 1  . 100  @2scalar
-VMUL_scalar  1110 1110 0 . .. ... 1 ... 1 1110 . 110  @2scalar
+
+{
+  VSHL_S_scalar   1110 1110 0 . 11 .. 01 ... 1 1110 0110  @shl_scalar
+  VRSHL_S_scalar  1110 1110 0 . 11 .. 11 ... 1 1110 0110  @shl_scalar
+  VQSHL_S_scalar  1110 1110 0 . 11 .. 01 ... 1 1110 1110  @shl_scalar
+  VQRSHL_S_scalar 1110 1110 0 . 11 .. 11 ... 1 1110 1110  @shl_scalar
+  VMUL_scalar 1110 1110 0 . .. ... 1 ... 1 1110 . 110  @2scalar
+}
+
+{
+  VSHL_U_scalar    1110 0 . 11 .. 01 ... 1 1110 0110  @shl_scalar
+  VRSHL_U_scalar   1110 0 . 11 .. 11 ... 1 1110 0110  @shl_scalar
+  VQSHL_U_scalar   1110 0 . 11 .. 01 ... 1 1110 1110  @shl_scalar
+  VQRSHL_U_scalar  1110 0 . 11 .. 11 ... 1 1110 1110  @shl_scalar
+  VBRSR    1110 0 . .. ... 1 ... 1 1110 . 110  @2scalar
+}
+
 VHADD_S_scalar   1110 1110 0 . .. ... 0 ... 0  . 100  @2scalar
 VHADD_U_scalar    1110 0 . .. ... 0 ... 0  . 100  @2scalar
 VHSUB_S_scalar   1110 1110 0 . .. ... 0 ... 1  . 100  @2scalar
@@ -340,8 +359,6 @@ VHSUB_U_scalar    1110 0 . .. ... 0 ... 1  . 100 
 @2scalar
   size=%size_28
 }
 
-VBRSR 1110 0 . .. ... 1 ... 1 1110 . 110  @2scalar
-
 VQDMULH_scalar   1110 1110 0 . .. ... 1 ... 0 1110 . 110  @2scalar
 VQRDMULH_scalar   1110 0 . .. ... 1 ... 0 1110 . 110  @2scalar
 
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index ab02a1e60f4..ac608fc524b 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -1334,6 +1334,8 @@ DO_2SHIFT_SAT_S(vqshli_s, DO_SQSHL_OP)
 DO_2SHIFT_SAT_S(vqshlui_s, DO_SUQSHL_OP)
 DO_2SHIFT_U(vrshli_u, DO_VRSHLU)
 DO_2SHIFT_S(vrshli_s, DO_VRSHLS)
+DO_2SHIFT_SAT_U(vqrshli_u, DO_UQRSHL_OP)
+DO_2SHIFT_SAT_S(vqrshli_s, DO_SQRSHL_OP)
 
 /* Shift-and-insert; we always work with 64 bits at a time */
 #define DO_2SHIFT_INSERT(OP, ESIZE, SHIFTFN, MASKFN)\
diff --git a/target/arm/translate-mve.c b/target/arm/translate-mve.c
index b56c91db2ab..44731fc4eb7 100644
--- a/target/arm/translate-mve.c
+++ b/target/arm/translate-mve.c
@@ -1003,6 +1003,52 @@ DO_2SHIFT(VRSHRI_U, vrshli_u, true)
 DO_2SHIFT(

[PULL 36/44] target/arm: Re-indent sdiv and udiv helpers

2021-08-25 Thread Peter Maydell
We're about to make a code change to the sdiv and udiv helper
functions, so first fix their indentation and coding style.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20210730151636.17254-2-peter.mayd...@linaro.org
---
 target/arm/helper.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 155d8bf2399..8e9c2a2cf8c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9355,17 +9355,20 @@ uint32_t HELPER(uxtb16)(uint32_t x)
 
 int32_t HELPER(sdiv)(int32_t num, int32_t den)
 {
-if (den == 0)
-  return 0;
-if (num == INT_MIN && den == -1)
-  return INT_MIN;
+if (den == 0) {
+return 0;
+}
+if (num == INT_MIN && den == -1) {
+return INT_MIN;
+}
 return num / den;
 }
 
 uint32_t HELPER(udiv)(uint32_t num, uint32_t den)
 {
-if (den == 0)
-  return 0;
+if (den == 0) {
+return 0;
+}
 return num / den;
 }
 
-- 
2.20.1




  1   2   3   4   >