date:20220706

Re: [PATCH] scsi/lsi53c895a: fix use-after-free in lsi_do_msgout (CVE-2022-0216)

2022-07-06 Thread Thomas Huth


On 05/07/2022 22.05, Mauro Matteo Cascella wrote:

Set current_req->req to NULL to prevent reusing a free'd buffer in case of
repeated SCSI cancel requests. Thanks to Thomas Huth for suggesting the patch.

Fixes: CVE-2022-0216
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/972
Signed-off-by: Mauro Matteo Cascella 
---
  hw/scsi/lsi53c895a.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index c8773f73f7..99ea42d49b 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -1028,8 +1028,9 @@ static void lsi_do_msgout(LSIState *s)
  case 0x0d:
  /* The ABORT TAG message clears the current I/O process only. */
  trace_lsi_do_msgout_abort(current_tag);
-if (current_req) {
+if (current_req && current_req->req) {
  scsi_req_cancel(current_req->req);
+current_req->req = NULL;
  }
  lsi_disconnect(s);
  break;


Let's hope that this will fix the issue for good...

Reviewed-by: Thomas Huth

[PATCH v6 11/13] qemu-sockets: move and rename SocketAddress_to_str()

2022-07-06 Thread Laurent Vivier

Rename SocketAddress_to_str() to socket_uri() and move it to
util/qemu-sockets.c close to socket_parse().

socket_uri() generates a string from a SocketAddress while
socket_parse() generates a SocketAddress from a string.

Signed-off-by: Laurent Vivier 
---
 include/qemu/sockets.h |  2 +-
 monitor/hmp-cmds.c | 23 +--
 util/qemu-sockets.c| 20 
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 47194b9732f8..e5a06d2e3729 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -40,6 +40,7 @@ NetworkAddressFamily inet_netfamily(int family);
 int unix_listen(const char *path, Error **errp);
 int unix_connect(const char *path, Error **errp);
 
+char *socket_uri(SocketAddress *addr);
 SocketAddress *socket_parse(const char *str, Error **errp);
 int socket_connect(SocketAddress *addr, Error **errp);
 int socket_listen(SocketAddress *addr, int num, Error **errp);
@@ -123,5 +124,4 @@ SocketAddress *socket_address_flatten(SocketAddressLegacy 
*addr);
  * Return 0 on success.
  */
 int socket_address_parse_named_fd(SocketAddress *addr, Error **errp);
-
 #endif /* QEMU_SOCKETS_H */
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index ca98df04952b..8ebb437d1b6a 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -197,27 +197,6 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
 qapi_free_MouseInfoList(mice_list);
 }
 
-static char *SocketAddress_to_str(SocketAddress *addr)
-{
-switch (addr->type) {
-case SOCKET_ADDRESS_TYPE_INET:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.inet.host,
-   addr->u.inet.port);
-case SOCKET_ADDRESS_TYPE_UNIX:
-return g_strdup_printf("unix:%s",
-   addr->u.q_unix.path);
-case SOCKET_ADDRESS_TYPE_FD:
-return g_strdup_printf("fd:%s", addr->u.fd.str);
-case SOCKET_ADDRESS_TYPE_VSOCK:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.vsock.cid,
-   addr->u.vsock.port);
-default:
-return g_strdup("unknown address type");
-}
-}
-
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 {
 MigrationInfo *info;
@@ -375,7 +354,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "socket address: [\n");
 
 for (addr = info->socket_address; addr; addr = addr->next) {
-char *s = SocketAddress_to_str(addr->value);
+char *s = socket_uri(addr->value);
 monitor_printf(mon, "\t%s\n", s);
 g_free(s);
 }
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 13b5b197f9ea..870a36eb0e93 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -1098,6 +1098,26 @@ int unix_connect(const char *path, Error **errp)
 return sock;
 }
 
+char *socket_uri(SocketAddress *addr)
+{
+switch (addr->type) {
+case SOCKET_ADDRESS_TYPE_INET:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.inet.host,
+   addr->u.inet.port);
+case SOCKET_ADDRESS_TYPE_UNIX:
+return g_strdup_printf("unix:%s",
+   addr->u.q_unix.path);
+case SOCKET_ADDRESS_TYPE_FD:
+return g_strdup_printf("fd:%s", addr->u.fd.str);
+case SOCKET_ADDRESS_TYPE_VSOCK:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.vsock.cid,
+   addr->u.vsock.port);
+default:
+return g_strdup("unknown address type");
+}
+}
 
 SocketAddress *socket_parse(const char *str, Error **errp)
 {
-- 
2.36.1

Re: [PATCH v2] io_uring: fix short read slow path

2022-07-06 Thread Stefan Hajnoczi

On Tue, 5 Jul 2022 at 20:26, Jens Axboe  wrote:
>
> On 7/5/22 7:28 AM, Stefan Hajnoczi wrote:
> > On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
> >> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
>  so when we ask for more we issue an extra short reads, making sure we go
>  through the two short reads path.
>  (Unfortunately I wasn't quite sure what to fiddle with to issue short
>  reads in the first place, I tried cutting one of the iovs short in
>  luring_do_submit() but I must not have been doing it properly as I ended
>  up with 0 return values which are handled by filling in with 0 (reads
>  after eof) and that didn't work well)
> >>>
> >>> Do you remember the kernel version where you first saw these problems?
> >>
> >> Since you're quoting my paragraph about testing two short reads, I've
> >> never seen any that I know of; but there's also no reason these couldn't
> >> happen.
> >>
> >> Single short reads have been happening for me with O_DIRECT (cache=none)
> >> on btrfs for a while, but unfortunately I cannot remember which was the
> >> first kernel I've seen this on -- I think rather than a kernel update it
> >> was due to file manipulations that made the file eligible for short
> >> reads in the first place (I started running deduplication on the backing
> >> file)
> >>
> >> The older kernel I have installed right now is 5.16 and that can
> >> reproduce it --  I'll give my laptop some work over the weekend to test
> >> still maintained stable branches if that's useful.
> >
> > Hi Dominique,
> > Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> > async context"). The comment above QEMU's luring_resubmit_short_read()
> > claims that short reads are a bug that was fixed by Linux commit
> > 9d93a3f5a0c.
> >
> > If the comment is inaccurate it needs to be fixed. Maybe short writes
> > need to be handled too.
> >
> > I have CCed Jens and the io_uring mailing list to clarify:
> > 1. Are short IORING_OP_READV reads possible on files/block devices?
> > 2. Are short IORING_OP_WRITEV writes possible on files/block devices?
>
> In general we try very hard to avoid them, but if eg we get a short read
> or write from blocking context (eg io-wq), then io_uring does return
> that. There's really not much we can do here, it seems futile to retry
> IO which was issued just like it would've been from a normal blocking
> syscall yet it is still short.

Thanks! QEMU's short I/O handling is spotty - some code paths handle
it while others don't. For the io_uring QEMU block driver we'll try to
handle short all I/Os.

Stefan

Re: [PATCH v2] io_uring: fix short read slow path

2022-07-06 Thread Stefan Hajnoczi

On Tue, 5 Jul 2022 at 23:53, Dominique Martinet
 wrote:
>
> Stefan Hajnoczi wrote on Tue, Jul 05, 2022 at 02:28:08PM +0100:
> > > The older kernel I have installed right now is 5.16 and that can
> > > reproduce it --  I'll give my laptop some work over the weekend to test
> > > still maintained stable branches if that's useful.
> >
> > Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> > async context"). The comment above QEMU's luring_resubmit_short_read()
> > claims that short reads are a bug that was fixed by Linux commit
> > 9d93a3f5a0c.
> >
> > If the comment is inaccurate it needs to be fixed. Maybe short writes
> > need to be handled too.
> >
> > I have CCed Jens and the io_uring mailing list to clarify:
> > 1. Are short IORING_OP_READV reads possible on files/block devices?
> > 2. Are short IORING_OP_WRITEV writes possible on files/block devices?
>
> Jens replied before me, so I won't be adding much (I agree with his
> reply -- linux tries hard to avoid short reads but we should assume they
> can happen)
>
> In this particular case it was another btrfs bug with O_DIRECT and mixed
> compression in a file, that's been fixed by this patch:
> https://lore.kernel.org/all/20220630151038.GA459423@falcondesktop/
>
> queued here:
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=dio_fixes&id=b3864441547e49a69d45c7771aa8cc5e595d18fc
>
> It should be backported to 5.10, but the problem will likely persist in
> 5.4 kernels if anyone runs on that as the code changed enough to make
> backporting non-trivial.
>
>
> So, WRT that comment, we probably should remove the reference to that
> commit and leave in that they should be very rare but we need to handle
> them anyway.
>
>
> For writes in particular, I haven't seen any and looking at the code
> qemu would blow up that storage (IO treated as ENOSPC would likely mark
> disk read-only?)
> It might make sense to add some warning message that it's what happened
> so it'll be obvious what needs doing in case anyone falls on that but I
> think the status-quo is good enough here.

Great! I've already queued your fix.

Do you want to send a follow-up that updates the comment?

Thanks,
Stefan

Re: [PATCH v2 1/1] qga: add command 'guest-get-cpustats'

2022-07-06 Thread Marc-André Lureau

Hi

On Wed, Jul 6, 2022 at 7:09 AM zhenwei pi  wrote:

> On 7/4/22 16:00, zhenwei pi wrote:
> >
> >
> >> +##
> >> +# @GuestOsType:
> >> +#
> >> +# An enumeration of OS type
> >> +#
> >> +# Since: 7.1
> >> +##
> >> +{ 'enum': 'GuestOsType',
> >> +  'data': [ 'linuxos', 'windowsos' ] }
> >>
> >>
> >> I would rather keep this enum specific to GuestCpuStats,
> >> "GuestLinuxCpuStatsType"?
> >>
> >
> > Hi,
> >
> > 'GuestOsType' may be re-used in the future, not only for the CPU
> > statistics case.
> >
> >> I would also drop the "os" suffix
> >>
> > I'm afraid we can not drop "os" suffix, build this without "os" suffix:
> > qga/qga-qapi-types.h:948:28: error: expected member name or ';' after
> > declaration specifiers
> >  GuestLinuxCpuStats linux;
> >  ~~ ^
> > :336:15: note: expanded from here
> > #define linux 1
> >
>
> Hi, Marc
>
> Could you please give any hint about this issue?
>

Yes, it looks like we need to add "linux" to the "polluted_words":

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 489273574aee..737b059e6291 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -114,7 +114,7 @@ def c_name(name: str, protect: bool = True) -> str:
  'and', 'and_eq', 'bitand', 'bitor', 'compl', 'not',
  'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
 # namespace pollution:
-polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
+polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386',
'linux'])


> >> +
> >> +
> >>
> >>
> >>
> >> Looks good to me otherwise.
> >> thanks
> >>
> >> --
> >> Marc-André Lureau
> >
>
> --
> zhenwei pi
>


-- 
Marc-André Lureau

Re: [PATCH] scsi/lsi53c895a: fix use-after-free in lsi_do_msgout (CVE-2022-0216)

2022-07-06 Thread Paolo Bonzini

Queued, thanks.

Paolo

Re: [PATCH v2] io_uring: fix short read slow path

2022-07-06 Thread Dominique Martinet

Stefan Hajnoczi wrote on Wed, Jul 06, 2022 at 08:17:42AM +0100:
> Great! I've already queued your fix.

Thanks!

> Do you want to send a follow-up that updates the comment?

I don't think I'd add much value at this point, leaving it to you unless
you really would prefer me to send it.


Cheers,
-- 
Dominique

Re: [PATCH] iotests: fix copy-before-write for macOS and FreeBSD

2022-07-06 Thread Thomas Huth


On 05/07/2022 17.37, Vladimir Sementsov-Ogievskiy wrote:

strerror() represents ETIMEDOUT a bit different in Linux and macOS /
FreeBSD. Let's support the latter too.

Fixes: 9d05a87b77 ("iotests: copy-before-write: add cases for cbw-timeout 
option")
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

As John and Thomas noted, the new iotests fails for FreeBSD and maxOS.
Here is a fix. Would be great if someone can test it.


Thanks, seems to work now:

 https://gitlab.com/thuth/qemu/-/jobs/2683487160#L3256
 https://gitlab.com/thuth/qemu/-/jobs/2683487162#L2897

Tested-by: Thomas Huth 



I tried to push it by

   git push --force  -o ci.variable="QEMU_CI=1"

to my block branch, I get a blocked pipeline
   https://gitlab.com/vsementsov/qemu/-/pipelines/580573238
but it doesn't have neither freebsd nor macos jobs.. How to get them?


The instructions are a little bit hidden - you can find them in the 
.gitlab-ci.d/cirrus/README.rst file in your git checkout.


 HTH,
  Thomas

Re: [PATCH 2/3] Revert "main-loop: Disable block backend global state assertion on Cocoa"

2022-07-06 Thread Emanuele Giuseppe Esposito




Am 06/07/2022 um 04:13 schrieb Akihiko Odaki:
> This reverts commit 47281859f66bdab1974fb122cab2cbb4a1c9af7f.
> 
> Signed-off-by: Akihiko Odaki 
> ---
>  include/qemu/main-loop.h | 13 -
>  1 file changed, 13 deletions(-)

Reviewed-by: Emanuele Giuseppe Esposito

Re: [PATCH] iotests: fix copy-before-write for macOS and FreeBSD

2022-07-06 Thread Thomas Huth


On 05/07/2022 19.22, Richard Henderson wrote:

On 7/5/22 21:07, Vladimir Sementsov-Ogievskiy wrote:

strerror() represents ETIMEDOUT a bit different in Linux and macOS /
FreeBSD. Let's support the latter too.

Fixes: 9d05a87b77 ("iotests: copy-before-write: add cases for cbw-timeout 
option")

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

As John and Thomas noted, the new iotests fails for FreeBSD and maxOS.
Here is a fix. Would be great if someone can test it.

I tried to push it by

   git push --force  -o ci.variable="QEMU_CI=1"

to my block branch, I get a blocked pipeline
   https://gitlab.com/vsementsov/qemu/-/pipelines/580573238
but it doesn't have neither freebsd nor macos jobs.. How to get them?


You'd have to have an attached cirrus token.
Better to just use 'make vm-build-freebsd'.


  def test_timeout_break_guest(self):
  log = self.do_cbw_timeout('break-guest-write')
+    # macOS and FreeBSD tend to represent ETIMEDOUT as
+    # "Operation timed out", when Linux prefer
+    # "Connection timed out"
+    log = log.replace('Operation timed out',
+  'Connection timed out')


Um, really?  Matching strerror text?  This is ultra-fragile.
Dare I say broken already.


Many of the iotests rely on output text matching. It's very fragile, always 
has been and always will be (unless we rewrite the whole test suite to not 
use output text matching anymore). For example, the iotests also do not work 
with the libc from Alpine Linux (where one of the error messages is "I/O 
error" instead of "Input/Output error"), so we only run check-unit and 
check-qtest in the CI there. It's a pity, but that's the way it is 
currently. I think it's still better to tweak some of the text strings here 
instead of not running the tests at all.


 Thomas

Re: [PATCH 2/9] target/ppc: add errp to kvmppc_read_int_cpu_dt()

2022-07-06 Thread Cédric Le Goater


On 7/5/22 08:57, Cédric Le Goater wrote:

On 7/5/22 08:51, Mark Cave-Ayland wrote:

On 04/07/2022 18:34, Cédric Le Goater wrote:


On 7/2/22 15:34, Daniel Henrique Barboza wrote:



On 7/2/22 03:24, Cédric Le Goater wrote:

On 6/30/22 21:42, Daniel Henrique Barboza wrote:

The function can't just return 0 whether an error happened and call it a
day. We must provide a way of letting callers know if the zero return is
legitimate or due to an error.

Add an Error pointer to kvmppc_read_int_cpu_dt() that will be filled
with an appropriate error, if one occurs. Callers are then free to pass
an Error pointer and handle it.

Signed-off-by: Daniel Henrique Barboza 
---
  target/ppc/kvm.c | 16 +---
  1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index 109823136d..bc17437097 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -1925,15 +1925,17 @@ static uint64_t kvmppc_read_int_dt(const char *filename)
  /*
   * Read a CPU node property from the host device tree that's a single
- * integer (32-bit or 64-bit).  Returns 0 if anything goes wrong
- * (can't find or open the property, or doesn't understand the format)
+ * integer (32-bit or 64-bit).  Returns 0 and set errp if anything goes
+ * wrong (can't find or open the property, or doesn't understand the
+ * format)
   */
-static uint64_t kvmppc_read_int_cpu_dt(const char *propname)
+static uint64_t kvmppc_read_int_cpu_dt(const char *propname, Error **errp)
  {
  char buf[PATH_MAX], *tmp;
  uint64_t val;
  if (kvmppc_find_cpu_dt(buf, sizeof(buf))) {
+    error_setg(errp, "Failed to read CPU property %s", propname);
  return 0;
  }
@@ -1946,12 +1948,12 @@ static uint64_t kvmppc_read_int_cpu_dt(const char 
*propname)
  uint64_t kvmppc_get_clockfreq(void)
  {
-    return kvmppc_read_int_cpu_dt("clock-frequency");
+    return kvmppc_read_int_cpu_dt("clock-frequency", NULL);



This should be fatal. no ?



I'm not sure. I went under the assumption that there might be some weird
condition where 'clock-frequency' might be missing from the DT, and this
is why we're not exiting out immediately.

That said, the advantage of turning this into fatal is that we won't need
all the other patches that handles error on this function. We're going to
assume that if 'clock-frequency' is missing then we can't boot. I'm okay
with that.


I think so. Some machines behave badly when 'clock-frequency' is bogus,
division by zero, no console, etc. We could check easily with pseries
which is the only KVM PPC platform.


Well not quite true ;)  I haven't tested it during the last release cycle, but 
the Mac machines were still working fine to boot OS X with KVM-PR on my G4 Mac 
Mini last time I checked.


Oh. Sorry. and I still have access to a real G5 running the latest debian.
I should give it a try one day.


I gave KVM a try on a :

cpu : PPC970MP, altivec supported
clock   : 2000.00MHz
revision: 1.0 (pvr 0044 0100)

processor	: 1

cpu : PPC970MP, altivec supported
clock   : 2000.00MHz
revision: 1.0 (pvr 0044 0100)

timebase	: 

platform: PowerMac
model   : PowerMac11,2
machine : PowerMac11,2
motherboard : PowerMac11,2 MacRISC4 Power Macintosh
detected as : 337 (PowerMac G5 Dual Core)
pmac flags  : 
L2 cache: 1024K unified
pmac-generation : NewWorld



running debian with kernel 5.18.0-2-powerpc64. With the installed QEMU 7.0.0,

qemu-system-ppc64 -M mac99 -cpu host -accel kvm ...

doesn't go very far. Program exception is quickly reached and host says:

[56450.118422] Couldn't emulate instruction 0x (op 0 xop 0)
[56450.119060] kvmppc_exit_pr_progint: emulation at 100 failed ()

Anything special I should know ?

Thanks,

C.

Re: Re: [PATCH v2 1/1] qga: add command 'guest-get-cpustats'

2022-07-06 Thread zhenwei pi





On 7/6/22 15:20, Marc-André Lureau wrote:

Hi

On Wed, Jul 6, 2022 at 7:09 AM zhenwei pi > wrote:


On 7/4/22 16:00, zhenwei pi wrote:
 >
 >
 >>     +##
 >>     +# @GuestOsType:
 >>     +#
 >>     +# An enumeration of OS type
 >>     +#
 >>     +# Since: 7.1
 >>     +##
 >>     +{ 'enum': 'GuestOsType',
 >>     +  'data': [ 'linuxos', 'windowsos' ] }
 >>
 >>
 >> I would rather keep this enum specific to GuestCpuStats,
 >> "GuestLinuxCpuStatsType"?
 >>
 >
 > Hi,
 >
 > 'GuestOsType' may be re-used in the future, not only for the CPU
 > statistics case.
 >
 >> I would also drop the "os" suffix
 >>
 > I'm afraid we can not drop "os" suffix, build this without "os"
suffix:
 > qga/qga-qapi-types.h:948:28: error: expected member name or ';'
after
 > declaration specifiers
 >      GuestLinuxCpuStats linux;
 >      ~~ ^
 > :336:15: note: expanded from here
 > #define linux 1
 >

Hi, Marc

Could you please give any hint about this issue?


Yes, it looks like we need to add "linux" to the "polluted_words":



OK, I'll fix this in the next versoin.

By the way, 'GuestCpuStatsType' seems to be used for CPU statistics 
only, but 'data': [ 'linux', 'windows' ] } is quite common, it may be 
used for other OS specified commands in the future. Should I use 
'GuestCpuStatsType' instead of 'GuestOsType'?



diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 489273574aee..737b059e6291 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -114,7 +114,7 @@ def c_name(name: str, protect: bool = True) -> str:
                       'and', 'and_eq', 'bitand', 'bitor', 'compl', 'not',
                       'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
      # namespace pollution:
-    polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
+    polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386', 
'linux'])



 >>     +
 >>     +
 >>
 >>
 >>
 >> Looks good to me otherwise.
 >> thanks
 >>
 >> --
 >> Marc-André Lureau
 >

-- 
zhenwei pi




--
Marc-André Lureau


--
zhenwei pi

Re: [PATCH v2] io_uring: fix short read slow path

2022-07-06 Thread Stefan Hajnoczi

On Wed, Jul 06, 2022 at 04:26:59PM +0900, Dominique Martinet wrote:
> Stefan Hajnoczi wrote on Wed, Jul 06, 2022 at 08:17:42AM +0100:
> > Great! I've already queued your fix.
> 
> Thanks!
> 
> > Do you want to send a follow-up that updates the comment?
> 
> I don't think I'd add much value at this point, leaving it to you unless
> you really would prefer me to send it.

That's fine. I'll send a patch. Thanks!

Stefan


signature.asc
Description: PGP signature

Re: [PATCH v2 1/9] hw/i2c/pca954x: Add method to get channels

2022-07-06 Thread Peter Delevoryas

On Wed, Jul 06, 2022 at 08:06:34AM +0200, Cédric Le Goater wrote:
> On 7/5/22 23:44, Peter Delevoryas wrote:
> > On Tue, Jul 05, 2022 at 02:40:32PM -0700, Peter Delevoryas wrote:
> > > On Tue, Jul 05, 2022 at 03:06:24PM -0500, Corey Minyard wrote:
> > > > On Tue, Jul 05, 2022 at 12:13:52PM -0700, Peter Delevoryas wrote:
> > > > > I added this helper in the Aspeed machine file a while ago to help
> > > > > initialize fuji-bmc i2c devices. This moves it to the official pca954x
> > > > > file so that other files can use it.
> > > > > 
> > > > > This does something very similar to pca954x_i2c_get_bus, but I think
> > > > > this is useful when you have a very complicated dts with a lot of
> > > > > switches, like the fuji dts.
> > > > > 
> > > > > This convenience method lets you write code that produces a flat array
> > > > > of I2C buses that matches the naming in the dts. After that you can 
> > > > > just
> > > > > add individual sensors using the flat array of I2C buses.
> > > > 
> > > > This is an improvment, I think.  But it really needs to be two patches,
> > > > one with the I2C portion, and one with the aspeed portion.
> > > > 
> > > > Also, the name is a little misleading, you might want to name it
> > > > pca954x_i2c_create_get_channels
> > > 
> > > You're right: Cedric, you can just ignore the pca954x patch then if you'd 
> > > like,
> > > I can resubmit it with the future I2C series that uses it. I probably 
> > > shouldn't
> > > have submitted it quite yet.
> > > 
> > > I can also resubmit the series with that patch removed, not sure if that's
> > > necessary or not.
> > 
> > This was hastily written, what I meant to say was:
> > 
> > Cedric, feel free to remove this patch from the series. If you'd like, I can
> > also resubmit this series as v3 with the patch removed.
> 
> 
> I moved it at the end of the series to come just before the other patches
> needing it, the last three patches of :
> 
>   http://patchwork.ozlabs.org/project/qemu-devel/list/?series=307305
> 
> You can resend all of them when fixed.

That sounds good, thanks.

> 
> 
> I think we are done with the initial fby35.
> 
> Next time, please add a cover letter and keep the Reviewed-by tags
> of the previous version. It helps the reviewers. I re-added them
> manually for this spin.

Yeah that's my bad again, I need to get in a better habit of adding those.
Thanks for reminding me again.

> Thanks,
> 
> C.
> 
> > 
> > > 
> > > > 
> > > > -corey
> > > > 
> > > > > 
> > > > > See fuji_bmc_i2c_init to understand this point further.
> > > > > 
> > > > > The fuji dts is here for reference:
> > > > > 
> > > > > https://github.com/torvalds/linux/blob/40cb6373b46/arch/arm/boot/dts/aspeed-bmc-facebook-fuji.dts
> > > > > 
> > > > > Signed-off-by: Peter Delevoryas 
> > > > > ---
> > > > >   hw/arm/aspeed.c  | 29 +
> > > > >   hw/i2c/i2c_mux_pca954x.c | 10 ++
> > > > >   include/hw/i2c/i2c_mux_pca954x.h | 13 +
> > > > >   3 files changed, 32 insertions(+), 20 deletions(-)
> > > > > 
> > > > > diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> > > > > index 6fe9b13548..bee8a748ec 100644
> > > > > --- a/hw/arm/aspeed.c
> > > > > +++ b/hw/arm/aspeed.c
> > > > > @@ -793,15 +793,6 @@ static void 
> > > > > rainier_bmc_i2c_init(AspeedMachineState *bmc)
> > > > >   create_pca9552(soc, 15, 0x60);
> > > > >   }
> > > > > -static void get_pca9548_channels(I2CBus *bus, uint8_t mux_addr,
> > > > > - I2CBus **channels)
> > > > > -{
> > > > > -I2CSlave *mux = i2c_slave_create_simple(bus, "pca9548", 
> > > > > mux_addr);
> > > > > -for (int i = 0; i < 8; i++) {
> > > > > -channels[i] = pca954x_i2c_get_bus(mux, i);
> > > > > -}
> > > > > -}
> > > > > -
> > > > >   #define TYPE_LM75 TYPE_TMP105
> > > > >   #define TYPE_TMP75 TYPE_TMP105
> > > > >   #define TYPE_TMP422 "tmp422"
> > > > > @@ -814,20 +805,18 @@ static void 
> > > > > fuji_bmc_i2c_init(AspeedMachineState *bmc)
> > > > >   for (int i = 0; i < 16; i++) {
> > > > >   i2c[i] = aspeed_i2c_get_bus(&soc->i2c, i);
> > > > >   }
> > > > > -I2CBus *i2c180 = i2c[2];
> > > > > -I2CBus *i2c480 = i2c[8];
> > > > > -I2CBus *i2c600 = i2c[11];
> > > > > -get_pca9548_channels(i2c180, 0x70, &i2c[16]);
> > > > > -get_pca9548_channels(i2c480, 0x70, &i2c[24]);
> > > > > +pca954x_i2c_get_channels(i2c[2], 0x70, "pca9548", &i2c[16]);
> > > > > +pca954x_i2c_get_channels(i2c[8], 0x70, "pca9548", &i2c[24]);
> > > > >   /* NOTE: The device tree skips [32, 40) in the alias numbering 
> > > > > */
> > > > > -get_pca9548_channels(i2c600, 0x77, &i2c[40]);
> > > > > -get_pca9548_channels(i2c[24], 0x71, &i2c[48]);
> > > > > -get_pca9548_channels(i2c[25], 0x72, &i2c[56]);
> > > > > -get_pca9548_channels(i2c[26], 0x76, &i2c[64]);
> > > > > -get_pca9548_channels(i2c[27], 0x76, &i2c[72]);
> > > > > +pca954x_i2c_get_channels(i2c[11], 0x77, "pc

Re: [PATCH v2 9/9] docs: aspeed: Add fby35 multi-SoC machine section

2022-07-06 Thread Peter Delevoryas

On Wed, Jul 06, 2022 at 07:58:44AM +0200, Cédric Le Goater wrote:
> On 7/5/22 21:14, Peter Delevoryas wrote:
> > Signed-off-by: Peter Delevoryas 
> 
> Reviewed-by: Cédric Le Goater 
> 
> I fixed inline the URL links and moved the section at the end of the file.
> 
> Thanks,
> 
> C.

Thanks for that!

> 
> > ---
> >   docs/system/arm/aspeed.rst | 48 ++
> >   1 file changed, 48 insertions(+)
> > 
> > diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
> > index 5d0a7865d3..b233191b67 100644
> > --- a/docs/system/arm/aspeed.rst
> > +++ b/docs/system/arm/aspeed.rst
> > @@ -136,6 +136,54 @@ AST1030 SoC based machines :
> >   - ``ast1030-evb``  Aspeed AST1030 Evaluation board (Cortex-M4F)
> > +Facebook Yosemite v3.5 Platform and CraterLake Server (``fby35``)
> > +==
> > +
> > +Facebook has a series of multi-node compute server designs named
> > +Yosemite. The most recent version released was
> > +`Yosemite v3 
> > `.
> > +
> > +Yosemite v3.5 is an iteration on this design, and is very similar: there's 
> > a
> > +baseboard with a BMC, and 4 server slots. The new server board design 
> > termed
> > +"CraterLake" includes a Bridge IC (BIC), with room for expansion boards to
> > +include various compute accelerators (video, inferencing, etc). At the 
> > moment,
> > +only the first server slot's BIC is included.
> > +
> > +Yosemite v3.5 is itself a sled which fits into a 40U chassis, and 3 sleds
> > +can be fit into a chassis. See `here 
> > `
> > +for an example.
> > +
> > +In this generation, the BMC is an AST2600 and each BIC is an AST1030. The 
> > BMC
> > +runs `OpenBMC `, and the BIC runs
> > +`OpenBIC `.
> > +
> > +Firmware images can be retrieved from the Github releases or built from the
> > +source code, see the README's for instructions on that. This image uses the
> > +"fby35" machine recipe from OpenBMC, and the "yv35-cl" target from OpenBIC.
> > +Some reference images can also be found here:
> > +
> > +.. code-block:: bash
> > +
> > +$ wget 
> > https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd
> > +$ wget 
> > https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf
> > +
> > +Since this machine has multiple SoC's, each with their own serial console, 
> > the
> > +recommended way to run it is to allocate a pseudoterminal for each serial
> > +console and let the monitor use stdio. Also, starting in a paused state is
> > +useful because it allows you to attach to the pseudoterminals before the 
> > boot
> > +process starts.
> > +
> > +.. code-block:: bash
> > +
> > +$ qemu-system-arm -machine fby35 \
> > +-drive file=fby35.mtd,format=raw,if=mtd \
> > +-device loader,file=Y35BCL.elf,addr=0,cpu-num=2 \
> > +-serial pty -serial pty -serial mon:stdio \
> > +-display none -S
> > +$ screen /dev/tty0 # In a separate TMUX pane, terminal window, etc.
> > +$ screen /dev/tty1
> > +$ (qemu) c# Start the boot process once screen is 
> > setup.
> > +
> >   Supported devices
> >   -
>

Re: [PATCH RESEND] python/machine: Fix AF_UNIX path too long on macOS

2022-07-06 Thread Daniel P . Berrangé

On Tue, Jul 05, 2022 at 02:46:59PM -0700, Peter Delevoryas wrote:
> I noticed that I can't run any avocado tests on macOS because the QMP
> unix socket path is too long:


> I think the path limit for unix sockets on macOS might be 104 [1]

All platforms have a very limited path limit, so it isn't really
a macOS specific problem, rather

> 
> /*
>  * [XSI] Definitions for UNIX IPC domain.
>  */
> struct  sockaddr_un {
> unsigned char   sun_len;/* sockaddr len including null */
> sa_family_t sun_family; /* [XSI] AF_UNIX */
> charsun_path[104];  /* [XSI] path name (gag) */
> };
> 
> The path we're using is exactly 105 characters:
> 
> $ python
> Python 2.7.10 (default, Jan 19 2016, 22:24:01)
> [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> len('/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/avo_qemu_sock_uh3w_dgc/qemu-37331-10bacf110-monitor.sock')

It is a problem related to where the test suite is creating the
paths.

/var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/avo_qemu_sock_uh3w_dgc/

is way too deep a directory location.

It seems we just create this location using 'tempfile.TemporyDirectory'
to get a standard tmp dir.

Do you know why python is choosing

  /var/folders/d7/rz20f6hd709c1ty8f6_6y_z4gn/T/

as the temp dir ? Is that a standard location on macOS or is it
from some env variable you have set ?

> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index 37191f433b..93451774e3 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -157,7 +157,7 @@ def __init__(self,
>  self._wrapper = wrapper
>  self._qmp_timer = qmp_timer
>  
> -self._name = name or f"qemu-{os.getpid()}-{id(self):02x}"
> +self._name = name or f"{os.getpid()}{id(self):02x}"

I don't think this is the right fix really, because IMHO the problem
is the hugely long path, rather than the final socket name.

That said, there is redundancy in the path - avocado is passing in
a dierctory created using 'tempfile.TemporyDirectory' so there is no
reason why we need to add more entropy via the POD and the 'id(self)'
hex string.

IMHO avocado should pass in the 'name' parameter explicitly, using a
plain name and thus get a shorter string.

>  self._temp_dir: Optional[str] = None
>  self._base_temp_dir = base_temp_dir
>  self._sock_dir = sock_dir
> @@ -167,7 +167,7 @@ def __init__(self,
>  self._monitor_address = monitor_address
>  else:
>  self._monitor_address = os.path.join(
> -self.sock_dir, f"{self._name}-monitor.sock"
> +self.sock_dir, f"{self._name}.sock"
>  )
>  
>  self._console_log_path = console_log
> -- 
> 2.37.0
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH] block/io_uring: clarify that short reads can happen

2022-07-06 Thread Stefan Hajnoczi

Jens Axboe has confirmed that short reads are rare but can happen:
https://lore.kernel.org/io-uring/YsU%2FCGkl9ZXUI+Tj@stefanha-x1.localdomain/T/#m729963dc577d709b709c191922e98ec79d7eef54

The luring_resubmit_short_read() comment claimed they were only due to a
specific io_uring bug that was fixed in Linux commit 9d93a3f5a0c
("io_uring: punt short reads to async context"), which is wrong.
Dominique Martinet found that a btrfs bug also causes short reads. There
may be more kernel code paths that result in short reads.

Let's consider short reads fair game.

Cc: Dominique Martinet 
Based-on: <20220630010137.2518851-1-dominique.marti...@atmark-techno.com>
Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index b238661740..f8a19fd97f 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -73,12 +73,8 @@ static void luring_resubmit(LuringState *s, LuringAIOCB 
*luringcb)
 /**
  * luring_resubmit_short_read:
  *
- * Before Linux commit 9d93a3f5a0c ("io_uring: punt short reads to async
- * context") a buffered I/O request with the start of the file range in the
- * page cache could result in a short read.  Applications need to resubmit the
- * remaining read request.
- *
- * This is a slow path but recent kernels never take it.
+ * Short reads are rare but may occur. The remaining read request needs to be
+ * resubmitted.
  */
 static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
int nread)
-- 
2.36.1

Re: [PATCH v2 9/9] docs: aspeed: Add fby35 multi-SoC machine section

2022-07-06 Thread Joel Stanley

On Tue, 5 Jul 2022 at 19:14, Peter Delevoryas  wrote:
>
> Signed-off-by: Peter Delevoryas 

Reviewed-by: Joel Stanley 

> ---
>  docs/system/arm/aspeed.rst | 48 ++
>  1 file changed, 48 insertions(+)
>
> diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
> index 5d0a7865d3..b233191b67 100644
> --- a/docs/system/arm/aspeed.rst
> +++ b/docs/system/arm/aspeed.rst
> @@ -136,6 +136,54 @@ AST1030 SoC based machines :
>
>  - ``ast1030-evb``  Aspeed AST1030 Evaluation board (Cortex-M4F)
>
> +Facebook Yosemite v3.5 Platform and CraterLake Server (``fby35``)
> +==
> +
> +Facebook has a series of multi-node compute server designs named
> +Yosemite. The most recent version released was
> +`Yosemite v3 
> `.
> +
> +Yosemite v3.5 is an iteration on this design, and is very similar: there's a
> +baseboard with a BMC, and 4 server slots. The new server board design termed
> +"CraterLake" includes a Bridge IC (BIC), with room for expansion boards to
> +include various compute accelerators (video, inferencing, etc). At the 
> moment,
> +only the first server slot's BIC is included.
> +
> +Yosemite v3.5 is itself a sled which fits into a 40U chassis, and 3 sleds
> +can be fit into a chassis. See `here 
> `
> +for an example.
> +
> +In this generation, the BMC is an AST2600 and each BIC is an AST1030. The BMC
> +runs `OpenBMC `, and the BIC runs
> +`OpenBIC `.
> +
> +Firmware images can be retrieved from the Github releases or built from the
> +source code, see the README's for instructions on that. This image uses the
> +"fby35" machine recipe from OpenBMC, and the "yv35-cl" target from OpenBIC.
> +Some reference images can also be found here:
> +
> +.. code-block:: bash
> +
> +$ wget 
> https://github.com/facebook/openbmc/releases/download/openbmc-e2294ff5d31d/fby35.mtd
> +$ wget 
> https://github.com/peterdelevoryas/OpenBIC/releases/download/oby35-cl-2022.13.01/Y35BCL.elf
> +
> +Since this machine has multiple SoC's, each with their own serial console, 
> the
> +recommended way to run it is to allocate a pseudoterminal for each serial
> +console and let the monitor use stdio. Also, starting in a paused state is
> +useful because it allows you to attach to the pseudoterminals before the boot
> +process starts.
> +
> +.. code-block:: bash
> +
> +$ qemu-system-arm -machine fby35 \
> +-drive file=fby35.mtd,format=raw,if=mtd \
> +-device loader,file=Y35BCL.elf,addr=0,cpu-num=2 \

I came across a quirk of the qemu commandline when testing.

-drive knows how to expand ~ in a path, but -device loader does not.
Something for someone to look into on a rainy day!

eg:

$ build/qemu-system-arm -M fby35 -drive
file=~/tmp/fby35.mtd,format=raw,if=mtd -device
loader,file=~/tmp/Y35BCL.elf,addr=0,cpu-num=2 -serial pty -serial pty
-serial mon:stdio -display none -S
char device redirected to /dev/pts/3 (label serial0)
char device redirected to /dev/pts/5 (label serial1)
qemu-system-arm: warning: Aspeed iBT has no chardev backend
~/tmp/Y35BCL.elf: No such file or directory
qemu-system-arm: -device
loader,file=~/tmp/Y35BCL.elf,addr=0,cpu-num=2: Cannot load specified
image ~/tmp/Y35BCL.elf

loader uses open(2) in load_elf_ram_sym.

The call stack for -drive looks like this (using a bad path to make it
easier to identify what's going on):

#0  __libc_open64 (file=file@entry=0xac009250
"/home/joel/tmp/fby35.mtda", oflag=oflag@entry=524288) at
../sysdeps/unix/sysv/linux/open64.c:37
#1  0xab3b18bc in open64 (__oflag=524288,
__path=0xac009250 "/home/joel/tmp/fby35.mtda") at
/usr/include/aarch64-linux-gnu/bits/fcntl2.h:59
#2  qemu_open_cloexec (mode=0, flags=0, name=0xac009250
"/home/joel/tmp/fby35.mtda") at ../util/osdep.c:286
#3  qemu_open_internal (name=name@entry=0xac009250
"/home/joel/tmp/fby35.mtda", flags=flags@entry=0, mode=mode@entry=0,
errp=errp@entry=0xeb70) at ../util/osdep.c:330
#4  0xab3b1f30 in qemu_open (name=name@entry=0xac009250
"/home/joel/tmp/fby35.mtda", flags=flags@entry=0,
errp=errp@entry=0xeb70) at ../util/osdep.c:360
#5  0xab30d9d8 in raw_open_common (bs=0xac002c40,
options=, bdrv_flags=155650, open_flags=, device=, errp=0xeb70)
at ../block/file-posix.c:680
#6  0xab2a53d0 in bdrv_open_driver
(bs=bs@entry=0xac002c40, drv=drv@entry=0xabc13250 ,
node_name=, options=options@entry=0xac008230,
open_flags=open_flags@entry=155650,
errp=errp@entry=0xec18) at ../block.c:1625
#7  0xab2a9454 in bdrv_open_common (errp=0xec18,
options=0xac008230, file=0x0, bs=0xac002c40) at
../block.c:1922
#8  bdrv_open_inherit (filename=,
filen

[PATCH v7 01/14] mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd

2022-07-06 Thread Chao Peng

Normally, a write to unallocated space of a file or the hole of a sparse
file automatically causes space allocation, for memfd, this equals to
memory allocation. This new seal prevents such automatically allocating,
either this is from a direct write() or a write on the previously
mmap-ed area. The seal does not prevent fallocate() so an explicit
fallocate() can still cause allocating and can be used to reserve
memory.

This is used to prevent unintentional allocation from userspace on a
stray or careless write and any intentional allocation should use an
explicit fallocate(). One of the main usecases is to avoid memory double
allocation for confidential computing usage where we use two memfds to
back guest memory and at a single point only one memfd is alive and we
want to prevent memory allocation for the other memfd which may have
been mmap-ed previously. More discussion can be found at:

  https://lkml.org/lkml/2022/6/14/1255

Suggested-by: Sean Christopherson 
Signed-off-by: Chao Peng 
---
 include/uapi/linux/fcntl.h |  1 +
 mm/memfd.c |  3 ++-
 mm/shmem.c | 16 ++--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 2f86b2ad6d7e..98bdabc8e309 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -43,6 +43,7 @@
 #define F_SEAL_GROW0x0004  /* prevent file from growing */
 #define F_SEAL_WRITE   0x0008  /* prevent writes */
 #define F_SEAL_FUTURE_WRITE0x0010  /* prevent future writes while mapped */
+#define F_SEAL_AUTO_ALLOCATE   0x0020  /* prevent allocation for writes */
 /* (1U << 31) is reserved for signed error codes */
 
 /*
diff --git a/mm/memfd.c b/mm/memfd.c
index 08f5f8304746..2afd898798e4 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
 F_SEAL_SHRINK | \
 F_SEAL_GROW | \
 F_SEAL_WRITE | \
-F_SEAL_FUTURE_WRITE)
+F_SEAL_FUTURE_WRITE | \
+F_SEAL_AUTO_ALLOCATE)
 
 static int memfd_add_seals(struct file *file, unsigned int seals)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index a6f565308133..6c8aef15a17d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2051,6 +2051,8 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct inode *inode = file_inode(vma->vm_file);
gfp_t gfp = mapping_gfp_mask(inode->i_mapping);
+   struct shmem_inode_info *info = SHMEM_I(inode);
+   enum sgp_type sgp;
int err;
vm_fault_t ret = VM_FAULT_LOCKED;
 
@@ -2113,7 +2115,12 @@ static vm_fault_t shmem_fault(struct vm_fault *vmf)
spin_unlock(&inode->i_lock);
}
 
-   err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, SGP_CACHE,
+   if (unlikely(info->seals & F_SEAL_AUTO_ALLOCATE))
+   sgp = SGP_NOALLOC;
+   else
+   sgp = SGP_CACHE;
+
+   err = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, sgp,
  gfp, vma, vmf, &ret);
if (err)
return vmf_error(err);
@@ -2459,6 +2466,7 @@ shmem_write_begin(struct file *file, struct address_space 
*mapping,
struct inode *inode = mapping->host;
struct shmem_inode_info *info = SHMEM_I(inode);
pgoff_t index = pos >> PAGE_SHIFT;
+   enum sgp_type sgp;
int ret = 0;
 
/* i_rwsem is held by caller */
@@ -2470,7 +2478,11 @@ shmem_write_begin(struct file *file, struct 
address_space *mapping,
return -EPERM;
}
 
-   ret = shmem_getpage(inode, index, pagep, SGP_WRITE);
+   if (unlikely(info->seals & F_SEAL_AUTO_ALLOCATE))
+   sgp = SGP_NOALLOC;
+   else
+   sgp = SGP_WRITE;
+   ret = shmem_getpage(inode, index, pagep, sgp);
 
if (ret)
return ret;
-- 
2.25.1

[PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-07-06 Thread Chao Peng

This is the v7 of this series which tries to implement the fd-based KVM
guest private memory. The patches are based on latest kvm/queue branch
commit:

  b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
split_desc_cache only by default capacity

Introduction

In general this patch series introduce fd-based memslot which provides
guest memory through memory file descriptor fd[offset,size] instead of
hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
and the the memory backing store exchange callbacks when such memslot
gets created. At runtime KVM will call into callbacks provided by the
backing store to get the pfn with the fd+offset. Memory backing store
will also call into KVM callbacks when userspace punch hole on the fd
to notify KVM to unmap secondary MMU page table entries.

Comparing to existing hva-based memslot, this new type of memslot allows
guest memory unmapped from host userspace like QEMU and even the kernel
itself, therefore reduce attack surface and prevent bugs.

Based on this fd-based memslot, we can build guest private memory that
is going to be used in confidential computing environments such as Intel
TDX and AMD SEV. When supported, the memory backing store can provide
more enforcement on the fd and KVM can use a single memslot to hold both
the private and shared part of the guest memory. 

mm extension
-
Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file
created with these flags cannot read(), write() or mmap() etc via normal
MMU operations. The file content can only be used with the newly
introduced memfile_notifier extension.

The memfile_notifier extension provides two sets of callbacks for KVM to
interact with the memory backing store:
  - memfile_notifier_ops: callbacks for memory backing store to notify
KVM when memory gets invalidated.
  - backing store callbacks: callbacks for KVM to call into memory
backing store to request memory pages for guest private memory.

The memfile_notifier extension also provides APIs for memory backing
store to register/unregister itself and to trigger the notifier when the
bookmarked memory gets invalidated.

The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to
prevent double allocation caused by unintentional guest when we only
have a single side of the shared/private memfds effective.

memslot extension
-
Add the private fd and the fd offset to existing 'shared' memslot so
that both private/shared guest memory can live in one single memslot.
A page in the memslot is either private or shared. Whether a guest page
is private or shared is maintained through reusing existing SEV ioctls
KVM_MEMORY_ENCRYPT_{UN,}REG_REGION.

Test

To test the new functionalities of this patch TDX patchset is needed.
Since TDX patchset has not been merged so I did two kinds of test:

-  Regresion test on kvm/queue (this patchset)
   Most new code are not covered. Code also in below repo:
   https://github.com/chao-p/linux/tree/privmem-v7

-  New Funational test on latest TDX code
   The patch is rebased to latest TDX code and tested the new
   funcationalities. See below repos:
   Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx
   QEMU: https://github.com/chao-p/qemu/tree/privmem-v7

An example QEMU command line for TDX test:
-object tdx-guest,id=tdx,debug=off,sept-ve-disable=off \
-machine confidential-guest-support=tdx \
-object memory-backend-memfd-private,id=ram1,size=${mem} \
-machine memory-backend=ram1

Changelog
--
v7:
  - Move the private/shared info from backing store to KVM.
  - Introduce F_SEAL_AUTO_ALLOCATE to avoid double allocation.
  - Rework on the sync mechanism between zap/page fault paths.
  - Addressed other comments in v6.
v6:
  - Re-organzied patch for both mm/KVM parts.
  - Added flags for memfile_notifier so its consumers can state their
features and memory backing store can check against these flags.
  - Put a backing store reference in the memfile_notifier and move pfn_ops
into backing store.
  - Only support boot time backing store register.
  - Overall KVM part improvement suggested by Sean and some others.
v5:
  - Removed userspace visible F_SEAL_INACCESSIBLE, instead using an
in-kernel flag (SHM_F_INACCESSIBLE for shmem). Private fd can only
be created by MFD_INACCESSIBLE.
  - Introduced new APIs for backing store to register itself to
memfile_notifier instead of direct function call.
  - Added the accounting and restriction for MFD_INACCESSIBLE memory.
  - Added KVM API doc for new memslot extensions and man page for the new
MFD_INACCESSIBLE flag.
  - Removed the overlap check for mapping the same file+offset into
multiple gfns due to perf consideration, warned in document.
  - Addressed other comments in v4.
v4:
  - Decoupled the callbacks between KVM/mm from memfd and use new
name 'memfile_notifier'.

[PATCH v5 01/45] target/arm: Handle SME in aarch64_cpu_dump_state

2022-07-06 Thread Richard Henderson

Dump SVCR, plus use the correct access check for Streaming Mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index ae6dca2f01..9c58be8b14 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -878,6 +878,7 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 int i;
 int el = arm_current_el(env);
 const char *ns_status;
+bool sve;
 
 qemu_fprintf(f, " PC=%016" PRIx64 " ", env->pc);
 for (i = 0; i < 32; i++) {
@@ -904,6 +905,12 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
  el,
  psr & PSTATE_SP ? 'h' : 't');
 
+if (cpu_isar_feature(aa64_sme, cpu)) {
+qemu_fprintf(f, "  SVCR=%08" PRIx64 " %c%c",
+ env->svcr,
+ (FIELD_EX64(env->svcr, SVCR, ZA) ? 'Z' : '-'),
+ (FIELD_EX64(env->svcr, SVCR, SM) ? 'S' : '-'));
+}
 if (cpu_isar_feature(aa64_bti, cpu)) {
 qemu_fprintf(f, "  BTYPE=%d", (psr & PSTATE_BTYPE) >> 10);
 }
@@ -918,7 +925,15 @@ static void aarch64_cpu_dump_state(CPUState *cs, FILE *f, 
int flags)
 qemu_fprintf(f, " FPCR=%08x FPSR=%08x\n",
  vfp_get_fpcr(env), vfp_get_fpsr(env));
 
-if (cpu_isar_feature(aa64_sve, cpu) && sve_exception_el(env, el) == 0) {
+if (cpu_isar_feature(aa64_sme, cpu) && FIELD_EX64(env->svcr, SVCR, SM)) {
+sve = sme_exception_el(env, el) == 0;
+} else if (cpu_isar_feature(aa64_sve, cpu)) {
+sve = sve_exception_el(env, el) == 0;
+} else {
+sve = false;
+}
+
+if (sve) {
 int j, zcr_len = sve_vqm1_for_el(env, el);
 
 for (i = 0; i <= FFR_PRED_NUM; i++) {
-- 
2.34.1

Re: [PATCH v8 06/20] job.h: define functions called without job lock held

2022-07-06 Thread Emanuele Giuseppe Esposito




Am 05/07/2022 um 12:54 schrieb Vladimir Sementsov-Ogievskiy:
> To subject: hmm, the commit don't define any function..
> 
mark functions called without job lock held?

[PATCH v5 03/45] target/arm: Trap non-streaming usage when Streaming SVE is active

2022-07-06 Thread Richard Henderson

This new behaviour is in the ARM pseudocode function
AArch64.CheckFPAdvSIMDEnabled, which applies to AArch32
via AArch32.CheckAdvSIMDOrFPEnabled when the EL to which
the trap would be delivered is in AArch64 mode.

Given that ARMv9 drops support for AArch32 outside EL0, the trap EL
detection ought to be trivially true, but the pseudocode still contains
a number of conditions, and QEMU has not yet committed to dropping A32
support for EL[12] when v9 features are present.

Since the computation of SME_TRAP_NONSTREAMING is necessarily different
for the two modes, we might as well preserve bits within TBFLAG_ANY and
allocate separate bits within TBFLAG_A32 and TBFLAG_A64 instead.

Note that DDI0616A.a has typos for bits [22:21] of LD1RO in the table
of instructions illegal in streaming mode.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h   |  7 +++
 target/arm/translate.h |  4 ++
 target/arm/sme-fa64.decode | 90 ++
 target/arm/helper.c| 41 +
 target/arm/translate-a64.c | 40 -
 target/arm/translate-vfp.c | 12 +
 target/arm/translate.c |  2 +
 target/arm/meson.build |  1 +
 8 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 target/arm/sme-fa64.decode

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 4a4342f262..9e12669c12 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3146,6 +3146,11 @@ FIELD(TBFLAG_A32, HSTR_ACTIVE, 9, 1)
  * the same thing as the current security state of the processor!
  */
 FIELD(TBFLAG_A32, NS, 10, 1)
+/*
+ * Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not.
+ * This requires an SME trap from AArch32 mode when using NEON.
+ */
+FIELD(TBFLAG_A32, SME_TRAP_NONSTREAMING, 11, 1)
 
 /*
  * Bit usage when in AArch32 state, for M-profile only.
@@ -3183,6 +3188,8 @@ FIELD(TBFLAG_A64, SMEEXC_EL, 20, 2)
 FIELD(TBFLAG_A64, PSTATE_SM, 22, 1)
 FIELD(TBFLAG_A64, PSTATE_ZA, 23, 1)
 FIELD(TBFLAG_A64, SVL, 24, 4)
+/* Indicates that SME Streaming mode is active, and SMCR_ELx.FA64 is not. */
+FIELD(TBFLAG_A64, SME_TRAP_NONSTREAMING, 28, 1)
 
 /*
  * Helpers for using the above.
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 22fd882368..cbc907c751 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -102,6 +102,10 @@ typedef struct DisasContext {
 bool pstate_sm;
 /* True if PSTATE.ZA is set. */
 bool pstate_za;
+/* True if non-streaming insns should raise an SME Streaming exception. */
+bool sme_trap_nonstreaming;
+/* True if the current instruction is non-streaming. */
+bool is_nonstreaming;
 /* True if MVE insns are definitely not predicated by VPR or LTPSIZE */
 bool mve_no_pred;
 /*
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
new file mode 100644
index 00..3d90837fc7
--- /dev/null
+++ b/target/arm/sme-fa64.decode
@@ -0,0 +1,90 @@
+# AArch64 SME allowed instruction decoding
+#
+#  Copyright (c) 2022 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see .
+
+#
+# This file is processed by scripts/decodetree.py
+#
+
+# These patterns are taken from Appendix E1.1 of DDI0616 A.a,
+# Arm Architecture Reference Manual Supplement,
+# The Scalable Matrix Extension (SME), for Armv9-A
+
+{
+  [
+OK  0-00 1110  0001 0010 11--     # SMOV W|Xd,Vn.B[0]
+OK  0-00 1110  0010 0010 11--     # SMOV W|Xd,Vn.H[0]
+OK  0100 1110  0100 0010 11--     # SMOV Xd,Vn.S[0]
+OK   1110  0001 0011 11--     # UMOV Wd,Vn.B[0]
+OK   1110  0010 0011 11--     # UMOV Wd,Vn.H[0]
+OK   1110  0100 0011 11--     # UMOV Wd,Vn.S[0]
+OK  0100 1110  1000 0011 11--     # UMOV Xd,Vn.D[0]
+  ]
+  FAIL  0--0 111-         # Advanced SIMD vector 
operations
+}
+
+{
+  [
+OK  0101 1110 --1-  11-1 11--     # FMULX/FRECPS/FRSQRTS 
(scalar)
+OK  0101 1110 -10-  00-1 11--     # FMULX/FRECPS/FRSQRTS 
(scalar, FP16)
+OK  01-1 1110 1-10 0001 11-1 10--     # FRECPE/FRSQRTE/FRECPX 
(scalar)
+OK  01-1 1110  1001 11-1 10--     # FRECPE/FRSQRTE/FRECPX 
(scalar, FP16)
+  ]
+  FAIL  01-1 111-         # Advanced SIMD 
single-element operations
+}
+
+FAIL0-0

Re: [PATCH v8 06/20] job.h: define functions called without job lock held

2022-07-06 Thread Emanuele Giuseppe Esposito




Am 05/07/2022 um 12:53 schrieb Vladimir Sementsov-Ogievskiy:
> On 6/29/22 17:15, Emanuele Giuseppe Esposito wrote:
>> These functions don't need a _locked() counterpart, since
>> they are all called outside job.c and take the lock only
>> internally.
>>
>> Update also the comments in blockjob.c (and move them in job.c).
> 
> Still, that would be better as a separate patch.
> 
>>
>> Note: at this stage, job_{lock/unlock} and job lock guard macros
>> are *nop*.
>>
>> No functional change intended.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito 
>> ---
>>   blockjob.c | 20 
>>   include/qemu/job.h | 37 ++---
>>   job.c  | 15 +++
>>   3 files changed, 49 insertions(+), 23 deletions(-)
>>
>> diff --git a/blockjob.c b/blockjob.c
>> index 4868453d74..7da59a1f1c 100644
>> --- a/blockjob.c
>> +++ b/blockjob.c
>> @@ -36,21 +36,6 @@
>>   #include "qemu/main-loop.h"
>>   #include "qemu/timer.h"
>>   -/*
>> - * The block job API is composed of two categories of functions.
>> - *
>> - * The first includes functions used by the monitor.  The monitor is
>> - * peculiar in that it accesses the block job list with
>> block_job_get, and
>> - * therefore needs consistency across block_job_get and the actual
>> operation
>> - * (e.g. block_job_set_speed).  The consistency is achieved with
>> - * aio_context_acquire/release.  These functions are declared in
>> blockjob.h.
>> - *
>> - * The second includes functions used by the block job drivers and
>> sometimes
>> - * by the core block layer.  These do not care about locking, because
>> the
>> - * whole coroutine runs under the AioContext lock, and are declared in
>> - * blockjob_int.h.
>> - */
>> -
>>   static bool is_block_job(Job *job)
>>   {
>>   return job_type(job) == JOB_TYPE_BACKUP ||
>> @@ -433,11 +418,6 @@ static void block_job_event_ready(Notifier *n,
>> void *opaque)
>>   }
>>     -/*
>> - * API for block job drivers and the block layer.  These functions are
>> - * declared in blockjob_int.h.
>> - */
>> -
>>   void *block_job_create(const char *job_id, const BlockJobDriver
>> *driver,
>>  JobTxn *txn, BlockDriverState *bs, uint64_t
>> perm,
>>  uint64_t shared_perm, int64_t speed, int flags,
>> diff --git a/include/qemu/job.h b/include/qemu/job.h
>> index 99960cc9a3..b714236c1a 100644
>> --- a/include/qemu/job.h
>> +++ b/include/qemu/job.h
>> @@ -363,6 +363,7 @@ void job_txn_unref_locked(JobTxn *txn);
>>     /**
>>    * Create a new long-running job and return it.
>> + * Called with job_mutex *not* held.
>>    *
>>    * @job_id: The id of the newly-created job, or %NULL for internal jobs
>>    * @driver: The class object for the newly-created job.
>> @@ -400,6 +401,8 @@ void job_unref_locked(Job *job);
>>    * @done: How much progress the job made since the last call
>>    *
>>    * Updates the progress counter of the job.
>> + *
>> + * Progress API is thread safe.
> 
> This tell nothing for function user. Finally the whole job_ API will be
> thread safe, isn't it?
> 
> I think here we need simply "called with mutex not held". (Or even "may
> be called with mutex held or not held" if we need it, or just nothing)
> 
> and note about progress API should be somewhere in job.c, as that's
> implementation details.

What about "Progress API is thread safe. Can be called with job mutex
held or not"?

> 
[...]
> 
> I'd merge all new comments in job.h to the previous commit, as they are
> related to the questions risen by it.

I disagree, I think it will be a mess of functions again if we mix these
one that don't need the lock held and the ones that need it.

You understand it because you got the logic of this serie, but others
may not.

> 
> 
>>   void job_cancel_sync_all(void);
>>     /**
>> diff --git a/job.c b/job.c
>> index dd44fac8dd..7a3cc93f66 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -32,12 +32,27 @@
>>   #include "trace/trace-root.h"
>>   #include "qapi/qapi-events-job.h"
>>   +/*
>> + * The job API is composed of two categories of functions.
>> + *
>> + * The first includes functions used by the monitor.  The monitor is
>> + * peculiar in that it accesses the block job list with job_get, and
>> + * therefore needs consistency across job_get and the actual operation
>> + * (e.g. job_user_cancel). To achieve this consistency, the caller
>> + * calls job_lock/job_unlock itself around the whole operation.
>> + *
>> + *
>> + * The second includes functions used by the block job drivers and
>> sometimes
>> + * by the core block layer. These delegate the locking to the callee
>> instead.
>> + */
>> +
>>   /*
>>    * job_mutex protects the jobs list, but also makes the
>>    * struct job fields thread-safe.
>>    */
>>   QemuMutex job_mutex;
>>   +/* Protected by job_mutex */
>>   static QLIST_HEAD(, Job) jobs = QLIST_HEAD_INITIALIZER(jobs);
>>     /* Job State Transition Table */
> 
> 
> So the logic is: the function that doesn't ha

[PATCH v7 02/14] selftests/memfd: Add tests for F_SEAL_AUTO_ALLOCATE

2022-07-06 Thread Chao Peng

Add tests to verify sealing memfds with the F_SEAL_AUTO_ALLOCATE works
as expected.

Signed-off-by: Chao Peng 
---
 tools/testing/selftests/memfd/memfd_test.c | 166 +
 1 file changed, 166 insertions(+)

diff --git a/tools/testing/selftests/memfd/memfd_test.c 
b/tools/testing/selftests/memfd/memfd_test.c
index 94df2692e6e4..b849ece295fd 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -232,6 +233,31 @@ static void mfd_fail_open(int fd, int flags, mode_t mode)
}
 }
 
+static void mfd_assert_fallocate(int fd)
+{
+   int r;
+
+   r = fallocate(fd, 0, 0, mfd_def_size);
+   if (r < 0) {
+   printf("fallocate(ALLOC) failed: %m\n");
+   abort();
+   }
+}
+
+static void mfd_assert_punch_hole(int fd)
+{
+   int r;
+
+   r = fallocate(fd,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ 0,
+ mfd_def_size);
+   if (r < 0) {
+   printf("fallocate(PUNCH_HOLE) failed: %m\n");
+   abort();
+   }
+}
+
 static void mfd_assert_read(int fd)
 {
char buf[16];
@@ -594,6 +620,94 @@ static void mfd_fail_grow_write(int fd)
}
 }
 
+static void mfd_assert_hole_write(int fd)
+{
+   ssize_t l;
+   void *p;
+   char *p1;
+
+   /*
+* huegtlbfs does not support write, but we want to
+* verify everything else here.
+*/
+   if (!hugetlbfs_test) {
+   /* verify direct write() succeeds */
+   l = write(fd, "\0\0\0\0", 4);
+   if (l != 4) {
+   printf("write() failed: %m\n");
+   abort();
+   }
+   }
+
+   /* verify mmaped write succeeds */
+   p = mmap(NULL,
+mfd_def_size,
+PROT_READ | PROT_WRITE,
+MAP_SHARED,
+fd,
+0);
+   if (p == MAP_FAILED) {
+   printf("mmap() failed: %m\n");
+   abort();
+   }
+   p1 = (char *)p + mfd_def_size - 1;
+   *p1 = 'H';
+   if (*p1 != 'H') {
+   printf("mmaped write failed: %m\n");
+   abort();
+
+   }
+   munmap(p, mfd_def_size);
+}
+
+sigjmp_buf jbuf, *sigbuf;
+static void sig_handler(int sig, siginfo_t *siginfo, void *ptr)
+{
+   if (sig == SIGBUS) {
+   if (sigbuf)
+   siglongjmp(*sigbuf, 1);
+   abort();
+   }
+}
+
+static void mfd_fail_hole_write(int fd)
+{
+   ssize_t l;
+   void *p;
+   char *p1;
+
+   /* verify direct write() fails */
+   l = write(fd, "data", 4);
+   if (l > 0) {
+   printf("expected failure on write(), but got %d: %m\n", (int)l);
+   abort();
+   }
+
+   /* verify mmaped write fails */
+   p = mmap(NULL,
+mfd_def_size,
+PROT_READ | PROT_WRITE,
+MAP_SHARED,
+fd,
+0);
+   if (p == MAP_FAILED) {
+   printf("mmap() failed: %m\n");
+   abort();
+   }
+
+   sigbuf = &jbuf;
+   if (sigsetjmp(*sigbuf, 1))
+   goto out;
+
+   /* Below write should trigger SIGBUS signal */
+   p1 = (char *)p + mfd_def_size - 1;
+   *p1 = 'H';
+   printf("failed to receive SIGBUS for mmaped write: %m\n");
+   abort();
+out:
+   munmap(p, mfd_def_size);
+}
+
 static int idle_thread_fn(void *arg)
 {
sigset_t set;
@@ -880,6 +994,57 @@ static void test_seal_resize(void)
close(fd);
 }
 
+/*
+ * Test F_SEAL_AUTO_ALLOCATE
+ * Test whether F_SEAL_AUTO_ALLOCATE actually prevents allocation.
+ */
+static void test_seal_auto_allocate(void)
+{
+   struct sigaction act;
+   int fd;
+
+   printf("%s SEAL-AUTO-ALLOCATE\n", memfd_str);
+
+   memset(&act, 0, sizeof(act));
+   act.sa_sigaction = sig_handler;
+   act.sa_flags = SA_SIGINFO;
+   if (sigaction(SIGBUS, &act, 0)) {
+   printf("sigaction() failed: %m\n");
+   abort();
+   }
+
+   fd = mfd_assert_new("kern_memfd_seal_auto_allocate",
+   mfd_def_size,
+   MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+   /* read/write should pass if F_SEAL_AUTO_ALLOCATE not set */
+   mfd_assert_read(fd);
+   mfd_assert_hole_write(fd);
+
+   mfd_assert_has_seals(fd, 0);
+   mfd_assert_add_seals(fd, F_SEAL_AUTO_ALLOCATE);
+   mfd_assert_has_seals(fd, F_SEAL_AUTO_ALLOCATE);
+
+   /* read/write should pass for pre-allocated area */
+   mfd_assert_read(fd);
+   mfd_assert_hole_write(fd);
+
+   mfd_assert_punch_hole(fd);
+
+   /* read should pass, write should fail in hole */
+   mfd_assert_read(fd);
+   mfd_fail_hole_write(fd);
+
+   mfd_

[PATCH v7 03/14] mm: Introduce memfile_notifier

2022-07-06 Thread Chao Peng

This patch introduces memfile_notifier facility so existing memory file
subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
third kernel component to make use of memory bookmarked in the memory
file and gets notified when the pages in the memory file become
invalidated.

It will be used for KVM to use a file descriptor as the guest memory
backing store and KVM will use this memfile_notifier interface to
interact with memory file subsystems. In the future there might be other
consumers (e.g. VFIO with encrypted device memory).

It consists below components:
 - memfile_backing_store: Each supported memory file subsystem can be
   implemented as a memory backing store which bookmarks memory and
   provides callbacks for other kernel systems (memfile_notifier
   consumers) to interact with.
 - memfile_notifier: memfile_notifier consumers defines callbacks and
   associate them to a file using memfile_register_notifier().
 - memfile_node: A memfile_node is associated with the file (inode) from
   the backing store and includes feature flags and a list of registered
   memfile_notifier for notifying.

In KVM usages, userspace is in charge of guest memory lifecycle: it first
allocates pages in memory backing store and then passes the fd to KVM and
lets KVM register memory slot to memory backing store via
memfile_register_notifier.

Co-developed-by: Kirill A. Shutemov 
Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---
 include/linux/memfile_notifier.h |  93 
 mm/Kconfig   |   4 +
 mm/Makefile  |   1 +
 mm/memfile_notifier.c| 121 +++
 4 files changed, 219 insertions(+)
 create mode 100644 include/linux/memfile_notifier.h
 create mode 100644 mm/memfile_notifier.c

diff --git a/include/linux/memfile_notifier.h b/include/linux/memfile_notifier.h
new file mode 100644
index ..c5d66fd8ba53
--- /dev/null
+++ b/include/linux/memfile_notifier.h
@@ -0,0 +1,93 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MEMFILE_NOTIFIER_H
+#define _LINUX_MEMFILE_NOTIFIER_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* memory in the file is inaccessible from userspace (e.g. read/write/mmap) */
+#define MEMFILE_F_USER_INACCESSIBLEBIT(0)
+/* memory in the file is unmovable (e.g. via pagemigration)*/
+#define MEMFILE_F_UNMOVABLEBIT(1)
+/* memory in the file is unreclaimable (e.g. via kswapd) */
+#define MEMFILE_F_UNRECLAIMABLEBIT(2)
+
+#define MEMFILE_F_ALLOWED_MASK (MEMFILE_F_USER_INACCESSIBLE | \
+   MEMFILE_F_UNMOVABLE | \
+   MEMFILE_F_UNRECLAIMABLE)
+
+struct memfile_node {
+   struct list_headnotifiers;  /* registered notifiers */
+   unsigned long   flags;  /* MEMFILE_F_* flags */
+};
+
+struct memfile_backing_store {
+   struct list_head list;
+   spinlock_t lock;
+   struct memfile_node* (*lookup_memfile_node)(struct file *file);
+   int (*get_pfn)(struct file *file, pgoff_t offset, pfn_t *pfn,
+  int *order);
+   void (*put_pfn)(pfn_t pfn);
+};
+
+struct memfile_notifier;
+struct memfile_notifier_ops {
+   void (*invalidate)(struct memfile_notifier *notifier,
+  pgoff_t start, pgoff_t end);
+};
+
+struct memfile_notifier {
+   struct list_head list;
+   struct memfile_notifier_ops *ops;
+   struct memfile_backing_store *bs;
+};
+
+static inline void memfile_node_init(struct memfile_node *node)
+{
+   INIT_LIST_HEAD(&node->notifiers);
+   node->flags = 0;
+}
+
+#ifdef CONFIG_MEMFILE_NOTIFIER
+/* APIs for backing stores */
+extern void memfile_register_backing_store(struct memfile_backing_store *bs);
+extern int memfile_node_set_flags(struct file *file, unsigned long flags);
+extern void memfile_notifier_invalidate(struct memfile_node *node,
+   pgoff_t start, pgoff_t end);
+/*APIs for notifier consumers */
+extern int memfile_register_notifier(struct file *file, unsigned long flags,
+struct memfile_notifier *notifier);
+extern void memfile_unregister_notifier(struct memfile_notifier *notifier);
+
+#else /* !CONFIG_MEMFILE_NOTIFIER */
+static inline void memfile_register_backing_store(struct memfile_backing_store 
*bs)
+{
+}
+
+static inline int memfile_node_set_flags(struct file *file, unsigned long 
flags)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline void memfile_notifier_invalidate(struct memfile_node *node,
+  pgoff_t start, pgoff_t end)
+{
+}
+
+static inline int memfile_register_notifier(struct file *file,
+   unsigned long flags,
+   struct memfile_notifier *notifier)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline void m

[PATCH v5 09/45] target/arm: Mark SMMLA, UMMLA, USMMLA as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  1 -
 target/arm/translate-sve.c | 12 ++--
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index b5eaa2d0fa..3260ea2d64 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0100 0101 --0-  1001 10--     # SMMLA, UMMLA, USMMLA
 FAIL0100 0101 --1-  1---      # SVE2 string/histo/crypto 
instructions
 FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
 FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d5aad53923..9bbf44f008 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -7302,12 +7302,12 @@ TRANS_FEAT(FMLALT_zzxw, aa64_sve2, do_FMLAL_zzxw, a, 
false, true)
 TRANS_FEAT(FMLSLB_zzxw, aa64_sve2, do_FMLAL_zzxw, a, true, false)
 TRANS_FEAT(FMLSLT_zzxw, aa64_sve2, do_FMLAL_zzxw, a, true, true)
 
-TRANS_FEAT(SMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
-   gen_helper_gvec_smmla_b, a, 0)
-TRANS_FEAT(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
-   gen_helper_gvec_usmmla_b, a, 0)
-TRANS_FEAT(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
-   gen_helper_gvec_ummla_b, a, 0)
+TRANS_FEAT_NONSTREAMING(SMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
+gen_helper_gvec_smmla_b, a, 0)
+TRANS_FEAT_NONSTREAMING(USMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
+gen_helper_gvec_usmmla_b, a, 0)
+TRANS_FEAT_NONSTREAMING(UMMLA, aa64_sve_i8mm, gen_gvec_ool_arg_,
+gen_helper_gvec_ummla_b, a, 0)
 
 TRANS_FEAT(BFDOT_, aa64_sve_bf16, gen_gvec_ool_arg_,
gen_helper_gvec_bfdot, a, 0)
-- 
2.34.1

[PATCH v5 00/45] target/arm: Scalable Matrix Extension

2022-07-06 Thread Richard Henderson

Changes for v5:
  * Use macros for vertical tile slice addressing.
  * Other misc adjustments per review.

Patches without r-b:
  03-target-arm-Trap-non-streaming-usage-when-Streamin.patch
  07-target-arm-Mark-PMULL-FMMLA-as-non-streaming.patch
  19-target-arm-Implement-SME-MOVA.patch
  20-target-arm-Implement-SME-LD1-ST1.patch
  23-target-arm-Implement-SME-ADDHA-ADDVA.patch
  24-target-arm-Implement-FMOPA-FMOPS-non-widening.patch
  25-target-arm-Implement-BFMOPA-BFMOPS.patch
  26-target-arm-Implement-FMOPA-FMOPS-widening.patch
  35-linux-user-aarch64-Add-SM-bit-to-SVE-signal-conte.patch
  37-linux-user-aarch64-Do-not-allow-duplicate-or-shor.patch


r~


Richard Henderson (45):
  target/arm: Handle SME in aarch64_cpu_dump_state
  target/arm: Add infrastructure for disas_sme
  target/arm: Trap non-streaming usage when Streaming SVE is active
  target/arm: Mark ADR as non-streaming
  target/arm: Mark RDFFR, WRFFR, SETFFR as non-streaming
  target/arm: Mark BDEP, BEXT, BGRP, COMPACT, FEXPA, FTSSEL as
non-streaming
  target/arm: Mark PMULL, FMMLA as non-streaming
  target/arm: Mark FTSMUL, FTMAD, FADDA as non-streaming
  target/arm: Mark SMMLA, UMMLA, USMMLA as non-streaming
  target/arm: Mark string/histo/crypto as non-streaming
  target/arm: Mark gather/scatter load/store as non-streaming
  target/arm: Mark gather prefetch as non-streaming
  target/arm: Mark LDFF1 and LDNF1 as non-streaming
  target/arm: Mark LD1RO as non-streaming
  target/arm: Add SME enablement checks
  target/arm: Handle SME in sve_access_check
  target/arm: Implement SME RDSVL, ADDSVL, ADDSPL
  target/arm: Implement SME ZERO
  target/arm: Implement SME MOVA
  target/arm: Implement SME LD1, ST1
  target/arm: Export unpredicated ld/st from translate-sve.c
  target/arm: Implement SME LDR, STR
  target/arm: Implement SME ADDHA, ADDVA
  target/arm: Implement FMOPA, FMOPS (non-widening)
  target/arm: Implement BFMOPA, BFMOPS
  target/arm: Implement FMOPA, FMOPS (widening)
  target/arm: Implement SME integer outer product
  target/arm: Implement PSEL
  target/arm: Implement REVD
  target/arm: Implement SCLAMP, UCLAMP
  target/arm: Reset streaming sve state on exception boundaries
  target/arm: Enable SME for -cpu max
  linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS
  linux-user/aarch64: Reset PSTATE.SM on syscalls
  linux-user/aarch64: Add SM bit to SVE signal context
  linux-user/aarch64: Tidy target_restore_sigframe error return
  linux-user/aarch64: Do not allow duplicate or short sve records
  linux-user/aarch64: Verify extra record lock succeeded
  linux-user/aarch64: Move sve record checks into restore
  linux-user/aarch64: Implement SME signal handling
  linux-user: Rename sve prctls
  linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL
  target/arm: Only set ZEN in reset if SVE present
  target/arm: Enable SME for user-only
  linux-user/aarch64: Add SME related hwcap entries

 docs/system/arm/emulation.rst |4 +
 linux-user/aarch64/target_cpu.h   |5 +-
 linux-user/aarch64/target_prctl.h |   56 +-
 target/arm/cpu.h  |7 +
 target/arm/helper-sme.h   |  126 
 target/arm/helper-sve.h   |4 +
 target/arm/helper.h   |   18 +
 target/arm/translate-a64.h|   45 ++
 target/arm/translate.h|   16 +
 target/arm/sme-fa64.decode|   60 ++
 target/arm/sme.decode |   88 +++
 target/arm/sve.decode |   41 +-
 linux-user/aarch64/cpu_loop.c |9 +
 linux-user/aarch64/signal.c   |  243 ++-
 linux-user/elfload.c  |   20 +
 linux-user/syscall.c  |   28 +-
 target/arm/cpu.c  |   35 +-
 target/arm/cpu64.c|   11 +
 target/arm/helper.c   |   56 +-
 target/arm/sme_helper.c   | 1124 +
 target/arm/sve_helper.c   |   28 +
 target/arm/translate-a64.c|  103 ++-
 target/arm/translate-sme.c|  373 ++
 target/arm/translate-sve.c|  393 --
 target/arm/translate-vfp.c|   12 +
 target/arm/translate.c|2 +
 target/arm/vec_helper.c   |   24 +
 target/arm/meson.build|3 +
 28 files changed, 2799 insertions(+), 135 deletions(-)
 create mode 100644 target/arm/sme-fa64.decode
 create mode 100644 target/arm/sme.decode
 create mode 100644 target/arm/translate-sme.c

-- 
2.34.1

[PATCH v7 04/14] mm/shmem: Support memfile_notifier

2022-07-06 Thread Chao Peng

From: "Kirill A. Shutemov" 

Implement shmem as a memfile_notifier backing store. Essentially it
interacts with the memfile_notifier feature flags for userspace
access/page migration/page reclaiming and implements the necessary
memfile_backing_store callbacks.

Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---
 include/linux/shmem_fs.h |   2 +
 mm/shmem.c   | 109 ++-
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index a68f982f22d1..6031c0b08d26 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* inode in-kernel data */
 
@@ -25,6 +26,7 @@ struct shmem_inode_info {
struct simple_xattrsxattrs; /* list of xattrs */
atomic_tstop_eviction;  /* hold when working on inode */
struct timespec64   i_crtime;   /* file creation time */
+   struct memfile_node memfile_node;   /* memfile node */
struct inodevfs_inode;
 };
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 6c8aef15a17d..627e315c3b4d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -905,6 +905,17 @@ static struct folio *shmem_get_partial_folio(struct inode 
*inode, pgoff_t index)
return page ? page_folio(page) : NULL;
 }
 
+static void notify_invalidate(struct inode *inode, struct folio *folio,
+  pgoff_t start, pgoff_t end)
+{
+   struct shmem_inode_info *info = SHMEM_I(inode);
+
+   start = max(start, folio->index);
+   end = min(end, folio->index + folio_nr_pages(folio));
+
+   memfile_notifier_invalidate(&info->memfile_node, start, end);
+}
+
 /*
  * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -948,6 +959,8 @@ static void shmem_undo_range(struct inode *inode, loff_t 
lstart, loff_t lend,
}
index += folio_nr_pages(folio) - 1;
 
+   notify_invalidate(inode, folio, start, end);
+
if (!unfalloc || !folio_test_uptodate(folio))
truncate_inode_folio(mapping, folio);
folio_unlock(folio);
@@ -1021,6 +1034,9 @@ static void shmem_undo_range(struct inode *inode, loff_t 
lstart, loff_t lend,
index--;
break;
}
+
+   notify_invalidate(inode, folio, start, end);
+
VM_BUG_ON_FOLIO(folio_test_writeback(folio),
folio);
truncate_inode_folio(mapping, folio);
@@ -1092,6 +1108,13 @@ static int shmem_setattr(struct user_namespace 
*mnt_userns,
(newsize > oldsize && (info->seals & F_SEAL_GROW)))
return -EPERM;
 
+   if (info->memfile_node.flags & MEMFILE_F_USER_INACCESSIBLE) {
+   if (oldsize)
+   return -EPERM;
+   if (!PAGE_ALIGNED(newsize))
+   return -EINVAL;
+   }
+
if (newsize != oldsize) {
error = shmem_reacct_size(SHMEM_I(inode)->flags,
oldsize, newsize);
@@ -1336,6 +1359,8 @@ static int shmem_writepage(struct page *page, struct 
writeback_control *wbc)
goto redirty;
if (!total_swap_pages)
goto redirty;
+   if (info->memfile_node.flags & MEMFILE_F_UNRECLAIMABLE)
+   goto redirty;
 
/*
 * Our capabilities prevent regular writeback or sync from ever calling
@@ -2271,6 +2296,9 @@ static int shmem_mmap(struct file *file, struct 
vm_area_struct *vma)
if (ret)
return ret;
 
+   if (info->memfile_node.flags & MEMFILE_F_USER_INACCESSIBLE)
+   return -EPERM;
+
/* arm64 - allow memory tagging on RAM-based files */
vma->vm_flags |= VM_MTE_ALLOWED;
 
@@ -2306,6 +2334,7 @@ static struct inode *shmem_get_inode(struct super_block 
*sb, const struct inode
info->i_crtime = inode->i_mtime;
INIT_LIST_HEAD(&info->shrinklist);
INIT_LIST_HEAD(&info->swaplist);
+   memfile_node_init(&info->memfile_node);
simple_xattrs_init(&info->xattrs);
cache_no_acl(inode);
mapping_set_large_folios(inode->i_mapping);
@@ -2477,6 +2506,8 @@ shmem_write_begin(struct file *file, struct address_space 
*mapping,
if ((info->seals & F_SEAL_GROW) && pos + len > inode->i_size)
return -EPERM;
}
+   if (unlikely(info->memfile_node.flags & MEMFILE_F_USER_

[PATCH v5 08/45] target/arm: Mark FTSMUL, FTMAD, FADDA as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  3 ---
 target/arm/translate-sve.c | 15 +++
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 4ff2df82e5..b5eaa2d0fa 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,9 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0110 0101 --0-   11--     # FTSMUL
-FAIL0110 0101 --01 0--- 100-      # FTMAD
-FAIL0110 0101 --01 1--- 001-      # FADDA
 FAIL0100 0101 --0-  1001 10--     # SMMLA, UMMLA, USMMLA
 FAIL0100 0101 --1-  1---      # SVE2 string/histo/crypto 
instructions
 FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 4ff2102fc8..d5aad53923 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -3861,9 +3861,9 @@ static gen_helper_gvec_3_ptr * const ftmad_fns[4] = {
 NULL,   gen_helper_sve_ftmad_h,
 gen_helper_sve_ftmad_s, gen_helper_sve_ftmad_d,
 };
-TRANS_FEAT(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
-   ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
-   a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
+TRANS_FEAT_NONSTREAMING(FTMAD, aa64_sve, gen_gvec_fpst_zzz,
+ftmad_fns[a->esz], a->rd, a->rn, a->rm, a->imm,
+a->esz == MO_16 ? FPST_FPCR_F16 : FPST_FPCR)
 
 /*
  *** SVE Floating Point Accumulating Reduction Group
@@ -3886,6 +3886,7 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz *a)
 if (a->esz == 0 || !dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -3923,12 +3924,18 @@ static bool trans_FADDA(DisasContext *s, arg_rprr_esz 
*a)
 DO_FP3(FADD_zzz, fadd)
 DO_FP3(FSUB_zzz, fsub)
 DO_FP3(FMUL_zzz, fmul)
-DO_FP3(FTSMUL, ftsmul)
 DO_FP3(FRECPS, recps)
 DO_FP3(FRSQRTS, rsqrts)
 
 #undef DO_FP3
 
+static gen_helper_gvec_3_ptr * const ftsmul_fns[4] = {
+NULL, gen_helper_gvec_ftsmul_h,
+gen_helper_gvec_ftsmul_s, gen_helper_gvec_ftsmul_d
+};
+TRANS_FEAT_NONSTREAMING(FTSMUL, aa64_sve, gen_gvec_fpst_arg_zzz,
+ftsmul_fns[a->esz], a, 0)
+
 /*
  *** SVE Floating Point Arithmetic - Predicated Group
  */
-- 
2.34.1

[PATCH v5 02/45] target/arm: Add infrastructure for disas_sme

2022-07-06 Thread Richard Henderson

This includes the build rules for the decoder, and the
new file for translation, but excludes any instructions.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.h |  1 +
 target/arm/sme.decode  | 20 
 target/arm/translate-a64.c |  7 ++-
 target/arm/translate-sme.c | 35 +++
 target/arm/meson.build |  2 ++
 5 files changed, 64 insertions(+), 1 deletion(-)
 create mode 100644 target/arm/sme.decode
 create mode 100644 target/arm/translate-sme.c

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index f0970c6b8c..789b6e8e78 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -146,6 +146,7 @@ static inline int pred_gvec_reg_size(DisasContext *s)
 }
 
 bool disas_sve(DisasContext *, uint32_t);
+bool disas_sme(DisasContext *, uint32_t);
 
 void gen_gvec_rax1(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
uint32_t rm_ofs, uint32_t opr_sz, uint32_t max_sz);
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
new file mode 100644
index 00..c25c031a71
--- /dev/null
+++ b/target/arm/sme.decode
@@ -0,0 +1,20 @@
+# AArch64 SME instruction descriptions
+#
+#  Copyright (c) 2022 Linaro, Ltd
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, see .
+
+#
+# This file is processed by scripts/decodetree.py
+#
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c86b97b1d4..a5f8a6c771 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14806,7 +14806,12 @@ static void aarch64_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 }
 
 switch (extract32(insn, 25, 4)) {
-case 0x0: case 0x1: case 0x3: /* UNALLOCATED */
+case 0x0:
+if (!extract32(insn, 31, 1) || !disas_sme(s, insn)) {
+unallocated_encoding(s);
+}
+break;
+case 0x1: case 0x3: /* UNALLOCATED */
 unallocated_encoding(s);
 break;
 case 0x2:
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
new file mode 100644
index 00..786c93fb2d
--- /dev/null
+++ b/target/arm/translate-sme.c
@@ -0,0 +1,35 @@
+/*
+ * AArch64 SME translation
+ *
+ * Copyright (c) 2022 Linaro, Ltd
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "tcg/tcg-op.h"
+#include "tcg/tcg-op-gvec.h"
+#include "tcg/tcg-gvec-desc.h"
+#include "translate.h"
+#include "exec/helper-gen.h"
+#include "translate-a64.h"
+#include "fpu/softfloat.h"
+
+
+/*
+ * Include the generated decoder.
+ */
+
+#include "decode-sme.c.inc"
diff --git a/target/arm/meson.build b/target/arm/meson.build
index 43dc600547..6dd7e93643 100644
--- a/target/arm/meson.build
+++ b/target/arm/meson.build
@@ -1,5 +1,6 @@
 gen = [
   decodetree.process('sve.decode', extra_args: '--decode=disas_sve'),
+  decodetree.process('sme.decode', extra_args: '--decode=disas_sme'),
   decodetree.process('neon-shared.decode', extra_args: 
'--decode=disas_neon_shared'),
   decodetree.process('neon-dp.decode', extra_args: '--decode=disas_neon_dp'),
   decodetree.process('neon-ls.decode', extra_args: '--decode=disas_neon_ls'),
@@ -50,6 +51,7 @@ arm_ss.add(when: 'TARGET_AARCH64', if_true: files(
   'sme_helper.c',
   'translate-a64.c',
   'translate-sve.c',
+  'translate-sme.c',
 ))
 
 arm_softmmu_ss = ss.source_set()
-- 
2.34.1

[PATCH v5 04/45] target/arm: Mark ADR as non-streaming

2022-07-06 Thread Richard Henderson

Mark ADR as a non-streaming instruction, which should trap
if full a64 support is not enabled in streaming mode.

Removing entries from sme-fa64.decode is an easy way to see
what remains to be done.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate.h | 7 +++
 target/arm/sme-fa64.decode | 1 -
 target/arm/translate-sve.c | 8 
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index cbc907c751..e2e619dab2 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -566,4 +566,11 @@ uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
 static bool trans_##NAME(DisasContext *s, arg_##NAME *a) \
 { return dc_isar_feature(FEAT, s) && FUNC(s, __VA_ARGS__); }
 
+#define TRANS_FEAT_NONSTREAMING(NAME, FEAT, FUNC, ...)\
+static bool trans_##NAME(DisasContext *s, arg_##NAME *a)  \
+{ \
+s->is_nonstreaming = true;\
+return dc_isar_feature(FEAT, s) && FUNC(s, __VA_ARGS__);  \
+}
+
 #endif /* TARGET_ARM_TRANSLATE_H */
diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 3d90837fc7..73c71abc46 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL 0100 --1-  1010      # ADR
 FAIL 0100 --1-  1011 -0--     # FTSSEL, FEXPA
 FAIL 0101 --10 0001 100-      # COMPACT
 FAIL0010 0101 --01 100-  000- ---0    # RDFFR, RDFFRS
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 62b5f3040c..5d1db0d3ff 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1320,10 +1320,10 @@ static bool do_adr(DisasContext *s, arg_rrri *a, 
gen_helper_gvec_3 *fn)
 return gen_gvec_ool_zzz(s, fn, a->rd, a->rn, a->rm, a->imm);
 }
 
-TRANS_FEAT(ADR_p32, aa64_sve, do_adr, a, gen_helper_sve_adr_p32)
-TRANS_FEAT(ADR_p64, aa64_sve, do_adr, a, gen_helper_sve_adr_p64)
-TRANS_FEAT(ADR_s32, aa64_sve, do_adr, a, gen_helper_sve_adr_s32)
-TRANS_FEAT(ADR_u32, aa64_sve, do_adr, a, gen_helper_sve_adr_u32)
+TRANS_FEAT_NONSTREAMING(ADR_p32, aa64_sve, do_adr, a, gen_helper_sve_adr_p32)
+TRANS_FEAT_NONSTREAMING(ADR_p64, aa64_sve, do_adr, a, gen_helper_sve_adr_p64)
+TRANS_FEAT_NONSTREAMING(ADR_s32, aa64_sve, do_adr, a, gen_helper_sve_adr_s32)
+TRANS_FEAT_NONSTREAMING(ADR_u32, aa64_sve, do_adr, a, gen_helper_sve_adr_u32)
 
 /*
  *** SVE Integer Misc - Unpredicated Group
-- 
2.34.1

[PATCH v5 07/45] target/arm: Mark PMULL, FMMLA as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  2 --
 target/arm/translate-sve.c | 24 +++-
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 4f515939d9..4ff2df82e5 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,8 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0100 0101 000-  0110 1---     # PMULLB, PMULLT (128b 
result)
-FAIL0110 0100 --1-  1110 01--     # FMMLA, BFMMLA
 FAIL0110 0101 --0-   11--     # FTSMUL
 FAIL0110 0101 --01 0--- 100-      # FTMAD
 FAIL0110 0101 --01 1--- 001-      # FADDA
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index ae48040aa4..4ff2102fc8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -6186,9 +6186,13 @@ static bool do_trans_pmull(DisasContext *s, arg_rrr_esz 
*a, bool sel)
 gen_helper_gvec_pmull_q, gen_helper_sve2_pmull_h,
 NULL,gen_helper_sve2_pmull_d,
 };
-if (a->esz == 0
-? !dc_isar_feature(aa64_sve2_pmull128, s)
-: !dc_isar_feature(aa64_sve, s)) {
+
+if (a->esz == 0) {
+if (!dc_isar_feature(aa64_sve2_pmull128, s)) {
+return false;
+}
+s->is_nonstreaming = true;
+} else if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
 return gen_gvec_ool_arg_zzz(s, fns[a->esz], a, sel);
@@ -7125,10 +7129,12 @@ DO_ZPZZ_FP(FMINP, aa64_sve2, sve2_fminp_zpzz)
  * SVE Integer Multiply-Add (unpredicated)
  */
 
-TRANS_FEAT(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_, gen_helper_fmmla_s,
-   a->rd, a->rn, a->rm, a->ra, 0, FPST_FPCR)
-TRANS_FEAT(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_, gen_helper_fmmla_d,
-   a->rd, a->rn, a->rm, a->ra, 0, FPST_FPCR)
+TRANS_FEAT_NONSTREAMING(FMMLA_s, aa64_sve_f32mm, gen_gvec_fpst_,
+gen_helper_fmmla_s, a->rd, a->rn, a->rm, a->ra,
+0, FPST_FPCR)
+TRANS_FEAT_NONSTREAMING(FMMLA_d, aa64_sve_f64mm, gen_gvec_fpst_,
+gen_helper_fmmla_d, a->rd, a->rn, a->rm, a->ra,
+0, FPST_FPCR)
 
 static gen_helper_gvec_4 * const sqdmlal_zzzw_fns[] = {
 NULL,   gen_helper_sve2_sqdmlal_zzzw_h,
@@ -7301,8 +7307,8 @@ TRANS_FEAT(BFDOT_, aa64_sve_bf16, 
gen_gvec_ool_arg_,
 TRANS_FEAT(BFDOT_zzxz, aa64_sve_bf16, gen_gvec_ool_arg_zzxz,
gen_helper_gvec_bfdot_idx, a)
 
-TRANS_FEAT(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_,
-   gen_helper_gvec_bfmmla, a, 0)
+TRANS_FEAT_NONSTREAMING(BFMMLA, aa64_sve_bf16, gen_gvec_ool_arg_,
+gen_helper_gvec_bfmmla, a, 0)
 
 static bool do_BFMLAL_zzzw(DisasContext *s, arg__esz *a, bool sel)
 {
-- 
2.34.1

[PATCH v5 05/45] target/arm: Mark RDFFR, WRFFR, SETFFR as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 2 --
 target/arm/translate-sve.c | 9 ++---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 73c71abc46..fa2b5cbf1a 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -61,8 +61,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 
 FAIL 0100 --1-  1011 -0--     # FTSSEL, FEXPA
 FAIL 0101 --10 0001 100-      # COMPACT
-FAIL0010 0101 --01 100-  000- ---0    # RDFFR, RDFFRS
-FAIL0010 0101 --10 1--- 1001      # WRFFR, SETFFR
 FAIL0100 0101 --0-  1011      # BDEP, BEXT, BGRP
 FAIL0100 0101 000-  0110 1---     # PMULLB, PMULLT (128b 
result)
 FAIL0110 0100 --1-  1110 01--     # FMMLA, BFMMLA
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 5d1db0d3ff..d6faec15fe 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1785,7 +1785,8 @@ static bool do_predset(DisasContext *s, int esz, int rd, 
int pat, bool setflag)
 TRANS_FEAT(PTRUE, aa64_sve, do_predset, a->esz, a->rd, a->pat, a->s)
 
 /* Note pat == 31 is #all, to set all elements.  */
-TRANS_FEAT(SETFFR, aa64_sve, do_predset, 0, FFR_PRED_NUM, 31, false)
+TRANS_FEAT_NONSTREAMING(SETFFR, aa64_sve,
+do_predset, 0, FFR_PRED_NUM, 31, false)
 
 /* Note pat == 32 is #unimp, to set no elements.  */
 TRANS_FEAT(PFALSE, aa64_sve, do_predset, 0, a->rd, 32, false)
@@ -1799,11 +1800,13 @@ static bool trans_RDFFR_p(DisasContext *s, arg_RDFFR_p 
*a)
 .rd = a->rd, .pg = a->pg, .s = a->s,
 .rn = FFR_PRED_NUM, .rm = FFR_PRED_NUM,
 };
+
+s->is_nonstreaming = true;
 return trans_AND_(s, &alt_a);
 }
 
-TRANS_FEAT(RDFFR, aa64_sve, do_mov_p, a->rd, FFR_PRED_NUM)
-TRANS_FEAT(WRFFR, aa64_sve, do_mov_p, FFR_PRED_NUM, a->rn)
+TRANS_FEAT_NONSTREAMING(RDFFR, aa64_sve, do_mov_p, a->rd, FFR_PRED_NUM)
+TRANS_FEAT_NONSTREAMING(WRFFR, aa64_sve, do_mov_p, FFR_PRED_NUM, a->rn)
 
 static bool do_pfirst_pnext(DisasContext *s, arg_rr_esz *a,
 void (*gen_fn)(TCGv_i32, TCGv_ptr,
-- 
2.34.1

[PATCH v5 14/45] target/arm: Mark LD1RO as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 3 ---
 target/arm/translate-sve.c | 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 2b5432bf85..47708ccc8d 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -58,6 +58,3 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --0-        # Load/store FP register 
(unscaled imm)
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
-
-FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
-FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 5182ee4c06..96e934c1ea 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5062,6 +5062,7 @@ static bool trans_LD1RO_zprr(DisasContext *s, 
arg_rprr_load *a)
 if (a->rm == 31) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 TCGv_i64 addr = new_tmp_a64(s);
 tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
@@ -5076,6 +5077,7 @@ static bool trans_LD1RO_zpri(DisasContext *s, 
arg_rpri_load *a)
 if (!dc_isar_feature(aa64_sve_f64mm, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 TCGv_i64 addr = new_tmp_a64(s);
 tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), a->imm * 32);
-- 
2.34.1

[PATCH v7 05/14] mm/memfd: Introduce MFD_INACCESSIBLE flag

2022-07-06 Thread Chao Peng

Introduce a new memfd_create() flag indicating the content of the
created memfd is inaccessible from userspace through ordinary MMU
access (e.g., read/write/mmap). However, the file content can be
accessed via a different mechanism (e.g. KVM MMU) indirectly.

It provides semantics required for KVM guest private memory support
that a file descriptor with this flag set is going to be used as the
source of guest memory in confidential computing environments such
as Intel TDX/AMD SEV but may not be accessible from host userspace.

The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
also impossible for a memfd created with this flag.

Signed-off-by: Chao Peng 
---
 include/uapi/linux/memfd.h |  1 +
 mm/memfd.c | 15 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
index 7a8a26751c23..48750474b904 100644
--- a/include/uapi/linux/memfd.h
+++ b/include/uapi/linux/memfd.h
@@ -8,6 +8,7 @@
 #define MFD_CLOEXEC0x0001U
 #define MFD_ALLOW_SEALING  0x0002U
 #define MFD_HUGETLB0x0004U
+#define MFD_INACCESSIBLE   0x0008U
 
 /*
  * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
diff --git a/mm/memfd.c b/mm/memfd.c
index 2afd898798e4..72d7139ccced 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -262,7 +263,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
 #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
 #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
 
-#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB)
+#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \
+  MFD_INACCESSIBLE)
 
 SYSCALL_DEFINE2(memfd_create,
const char __user *, uname,
@@ -284,6 +286,10 @@ SYSCALL_DEFINE2(memfd_create,
return -EINVAL;
}
 
+   /* Disallow sealing when MFD_INACCESSIBLE is set. */
+   if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING)
+   return -EINVAL;
+
/* length includes terminating zero */
len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
if (len <= 0)
@@ -330,12 +336,19 @@ SYSCALL_DEFINE2(memfd_create,
if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
*file_seals &= ~F_SEAL_SEAL;
+   } else if (flags & MFD_INACCESSIBLE) {
+   error = memfile_node_set_flags(file,
+  MEMFILE_F_USER_INACCESSIBLE);
+   if (error)
+   goto err_file;
}
 
fd_install(fd, file);
kfree(name);
return fd;
 
+err_file:
+   fput(file);
 err_fd:
put_unused_fd(fd);
 err_name:
-- 
2.25.1

[PATCH v5 15/45] target/arm: Add SME enablement checks

2022-07-06 Thread Richard Henderson

These functions will be used to verify that the cpu
is in the correct state for a given instruction.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.h | 21 +
 target/arm/translate-a64.c | 34 ++
 2 files changed, 55 insertions(+)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 789b6e8e78..02fb95e019 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -29,6 +29,27 @@ void write_fp_dreg(DisasContext *s, int reg, TCGv_i64 v);
 bool logic_imm_decode_wmask(uint64_t *result, unsigned int immn,
 unsigned int imms, unsigned int immr);
 bool sve_access_check(DisasContext *s);
+bool sme_enabled_check(DisasContext *s);
+bool sme_enabled_check_with_svcr(DisasContext *s, unsigned);
+
+/* This function corresponds to CheckStreamingSVEEnabled. */
+static inline bool sme_sm_enabled_check(DisasContext *s)
+{
+return sme_enabled_check_with_svcr(s, R_SVCR_SM_MASK);
+}
+
+/* This function corresponds to CheckSMEAndZAEnabled. */
+static inline bool sme_za_enabled_check(DisasContext *s)
+{
+return sme_enabled_check_with_svcr(s, R_SVCR_ZA_MASK);
+}
+
+/* Note that this function corresponds to CheckStreamingSVEAndZAEnabled. */
+static inline bool sme_smza_enabled_check(DisasContext *s)
+{
+return sme_enabled_check_with_svcr(s, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
+}
+
 TCGv_i64 clean_data_tbi(DisasContext *s, TCGv_i64 addr);
 TCGv_i64 gen_mte_check1(DisasContext *s, TCGv_i64 addr, bool is_write,
 bool tag_checked, int log2_size);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7fab7f64f8..b16d81bf19 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1216,6 +1216,40 @@ static bool sme_access_check(DisasContext *s)
 return true;
 }
 
+/* This function corresponds to CheckSMEEnabled. */
+bool sme_enabled_check(DisasContext *s)
+{
+/*
+ * Note that unlike sve_excp_el, we have not constrained sme_excp_el
+ * to be zero when fp_excp_el has priority.  This is because we need
+ * sme_excp_el by itself for cpregs access checks.
+ */
+if (!s->fp_excp_el || s->sme_excp_el < s->fp_excp_el) {
+s->fp_access_checked = true;
+return sme_access_check(s);
+}
+return fp_access_check_only(s);
+}
+
+/* Common subroutine for CheckSMEAnd*Enabled. */
+bool sme_enabled_check_with_svcr(DisasContext *s, unsigned req)
+{
+if (!sme_enabled_check(s)) {
+return false;
+}
+if (FIELD_EX64(req, SVCR, SM) && !s->pstate_sm) {
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_smetrap(SME_ET_NotStreaming, false));
+return false;
+}
+if (FIELD_EX64(req, SVCR, ZA) && !s->pstate_za) {
+gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+   syn_smetrap(SME_ET_InactiveZA, false));
+return false;
+}
+return true;
+}
+
 /*
  * This utility function is for doing register extension with an
  * optional shift. You will likely want to pass a temporary for the
-- 
2.34.1

[PATCH v5 11/45] target/arm: Mark gather/scatter load/store as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 9 -
 target/arm/translate-sve.c | 6 ++
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index fe462d2ccc..1acc3ae080 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,19 +59,10 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
 FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
 FAIL1000 0100 0-1-  0---      # SVE 32-bit gather prefetch 
(scalar+vector)
-FAIL1000 010- -01-  1---      # SVE 32-bit gather load 
(vector+imm)
-FAIL1000 0100 0-0-  0---      # SVE 32-bit gather load 
byte (scalar+vector)
-FAIL1000 0100 1---  0---      # SVE 32-bit gather load 
half (scalar+vector)
-FAIL1000 0101 0---  0---      # SVE 32-bit gather load 
word (scalar+vector)
 FAIL1010 010-   011-      # SVE contiguous FF load 
(scalar+scalar)
 FAIL1010 010- ---1  101-      # SVE contiguous NF load 
(scalar+imm)
 FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
 FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
 FAIL1100 010-         # SVE 64-bit gather 
load/prefetch
-FAIL1110 010- -00-  001-      # SVE2 64-bit scatter NT 
store (vector+scalar)
-FAIL1110 010- -10-  001-      # SVE2 32-bit scatter NT 
store (vector+scalar)
-FAIL1110 010-   1-0-      # SVE scatter store 
(scalar+32-bit vector)
-FAIL1110 010-   101-      # SVE scatter store (misc)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index f8e0716474..b23c6aa0bf 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5669,6 +5669,7 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5700,6 +5701,7 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5734,6 +5736,7 @@ static bool trans_LDNT1_zprz(DisasContext *s, 
arg_LD1_zprz *a)
 if (!dc_isar_feature(aa64_sve2, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5857,6 +5860,7 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5887,6 +5891,7 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz 
*a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
@@ -5921,6 +5926,7 @@ static bool trans_STNT1_zprz(DisasContext *s, 
arg_ST1_zprz *a)
 if (!dc_isar_feature(aa64_sve2, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (!sve_access_check(s)) {
 return true;
 }
-- 
2.34.1

[PATCH v5 13/45] target/arm: Mark LDFF1 and LDNF1 as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode | 2 --
 target/arm/translate-sve.c | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 7d4c33fb5b..2b5432bf85 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,5 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL1010 010-   011-      # SVE contiguous FF load 
(scalar+scalar)
-FAIL1010 010- ---1  101-      # SVE contiguous NF load 
(scalar+imm)
 FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
 FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index bbf3bf2119..5182ee4c06 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4805,6 +4805,7 @@ static bool trans_LDFF1_zprr(DisasContext *s, 
arg_rprr_load *a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 TCGv_i64 addr = new_tmp_a64(s);
 tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
@@ -4906,6 +4907,7 @@ static bool trans_LDNF1_zpri(DisasContext *s, 
arg_rpri_load *a)
 if (!dc_isar_feature(aa64_sve, s)) {
 return false;
 }
+s->is_nonstreaming = true;
 if (sve_access_check(s)) {
 int vsz = vec_full_reg_size(s);
 int elements = vsz >> dtype_esz[a->dtype];
-- 
2.34.1

[PATCH v5 06/45] target/arm: Mark BDEP, BEXT, BGRP, COMPACT, FEXPA, FTSSEL as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  3 ---
 target/arm/translate-sve.c | 22 --
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index fa2b5cbf1a..4f515939d9 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,9 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL 0100 --1-  1011 -0--     # FTSSEL, FEXPA
-FAIL 0101 --10 0001 100-      # COMPACT
-FAIL0100 0101 --0-  1011      # BDEP, BEXT, BGRP
 FAIL0100 0101 000-  0110 1---     # PMULLB, PMULLT (128b 
result)
 FAIL0110 0100 --1-  1110 01--     # FMMLA, BFMMLA
 FAIL0110 0101 --0-   11--     # FTSMUL
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d6faec15fe..ae48040aa4 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1333,14 +1333,15 @@ static gen_helper_gvec_2 * const fexpa_fns[4] = {
 NULL,   gen_helper_sve_fexpa_h,
 gen_helper_sve_fexpa_s, gen_helper_sve_fexpa_d,
 };
-TRANS_FEAT(FEXPA, aa64_sve, gen_gvec_ool_zz,
-   fexpa_fns[a->esz], a->rd, a->rn, 0)
+TRANS_FEAT_NONSTREAMING(FEXPA, aa64_sve, gen_gvec_ool_zz,
+fexpa_fns[a->esz], a->rd, a->rn, 0)
 
 static gen_helper_gvec_3 * const ftssel_fns[4] = {
 NULL,gen_helper_sve_ftssel_h,
 gen_helper_sve_ftssel_s, gen_helper_sve_ftssel_d,
 };
-TRANS_FEAT(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz, ftssel_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(FTSSEL, aa64_sve, gen_gvec_ool_arg_zzz,
+ftssel_fns[a->esz], a, 0)
 
 /*
  *** SVE Predicate Logical Operations Group
@@ -2536,7 +2537,8 @@ TRANS_FEAT(TRN2_q, aa64_sve_f64mm, gen_gvec_ool_arg_zzz,
 static gen_helper_gvec_3 * const compact_fns[4] = {
 NULL, NULL, gen_helper_sve_compact_s, gen_helper_sve_compact_d
 };
-TRANS_FEAT(COMPACT, aa64_sve, gen_gvec_ool_arg_zpz, compact_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(COMPACT, aa64_sve, gen_gvec_ool_arg_zpz,
+compact_fns[a->esz], a, 0)
 
 /* Call the helper that computes the ARM LastActiveElement pseudocode
  * function, scaled by the element size.  This includes the not found
@@ -6374,22 +6376,22 @@ static gen_helper_gvec_3 * const bext_fns[4] = {
 gen_helper_sve2_bext_b, gen_helper_sve2_bext_h,
 gen_helper_sve2_bext_s, gen_helper_sve2_bext_d,
 };
-TRANS_FEAT(BEXT, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
-   bext_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(BEXT, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
+bext_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const bdep_fns[4] = {
 gen_helper_sve2_bdep_b, gen_helper_sve2_bdep_h,
 gen_helper_sve2_bdep_s, gen_helper_sve2_bdep_d,
 };
-TRANS_FEAT(BDEP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
-   bdep_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(BDEP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
+bdep_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const bgrp_fns[4] = {
 gen_helper_sve2_bgrp_b, gen_helper_sve2_bgrp_h,
 gen_helper_sve2_bgrp_s, gen_helper_sve2_bgrp_d,
 };
-TRANS_FEAT(BGRP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
-   bgrp_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(BGRP, aa64_sve2_bitperm, gen_gvec_ool_arg_zzz,
+bgrp_fns[a->esz], a, 0)
 
 static gen_helper_gvec_3 * const cadd_fns[4] = {
 gen_helper_sve2_cadd_b, gen_helper_sve2_cadd_h,
-- 
2.34.1

[PATCH v5 10/45] target/arm: Mark string/histo/crypto as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as non-streaming instructions, which should trap
if full a64 support is not enabled in streaming mode.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  1 -
 target/arm/translate-sve.c | 35 ++-
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 3260ea2d64..fe462d2ccc 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,7 +59,6 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL0100 0101 --1-  1---      # SVE2 string/histo/crypto 
instructions
 FAIL1000 010- -00-  10--      # SVE2 32-bit gather NT load 
(vector+scalar)
 FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
 FAIL1000 0100 0-1-  0---      # SVE 32-bit gather prefetch 
(scalar+vector)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 9bbf44f008..f8e0716474 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -7110,21 +7110,21 @@ DO_SVE2_ZZZ_NARROW(RSUBHNT, rsubhnt)
 static gen_helper_gvec_flags_4 * const match_fns[4] = {
 gen_helper_sve2_match_ppzz_b, gen_helper_sve2_match_ppzz_h, NULL, NULL
 };
-TRANS_FEAT(MATCH, aa64_sve2, do_ppzz_flags, a, match_fns[a->esz])
+TRANS_FEAT_NONSTREAMING(MATCH, aa64_sve2, do_ppzz_flags, a, match_fns[a->esz])
 
 static gen_helper_gvec_flags_4 * const nmatch_fns[4] = {
 gen_helper_sve2_nmatch_ppzz_b, gen_helper_sve2_nmatch_ppzz_h, NULL, NULL
 };
-TRANS_FEAT(NMATCH, aa64_sve2, do_ppzz_flags, a, nmatch_fns[a->esz])
+TRANS_FEAT_NONSTREAMING(NMATCH, aa64_sve2, do_ppzz_flags, a, 
nmatch_fns[a->esz])
 
 static gen_helper_gvec_4 * const histcnt_fns[4] = {
 NULL, NULL, gen_helper_sve2_histcnt_s, gen_helper_sve2_histcnt_d
 };
-TRANS_FEAT(HISTCNT, aa64_sve2, gen_gvec_ool_arg_zpzz,
-   histcnt_fns[a->esz], a, 0)
+TRANS_FEAT_NONSTREAMING(HISTCNT, aa64_sve2, gen_gvec_ool_arg_zpzz,
+histcnt_fns[a->esz], a, 0)
 
-TRANS_FEAT(HISTSEG, aa64_sve2, gen_gvec_ool_arg_zzz,
-   a->esz == 0 ? gen_helper_sve2_histseg : NULL, a, 0)
+TRANS_FEAT_NONSTREAMING(HISTSEG, aa64_sve2, gen_gvec_ool_arg_zzz,
+a->esz == 0 ? gen_helper_sve2_histseg : NULL, a, 0)
 
 DO_ZPZZ_FP(FADDP, aa64_sve2, sve2_faddp_zpzz)
 DO_ZPZZ_FP(FMAXNMP, aa64_sve2, sve2_fmaxnmp_zpzz)
@@ -7238,20 +7238,21 @@ TRANS_FEAT(SQRDCMLAH_, aa64_sve2, gen_gvec_ool_,
 TRANS_FEAT(USDOT_, aa64_sve_i8mm, gen_gvec_ool_arg_,
a->esz == 2 ? gen_helper_gvec_usdot_b : NULL, a, 0)
 
-TRANS_FEAT(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
-   gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
+TRANS_FEAT_NONSTREAMING(AESMC, aa64_sve2_aes, gen_gvec_ool_zz,
+gen_helper_crypto_aesmc, a->rd, a->rd, a->decrypt)
 
-TRANS_FEAT(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_aese, a, false)
-TRANS_FEAT(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_aese, a, true)
+TRANS_FEAT_NONSTREAMING(AESE, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_aese, a, false)
+TRANS_FEAT_NONSTREAMING(AESD, aa64_sve2_aes, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_aese, a, true)
 
-TRANS_FEAT(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_sm4e, a, 0)
-TRANS_FEAT(SM4EKEY, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
-   gen_helper_crypto_sm4ekey, a, 0)
+TRANS_FEAT_NONSTREAMING(SM4E, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_sm4e, a, 0)
+TRANS_FEAT_NONSTREAMING(SM4EKEY, aa64_sve2_sm4, gen_gvec_ool_arg_zzz,
+gen_helper_crypto_sm4ekey, a, 0)
 
-TRANS_FEAT(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz, gen_gvec_rax1, a)
+TRANS_FEAT_NONSTREAMING(RAX1, aa64_sve2_sha3, gen_gvec_fn_arg_zzz,
+gen_gvec_rax1, a)
 
 TRANS_FEAT(FCVTNT_sh, aa64_sve2, gen_gvec_fpst_arg_zpz,
gen_helper_sve2_fcvtnt_sh, a, 0, FPST_FPCR)
-- 
2.34.1

[PATCH v5 12/45] target/arm: Mark gather prefetch as non-streaming

2022-07-06 Thread Richard Henderson

Mark these as a non-streaming instructions, which should trap if full
a64 support is not enabled in streaming mode.  In this case, introduce
PRF_ns (prefetch non-streaming) to handle the checks.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme-fa64.decode |  3 ---
 target/arm/sve.decode  | 10 +-
 target/arm/translate-sve.c | 11 +++
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/target/arm/sme-fa64.decode b/target/arm/sme-fa64.decode
index 1acc3ae080..7d4c33fb5b 100644
--- a/target/arm/sme-fa64.decode
+++ b/target/arm/sme-fa64.decode
@@ -59,10 +59,7 @@ FAIL0001 1110 0111 1110  00--     # FJCVTZS
 #   --11 1100 --1-     --10   # Load/store FP register 
(register offset)
 #   --11 1101         # Load/store FP register 
(scaled imm)
 
-FAIL1000 010- -00-  111-      # SVE 32-bit gather prefetch 
(vector+imm)
-FAIL1000 0100 0-1-  0---      # SVE 32-bit gather prefetch 
(scalar+vector)
 FAIL1010 010-   011-      # SVE contiguous FF load 
(scalar+scalar)
 FAIL1010 010- ---1  101-      # SVE contiguous NF load 
(scalar+imm)
 FAIL1010 010- -01-  000-      # SVE load & replicate 32 
bytes (scalar+scalar)
 FAIL1010 010- -010  001-      # SVE load & replicate 32 
bytes (scalar+imm)
-FAIL1100 010-         # SVE 64-bit gather 
load/prefetch
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index a54feb2f61..908643d7d9 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1183,10 +1183,10 @@ LD1RO_zpri  1010010 .. 01 0 001 ... . . 
\
 @rpri_load_msz nreg=0
 
 # SVE 32-bit gather prefetch (scalar plus 32-bit scaled offsets)
-PRF 110 00 -1 - 0-- --- - 0 
+PRF_ns  110 00 -1 - 0-- --- - 0 
 
 # SVE 32-bit gather prefetch (vector plus immediate)
-PRF 110 -- 00 - 111 --- - 0 
+PRF_ns  110 -- 00 - 111 --- - 0 
 
 # SVE contiguous prefetch (scalar plus immediate)
 PRF 110 11 1- - 0-- --- - 0 
@@ -1223,13 +1223,13 @@ LD1_zpiz1100010 .. 01 . 1.. ... . . 
\
 @rpri_g_load esz=3
 
 # SVE 64-bit gather prefetch (scalar plus 64-bit scaled offsets)
-PRF 1100010 00 11 - 1-- --- - 0 
+PRF_ns  1100010 00 11 - 1-- --- - 0 
 
 # SVE 64-bit gather prefetch (scalar plus unpacked 32-bit scaled offsets)
-PRF 1100010 00 -1 - 0-- --- - 0 
+PRF_ns  1100010 00 -1 - 0-- --- - 0 
 
 # SVE 64-bit gather prefetch (vector plus immediate)
-PRF 1100010 -- 00 - 111 --- - 0 
+PRF_ns  1100010 -- 00 - 111 --- - 0 
 
 ### SVE Memory Store Group
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index b23c6aa0bf..bbf3bf2119 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5971,6 +5971,17 @@ static bool trans_PRF_rr(DisasContext *s, arg_PRF_rr *a)
 return true;
 }
 
+static bool trans_PRF_ns(DisasContext *s, arg_PRF_ns *a)
+{
+if (!dc_isar_feature(aa64_sve, s)) {
+return false;
+}
+/* Prefetch is a nop within QEMU.  */
+s->is_nonstreaming = true;
+(void)sve_access_check(s);
+return true;
+}
+
 /*
  * Move Prefix
  *
-- 
2.34.1

[PATCH v5 16/45] target/arm: Handle SME in sve_access_check

2022-07-06 Thread Richard Henderson

The pseudocode for CheckSVEEnabled gains a check for Streaming
SVE mode, and for SME present but SVE absent.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b16d81bf19..b7b64f7358 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1183,21 +1183,31 @@ static bool fp_access_check(DisasContext *s)
 return true;
 }
 
-/* Check that SVE access is enabled.  If it is, return true.
+/*
+ * Check that SVE access is enabled.  If it is, return true.
  * If not, emit code to generate an appropriate exception and return false.
+ * This function corresponds to CheckSVEEnabled().
  */
 bool sve_access_check(DisasContext *s)
 {
-if (s->sve_excp_el) {
-assert(!s->sve_access_checked);
-s->sve_access_checked = true;
-
+if (s->pstate_sm || !dc_isar_feature(aa64_sve, s)) {
+assert(dc_isar_feature(aa64_sme, s));
+if (!sme_sm_enabled_check(s)) {
+goto fail_exit;
+}
+} else if (s->sve_excp_el) {
 gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
   syn_sve_access_trap(), s->sve_excp_el);
-return false;
+goto fail_exit;
 }
 s->sve_access_checked = true;
 return fp_access_check(s);
+
+ fail_exit:
+/* Assert that we only raise one exception per instruction. */
+assert(!s->sve_access_checked);
+s->sve_access_checked = true;
+return false;
 }
 
 /*
-- 
2.34.1

[PATCH v5 18/45] target/arm: Implement SME ZERO

2022-07-06 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v4: Fix ZA[] comment in helper_sme_zero.
---
 target/arm/helper-sme.h|  2 ++
 target/arm/sme.decode  |  4 
 target/arm/sme_helper.c| 25 +
 target/arm/translate-sme.c | 13 +
 4 files changed, 44 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 3bd48c235f..c4ee1f09e4 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -19,3 +19,5 @@
 
 DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void, env, i32)
 DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
+
+DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index c25c031a71..6e4483fdce 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -18,3 +18,7 @@
 #
 # This file is processed by scripts/decodetree.py
 #
+
+### SME Misc
+
+ZERO1100 00 001 000 imm:8
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index b215725594..eef2df73e1 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -59,3 +59,28 @@ void helper_set_pstate_za(CPUARMState *env, uint32_t i)
 memset(env->zarray, 0, sizeof(env->zarray));
 }
 }
+
+void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl)
+{
+uint32_t i;
+
+/*
+ * Special case clearing the entire ZA space.
+ * This falls into the CONSTRAINED UNPREDICTABLE zeroing of any
+ * parts of the ZA storage outside of SVL.
+ */
+if (imm == 0xff) {
+memset(env->zarray, 0, sizeof(env->zarray));
+return;
+}
+
+/*
+ * Recall that ZAnH.D[m] is spread across ZA[n+8*m],
+ * so each row is discontiguous within ZA[].
+ */
+for (i = 0; i < svl; i++) {
+if (imm & (1 << (i % 8))) {
+memset(&env->zarray[i], 0, svl);
+}
+}
+}
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index 786c93fb2d..971504559b 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -33,3 +33,16 @@
  */
 
 #include "decode-sme.c.inc"
+
+
+static bool trans_ZERO(DisasContext *s, arg_ZERO *a)
+{
+if (!dc_isar_feature(aa64_sme, s)) {
+return false;
+}
+if (sme_za_enabled_check(s)) {
+gen_helper_sme_zero(cpu_env, tcg_constant_i32(a->imm),
+tcg_constant_i32(streaming_vec_reg_size(s)));
+}
+return true;
+}
-- 
2.34.1

[PATCH v5 17/45] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL

2022-07-06 Thread Richard Henderson

These SME instructions are nominally within the SVE decode space,
so we add them to sve.decode and translate-sve.c.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v4: Add streaming_{vec,pred}_reg_size.
---
 target/arm/translate-a64.h | 12 
 target/arm/sve.decode  |  5 -
 target/arm/translate-sve.c | 38 ++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 02fb95e019..099d3d11d6 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -128,6 +128,12 @@ static inline int vec_full_reg_size(DisasContext *s)
 return s->vl;
 }
 
+/* Return the byte size of the vector register, SVL / 8. */
+static inline int streaming_vec_reg_size(DisasContext *s)
+{
+return s->svl;
+}
+
 /*
  * Return the offset info CPUARMState of the predicate vector register Pn.
  * Note for this purpose, FFR is P16.
@@ -143,6 +149,12 @@ static inline int pred_full_reg_size(DisasContext *s)
 return s->vl >> 3;
 }
 
+/* Return the byte size of the predicate register, SVL / 64.  */
+static inline int streaming_pred_reg_size(DisasContext *s)
+{
+return s->svl >> 3;
+}
+
 /*
  * Round up the size of a register to a size allowed by
  * the tcg vector infrastructure.  Any operation which uses this
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 908643d7d9..95af08c139 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -449,14 +449,17 @@ INDEX_ri0100 esz:2 1 imm:s5 010001 rn:5 rd:5
 # SVE index generation (register start, register increment)
 INDEX_rr0100 .. 1 . 010011 . .  @rd_rn_rm
 
-### SVE Stack Allocation Group
+### SVE / Streaming SVE Stack Allocation Group
 
 # SVE stack frame adjustment
 ADDVL   0100 001 . 01010 .. .   @rd_rn_i6
+ADDSVL  0100 001 . 01011 .. .   @rd_rn_i6
 ADDPL   0100 011 . 01010 .. .   @rd_rn_i6
+ADDSPL  0100 011 . 01011 .. .   @rd_rn_i6
 
 # SVE stack frame size
 RDVL0100 101 1 01010 imm:s6 rd:5
+RDSVL   0100 101 1 01011 imm:s6 rd:5
 
 ### SVE Bitwise Shift - Unpredicated Group
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 96e934c1ea..95016e49e9 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1286,6 +1286,19 @@ static bool trans_ADDVL(DisasContext *s, arg_ADDVL *a)
 return true;
 }
 
+static bool trans_ADDSVL(DisasContext *s, arg_ADDSVL *a)
+{
+if (!dc_isar_feature(aa64_sme, s)) {
+return false;
+}
+if (sme_enabled_check(s)) {
+TCGv_i64 rd = cpu_reg_sp(s, a->rd);
+TCGv_i64 rn = cpu_reg_sp(s, a->rn);
+tcg_gen_addi_i64(rd, rn, a->imm * streaming_vec_reg_size(s));
+}
+return true;
+}
+
 static bool trans_ADDPL(DisasContext *s, arg_ADDPL *a)
 {
 if (!dc_isar_feature(aa64_sve, s)) {
@@ -1299,6 +1312,19 @@ static bool trans_ADDPL(DisasContext *s, arg_ADDPL *a)
 return true;
 }
 
+static bool trans_ADDSPL(DisasContext *s, arg_ADDSPL *a)
+{
+if (!dc_isar_feature(aa64_sme, s)) {
+return false;
+}
+if (sme_enabled_check(s)) {
+TCGv_i64 rd = cpu_reg_sp(s, a->rd);
+TCGv_i64 rn = cpu_reg_sp(s, a->rn);
+tcg_gen_addi_i64(rd, rn, a->imm * streaming_pred_reg_size(s));
+}
+return true;
+}
+
 static bool trans_RDVL(DisasContext *s, arg_RDVL *a)
 {
 if (!dc_isar_feature(aa64_sve, s)) {
@@ -1311,6 +1337,18 @@ static bool trans_RDVL(DisasContext *s, arg_RDVL *a)
 return true;
 }
 
+static bool trans_RDSVL(DisasContext *s, arg_RDSVL *a)
+{
+if (!dc_isar_feature(aa64_sme, s)) {
+return false;
+}
+if (sme_enabled_check(s)) {
+TCGv_i64 reg = cpu_reg(s, a->rd);
+tcg_gen_movi_i64(reg, a->imm * streaming_vec_reg_size(s));
+}
+return true;
+}
+
 /*
  *** SVE Compute Vector Address Group
  */
-- 
2.34.1

[PATCH v7 07/14] KVM: Use gfn instead of hva for mmu_notifier_retry

2022-07-06 Thread Chao Peng

Currently in mmu_notifier validate path, hva range is recorded and then
checked in the mmu_notifier_retry_hva() from page fault path. However
for the to be introduced private memory, a page fault may not have a hva
associated, checking gfn(gpa) makes more sense. For existing non private
memory case, gfn is expected to continue to work.

The patch also fixes a potential bug in kvm_zap_gfn_range() which has
already been using gfn when calling kvm_inc/dec_notifier_count() in
current code.

Signed-off-by: Chao Peng 
---
 arch/x86/kvm/mmu/mmu.c   |  2 +-
 include/linux/kvm_host.h | 18 --
 virt/kvm/kvm_main.c  |  6 +++---
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f7fa4c31b7c5..0d882fad4bc1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4182,7 +4182,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
return true;
 
return fault->slot &&
-  mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
+  mmu_notifier_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn);
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault 
*fault)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bdb6044e316..e9153b54e2a4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -767,8 +767,8 @@ struct kvm {
struct mmu_notifier mmu_notifier;
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
-   unsigned long mmu_notifier_range_start;
-   unsigned long mmu_notifier_range_end;
+   gfn_t mmu_notifier_range_start;
+   gfn_t mmu_notifier_range_end;
 #endif
struct list_head devices;
u64 manual_dirty_log_protect;
@@ -1362,10 +1362,8 @@ void kvm_mmu_free_memory_cache(struct 
kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
 
-void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start,
-  unsigned long end);
-void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start,
-  unsigned long end);
+void kvm_inc_notifier_count(struct kvm *kvm, gfn_t start, gfn_t end);
+void kvm_dec_notifier_count(struct kvm *kvm, gfn_t start, gfn_t end);
 
 long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
@@ -1923,9 +1921,9 @@ static inline int mmu_notifier_retry(struct kvm *kvm, 
unsigned long mmu_seq)
return 0;
 }
 
-static inline int mmu_notifier_retry_hva(struct kvm *kvm,
+static inline int mmu_notifier_retry_gfn(struct kvm *kvm,
 unsigned long mmu_seq,
-unsigned long hva)
+gfn_t gfn)
 {
lockdep_assert_held(&kvm->mmu_lock);
/*
@@ -1935,8 +1933,8 @@ static inline int mmu_notifier_retry_hva(struct kvm *kvm,
 * positives, due to shortcuts when handing concurrent invalidations.
 */
if (unlikely(kvm->mmu_notifier_count) &&
-   hva >= kvm->mmu_notifier_range_start &&
-   hva < kvm->mmu_notifier_range_end)
+   gfn >= kvm->mmu_notifier_range_start &&
+   gfn < kvm->mmu_notifier_range_end)
return 1;
if (kvm->mmu_notifier_seq != mmu_seq)
return 1;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index da263c370d00..4d7f0e72366f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -536,8 +536,7 @@ static void kvm_mmu_notifier_invalidate_range(struct 
mmu_notifier *mn,
 
 typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
-typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
-unsigned long end);
+typedef void (*on_lock_fn_t)(struct kvm *kvm, gfn_t start, gfn_t end);
 
 typedef void (*on_unlock_fn_t)(struct kvm *kvm);
 
@@ -624,7 +623,8 @@ static __always_inline int __kvm_handle_hva_range(struct 
kvm *kvm,
locked = true;
KVM_MMU_LOCK(kvm);
if (!IS_KVM_NULL_FN(range->on_lock))
-   range->on_lock(kvm, range->start, 
range->end);
+   range->on_lock(kvm, gfn_range.start,
+   gfn_range.end);
if (IS_KVM_NULL_FN(range->handler))
break;
}
-- 
2.25.1

[PATCH v5 24/45] target/arm: Implement FMOPA, FMOPS (non-widening)

2022-07-06 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  5 +++
 target/arm/sme.decode  |  9 ++
 target/arm/sme_helper.c| 63 ++
 target/arm/translate-sme.c | 32 +++
 4 files changed, 109 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 753e9e624c..f50d0fe1d6 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -120,3 +120,8 @@ DEF_HELPER_FLAGS_5(sme_addha_s, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index 8cb6c4053c..ba4774d174 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -64,3 +64,12 @@ ADDHA_s 1100 10 01000 0 ... ... . 000 .. 
   @adda_32
 ADDVA_s 1100 10 01000 1 ... ... . 000 ..@adda_32
 ADDHA_d 1100 11 01000 0 ... ... . 00 ...@adda_64
 ADDVA_d 1100 11 01000 1 ... ... . 00 ...@adda_64
+
+### SME Outer Product
+
+&op zad zn zm pm pn sub:bool
+@op_32   ... zm:5 pm:3 pn:3 zn:5 sub:1 .. zad:2 &op
+@op_64   ... zm:5 pm:3 pn:3 zn:5 sub:1 .  zad:3 &op
+
+FMOPA_s 1000 100 . ... ... . . 00 ..@op_32
+FMOPA_d 1000 110 . ... ... . . 0 ...@op_64
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 10b7c1ad68..78ba34f3d2 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -25,6 +25,7 @@
 #include "exec/cpu_ldst.h"
 #include "exec/exec-all.h"
 #include "qemu/int128.h"
+#include "fpu/softfloat.h"
 #include "vec_internal.h"
 #include "sve_ldst_internal.h"
 
@@ -918,3 +919,65 @@ void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn,
 }
 }
 }
+
+void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
+ void *vpm, void *vst, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_maxsz(desc);
+uint32_t neg = simd_data(desc) << 31;
+uint16_t *pn = vpn, *pm = vpm;
+float_status fpst = *(float_status *)vst;
+
+set_default_nan_mode(true, &fpst);
+
+for (row = 0; row < oprsz; ) {
+uint16_t pa = pn[H2(row >> 4)];
+do {
+if (pa & 1) {
+void *vza_row = vza + tile_vslice_offset(row);
+uint32_t n = *(uint32_t *)(vzn + row) ^ neg;
+
+for (col = 0; col < oprsz; ) {
+uint16_t pb = pm[H2(col >> 4)];
+do {
+if (pb & 1) {
+uint32_t *a = vza_row + col;
+uint32_t *m = vzm + col;
+*a = float32_muladd(n, *m, *a, 0, vst);
+}
+col += 4;
+pb >>= 4;
+} while (col & 15);
+}
+}
+row += 4;
+pa >>= 4;
+} while (row & 15);
+}
+}
+
+void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, void *vpn,
+ void *vpm, void *vst, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
+uint64_t neg = (uint64_t)simd_data(desc) << 63;
+uint64_t *za = vza, *zn = vzn, *zm = vzm;
+uint8_t *pn = vpn, *pm = vpm;
+float_status fpst = *(float_status *)vst;
+
+set_default_nan_mode(true, &fpst);
+
+for (row = 0; row < oprsz; ++row) {
+if (pn[H1(row)] & 1) {
+uint64_t *za_row = &za[tile_vslice_index(row)];
+uint64_t n = zn[row] ^ neg;
+
+for (col = 0; col < oprsz; ++col) {
+if (pm[H1(col)] & 1) {
+uint64_t *a = &za_row[col];
+*a = float64_muladd(n, zm[col], *a, 0, &fpst);
+}
+}
+}
+}
+}
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index d3b9cdd5c4..fa8f343a7d 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -298,3 +298,35 @@ TRANS_FEAT(ADDHA_s, aa64_sme, do_adda, a, MO_32, 
gen_helper_sme_addha_s)
 TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, gen_helper_sme_addva_s)
 TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
 TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addva_d)
+
+static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
+gen_helper_gvec_5_ptr *fn)
+{
+int svl = streaming_vec_reg

[PATCH v7 06/14] KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS

2022-07-06 Thread Chao Peng

KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are not
exposed to userspace and avoids confusion to real private slots that
is going to be added.

Signed-off-by: Chao Peng 
---
 arch/mips/include/asm/kvm_host.h | 2 +-
 arch/x86/include/asm/kvm_host.h  | 2 +-
 include/linux/kvm_host.h | 6 +++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 717716cc51c5..45a978c805bc 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -85,7 +85,7 @@
 
 #define KVM_MAX_VCPUS  16
 /* memory slots that does not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS  0
+#define KVM_INTERNAL_MEM_SLOTS 0
 
 #define KVM_HALT_POLL_NS_DEFAULT 50
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index de5a149d0971..dae190e19fce 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -53,7 +53,7 @@
 #define KVM_MAX_VCPU_IDS (KVM_MAX_VCPUS * KVM_VCPU_ID_RATIO)
 
 /* memory slots that are not exposed to userspace */
-#define KVM_PRIVATE_MEM_SLOTS 3
+#define KVM_INTERNAL_MEM_SLOTS 3
 
 #define KVM_HALT_POLL_NS_DEFAULT 20
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 3b40f8d68fbb..0bdb6044e316 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -656,12 +656,12 @@ struct kvm_irq_routing_table {
 };
 #endif
 
-#ifndef KVM_PRIVATE_MEM_SLOTS
-#define KVM_PRIVATE_MEM_SLOTS 0
+#ifndef KVM_INTERNAL_MEM_SLOTS
+#define KVM_INTERNAL_MEM_SLOTS 0
 #endif
 
 #define KVM_MEM_SLOTS_NUM SHRT_MAX
-#define KVM_USER_MEM_SLOTS (KVM_MEM_SLOTS_NUM - KVM_PRIVATE_MEM_SLOTS)
+#define KVM_USER_MEM_SLOTS (KVM_MEM_SLOTS_NUM - KVM_INTERNAL_MEM_SLOTS)
 
 #ifndef __KVM_VCPU_MULTIPLE_ADDRESS_SPACE
 static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
-- 
2.25.1

[PATCH v5 21/45] target/arm: Export unpredicated ld/st from translate-sve.c

2022-07-06 Thread Richard Henderson

Add a TCGv_ptr base argument, which will be cpu_env for SVE.
We will reuse this for SME save and restore array insns.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.h |  3 +++
 target/arm/translate-sve.c | 48 --
 2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 2a7fe6e9e7..ad3762d1ac 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -195,4 +195,7 @@ void gen_gvec_xar(unsigned vece, uint32_t rd_ofs, uint32_t 
rn_ofs,
   uint32_t rm_ofs, int64_t shift,
   uint32_t opr_sz, uint32_t max_sz);
 
+void gen_sve_ldr(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int 
imm);
+void gen_sve_str(DisasContext *s, TCGv_ptr, int vofs, int len, int rn, int 
imm);
+
 #endif /* TARGET_ARM_TRANSLATE_A64_H */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 95016e49e9..fd1a173637 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4306,7 +4306,8 @@ TRANS_FEAT(UCVTF_dd, aa64_sve, gen_gvec_fpst_arg_zpz,
  * The load should begin at the address Rn + IMM.
  */
 
-static void do_ldr(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
+void gen_sve_ldr(DisasContext *s, TCGv_ptr base, int vofs,
+ int len, int rn, int imm)
 {
 int len_align = QEMU_ALIGN_DOWN(len, 8);
 int len_remain = len % 8;
@@ -4332,7 +4333,7 @@ static void do_ldr(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 t0 = tcg_temp_new_i64();
 for (i = 0; i < len_align; i += 8) {
 tcg_gen_qemu_ld_i64(t0, clean_addr, midx, MO_LEUQ);
-tcg_gen_st_i64(t0, cpu_env, vofs + i);
+tcg_gen_st_i64(t0, base, vofs + i);
 tcg_gen_addi_i64(clean_addr, clean_addr, 8);
 }
 tcg_temp_free_i64(t0);
@@ -4345,6 +4346,12 @@ static void do_ldr(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 clean_addr = new_tmp_a64_local(s);
 tcg_gen_mov_i64(clean_addr, t0);
 
+if (base != cpu_env) {
+TCGv_ptr b = tcg_temp_local_new_ptr();
+tcg_gen_mov_ptr(b, base);
+base = b;
+}
+
 gen_set_label(loop);
 
 t0 = tcg_temp_new_i64();
@@ -4352,7 +4359,7 @@ static void do_ldr(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 tcg_gen_addi_i64(clean_addr, clean_addr, 8);
 
 tp = tcg_temp_new_ptr();
-tcg_gen_add_ptr(tp, cpu_env, i);
+tcg_gen_add_ptr(tp, base, i);
 tcg_gen_addi_ptr(i, i, 8);
 tcg_gen_st_i64(t0, tp, vofs);
 tcg_temp_free_ptr(tp);
@@ -4360,6 +4367,11 @@ static void do_ldr(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 
 tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
 tcg_temp_free_ptr(i);
+
+if (base != cpu_env) {
+tcg_temp_free_ptr(base);
+assert(len_remain == 0);
+}
 }
 
 /*
@@ -4388,13 +4400,14 @@ static void do_ldr(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 default:
 g_assert_not_reached();
 }
-tcg_gen_st_i64(t0, cpu_env, vofs + len_align);
+tcg_gen_st_i64(t0, base, vofs + len_align);
 tcg_temp_free_i64(t0);
 }
 }
 
 /* Similarly for stores.  */
-static void do_str(DisasContext *s, uint32_t vofs, int len, int rn, int imm)
+void gen_sve_str(DisasContext *s, TCGv_ptr base, int vofs,
+ int len, int rn, int imm)
 {
 int len_align = QEMU_ALIGN_DOWN(len, 8);
 int len_remain = len % 8;
@@ -4420,7 +4433,7 @@ static void do_str(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 
 t0 = tcg_temp_new_i64();
 for (i = 0; i < len_align; i += 8) {
-tcg_gen_ld_i64(t0, cpu_env, vofs + i);
+tcg_gen_ld_i64(t0, base, vofs + i);
 tcg_gen_qemu_st_i64(t0, clean_addr, midx, MO_LEUQ);
 tcg_gen_addi_i64(clean_addr, clean_addr, 8);
 }
@@ -4434,11 +4447,17 @@ static void do_str(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 clean_addr = new_tmp_a64_local(s);
 tcg_gen_mov_i64(clean_addr, t0);
 
+if (base != cpu_env) {
+TCGv_ptr b = tcg_temp_local_new_ptr();
+tcg_gen_mov_ptr(b, base);
+base = b;
+}
+
 gen_set_label(loop);
 
 t0 = tcg_temp_new_i64();
 tp = tcg_temp_new_ptr();
-tcg_gen_add_ptr(tp, cpu_env, i);
+tcg_gen_add_ptr(tp, base, i);
 tcg_gen_ld_i64(t0, tp, vofs);
 tcg_gen_addi_ptr(i, i, 8);
 tcg_temp_free_ptr(tp);
@@ -4449,12 +4468,17 @@ static void do_str(DisasContext *s, uint32_t vofs, int 
len, int rn, int imm)
 
 tcg_gen_brcondi_ptr(TCG_COND_LTU, i, len_align, loop);
 tcg_temp_free_ptr(i);
+
+if (base != cpu_env) {
+tcg_temp_free_ptr(base

[PATCH v5 23/45] target/arm: Implement SME ADDHA, ADDVA

2022-07-06 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
v4: Drop restrict.
---
 target/arm/helper-sme.h|  5 +++
 target/arm/sme.decode  | 11 +
 target/arm/sme_helper.c| 90 ++
 target/arm/translate-sme.c | 31 +
 4 files changed, 137 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 95f6e88bdd..753e9e624c 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -115,3 +115,8 @@ DEF_HELPER_FLAGS_5(sme_st1q_be_h_mte, TCG_CALL_NO_WG, void, 
env, ptr, ptr, tl, i
 DEF_HELPER_FLAGS_5(sme_st1q_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
 DEF_HELPER_FLAGS_5(sme_st1q_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
 DEF_HELPER_FLAGS_5(sme_st1q_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_addha_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index f1ebd857a5..8cb6c4053c 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -53,3 +53,14 @@ LDST1   111 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 
za_imm:4  \
 
 LDR 111 100 0 00 .. 000 . 0 @ldstr
 STR 111 100 1 00 .. 000 . 0 @ldstr
+
+### SME Add Vector to Array
+
+&adda   zad zn pm pn
+@adda_32 .. . . pm:3 pn:3 zn:5 ... zad:2&adda
+@adda_64 .. . . pm:3 pn:3 zn:5 ..  zad:3&adda
+
+ADDHA_s 1100 10 01000 0 ... ... . 000 ..@adda_32
+ADDVA_s 1100 10 01000 1 ... ... . 000 ..@adda_32
+ADDHA_d 1100 11 01000 0 ... ... . 00 ...@adda_64
+ADDVA_d 1100 11 01000 1 ... ... . 00 ...@adda_64
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index e8895143c1..10b7c1ad68 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -828,3 +828,93 @@ DO_ST(q, _be, MO_128)
 DO_ST(q, _le, MO_128)
 
 #undef DO_ST
+
+void HELPER(sme_addha_s)(void *vzda, void *vzn, void *vpn,
+ void *vpm, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
+uint64_t *pn = vpn, *pm = vpm;
+uint32_t *zda = vzda, *zn = vzn;
+
+for (row = 0; row < oprsz; ) {
+uint64_t pa = pn[row >> 4];
+do {
+if (pa & 1) {
+for (col = 0; col < oprsz; ) {
+uint64_t pb = pm[col >> 4];
+do {
+if (pb & 1) {
+zda[tile_vslice_index(row) + col] += zn[col];
+}
+pb >>= 4;
+} while (++col & 15);
+}
+}
+pa >>= 4;
+} while (++row & 15);
+}
+}
+
+void HELPER(sme_addha_d)(void *vzda, void *vzn, void *vpn,
+ void *vpm, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
+uint8_t *pn = vpn, *pm = vpm;
+uint64_t *zda = vzda, *zn = vzn;
+
+for (row = 0; row < oprsz; ++row) {
+if (pn[H1(row)] & 1) {
+for (col = 0; col < oprsz; ++col) {
+if (pm[H1(col)] & 1) {
+zda[tile_vslice_index(row) + col] += zn[col];
+}
+}
+}
+}
+}
+
+void HELPER(sme_addva_s)(void *vzda, void *vzn, void *vpn,
+ void *vpm, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 4;
+uint64_t *pn = vpn, *pm = vpm;
+uint32_t *zda = vzda, *zn = vzn;
+
+for (row = 0; row < oprsz; ) {
+uint64_t pa = pn[row >> 4];
+do {
+if (pa & 1) {
+uint32_t zn_row = zn[row];
+for (col = 0; col < oprsz; ) {
+uint64_t pb = pm[col >> 4];
+do {
+if (pb & 1) {
+zda[tile_vslice_index(row) + col] += zn_row;
+}
+pb >>= 4;
+} while (++col & 15);
+}
+}
+pa >>= 4;
+} while (++row & 15);
+}
+}
+
+void HELPER(sme_addva_d)(void *vzda, void *vzn, void *vpn,
+ void *vpm, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
+uint8_t *pn = vpn, *pm = vpm;
+uint64_t *zda = vzda, *zn = vzn;
+
+for (row = 0; row < oprsz; ++row) {
+if (pn[H1(row)] & 1) {
+uint64_t zn_row = zn[row];
+for (col = 0; col < oprsz; ++col) {
+if (pm[H1(col)] & 1) {
+zda[tile_vslice_index(row) + col] += zn_row;
+}
+}
+

[PATCH v7 09/14] KVM: Extend the memslot to support fd-based private memory

2022-07-06 Thread Chao Peng

Extend the memslot definition to provide guest private memory through a
file descriptor(fd) instead of userspace_addr(hva). Such guest private
memory(fd) may never be mapped into userspace so no userspace_addr(hva)
can be used. Instead add another two new fields
(private_fd/private_offset), plus the existing memory_size to represent
the private memory range. Such memslot can still have the existing
userspace_addr(hva). When use, a single memslot can maintain both
private memory through private fd(private_fd/private_offset) and shared
memory through hva(userspace_addr). Whether the private or shared part
is effective for a guest GPA is maintained by other KVM code.

Since there is no userspace mapping for private fd so we cannot
rely on get_user_pages() to get the pfn in KVM, instead we add a new
memfile_notifier in the memslot and rely on it to get pfn by interacting
the callbacks from memory backing store with the fd/offset.

This new extension is indicated by a new flag KVM_MEM_PRIVATE. At
compile time, a new config HAVE_KVM_PRIVATE_MEM is added and right now
it is selected on X86_64 for Intel TDX usage.

To make KVM easy, internally we use a binary compatible alias struct
kvm_user_mem_region to handle both the normal and the '_ext' variants.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 Documentation/virt/kvm/api.rst | 38 
 arch/x86/kvm/Kconfig   |  2 ++
 arch/x86/kvm/x86.c |  2 +-
 include/linux/kvm_host.h   | 13 +--
 include/uapi/linux/kvm.h   | 28 +++
 virt/kvm/Kconfig   |  3 ++
 virt/kvm/kvm_main.c| 64 +-
 7 files changed, 132 insertions(+), 18 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index bafaeedd455c..4f27c973a952 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1319,7 +1319,7 @@ yet and must be cleared on entry.
 :Capability: KVM_CAP_USER_MEMORY
 :Architectures: all
 :Type: vm ioctl
-:Parameters: struct kvm_userspace_memory_region (in)
+:Parameters: struct kvm_userspace_memory_region(_ext) (in)
 :Returns: 0 on success, -1 on error
 
 ::
@@ -1332,9 +1332,18 @@ yet and must be cleared on entry.
__u64 userspace_addr; /* start of the userspace allocated memory */
   };
 
+  struct kvm_userspace_memory_region_ext {
+   struct kvm_userspace_memory_region region;
+   __u64 private_offset;
+   __u32 private_fd;
+   __u32 pad1;
+   __u64 pad2[14];
+};
+
   /* for kvm_memory_region::flags */
   #define KVM_MEM_LOG_DIRTY_PAGES  (1UL << 0)
   #define KVM_MEM_READONLY (1UL << 1)
+  #define KVM_MEM_PRIVATE  (1UL << 2)
 
 This ioctl allows the user to create, modify or delete a guest physical
 memory slot.  Bits 0-15 of "slot" specify the slot id and this value
@@ -1365,12 +1374,27 @@ It is recommended that the lower 21 bits of 
guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.
 
-The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
-KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
-writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
-to make a new slot read-only.  In this case, writes to this memory will be
-posted to userspace as KVM_EXIT_MMIO exits.
+kvm_userspace_memory_region_ext includes all the kvm_userspace_memory_region
+fields. It also includes additional fields for some specific features. See
+below description of flags field for more information. It's recommended to use
+kvm_userspace_memory_region_ext in new userspace code.
+
+The flags field supports below flags:
+
+- KVM_MEM_LOG_DIRTY_PAGES can be set to instruct KVM to keep track of writes to
+  memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to use it.
+
+- KVM_MEM_READONLY can be set, if KVM_CAP_READONLY_MEM capability allows it, to
+  make a new slot read-only.  In this case, writes to this memory will be 
posted
+  to userspace as KVM_EXIT_MMIO exits.
+
+- KVM_MEM_PRIVATE can be set to indicate a new slot has private memory backed 
by
+  a file descirptor(fd) and the content of the private memory is invisible to
+  userspace. In this case, userspace should use private_fd/private_offset in
+  kvm_userspace_memory_region_ext to instruct KVM to provide private memory to
+  guest. Userspace should guarantee not to map the same pfn indicated by
+  private_fd/private_offset to different gfns with multiple memslots. Failed to
+  do this may result undefined behavior.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region are automatically reflected into the guest.  For example, an
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index e3cbd7706136..1f160801e2a7 100644
--- a/ar

[PATCH v5 20/45] target/arm: Implement SME LD1, ST1

2022-07-06 Thread Richard Henderson

We cannot reuse the SVE functions for LD[1-4] and ST[1-4],
because those functions accept only a Zreg register number.
For SME, we want to pass a pointer into ZA storage.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  82 +
 target/arm/sme.decode  |   9 +
 target/arm/sme_helper.c| 595 +
 target/arm/translate-sme.c |  70 +
 4 files changed, 756 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 154bc73d2e..95f6e88bdd 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -33,3 +33,85 @@ DEF_HELPER_FLAGS_4(sme_mova_cz_d, TCG_CALL_NO_RWG, void, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(sme_ld1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1b_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_ld1h_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1h_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_ld1s_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1s_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_ld1d_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1d_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_ld1q_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_ld1q_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_st1b_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_st1b_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_st1b_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_st1b_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_st1h_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_st1h_le_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_st1h_be_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_st1h_le_v, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_5(sme_st1h_be_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_st1h_le_h_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_st1h_be_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+DEF_HELPER_FLAGS_5(sme_st1h_le_v_mte, TCG_CALL_NO_WG, void, env, ptr, ptr, tl, 
i32)
+
+DEF_HELPER_FLAGS_5(sme_st1s_be_h, TCG_CALL_NO_WG, void, env, ptr, ptr, tl

[PATCH v5 27/45] target/arm: Implement SME integer outer product

2022-07-06 Thread Richard Henderson

This is SMOPA, SUMOPA, USMOPA_s, UMOPA, for both Int8 and Int16.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h| 16 
 target/arm/sme.decode  | 10 +
 target/arm/sme_helper.c| 82 ++
 target/arm/translate-sme.c | 10 +
 4 files changed, 118 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 4d5d05db3a..d2d544a696 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -129,3 +129,19 @@ DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_smopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_umopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_sumopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_usmopa_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_smopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_umopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_sumopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_usmopa_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index e8d27fd8a0..628804e37a 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -76,3 +76,13 @@ FMOPA_d 1000 110 . ... ... . . 0 ... 
   @op_64
 
 BFMOPA  1001 100 . ... ... . . 00 ..@op_32
 FMOPA_h 1001 101 . ... ... . . 00 ..@op_32
+
+SMOPA_s 101 0 10 0 . ... ... . . 00 ..  @op_32
+SUMOPA_s101 0 10 1 . ... ... . . 00 ..  @op_32
+USMOPA_s101 1 10 0 . ... ... . . 00 ..  @op_32
+UMOPA_s 101 1 10 1 . ... ... . . 00 ..  @op_32
+
+SMOPA_d 101 0 11 0 . ... ... . . 0 ...  @op_64
+SUMOPA_d101 0 11 1 . ... ... . . 0 ...  @op_64
+USMOPA_d101 1 11 0 . ... ... . . 0 ...  @op_64
+UMOPA_d 101 1 11 1 . ... ... . . 0 ...  @op_64
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index e92f53ecab..be5e79af70 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -1101,3 +1101,85 @@ void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, 
void *vpn,
 } while (row & 15);
 }
 }
+
+typedef uint64_t IMOPFn(uint64_t, uint64_t, uint64_t, uint8_t, bool);
+
+static inline void do_imopa(uint64_t *za, uint64_t *zn, uint64_t *zm,
+uint8_t *pn, uint8_t *pm,
+uint32_t desc, IMOPFn *fn)
+{
+intptr_t row, col, oprsz = simd_oprsz(desc) / 8;
+bool neg = simd_data(desc);
+
+for (row = 0; row < oprsz; ++row) {
+uint8_t pa = pn[H1(row)];
+uint64_t *za_row = &za[tile_vslice_index(row)];
+uint64_t n = zn[row];
+
+for (col = 0; col < oprsz; ++col) {
+uint8_t pb = pm[H1(col)];
+uint64_t *a = &za_row[col];
+
+*a = fn(n, zm[col], *a, pa & pb, neg);
+}
+}
+}
+
+#define DEF_IMOP_32(NAME, NTYPE, MTYPE) \
+static uint64_t NAME(uint64_t n, uint64_t m, uint64_t a, uint8_t p, bool neg) \
+{   \
+uint32_t sum0 = 0, sum1 = 0;\
+/* Apply P to N as a mask, making the inactive elements 0. */   \
+n &= expand_pred_b(p);  \
+sum0 += (NTYPE)(n >> 0) * (MTYPE)(m >> 0);  \
+sum0 += (NTYPE)(n >> 8) * (MTYPE)(m >> 8);  \
+sum0 += (NTYPE)(n >> 16) * (MTYPE)(m >> 16);\
+sum0 += (NTYPE)(n >> 24) * (MTYPE)(m >> 24);\
+sum1 += (NTYPE)(n >> 32) * (MTYPE)(m >> 32);\
+sum1 += (NTYPE)(n >> 40) * (MTYPE)(m >> 40);\
+sum1 += (NTYPE)(n >> 48) * (MTYPE)(m >> 48);\
+sum1 += (NTYPE)(n >> 56) * (MTYPE)(m >> 56);\
+if (neg) {  \
+sum0 = (uint32_t)a - sum0, sum1 = (uint32_t)(a >> 32) - sum1;   \
+} else {\
+sum0 = (uint32_t)a + sum0, sum1 = (uint32_t)(a >> 32) + sum1;   \
+}

[PATCH v5 30/45] target/arm: Implement SCLAMP, UCLAMP

2022-07-06 Thread Richard Henderson

This is an SVE instruction that operates using the SVE vector
length but that it is present only if SME is implemented.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.h|  18 +++
 target/arm/sve.decode  |   5 ++
 target/arm/translate-sve.c | 102 +
 target/arm/vec_helper.c|  24 +
 4 files changed, 149 insertions(+)

diff --git a/target/arm/helper.h b/target/arm/helper.h
index 3a8ce42ab0..92f36d9dbb 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -1019,6 +1019,24 @@ DEF_HELPER_FLAGS_6(gvec_bfmlal, TCG_CALL_NO_RWG,
 DEF_HELPER_FLAGS_6(gvec_bfmlal_idx, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_5(gvec_sclamp_b, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sclamp_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sclamp_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_sclamp_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(gvec_uclamp_b, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uclamp_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uclamp_s, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(gvec_uclamp_d, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
+
 #ifdef TARGET_AARCH64
 #include "helper-a64.h"
 #include "helper-sve.h"
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index a9e48f07b4..14b3a69c36 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1695,3 +1695,8 @@ PSEL00100101 .. 1 100 .. 01  0  0 
  \
 @psel esz=2 imm=%psel_imm_s
 PSEL00100101 .1 1 000 .. 01  0  0   \
 @psel esz=3 imm=%psel_imm_d
+
+### SVE clamp
+
+SCLAMP  01000100 .. 0 . 11 . .  @rda_rn_rm
+UCLAMP  01000100 .. 0 . 110001 . .  @rda_rn_rm
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 9ed3b267fd..41f8b12259 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -7478,3 +7478,105 @@ static bool trans_PSEL(DisasContext *s, arg_psel *a)
 tcg_temp_free_ptr(ptr);
 return true;
 }
+
+static void gen_sclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a)
+{
+tcg_gen_smax_i32(d, a, n);
+tcg_gen_smin_i32(d, d, m);
+}
+
+static void gen_sclamp_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_i64 a)
+{
+tcg_gen_smax_i64(d, a, n);
+tcg_gen_smin_i64(d, d, m);
+}
+
+static void gen_sclamp_vec(unsigned vece, TCGv_vec d, TCGv_vec n,
+   TCGv_vec m, TCGv_vec a)
+{
+tcg_gen_smax_vec(vece, d, a, n);
+tcg_gen_smin_vec(vece, d, d, m);
+}
+
+static void gen_sclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
+   uint32_t a, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop[] = {
+INDEX_op_smin_vec, INDEX_op_smax_vec, 0
+};
+static const GVecGen4 ops[4] = {
+{ .fniv = gen_sclamp_vec,
+  .fno  = gen_helper_gvec_sclamp_b,
+  .opt_opc = vecop,
+  .vece = MO_8 },
+{ .fniv = gen_sclamp_vec,
+  .fno  = gen_helper_gvec_sclamp_h,
+  .opt_opc = vecop,
+  .vece = MO_16 },
+{ .fni4 = gen_sclamp_i32,
+  .fniv = gen_sclamp_vec,
+  .fno  = gen_helper_gvec_sclamp_s,
+  .opt_opc = vecop,
+  .vece = MO_32 },
+{ .fni8 = gen_sclamp_i64,
+  .fniv = gen_sclamp_vec,
+  .fno  = gen_helper_gvec_sclamp_d,
+  .opt_opc = vecop,
+  .vece = MO_64,
+  .prefer_i64 = TCG_TARGET_REG_BITS == 64 }
+};
+tcg_gen_gvec_4(d, n, m, a, oprsz, maxsz, &ops[vece]);
+}
+
+TRANS_FEAT(SCLAMP, aa64_sme, gen_gvec_fn_arg_, gen_sclamp, a)
+
+static void gen_uclamp_i32(TCGv_i32 d, TCGv_i32 n, TCGv_i32 m, TCGv_i32 a)
+{
+tcg_gen_umax_i32(d, a, n);
+tcg_gen_umin_i32(d, d, m);
+}
+
+static void gen_uclamp_i64(TCGv_i64 d, TCGv_i64 n, TCGv_i64 m, TCGv_i64 a)
+{
+tcg_gen_umax_i64(d, a, n);
+tcg_gen_umin_i64(d, d, m);
+}
+
+static void gen_uclamp_vec(unsigned vece, TCGv_vec d, TCGv_vec n,
+   TCGv_vec m, TCGv_vec a)
+{
+tcg_gen_umax_vec(vece, d, a, n);
+tcg_gen_umin_vec(vece, d, d, m);
+}
+
+static void gen_uclamp(unsigned vece, uint32_t d, uint32_t n, uint32_t m,
+   uint32_t a, uint32_t oprsz, uint32_t maxsz)
+{
+static const TCGOpcode vecop[] = {
+INDEX_op_umin_vec, INDEX_op_umax_vec, 0
+};
+static const GVecGen4 ops[4] = {
+{ .fniv = gen_uclamp_vec,
+  .fno  = gen_helper_gvec_uclamp_b,
+  .opt_opc = vecop,
+  .ve

[PATCH v5 19/45] target/arm: Implement SME MOVA

2022-07-06 Thread Richard Henderson

We can reuse the SVE functions for implementing moves to/from
horizontal tile slices, but we need new ones for moves to/from
vertical tile slices.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  12 +++
 target/arm/helper-sve.h|   2 +
 target/arm/translate-a64.h |   8 ++
 target/arm/translate.h |   5 ++
 target/arm/sme.decode  |  15 
 target/arm/sme_helper.c| 151 -
 target/arm/sve_helper.c|  12 +++
 target/arm/translate-sme.c | 127 +++
 8 files changed, 331 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index c4ee1f09e4..154bc73d2e 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -21,3 +21,15 @@ DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void, 
env, i32)
 DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
 
 DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
+
+/* Move to/from vertical array slices, i.e. columns, so 'c'.  */
+DEF_HELPER_FLAGS_4(sme_mova_cz_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_zc_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_cz_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_zc_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_cz_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_zc_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_cz_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_zc_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_cz_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(sme_mova_zc_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index dc629f851a..ab0333400f 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -325,6 +325,8 @@ DEF_HELPER_FLAGS_5(sve_sel_zpzz_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sve_sel_zpzz_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_5(sve_sel_zpzz_q, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_5(sve2_addp_zpzz_b, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 099d3d11d6..2a7fe6e9e7 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -178,6 +178,14 @@ static inline int pred_gvec_reg_size(DisasContext *s)
 return size_for_gvec(pred_full_reg_size(s));
 }
 
+/* Return a newly allocated pointer to the predicate register.  */
+static inline TCGv_ptr pred_full_reg_ptr(DisasContext *s, int regno)
+{
+TCGv_ptr ret = tcg_temp_new_ptr();
+tcg_gen_addi_ptr(ret, cpu_env, pred_full_reg_offset(s, regno));
+return ret;
+}
+
 bool disas_sve(DisasContext *, uint32_t);
 bool disas_sme(DisasContext *, uint32_t);
 
diff --git a/target/arm/translate.h b/target/arm/translate.h
index e2e619dab2..af5d4a7086 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -156,6 +156,11 @@ static inline int plus_2(DisasContext *s, int x)
 return x + 2;
 }
 
+static inline int plus_12(DisasContext *s, int x)
+{
+return x + 12;
+}
+
 static inline int times_2(DisasContext *s, int x)
 {
 return x * 2;
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index 6e4483fdce..241b4895b7 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -22,3 +22,18 @@
 ### SME Misc
 
 ZERO1100 00 001 000 imm:8
+
+### SME Move into/from Array
+
+%mova_rs13:2 !function=plus_12
+&mova   esz rs pg zr za_imm v:bool to_vec:bool
+
+MOVA1100 esz:2 0 0 v:1 .. pg:3 zr:5 0 za_imm:4  \
+&mova to_vec=0 rs=%mova_rs
+MOVA1100 110 1 v:1 .. pg:3 zr:5 0 za_imm:4  \
+&mova to_vec=0 rs=%mova_rs esz=4
+
+MOVA1100 esz:2 1 0 v:1 .. pg:3 0 za_imm:4 zr:5  \
+&mova to_vec=1 rs=%mova_rs
+MOVA1100 111 1 v:1 .. pg:3 0 za_imm:4 zr:5  \
+&mova to_vec=1 rs=%mova_rs esz=4
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index eef2df73e1..5389e361c3 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -19,8 +19,10 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
-#include "internals.h"
+#include "tcg/tcg-gvec-desc.h"
 #include "exec/helper-proto.h"
+#include "qemu/int128.h"
+#include "vec_internal.h"
 
 /* ResetSVEState */
 void arm_reset_sve_state(CPUARMState *env)
@@ -84,3 +86,150 @@ void helper_sme_zero(CPUARMState *env, uint32_t imm, 
uint32_t svl)
 }
 }
 }
+
+
+/*
+ * When considering the ZA storage as an array of elements of
+ * type T, the index within that array of the Nth e

[PATCH v5 26/45] target/arm: Implement FMOPA, FMOPS (widening)

2022-07-06 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  2 ++
 target/arm/sme.decode  |  1 +
 target/arm/sme_helper.c| 68 ++
 target/arm/translate-sme.c |  1 +
 4 files changed, 72 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index 1d68fb8c74..4d5d05db3a 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -121,6 +121,8 @@ DEF_HELPER_FLAGS_5(sme_addva_s, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index afd9c0dffd..e8d27fd8a0 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -75,3 +75,4 @@ FMOPA_s 1000 100 . ... ... . . 00 ..  
  @op_32
 FMOPA_d 1000 110 . ... ... . . 0 ...@op_64
 
 BFMOPA  1001 100 . ... ... . . 00 ..@op_32
+FMOPA_h 1001 101 . ... ... . . 00 ..@op_32
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 4b437bb913..e92f53ecab 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -998,6 +998,74 @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, 
uint32_t pg, uint32_t neg)
 return pair;
 }
 
+static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2,
+  float_status *s_std, float_status *s_odd)
+{
+float64 e1r = float16_to_float64(e1 & 0x, true, s_std);
+float64 e1c = float16_to_float64(e1 >> 16, true, s_std);
+float64 e2r = float16_to_float64(e2 & 0x, true, s_std);
+float64 e2c = float16_to_float64(e2 >> 16, true, s_std);
+float64 t64;
+float32 t32;
+
+/*
+ * The ARM pseudocode function FPDot performs both multiplies
+ * and the add with a single rounding operation.  Emulate this
+ * by performing the first multiply in round-to-odd, then doing
+ * the second multiply as fused multiply-add, and rounding to
+ * float32 all in one step.
+ */
+t64 = float64_mul(e1r, e2r, s_odd);
+t64 = float64r32_muladd(e1c, e2c, t64, 0, s_std);
+
+/* This conversion is exact, because we've already rounded. */
+t32 = float64_to_float32(t64, s_std);
+
+/* The final accumulation step is not fused. */
+return float32_add(sum, t32, s_std);
+}
+
+void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn,
+ void *vpm, void *vst, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_maxsz(desc);
+uint32_t neg = simd_data(desc) << 15;
+uint16_t *pn = vpn, *pm = vpm;
+float_status fpst_odd, fpst_std = *(float_status *)vst;
+
+set_default_nan_mode(true, &fpst_std);
+fpst_odd = fpst_std;
+set_float_rounding_mode(float_round_to_odd, &fpst_odd);
+
+for (row = 0; row < oprsz; ) {
+uint16_t pa = pn[H2(row >> 4)];
+do {
+void *vza_row = vza + tile_vslice_offset(row);
+uint32_t n = *(uint32_t *)(vzn + row);
+
+n = f16mop_adj_pair(n, pa, neg);
+
+for (col = 0; col < oprsz; ) {
+uint16_t pb = pm[H2(col >> 4)];
+do {
+if ((pa & 0b0101) == 0b0101 || (pb & 0b0101) == 0b0101) {
+uint32_t *a = vza_row + col;
+uint32_t m = *(uint32_t *)(vzm + col);
+
+m = f16mop_adj_pair(m, pb, neg);
+*a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd);
+
+col += 4;
+pb >>= 4;
+}
+} while (col & 15);
+}
+row += 4;
+pa >>= 4;
+} while (row & 15);
+}
+}
+
 void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
 void *vpm, uint32_t desc)
 {
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index ecb7583c55..c2953b22ce 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -355,6 +355,7 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, 
MemOp esz,
 return true;
 }
 
+TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_h)
 TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_s)
 TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, 
gen_helper_sme_fmopa_d)
 
-- 
2.34.1

[PATCH v5 22/45] target/arm: Implement SME LDR, STR

2022-07-06 Thread Richard Henderson

We can reuse the SVE functions for LDR and STR, passing in the
base of the ZA vector and a zero offset.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sme.decode  |  7 +++
 target/arm/translate-sme.c | 24 
 2 files changed, 31 insertions(+)

diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index 900e3f2a07..f1ebd857a5 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -46,3 +46,10 @@ LDST1   111 0 esz:2 st:1 rm:5 v:1 .. pg:3 rn:5 0 
za_imm:4  \
 &ldst rs=%mova_rs
 LDST1   111 111 st:1 rm:5 v:1 .. pg:3 rn:5 0 za_imm:4  \
 &ldst esz=4 rs=%mova_rs
+
+&ldstr  rv rn imm
+@ldstr  ... ... . .. .. ... rn:5 . imm:4 \
+&ldstr rv=%mova_rs
+
+LDR 111 100 0 00 .. 000 . 0 @ldstr
+STR 111 100 1 00 .. 000 . 0 @ldstr
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index 42d14b883a..35c2644812 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -243,3 +243,27 @@ static bool trans_LDST1(DisasContext *s, arg_LDST1 *a)
 tcg_temp_free_i64(addr);
 return true;
 }
+
+typedef void GenLdStR(DisasContext *, TCGv_ptr, int, int, int, int);
+
+static bool do_ldst_r(DisasContext *s, arg_ldstr *a, GenLdStR *fn)
+{
+int svl = streaming_vec_reg_size(s);
+int imm = a->imm;
+TCGv_ptr base;
+
+if (!sme_za_enabled_check(s)) {
+return true;
+}
+
+/* ZA[n] equates to ZA0H.B[n]. */
+base = get_tile_rowcol(s, MO_8, a->rv, imm, false);
+
+fn(s, base, 0, svl, a->rn, imm * svl);
+
+tcg_temp_free_ptr(base);
+return true;
+}
+
+TRANS_FEAT(LDR, aa64_sme, do_ldst_r, a, gen_sve_ldr)
+TRANS_FEAT(STR, aa64_sme, do_ldst_r, a, gen_sve_str)
-- 
2.34.1

[PATCH v5 28/45] target/arm: Implement PSEL

2022-07-06 Thread Richard Henderson

This is an SVE instruction that operates using the SVE vector
length but that it is present only if SME is implemented.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/sve.decode  | 20 +
 target/arm/translate-sve.c | 57 ++
 2 files changed, 77 insertions(+)

diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 95af08c139..966803cbb7 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -1674,3 +1674,23 @@ BFMLALT_zzxw01100100 11 1 . 0100.1 . .   
  @rrxr_3a esz=2
 
 ### SVE2 floating-point bfloat16 dot-product (indexed)
 BFDOT_zzxz  01100100 01 1 . 01 . . @rrxr_2 esz=2
+
+### SVE broadcast predicate element
+
+&psel   esz pd pn pm rv imm
+%psel_rv16:2 !function=plus_12
+%psel_imm_b 22:2 19:2
+%psel_imm_h 22:2 20:1
+%psel_imm_s 22:2
+%psel_imm_d 23:1
+@psel    .. . ... .. .. pn:4 . pm:4 . pd:4  \
+&psel rv=%psel_rv
+
+PSEL00100101 .. 1 ..1 .. 01  0  0   \
+@psel esz=0 imm=%psel_imm_b
+PSEL00100101 .. 1 .10 .. 01  0  0   \
+@psel esz=1 imm=%psel_imm_h
+PSEL00100101 .. 1 100 .. 01  0  0   \
+@psel esz=2 imm=%psel_imm_s
+PSEL00100101 .1 1 000 .. 01  0  0   \
+@psel esz=3 imm=%psel_imm_d
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fd1a173637..24ffb69a2a 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -7419,3 +7419,60 @@ static bool do_BFMLAL_zzxw(DisasContext *s, arg_rrxr_esz 
*a, bool sel)
 
 TRANS_FEAT(BFMLALB_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, false)
 TRANS_FEAT(BFMLALT_zzxw, aa64_sve_bf16, do_BFMLAL_zzxw, a, true)
+
+static bool trans_PSEL(DisasContext *s, arg_psel *a)
+{
+int vl = vec_full_reg_size(s);
+int pl = pred_gvec_reg_size(s);
+int elements = vl >> a->esz;
+TCGv_i64 tmp, didx, dbit;
+TCGv_ptr ptr;
+
+if (!dc_isar_feature(aa64_sme, s)) {
+return false;
+}
+if (!sve_access_check(s)) {
+return true;
+}
+
+tmp = tcg_temp_new_i64();
+dbit = tcg_temp_new_i64();
+didx = tcg_temp_new_i64();
+ptr = tcg_temp_new_ptr();
+
+/* Compute the predicate element. */
+tcg_gen_addi_i64(tmp, cpu_reg(s, a->rv), a->imm);
+if (is_power_of_2(elements)) {
+tcg_gen_andi_i64(tmp, tmp, elements - 1);
+} else {
+tcg_gen_remu_i64(tmp, tmp, tcg_constant_i64(elements));
+}
+
+/* Extract the predicate byte and bit indices. */
+tcg_gen_shli_i64(tmp, tmp, a->esz);
+tcg_gen_andi_i64(dbit, tmp, 7);
+tcg_gen_shri_i64(didx, tmp, 3);
+if (HOST_BIG_ENDIAN) {
+tcg_gen_xori_i64(didx, didx, 7);
+}
+
+/* Load the predicate word. */
+tcg_gen_trunc_i64_ptr(ptr, didx);
+tcg_gen_add_ptr(ptr, ptr, cpu_env);
+tcg_gen_ld8u_i64(tmp, ptr, pred_full_reg_offset(s, a->pm));
+
+/* Extract the predicate bit and replicate to MO_64. */
+tcg_gen_shr_i64(tmp, tmp, dbit);
+tcg_gen_andi_i64(tmp, tmp, 1);
+tcg_gen_neg_i64(tmp, tmp);
+
+/* Apply to either copy the source, or write zeros. */
+tcg_gen_gvec_ands(MO_64, pred_full_reg_offset(s, a->pd),
+  pred_full_reg_offset(s, a->pn), tmp, pl, pl);
+
+tcg_temp_free_i64(tmp);
+tcg_temp_free_i64(dbit);
+tcg_temp_free_i64(didx);
+tcg_temp_free_ptr(ptr);
+return true;
+}
-- 
2.34.1

[PATCH v7 13/14] KVM: Enable and expose KVM_MEM_PRIVATE

2022-07-06 Thread Chao Peng

Register private memslot to fd-based memory backing store and handle the
memfile notifiers to zap the existing mappings.

Currently the register is happened at memslot creating time and the
initial support does not include page migration/swap.

KVM_MEM_PRIVATE is not exposed by default, architecture code can turn
on it by implementing kvm_arch_private_mem_supported().

A 'kvm' reference is added in memslot structure since in
memfile_notifier callbacks we can only obtain a memslot reference while
kvm is need to do the zapping.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |   1 +
 virt/kvm/kvm_main.c  | 117 ---
 2 files changed, 109 insertions(+), 9 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8f56426aa1e3..4e5a0db68799 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -584,6 +584,7 @@ struct kvm_memory_slot {
struct file *private_file;
loff_t private_offset;
struct memfile_notifier notifier;
+   struct kvm *kvm;
 };
 
 static inline bool kvm_slot_can_be_private(const struct kvm_memory_slot *slot)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bb714c2a4b06..d6f7e074cab2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -941,6 +941,63 @@ static int kvm_vm_ioctl_set_encrypted_region(struct kvm 
*kvm, unsigned int ioctl
 
return r;
 }
+
+static void kvm_memfile_notifier_invalidate(struct memfile_notifier *notifier,
+   pgoff_t start, pgoff_t end)
+{
+   struct kvm_memory_slot *slot = container_of(notifier,
+   struct kvm_memory_slot,
+   notifier);
+   unsigned long base_pgoff = slot->private_offset >> PAGE_SHIFT;
+   gfn_t start_gfn = slot->base_gfn;
+   gfn_t end_gfn = slot->base_gfn + slot->npages;
+
+
+   if (start > base_pgoff)
+   start_gfn = slot->base_gfn + start - base_pgoff;
+
+   if (end < base_pgoff + slot->npages)
+   end_gfn = slot->base_gfn + end - base_pgoff;
+
+   if (start_gfn >= end_gfn)
+   return;
+
+   kvm_zap_gfn_range(slot->kvm, start_gfn, end_gfn);
+}
+
+static struct memfile_notifier_ops kvm_memfile_notifier_ops = {
+   .invalidate = kvm_memfile_notifier_invalidate,
+};
+
+#define KVM_MEMFILE_FLAGS (MEMFILE_F_USER_INACCESSIBLE | \
+  MEMFILE_F_UNMOVABLE | \
+  MEMFILE_F_UNRECLAIMABLE)
+
+static inline int kvm_private_mem_register(struct kvm_memory_slot *slot)
+{
+   slot->notifier.ops = &kvm_memfile_notifier_ops;
+   return memfile_register_notifier(slot->private_file, KVM_MEMFILE_FLAGS,
+&slot->notifier);
+}
+
+static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot)
+{
+   memfile_unregister_notifier(&slot->notifier);
+}
+
+#else /* !CONFIG_HAVE_KVM_PRIVATE_MEM */
+
+static inline int kvm_private_mem_register(struct kvm_memory_slot *slot)
+{
+   WARN_ON_ONCE(1);
+   return -EOPNOTSUPP;
+}
+
+static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot)
+{
+   WARN_ON_ONCE(1);
+}
+
 #endif /* CONFIG_HAVE_KVM_PRIVATE_MEM */
 
 #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
@@ -987,6 +1044,11 @@ static void kvm_destroy_dirty_bitmap(struct 
kvm_memory_slot *memslot)
 /* This does not remove the slot from struct kvm_memslots data structures */
 static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
 {
+   if (slot->flags & KVM_MEM_PRIVATE) {
+   kvm_private_mem_unregister(slot);
+   fput(slot->private_file);
+   }
+
kvm_destroy_dirty_bitmap(slot);
 
kvm_arch_free_memslot(kvm, slot);
@@ -1548,10 +1610,16 @@ bool __weak kvm_arch_private_mem_supported(struct kvm 
*kvm)
return false;
 }
 
-static int check_memory_region_flags(const struct kvm_user_mem_region *mem)
+static int check_memory_region_flags(struct kvm *kvm,
+const struct kvm_user_mem_region *mem)
 {
u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
 
+#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
+   if (kvm_arch_private_mem_supported(kvm))
+   valid_flags |= KVM_MEM_PRIVATE;
+#endif
+
 #ifdef __KVM_HAVE_READONLY_MEM
valid_flags |= KVM_MEM_READONLY;
 #endif
@@ -1627,6 +1695,12 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 {
int r;
 
+   if (change == KVM_MR_CREATE && new->flags & KVM_MEM_PRIVATE) {
+   r = kvm_private_mem_register(new);
+   if (r)
+   return r;
+   }
+
/*
 * If dirty logging is disabled, nullify the bitmap; the old bitmap
 * will be freed on "commit".  If logging is enabled in both old and
@@ -1655,6 +1729,9 @@ static int kvm_prepare_memo

[PATCH v7 10/14] KVM: Add KVM_EXIT_MEMORY_FAULT exit

2022-07-06 Thread Chao Peng

This new KVM exit allows userspace to handle memory-related errors. It
indicates an error happens in KVM at guest memory range [gpa, gpa+size).
The flags includes additional information for userspace to handle the
error. Currently bit 0 is defined as 'private memory' where '1'
indicates error happens due to private memory access and '0' indicates
error happens due to shared memory access.

After private memory is enabled, this new exit will be used for KVM to
exit to userspace for shared memory <-> private memory conversion in
memory encryption usage.

In such usage, typically there are two kind of memory conversions:
  - explicit conversion: happens when guest explicitly calls into KVM to
map a range (as private or shared), KVM then exits to userspace to
do the map/unmap operations.
  - implicit conversion: happens in KVM page fault handler.
* if the fault is due to a private memory access then causes a
  userspace exit for a shared->private conversion when the page
  is recognized as shared by KVM.
* If the fault is due to a shared memory access then causes a
  userspace exit for a private->shared conversion when the page
  is recognized as private by KVM.

Suggested-by: Sean Christopherson 
Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 Documentation/virt/kvm/api.rst | 22 ++
 include/uapi/linux/kvm.h   |  9 +
 2 files changed, 31 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4f27c973a952..5ecfc7fbe0ee 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6583,6 +6583,28 @@ array field represents return values. The userspace 
should update the return
 values of SBI call before resuming the VCPU. For more details on RISC-V SBI
 spec refer, https://github.com/riscv/riscv-sbi-doc.
 
+::
+
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0)
+   __u32 flags;
+   __u32 padding;
+   __u64 gpa;
+   __u64 size;
+   } memory;
+If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has
+encountered a memory error which is not handled by KVM kernel module and
+userspace may choose to handle it. The 'flags' field indicates the memory
+properties of the exit.
+
+ - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by
+   private memory access when the bit is set otherwise the memory error is
+   caused by shared memory access when the bit is clear.
+
+'gpa' and 'size' indicate the memory range the error occurs at. The userspace
+may handle the error and return to KVM to retry the previous memory access.
+
 ::
 
 /* KVM_EXIT_NOTIFY */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c467c69b7ad7..83c278f284dd 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -299,6 +299,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_XEN  34
 #define KVM_EXIT_RISCV_SBI35
 #define KVM_EXIT_NOTIFY   36
+#define KVM_EXIT_MEMORY_FAULT 37
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -530,6 +531,14 @@ struct kvm_run {
 #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
__u32 flags;
} notify;
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1 << 0)
+   __u32 flags;
+   __u32 padding;
+   __u64 gpa;
+   __u64 size;
+   } memory;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.25.1

[PATCH v7 08/14] KVM: Rename mmu_notifier_*

2022-07-06 Thread Chao Peng

The sync mechanism between mmu_notifier and page fault handler employs
fields mmu_notifier_seq/count and mmu_notifier_range_start/end. For the
to be added private memory, there is the same mechanism needed but not
rely on mmu_notifier (It uses new introduced memfile_notifier). This
patch renames the existing fields and related helper functions to a
neutral name mmu_updating_* so private memory can reuse.

No functional change intended.

Signed-off-by: Chao Peng 
---
 arch/arm64/kvm/mmu.c |  8 ++---
 arch/mips/kvm/mmu.c  | 10 +++---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_host.c|  4 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c  |  4 +--
 arch/powerpc/kvm/book3s_64_mmu_radix.c   |  6 ++--
 arch/powerpc/kvm/book3s_hv_nested.c  |  2 +-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  |  8 ++---
 arch/powerpc/kvm/e500_mmu_host.c |  4 +--
 arch/riscv/kvm/mmu.c |  4 +--
 arch/x86/kvm/mmu/mmu.c   | 14 
 arch/x86/kvm/mmu/paging_tmpl.h   |  4 +--
 include/linux/kvm_host.h | 38 ++---
 virt/kvm/kvm_main.c  | 42 +++-
 virt/kvm/pfncache.c  | 14 
 15 files changed, 81 insertions(+), 83 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 87f1cd0df36e..7ee6fafc24ee 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -993,7 +993,7 @@ transparent_hugepage_adjust(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
 * THP doesn't start to split while we are adjusting the
 * refcounts.
 *
-* We are sure this doesn't happen, because mmu_notifier_retry
+* We are sure this doesn't happen, because mmu_updating_retry
 * was successful and we are holding the mmu_lock, so if this
 * THP is trying to split, it will be blocked in the mmu
 * notifier before touching any of the pages, specifically
@@ -1188,9 +1188,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
return ret;
}
 
-   mmu_seq = vcpu->kvm->mmu_notifier_seq;
+   mmu_seq = vcpu->kvm->mmu_updating_seq;
/*
-* Ensure the read of mmu_notifier_seq happens before we call
+* Ensure the read of mmu_updating_seq happens before we call
 * gfn_to_pfn_prot (which calls get_user_pages), so that we don't risk
 * the page we just got a reference to gets unmapped before we have a
 * chance to grab the mmu_lock, which ensure that if the page gets
@@ -1246,7 +1246,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
phys_addr_t fault_ipa,
else
write_lock(&kvm->mmu_lock);
pgt = vcpu->arch.hw_mmu->pgt;
-   if (mmu_notifier_retry(kvm, mmu_seq))
+   if (mmu_updating_retry(kvm, mmu_seq))
goto out_unlock;
 
/*
diff --git a/arch/mips/kvm/mmu.c b/arch/mips/kvm/mmu.c
index 1bfd1b501d82..abd468c6a749 100644
--- a/arch/mips/kvm/mmu.c
+++ b/arch/mips/kvm/mmu.c
@@ -615,17 +615,17 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
unsigned long gpa,
 * Used to check for invalidations in progress, of the pfn that is
 * returned by pfn_to_pfn_prot below.
 */
-   mmu_seq = kvm->mmu_notifier_seq;
+   mmu_seq = kvm->mmu_updating_seq;
/*
-* Ensure the read of mmu_notifier_seq isn't reordered with PTE reads in
+* Ensure the read of mmu_updating_seq isn't reordered with PTE reads in
 * gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
 * risk the page we get a reference to getting unmapped before we have a
-* chance to grab the mmu_lock without mmu_notifier_retry() noticing.
+* chance to grab the mmu_lock without mmu_updating_retry () noticing.
 *
 * This smp_rmb() pairs with the effective smp_wmb() of the combination
 * of the pte_unmap_unlock() after the PTE is zapped, and the
 * spin_lock() in kvm_mmu_notifier_invalidate_() before
-* mmu_notifier_seq is incremented.
+* mmu_updating_seq is incremented.
 */
smp_rmb();
 
@@ -638,7 +638,7 @@ static int kvm_mips_map_page(struct kvm_vcpu *vcpu, 
unsigned long gpa,
 
spin_lock(&kvm->mmu_lock);
/* Check if an invalidation has taken place since we got pfn */
-   if (mmu_notifier_retry(kvm, mmu_seq)) {
+   if (mmu_updating_retry(kvm, mmu_seq)) {
/*
 * This can happen when mappings are changed asynchronously, but
 * also synchronously if a COW is triggered by
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h 
b/arch/powerpc/include/asm/kvm_book3s_64.h
index 4def2bd17b9b..4d35fb913de5 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/pow

[PATCH v5 32/45] target/arm: Enable SME for -cpu max

2022-07-06 Thread Richard Henderson

Note that SME remains effectively disabled for user-only,
because we do not yet set CPACR_EL1.SMEN.  This needs to
wait until the kernel ABI is implemented.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst |  4 
 target/arm/cpu64.c| 11 +++
 2 files changed, 15 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 83b4410065..8e494c8bea 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -65,6 +65,10 @@ the following architecture extensions:
 - FEAT_SHA512 (Advanced SIMD SHA512 instructions)
 - FEAT_SM3 (Advanced SIMD SM3 instructions)
 - FEAT_SM4 (Advanced SIMD SM4 instructions)
+- FEAT_SME (Scalable Matrix Extension)
+- FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode)
+- FEAT_SME_F64F64 (Double-precision floating-point outer product instructions)
+- FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product 
instructions)
 - FEAT_SPECRES (Speculation restriction instructions)
 - FEAT_SSBS (Speculative Store Bypass Safe)
 - FEAT_TLBIOS (TLB invalidate instructions in Outer Shareable domain)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 19188d6cc2..40a0f043d0 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -1018,6 +1018,7 @@ static void aarch64_max_initfn(Object *obj)
  */
 t = FIELD_DP64(t, ID_AA64PFR1, MTE, 3);   /* FEAT_MTE3 */
 t = FIELD_DP64(t, ID_AA64PFR1, RAS_FRAC, 0);  /* FEAT_RASv1p1 + 
FEAT_DoubleFault */
+t = FIELD_DP64(t, ID_AA64PFR1, SME, 1);   /* FEAT_SME */
 t = FIELD_DP64(t, ID_AA64PFR1, CSV2_FRAC, 0); /* FEAT_CSV2_2 */
 cpu->isar.id_aa64pfr1 = t;
 
@@ -1068,6 +1069,16 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64DFR0, PMUVER, 5);/* FEAT_PMUv3p4 */
 cpu->isar.id_aa64dfr0 = t;
 
+t = cpu->isar.id_aa64smfr0;
+t = FIELD_DP64(t, ID_AA64SMFR0, F32F32, 1);   /* FEAT_SME */
+t = FIELD_DP64(t, ID_AA64SMFR0, B16F32, 1);   /* FEAT_SME */
+t = FIELD_DP64(t, ID_AA64SMFR0, F16F32, 1);   /* FEAT_SME */
+t = FIELD_DP64(t, ID_AA64SMFR0, I8I32, 0xf);  /* FEAT_SME */
+t = FIELD_DP64(t, ID_AA64SMFR0, F64F64, 1);   /* FEAT_SME_F64F64 */
+t = FIELD_DP64(t, ID_AA64SMFR0, I16I64, 0xf); /* FEAT_SME_I16I64 */
+t = FIELD_DP64(t, ID_AA64SMFR0, FA64, 1); /* FEAT_SME_FA64 */
+cpu->isar.id_aa64smfr0 = t;
+
 /* Replicate the same data to the 32-bit id registers.  */
 aa32_max_features(cpu);
 
-- 
2.34.1

[PATCH v5 44/45] target/arm: Enable SME for user-only

2022-07-06 Thread Richard Henderson

Enable SME, TPIDR2_EL0, and FA64 if supported by the cpu.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 9b54443843..5de7e097e9 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -210,6 +210,17 @@ static void arm_cpu_reset(DeviceState *dev)
  CPACR_EL1, ZEN, 3);
 env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
 }
+/* and for SME instructions, with default vector length, and TPIDR2 */
+if (cpu_isar_feature(aa64_sme, cpu)) {
+env->cp15.sctlr_el[1] |= SCTLR_EnTP2;
+env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
+ CPACR_EL1, SMEN, 3);
+env->vfp.smcr_el[1] = cpu->sme_default_vq - 1;
+if (cpu_isar_feature(aa64_sme_fa64, cpu)) {
+env->vfp.smcr_el[1] = FIELD_DP64(env->vfp.smcr_el[1],
+ SMCR, FA64, 1);
+}
+}
 /*
  * Enable 48-bit address space (TODO: take reserved_va into account).
  * Enable TBI0 but not TBI1.
-- 
2.34.1

[PATCH v5 35/45] linux-user/aarch64: Add SM bit to SVE signal context

2022-07-06 Thread Richard Henderson

Make sure to zero the currently reserved fields.

Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 7da0e36c6d..3cef2f44cf 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -78,7 +78,8 @@ struct target_extra_context {
 struct target_sve_context {
 struct target_aarch64_ctx head;
 uint16_t vl;
-uint16_t reserved[3];
+uint16_t flags;
+uint16_t reserved[2];
 /* The actual SVE data immediately follows.  It is laid out
  * according to TARGET_SVE_SIG_{Z,P}REG_OFFSET, based off of
  * the original struct pointer.
@@ -101,6 +102,8 @@ struct target_sve_context {
 #define TARGET_SVE_SIG_CONTEXT_SIZE(VQ) \
 (TARGET_SVE_SIG_PREG_OFFSET(VQ, 17))
 
+#define TARGET_SVE_SIG_FLAG_SM  1
+
 struct target_rt_sigframe {
 struct target_siginfo info;
 struct target_ucontext uc;
@@ -177,9 +180,13 @@ static void target_setup_sve_record(struct 
target_sve_context *sve,
 {
 int i, j;
 
+memset(sve, 0, sizeof(*sve));
 __put_user(TARGET_SVE_MAGIC, &sve->head.magic);
 __put_user(size, &sve->head.size);
 __put_user(vq * TARGET_SVE_VQ_BYTES, &sve->vl);
+if (FIELD_EX64(env->svcr, SVCR, SM)) {
+__put_user(TARGET_SVE_SIG_FLAG_SM, &sve->flags);
+}
 
 /* Note that SVE regs are stored as a byte stream, with each byte element
  * at a subsequent address.  This corresponds to a little-endian store
-- 
2.34.1

[PATCH v5 25/45] target/arm: Implement BFMOPA, BFMOPS

2022-07-06 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper-sme.h|  2 ++
 target/arm/sme.decode  |  2 ++
 target/arm/sme_helper.c| 52 ++
 target/arm/translate-sme.c | 30 ++
 4 files changed, 86 insertions(+)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index f50d0fe1d6..1d68fb8c74 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -125,3 +125,5 @@ DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG,
void, ptr, ptr, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_6(sme_bfmopa, TCG_CALL_NO_RWG,
+   void, ptr, ptr, ptr, ptr, ptr, i32)
diff --git a/target/arm/sme.decode b/target/arm/sme.decode
index ba4774d174..afd9c0dffd 100644
--- a/target/arm/sme.decode
+++ b/target/arm/sme.decode
@@ -73,3 +73,5 @@ ADDVA_d 1100 11 01000 1 ... ... . 00 ...  
  @adda_64
 
 FMOPA_s 1000 100 . ... ... . . 00 ..@op_32
 FMOPA_d 1000 110 . ... ... . . 0 ...@op_64
+
+BFMOPA  1001 100 . ... ... . . 00 ..@op_32
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 78ba34f3d2..4b437bb913 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -981,3 +981,55 @@ void HELPER(sme_fmopa_d)(void *vza, void *vzn, void *vzm, 
void *vpn,
 }
 }
 }
+
+/*
+ * Alter PAIR as needed for controlling predicates being false,
+ * and for NEG on an enabled row element.
+ */
+static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t 
neg)
+{
+pair ^= neg;
+if (!(pg & 1)) {
+pair &= 0xu;
+}
+if (!(pg & 4)) {
+pair &= 0xu;
+}
+return pair;
+}
+
+void HELPER(sme_bfmopa)(void *vza, void *vzn, void *vzm, void *vpn,
+void *vpm, uint32_t desc)
+{
+intptr_t row, col, oprsz = simd_maxsz(desc);
+uint32_t neg = simd_data(desc) << 15;
+uint16_t *pn = vpn, *pm = vpm;
+
+for (row = 0; row < oprsz; ) {
+uint16_t pa = pn[H2(row >> 4)];
+do {
+void *vza_row = vza + tile_vslice_offset(row);
+uint32_t n = *(uint32_t *)(vzn + row);
+
+n = f16mop_adj_pair(n, pa, neg);
+
+for (col = 0; col < oprsz; ) {
+uint16_t pb = pm[H2(col >> 4)];
+do {
+if ((pa & 0b0101) == 0b0101 || (pb & 0b0101) == 0b0101) {
+uint32_t *a = vza_row + col;
+uint32_t m = *(uint32_t *)(vzm + col);
+
+m = f16mop_adj_pair(m, pb, neg);
+*a = bfdotadd(*a, n, m);
+
+col += 4;
+pb >>= 4;
+}
+} while (col & 15);
+}
+row += 4;
+pa >>= 4;
+} while (row & 15);
+}
+}
diff --git a/target/arm/translate-sme.c b/target/arm/translate-sme.c
index fa8f343a7d..ecb7583c55 100644
--- a/target/arm/translate-sme.c
+++ b/target/arm/translate-sme.c
@@ -299,6 +299,33 @@ TRANS_FEAT(ADDVA_s, aa64_sme, do_adda, a, MO_32, 
gen_helper_sme_addva_s)
 TRANS_FEAT(ADDHA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addha_d)
 TRANS_FEAT(ADDVA_d, aa64_sme_i16i64, do_adda, a, MO_64, gen_helper_sme_addva_d)
 
+static bool do_outprod(DisasContext *s, arg_op *a, MemOp esz,
+   gen_helper_gvec_5 *fn)
+{
+int svl = streaming_vec_reg_size(s);
+uint32_t desc = simd_desc(svl, svl, a->sub);
+TCGv_ptr za, zn, zm, pn, pm;
+
+if (!sme_smza_enabled_check(s)) {
+return true;
+}
+
+/* Sum XZR+zad to find ZAd. */
+za = get_tile_rowcol(s, esz, 31, a->zad, false);
+zn = vec_full_reg_ptr(s, a->zn);
+zm = vec_full_reg_ptr(s, a->zm);
+pn = pred_full_reg_ptr(s, a->pn);
+pm = pred_full_reg_ptr(s, a->pm);
+
+fn(za, zn, zm, pn, pm, tcg_constant_i32(desc));
+
+tcg_temp_free_ptr(za);
+tcg_temp_free_ptr(zn);
+tcg_temp_free_ptr(pn);
+tcg_temp_free_ptr(pm);
+return true;
+}
+
 static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz,
 gen_helper_gvec_5_ptr *fn)
 {
@@ -330,3 +357,6 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, 
MemOp esz,
 
 TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, 
gen_helper_sme_fmopa_s)
 TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a, MO_64, 
gen_helper_sme_fmopa_d)
+
+/* TODO: FEAT_EBF16 */
+TRANS_FEAT(BFMOPA, aa64_sme, do_outprod, a, MO_32, gen_helper_sme_bfmopa)
-- 
2.34.1

[PATCH v7 11/14] KVM: Register/unregister the guest private memory regions

2022-07-06 Thread Chao Peng

If CONFIG_HAVE_KVM_PRIVATE_MEM=y, userspace can register/unregister the
guest private memory regions through KVM_MEMORY_ENCRYPT_{UN,}REG_REGION
ioctls. The patch reuses existing SEV ioctl but differs that the
address in the region for private memory is gpa while SEV case it's hva.

The private memory region is stored as xarray in KVM for memory
efficiency in normal usages and zapping existing memory mappings is also
a side effect of these two ioctls.

Signed-off-by: Chao Peng 
---
 Documentation/virt/kvm/api.rst  | 17 +++---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/Kconfig|  1 +
 arch/x86/kvm/mmu.h  |  2 --
 include/linux/kvm_host.h|  8 +
 virt/kvm/kvm_main.c | 57 +
 6 files changed, 80 insertions(+), 6 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 5ecfc7fbe0ee..dfb4caecab73 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4715,10 +4715,19 @@ Documentation/virt/kvm/amd-memory-encryption.rst.
 This ioctl can be used to register a guest memory region which may
 contain encrypted data (e.g. guest RAM, SMRAM etc).
 
-It is used in the SEV-enabled guest. When encryption is enabled, a guest
-memory region may contain encrypted data. The SEV memory encryption
-engine uses a tweak such that two identical plaintext pages, each at
-different locations will have differing ciphertexts. So swapping or
+Currently this ioctl supports registering memory regions for two usages:
+private memory and SEV-encrypted memory.
+
+When private memory is enabled, this ioctl is used to register guest private
+memory region and the addr/size of kvm_enc_region represents guest physical
+address (GPA). In this usage, this ioctl zaps the existing guest memory
+mappings in KVM that fallen into the region.
+
+When SEV-encrypted memory is enabled, this ioctl is used to register guest
+memory region which may contain encrypted data for a SEV-enabled guest. The
+addr/size of kvm_enc_region represents userspace address (HVA). The SEV
+memory encryption engine uses a tweak such that two identical plaintext pages,
+each at different locations will have differing ciphertexts. So swapping or
 moving ciphertext of those pages will not result in plaintext being
 swapped. So relocating (or migrating) physical backing pages for the SEV
 guest will require some additional steps.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dae190e19fce..92120e3a224e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -37,6 +37,7 @@
 #include 
 
 #define __KVM_HAVE_ARCH_VCPU_DEBUGFS
+#define __KVM_HAVE_ZAP_GFN_RANGE
 
 #define KVM_MAX_VCPUS 1024
 
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 1f160801e2a7..05861b9656a4 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -50,6 +50,7 @@ config KVM
select HAVE_KVM_PM_NOTIFIER if PM
select HAVE_KVM_PRIVATE_MEM if X86_64
select MEMFILE_NOTIFIER if HAVE_KVM_PRIVATE_MEM
+   select XARRAY_MULTI if HAVE_KVM_PRIVATE_MEM
help
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index a99acec925eb..428cd2e88cbd 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -209,8 +209,6 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, 
struct kvm_mmu *mmu,
return -(u32)fault & errcode;
 }
 
-void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
-
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_post_init_vm(struct kvm *kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1b203c8aa696..da33f8828456 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -260,6 +260,10 @@ bool kvm_test_age_gfn(struct kvm *kvm, struct 
kvm_gfn_range *range);
 bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range);
 #endif
 
+#ifdef __KVM_HAVE_ZAP_GFN_RANGE
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
+#endif
+
 enum {
OUTSIDE_GUEST_MODE,
IN_GUEST_MODE,
@@ -795,6 +799,9 @@ struct kvm {
struct notifier_block pm_notifier;
 #endif
char stats_id[KVM_STATS_NAME_SIZE];
+#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
+   struct xarray mem_attr_array;
+#endif
 };
 
 #define kvm_err(fmt, ...) \
@@ -1459,6 +1466,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu 
*vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_private_mem_supported(struct kvm *kvm);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 230c8ff9659c..bb714c2a4b06 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/k

[PATCH v5 38/45] linux-user/aarch64: Verify extra record lock succeeded

2022-07-06 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 8fbe98d72f..9ff79da4be 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -340,6 +340,9 @@ static int target_restore_sigframe(CPUARMState *env,
 __get_user(extra_size,
&((struct target_extra_context *)ctx)->size);
 extra = lock_user(VERIFY_READ, extra_datap, extra_size, 0);
+if (!extra) {
+return 1;
+}
 break;
 
 default:
-- 
2.34.1

[PATCH v5 29/45] target/arm: Implement REVD

2022-07-06 Thread Richard Henderson

This is an SVE instruction that operates using the SVE vector
length but that it is present only if SME is implemented.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-sve.h|  2 ++
 target/arm/sve.decode  |  1 +
 target/arm/sve_helper.c| 16 
 target/arm/translate-sve.c |  2 ++
 4 files changed, 21 insertions(+)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index ab0333400f..cc4e1d8948 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -719,6 +719,8 @@ DEF_HELPER_FLAGS_4(sve_revh_d, TCG_CALL_NO_RWG, void, ptr, 
ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(sve_revw_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_4(sme_revd_q, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(sve_rbit_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_rbit_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(sve_rbit_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/arm/sve.decode b/target/arm/sve.decode
index 966803cbb7..a9e48f07b4 100644
--- a/target/arm/sve.decode
+++ b/target/arm/sve.decode
@@ -652,6 +652,7 @@ REVB0101 .. 1001 00 100 ... . . 
@rd_pg_rn
 REVH0101 .. 1001 01 100 ... . . @rd_pg_rn
 REVW0101 .. 1001 10 100 ... . . @rd_pg_rn
 RBIT0101 .. 1001 11 100 ... . . @rd_pg_rn
+REVD0101 00 1011 10 100 ... . . @rd_pg_rn_e0
 
 # SVE vector splice (predicated, destructive)
 SPLICE  0101 .. 101 100 100 ... . . @rdn_pg_rm
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 9a26f253e0..5de82696b5 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -931,6 +931,22 @@ DO_ZPZ_D(sve_revh_d, uint64_t, hswap64)
 
 DO_ZPZ_D(sve_revw_d, uint64_t, wswap64)
 
+void HELPER(sme_revd_q)(void *vd, void *vn, void *vg, uint32_t desc)
+{
+intptr_t i, opr_sz = simd_oprsz(desc) / 8;
+uint64_t *d = vd, *n = vn;
+uint8_t *pg = vg;
+
+for (i = 0; i < opr_sz; i += 2) {
+if (pg[H1(i)] & 1) {
+uint64_t n0 = n[i + 0];
+uint64_t n1 = n[i + 1];
+d[i + 0] = n1;
+d[i + 1] = n0;
+}
+}
+}
+
 DO_ZPZ(sve_rbit_b, uint8_t, H1, revbit8)
 DO_ZPZ(sve_rbit_h, uint16_t, H1_2, revbit16)
 DO_ZPZ(sve_rbit_s, uint32_t, H1_4, revbit32)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 24ffb69a2a..9ed3b267fd 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -2901,6 +2901,8 @@ TRANS_FEAT(REVH, aa64_sve, gen_gvec_ool_arg_zpz, 
revh_fns[a->esz], a, 0)
 TRANS_FEAT(REVW, aa64_sve, gen_gvec_ool_arg_zpz,
a->esz == 3 ? gen_helper_sve_revw_d : NULL, a, 0)
 
+TRANS_FEAT(REVD, aa64_sme, gen_gvec_ool_arg_zpz, gen_helper_sme_revd_q, a, 0)
+
 TRANS_FEAT(SPLICE, aa64_sve, gen_gvec_ool_arg_zpzz,
gen_helper_sve_splice, a, a->esz)
 
-- 
2.34.1

[PATCH v7 12/14] KVM: Handle page fault for private memory

2022-07-06 Thread Chao Peng

A page fault can carry the private/shared information for
KVM_MEM_PRIVATE memslot, this can be filled by architecture code(like
TDX code). To handle page fault for such access, KVM maps the page only
when this private property matches the host's view on the page.

For a successful match, private pfn is obtained with memfile_notifier
callbacks from private fd and shared pfn is obtained with existing
get_user_pages.

For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
userspace. Userspace then can convert memory between private/shared from
host's view then retry the access.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 arch/x86/kvm/mmu/mmu.c  | 60 -
 arch/x86/kvm/mmu/mmu_internal.h | 18 ++
 arch/x86/kvm/mmu/mmutrace.h |  1 +
 include/linux/kvm_host.h| 35 ++-
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 545eb74305fe..27dbdd4fe8d1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3004,6 +3004,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
 
+   if (kvm_mem_is_private(kvm, gfn))
+   return max_level;
+
host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
return min(host_level, max_level);
 }
@@ -4101,10 +4104,52 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, 
struct kvm_async_pf *work)
kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
 }
 
+static inline u8 order_to_level(int order)
+{
+   enum pg_level level;
+
+   for (level = KVM_MAX_HUGEPAGE_LEVEL; level > PG_LEVEL_4K; level--)
+   if (order >= page_level_shift(level) - PAGE_SHIFT)
+   return level;
+   return level;
+}
+
+static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
+  struct kvm_page_fault *fault)
+{
+   int order;
+   struct kvm_memory_slot *slot = fault->slot;
+   bool private_exist = kvm_mem_is_private(vcpu->kvm, fault->gfn);
+
+   if (fault->is_private != private_exist) {
+   vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
+   if (fault->is_private)
+   vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
+   else
+   vcpu->run->memory.flags = 0;
+   vcpu->run->memory.padding = 0;
+   vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
+   vcpu->run->memory.size = PAGE_SIZE;
+   return RET_PF_USER;
+   }
+
+   if (fault->is_private) {
+   if (kvm_private_mem_get_pfn(slot, fault->gfn, &fault->pfn, 
&order))
+   return RET_PF_RETRY;
+   fault->max_level = min(order_to_level(order), fault->max_level);
+   fault->map_writable = !(slot->flags & KVM_MEM_READONLY);
+   return RET_PF_FIXED;
+   }
+
+   /* Fault is shared, fallthrough. */
+   return RET_PF_CONTINUE;
+}
+
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
struct kvm_memory_slot *slot = fault->slot;
bool async;
+   int r;
 
/*
 * Retry the page fault if the gfn hit a memslot that is being deleted
@@ -4133,6 +4178,12 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct 
kvm_page_fault *fault)
return RET_PF_EMULATE;
}
 
+   if (kvm_slot_can_be_private(slot)) {
+   r = kvm_faultin_pfn_private(vcpu, fault);
+   if (r != RET_PF_CONTINUE)
+   return r == RET_PF_FIXED ? RET_PF_CONTINUE : r;
+   }
+
async = false;
fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
  fault->write, &fault->map_writable,
@@ -4241,7 +4292,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, 
struct kvm_page_fault *fault
read_unlock(&vcpu->kvm->mmu_lock);
else
write_unlock(&vcpu->kvm->mmu_lock);
-   kvm_release_pfn_clean(fault->pfn);
+
+   if (fault->is_private)
+   kvm_private_mem_put_pfn(fault->slot, fault->pfn);
+   else
+   kvm_release_pfn_clean(fault->pfn);
return r;
 }
 
@@ -5518,6 +5573,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, 
gpa_t cr2_or_gpa, u64 err
return -EIO;
}
 
+   if (r == RET_PF_USER)
+   return 0;
+
if (r < 0)
return r;
if (r != RET_PF_EMULATE)
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ae2d660e2dab..fb9c298abcf0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -188,6 +188,7 @@ struct kvm_page_fault {
 
/* Derived from mmu and global state.  */

Re: [PATCH] iotests: fix copy-before-write for macOS and FreeBSD

2022-07-06 Thread Peter Maydell

On Wed, 6 Jul 2022 at 08:39, Thomas Huth  wrote:
> Many of the iotests rely on output text matching. It's very fragile, always
> has been and always will be (unless we rewrite the whole test suite to not
> use output text matching anymore).

Maybe you could have a pre-pass over the "expected results" files
that substituted in the strerror text for the host it was running on.
That's probably a reasonably doable amount of work compared to
a complete testsuite rewrite.

-- PMM

[PATCH v5 39/45] linux-user/aarch64: Move sve record checks into restore

2022-07-06 Thread Richard Henderson

Move the checks out of the parsing loop and into the
restore function.  This more closely mirrors the code
structure in the kernel, and is slightly clearer.

Reject rather than silently skip incorrect VL and SVE record sizes,
bringing our checks in to line with those the kernel does.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 51 +
 1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 9ff79da4be..22d0b8b4ec 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -250,12 +250,36 @@ static void target_restore_fpsimd_record(CPUARMState *env,
 }
 }
 
-static void target_restore_sve_record(CPUARMState *env,
-  struct target_sve_context *sve, int vq)
+static bool target_restore_sve_record(CPUARMState *env,
+  struct target_sve_context *sve,
+  int size)
 {
-int i, j;
+int i, j, vl, vq;
 
-/* Note that SVE regs are stored as a byte stream, with each byte element
+if (!cpu_isar_feature(aa64_sve, env_archcpu(env))) {
+return false;
+}
+
+__get_user(vl, &sve->vl);
+vq = sve_vq(env);
+
+/* Reject mismatched VL. */
+if (vl != vq * TARGET_SVE_VQ_BYTES) {
+return false;
+}
+
+/* Accept empty record -- used to clear PSTATE.SM. */
+if (size <= sizeof(*sve)) {
+return true;
+}
+
+/* Reject non-empty but incomplete record. */
+if (size < TARGET_SVE_SIG_CONTEXT_SIZE(vq)) {
+return false;
+}
+
+/*
+ * Note that SVE regs are stored as a byte stream, with each byte element
  * at a subsequent address.  This corresponds to a little-endian load
  * of our 64-bit hunks.
  */
@@ -277,6 +301,7 @@ static void target_restore_sve_record(CPUARMState *env,
 }
 }
 }
+return true;
 }
 
 static int target_restore_sigframe(CPUARMState *env,
@@ -287,7 +312,7 @@ static int target_restore_sigframe(CPUARMState *env,
 struct target_sve_context *sve = NULL;
 uint64_t extra_datap = 0;
 bool used_extra = false;
-int vq = 0, sve_size = 0;
+int sve_size = 0;
 
 target_restore_general_frame(env, sf);
 
@@ -321,15 +346,9 @@ static int target_restore_sigframe(CPUARMState *env,
 if (sve || size < sizeof(struct target_sve_context)) {
 goto err;
 }
-if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
-vq = sve_vq(env);
-sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
-if (size == sve_size) {
-sve = (struct target_sve_context *)ctx;
-break;
-}
-}
-goto err;
+sve = (struct target_sve_context *)ctx;
+sve_size = size;
+break;
 
 case TARGET_EXTRA_MAGIC:
 if (extra || size != sizeof(struct target_extra_context)) {
@@ -362,8 +381,8 @@ static int target_restore_sigframe(CPUARMState *env,
 }
 
 /* SVE data, if present, overwrites FPSIMD data.  */
-if (sve) {
-target_restore_sve_record(env, sve, vq);
+if (sve && !target_restore_sve_record(env, sve, sve_size)) {
+goto err;
 }
 unlock_user(extra, extra_datap, 0);
 return 0;
-- 
2.34.1

Re: Re: [PATCH v2 1/1] qga: add command 'guest-get-cpustats'

2022-07-06 Thread Marc-André Lureau

Hi

On Wed, Jul 6, 2022 at 11:49 AM zhenwei pi  wrote:

>
>
> On 7/6/22 15:20, Marc-André Lureau wrote:
> > Hi
> >
> > On Wed, Jul 6, 2022 at 7:09 AM zhenwei pi  > > wrote:
> >
> > On 7/4/22 16:00, zhenwei pi wrote:
> >  >
> >  >
> >  >> +##
> >  >> +# @GuestOsType:
> >  >> +#
> >  >> +# An enumeration of OS type
> >  >> +#
> >  >> +# Since: 7.1
> >  >> +##
> >  >> +{ 'enum': 'GuestOsType',
> >  >> +  'data': [ 'linuxos', 'windowsos' ] }
> >  >>
> >  >>
> >  >> I would rather keep this enum specific to GuestCpuStats,
> >  >> "GuestLinuxCpuStatsType"?
> >  >>
> >  >
> >  > Hi,
> >  >
> >  > 'GuestOsType' may be re-used in the future, not only for the CPU
> >  > statistics case.
> >  >
> >  >> I would also drop the "os" suffix
> >  >>
> >  > I'm afraid we can not drop "os" suffix, build this without "os"
> > suffix:
> >  > qga/qga-qapi-types.h:948:28: error: expected member name or ';'
> > after
> >  > declaration specifiers
> >  >  GuestLinuxCpuStats linux;
> >  >  ~~ ^
> >  > :336:15: note: expanded from here
> >  > #define linux 1
> >  >
> >
> > Hi, Marc
> >
> > Could you please give any hint about this issue?
> >
> >
> > Yes, it looks like we need to add "linux" to the "polluted_words":
> >
>
> OK, I'll fix this in the next versoin.
>
> By the way, 'GuestCpuStatsType' seems to be used for CPU statistics
> only, but 'data': [ 'linux', 'windows' ] } is quite common, it may be
> used for other OS specified commands in the future. Should I use
> 'GuestCpuStatsType' instead of 'GuestOsType'?
>

We can always generalize later, but for now I think this may just create
confusion on the usage of the enum, I'd make it GuestCpuStatsType for now.

(for example GuestOSInfo can't use GuestOsType)


>
> > diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
> > index 489273574aee..737b059e6291 100644
> > --- a/scripts/qapi/common.py
> > +++ b/scripts/qapi/common.py
> > @@ -114,7 +114,7 @@ def c_name(name: str, protect: bool = True) -> str:
> >'and', 'and_eq', 'bitand', 'bitor', 'compl',
> 'not',
> >'not_eq', 'or', 'or_eq', 'xor', 'xor_eq'])
> >   # namespace pollution:
> > -polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386'])
> > +polluted_words = set(['unix', 'errno', 'mips', 'sparc', 'i386',
> > 'linux'])
> >
> >
> >  >> +
> >  >> +
> >  >>
> >  >>
> >  >>
> >  >> Looks good to me otherwise.
> >  >> thanks
> >  >>
> >  >> --
> >  >> Marc-André Lureau
> >  >
> >
> > --
> > zhenwei pi
> >
> >
> >
> > --
> > Marc-André Lureau
>
> --
> zhenwei pi
>


-- 
Marc-André Lureau

[PATCH v7 14/14] memfd_create.2: Describe MFD_INACCESSIBLE flag

2022-07-06 Thread Chao Peng

Signed-off-by: Chao Peng 
---
 man2/memfd_create.2 | 13 +
 1 file changed, 13 insertions(+)

diff --git a/man2/memfd_create.2 b/man2/memfd_create.2
index 89e9c4136..2698222ae 100644
--- a/man2/memfd_create.2
+++ b/man2/memfd_create.2
@@ -101,6 +101,19 @@ meaning that no other seals can be set on the file.
 .\" FIXME Why is the MFD_ALLOW_SEALING behavior not simply the default?
 .\" Is it worth adding some text explaining this?
 .TP
+.BR MFD_INACCESSIBLE
+Disallow userspace access through ordinary MMU accesses via
+.BR read (2),
+.BR write (2)
+and
+.BR mmap (2).
+The file size cannot be changed once initialized.
+This flag cannot coexist with
+.B MFD_ALLOW_SEALING
+and when this flag is set, the initial set of seals will be
+.B F_SEAL_SEAL,
+meaning that no other seals can be set on the file.
+.TP
 .BR MFD_HUGETLB " (since Linux 4.14)"
 .\" commit 749df87bd7bee5a79cef073f5d032ddb2b211de8
 The anonymous file will be created in the hugetlbfs filesystem using
-- 
2.17.1

[PATCH v5 31/45] target/arm: Reset streaming sve state on exception boundaries

2022-07-06 Thread Richard Henderson

We can handle both exception entry and exception return by
hooking into aarch64_sve_change_el.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 67f8ca98f2..11f70725e5 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -11753,6 +11753,19 @@ void aarch64_sve_change_el(CPUARMState *env, int 
old_el,
 return;
 }
 
+old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
+new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
+
+/*
+ * Both AArch64.TakeException and AArch64.ExceptionReturn
+ * invoke ResetSVEState when taking an exception from, or
+ * returning to, AArch32 state when PSTATE.SM is enabled.
+ */
+if (old_a64 != new_a64 && FIELD_EX64(env->svcr, SVCR, SM)) {
+arm_reset_sve_state(env);
+return;
+}
+
 /*
  * DDI0584A.d sec 3.2: "If SVE instructions are disabled or trapped
  * at ELx, or not available because the EL is in AArch32 state, then
@@ -11765,10 +11778,8 @@ void aarch64_sve_change_el(CPUARMState *env, int 
old_el,
  * we already have the correct register contents when encountering the
  * vq0->vq0 transition between EL0->EL1.
  */
-old_a64 = old_el ? arm_el_is_aa64(env, old_el) : el0_a64;
 old_len = (old_a64 && !sve_exception_el(env, old_el)
? sve_vqm1_for_el(env, old_el) : 0);
-new_a64 = new_el ? arm_el_is_aa64(env, new_el) : el0_a64;
 new_len = (new_a64 && !sve_exception_el(env, new_el)
? sve_vqm1_for_el(env, new_el) : 0);
 
-- 
2.34.1

[PATCH v5 40/45] linux-user/aarch64: Implement SME signal handling

2022-07-06 Thread Richard Henderson

Set the SM bit in the SVE record on signal delivery, create the ZA record.
Restore SM and ZA state according to the records present on return.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 167 +---
 1 file changed, 154 insertions(+), 13 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 22d0b8b4ec..6a2c6e06d2 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -104,6 +104,22 @@ struct target_sve_context {
 
 #define TARGET_SVE_SIG_FLAG_SM  1
 
+#define TARGET_ZA_MAGIC0x54366345
+
+struct target_za_context {
+struct target_aarch64_ctx head;
+uint16_t vl;
+uint16_t reserved[3];
+/* The actual ZA data immediately follows. */
+};
+
+#define TARGET_ZA_SIG_REGS_OFFSET \
+QEMU_ALIGN_UP(sizeof(struct target_za_context), TARGET_SVE_VQ_BYTES)
+#define TARGET_ZA_SIG_ZAV_OFFSET(VQ, N) \
+(TARGET_ZA_SIG_REGS_OFFSET + (VQ) * TARGET_SVE_VQ_BYTES * (N))
+#define TARGET_ZA_SIG_CONTEXT_SIZE(VQ) \
+TARGET_ZA_SIG_ZAV_OFFSET(VQ, VQ * TARGET_SVE_VQ_BYTES)
+
 struct target_rt_sigframe {
 struct target_siginfo info;
 struct target_ucontext uc;
@@ -176,9 +192,9 @@ static void target_setup_end_record(struct 
target_aarch64_ctx *end)
 }
 
 static void target_setup_sve_record(struct target_sve_context *sve,
-CPUARMState *env, int vq, int size)
+CPUARMState *env, int size)
 {
-int i, j;
+int i, j, vq = sve_vq(env);
 
 memset(sve, 0, sizeof(*sve));
 __put_user(TARGET_SVE_MAGIC, &sve->head.magic);
@@ -207,6 +223,35 @@ static void target_setup_sve_record(struct 
target_sve_context *sve,
 }
 }
 
+static void target_setup_za_record(struct target_za_context *za,
+   CPUARMState *env, int size)
+{
+int vq = sme_vq(env);
+int vl = vq * TARGET_SVE_VQ_BYTES;
+int i, j;
+
+memset(za, 0, sizeof(*za));
+__put_user(TARGET_ZA_MAGIC, &za->head.magic);
+__put_user(size, &za->head.size);
+__put_user(vl, &za->vl);
+
+if (size == TARGET_ZA_SIG_CONTEXT_SIZE(0)) {
+return;
+}
+assert(size == TARGET_ZA_SIG_CONTEXT_SIZE(vq));
+
+/*
+ * Note that ZA vectors are stored as a byte stream,
+ * with each byte element at a subsequent address.
+ */
+for (i = 0; i < vl; ++i) {
+uint64_t *z = (void *)za + TARGET_ZA_SIG_ZAV_OFFSET(vq, i);
+for (j = 0; j < vq * 2; ++j) {
+__put_user_e(env->zarray[i].d[j], z + j, le);
+}
+}
+}
+
 static void target_restore_general_frame(CPUARMState *env,
  struct target_rt_sigframe *sf)
 {
@@ -252,16 +297,28 @@ static void target_restore_fpsimd_record(CPUARMState *env,
 
 static bool target_restore_sve_record(CPUARMState *env,
   struct target_sve_context *sve,
-  int size)
+  int size, int *svcr)
 {
-int i, j, vl, vq;
+int i, j, vl, vq, flags;
+bool sm;
 
-if (!cpu_isar_feature(aa64_sve, env_archcpu(env))) {
+__get_user(vl, &sve->vl);
+__get_user(flags, &sve->flags);
+
+sm = flags & TARGET_SVE_SIG_FLAG_SM;
+
+/* The cpu must support Streaming or Non-streaming SVE. */
+if (sm
+? !cpu_isar_feature(aa64_sme, env_archcpu(env))
+: !cpu_isar_feature(aa64_sve, env_archcpu(env))) {
 return false;
 }
 
-__get_user(vl, &sve->vl);
-vq = sve_vq(env);
+/*
+ * Note that we cannot use sve_vq() because that depends on the
+ * current setting of PSTATE.SM, not the state to be restored.
+ */
+vq = sve_vqm1_for_el_sm(env, 0, sm) + 1;
 
 /* Reject mismatched VL. */
 if (vl != vq * TARGET_SVE_VQ_BYTES) {
@@ -278,6 +335,8 @@ static bool target_restore_sve_record(CPUARMState *env,
 return false;
 }
 
+*svcr = FIELD_DP64(*svcr, SVCR, SM, sm);
+
 /*
  * Note that SVE regs are stored as a byte stream, with each byte element
  * at a subsequent address.  This corresponds to a little-endian load
@@ -304,15 +363,57 @@ static bool target_restore_sve_record(CPUARMState *env,
 return true;
 }
 
+static bool target_restore_za_record(CPUARMState *env,
+ struct target_za_context *za,
+ int size, int *svcr)
+{
+int i, j, vl, vq;
+
+if (!cpu_isar_feature(aa64_sme, env_archcpu(env))) {
+return false;
+}
+
+__get_user(vl, &za->vl);
+vq = sme_vq(env);
+
+/* Reject mismatched VL. */
+if (vl != vq * TARGET_SVE_VQ_BYTES) {
+return false;
+}
+
+/* Accept empty record -- used to clear PSTATE.ZA. */
+if (size <= TARGET_ZA_SIG_CONTEXT_SIZE(0)) {
+return true;
+}
+
+/* Reject non-empty but incomplete record. */
+if (size < TARGET_ZA_SIG_CONTEXT_S

[PATCH v5 41/45] linux-user: Rename sve prctls

2022-07-06 Thread Richard Henderson

Add "sve" to the sve prctl functions, to distinguish
them from the coming "sme" prctls with similar names.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_prctl.h |  8 
 linux-user/syscall.c  | 12 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/linux-user/aarch64/target_prctl.h 
b/linux-user/aarch64/target_prctl.h
index 1d440ffbea..40481e6663 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -6,7 +6,7 @@
 #ifndef AARCH64_TARGET_PRCTL_H
 #define AARCH64_TARGET_PRCTL_H
 
-static abi_long do_prctl_get_vl(CPUArchState *env)
+static abi_long do_prctl_sve_get_vl(CPUArchState *env)
 {
 ARMCPU *cpu = env_archcpu(env);
 if (cpu_isar_feature(aa64_sve, cpu)) {
@@ -14,9 +14,9 @@ static abi_long do_prctl_get_vl(CPUArchState *env)
 }
 return -TARGET_EINVAL;
 }
-#define do_prctl_get_vl do_prctl_get_vl
+#define do_prctl_sve_get_vl do_prctl_sve_get_vl
 
-static abi_long do_prctl_set_vl(CPUArchState *env, abi_long arg2)
+static abi_long do_prctl_sve_set_vl(CPUArchState *env, abi_long arg2)
 {
 /*
  * We cannot support either PR_SVE_SET_VL_ONEXEC or PR_SVE_VL_INHERIT.
@@ -47,7 +47,7 @@ static abi_long do_prctl_set_vl(CPUArchState *env, abi_long 
arg2)
 }
 return -TARGET_EINVAL;
 }
-#define do_prctl_set_vl do_prctl_set_vl
+#define do_prctl_sve_set_vl do_prctl_sve_set_vl
 
 static abi_long do_prctl_reset_keys(CPUArchState *env, abi_long arg2)
 {
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 669add74c1..cbde82c907 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6362,11 +6362,11 @@ static abi_long do_prctl_inval1(CPUArchState *env, 
abi_long arg2)
 #ifndef do_prctl_set_fp_mode
 #define do_prctl_set_fp_mode do_prctl_inval1
 #endif
-#ifndef do_prctl_get_vl
-#define do_prctl_get_vl do_prctl_inval0
+#ifndef do_prctl_sve_get_vl
+#define do_prctl_sve_get_vl do_prctl_inval0
 #endif
-#ifndef do_prctl_set_vl
-#define do_prctl_set_vl do_prctl_inval1
+#ifndef do_prctl_sve_set_vl
+#define do_prctl_sve_set_vl do_prctl_inval1
 #endif
 #ifndef do_prctl_reset_keys
 #define do_prctl_reset_keys do_prctl_inval1
@@ -6431,9 +6431,9 @@ static abi_long do_prctl(CPUArchState *env, abi_long 
option, abi_long arg2,
 case PR_SET_FP_MODE:
 return do_prctl_set_fp_mode(env, arg2);
 case PR_SVE_GET_VL:
-return do_prctl_get_vl(env);
+return do_prctl_sve_get_vl(env);
 case PR_SVE_SET_VL:
-return do_prctl_set_vl(env, arg2);
+return do_prctl_sve_set_vl(env, arg2);
 case PR_PAC_RESET_KEYS:
 if (arg3 || arg4 || arg5) {
 return -TARGET_EINVAL;
-- 
2.34.1

[PATCH v5 33/45] linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS

2022-07-06 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_cpu.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/linux-user/aarch64/target_cpu.h b/linux-user/aarch64/target_cpu.h
index 97a477bd3e..f90359faf2 100644
--- a/linux-user/aarch64/target_cpu.h
+++ b/linux-user/aarch64/target_cpu.h
@@ -34,10 +34,13 @@ static inline void cpu_clone_regs_parent(CPUARMState *env, 
unsigned flags)
 
 static inline void cpu_set_tls(CPUARMState *env, target_ulong newtls)
 {
-/* Note that AArch64 Linux keeps the TLS pointer in TPIDR; this is
+/*
+ * Note that AArch64 Linux keeps the TLS pointer in TPIDR; this is
  * different from AArch32 Linux, which uses TPIDRRO.
  */
 env->cp15.tpidr_el[0] = newtls;
+/* TPIDR2_EL0 is cleared with CLONE_SETTLS. */
+env->cp15.tpidr2_el0 = 0;
 }
 
 static inline abi_ulong get_sp_from_cpustate(CPUARMState *state)
-- 
2.34.1

[PATCH v5 34/45] linux-user/aarch64: Reset PSTATE.SM on syscalls

2022-07-06 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/cpu_loop.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
index f7ef36cd9f..9875d609a9 100644
--- a/linux-user/aarch64/cpu_loop.c
+++ b/linux-user/aarch64/cpu_loop.c
@@ -89,6 +89,15 @@ void cpu_loop(CPUARMState *env)
 
 switch (trapnr) {
 case EXCP_SWI:
+/*
+ * On syscall, PSTATE.ZA is preserved, along with the ZA matrix.
+ * PSTATE.SM is cleared, per SMSTOP, which does ResetSVEState.
+ */
+if (FIELD_EX64(env->svcr, SVCR, SM)) {
+env->svcr = FIELD_DP64(env->svcr, SVCR, SM, 0);
+arm_rebuild_hflags(env);
+arm_reset_sve_state(env);
+}
 ret = do_syscall(env,
  env->xregs[8],
  env->xregs[0],
-- 
2.34.1

[PATCH v5 45/45] linux-user/aarch64: Add SME related hwcap entries

2022-07-06 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 1de77c7959..ce902dbd56 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -605,6 +605,18 @@ enum {
 ARM_HWCAP2_A64_RNG  = 1 << 16,
 ARM_HWCAP2_A64_BTI  = 1 << 17,
 ARM_HWCAP2_A64_MTE  = 1 << 18,
+ARM_HWCAP2_A64_ECV  = 1 << 19,
+ARM_HWCAP2_A64_AFP  = 1 << 20,
+ARM_HWCAP2_A64_RPRES= 1 << 21,
+ARM_HWCAP2_A64_MTE3 = 1 << 22,
+ARM_HWCAP2_A64_SME  = 1 << 23,
+ARM_HWCAP2_A64_SME_I16I64   = 1 << 24,
+ARM_HWCAP2_A64_SME_F64F64   = 1 << 25,
+ARM_HWCAP2_A64_SME_I8I32= 1 << 26,
+ARM_HWCAP2_A64_SME_F16F32   = 1 << 27,
+ARM_HWCAP2_A64_SME_B16F32   = 1 << 28,
+ARM_HWCAP2_A64_SME_F32F32   = 1 << 29,
+ARM_HWCAP2_A64_SME_FA64 = 1 << 30,
 };
 
 #define ELF_HWCAP   get_elf_hwcap()
@@ -674,6 +686,14 @@ static uint32_t get_elf_hwcap2(void)
 GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
 GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
 GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
+GET_FEATURE_ID(aa64_sme, (ARM_HWCAP2_A64_SME |
+  ARM_HWCAP2_A64_SME_F32F32 |
+  ARM_HWCAP2_A64_SME_B16F32 |
+  ARM_HWCAP2_A64_SME_F16F32 |
+  ARM_HWCAP2_A64_SME_I8I32));
+GET_FEATURE_ID(aa64_sme_f64f64, ARM_HWCAP2_A64_SME_F64F64);
+GET_FEATURE_ID(aa64_sme_i16i64, ARM_HWCAP2_A64_SME_I16I64);
+GET_FEATURE_ID(aa64_sme_fa64, ARM_HWCAP2_A64_SME_FA64);
 
 return hwcaps;
 }
-- 
2.34.1

Re: [RFC PATCH v2 4/8] qapi: golang: Generate qapi's union types in Go

2022-07-06 Thread Andrea Bolognani

On Tue, Jul 05, 2022 at 05:35:26PM +0100, Daniel P. Berrangé wrote:
> On Tue, Jul 05, 2022 at 08:45:30AM -0700, Andrea Bolognani wrote:
> > All this string manipulation looks sketchy. Is there some reason that
> > I'm not seeing preventing you for doing something like the untested
> > code below?
> >
> >   func (s GuestPanicInformation) MarshalJSON() ([]byte, error) {
> >   if s.HyperV != nil {
> >   type union struct {
> >   Discriminator string  `json:"type"`
> >   HyperVGuestPanicInformationHyperV `json:"hyper-v"`
> >   }
> >   tmp := union {
> >   Discriminator: "hyper-v",
> >   HyperV:s.HyperV,
> >   }
> >   return json.Marshal(tmp)
> >   } else if s.S390 != nil {
> >   type union struct {
> >   Discriminator string  `json:"type"`
> >   S390  GuestPanicInformationHyperV `json:"s390"`
> >   }
> >   tmp := union {
> >   Discriminator: "s390",
> >   S390:  s.S390,
> >   }
> >   return json.Marshal(tmp)
> >   }
> >   return nil, errors.New("...")
> >   }
>
> Using these dummy structs is the way I've approached the
> discriminated union issue in the libvirt Golang XML bindings
> and it works well. It is the bit I like the least, but it was
> the lesser of many evils, and on the plus side in the QEMU case
> it'll be auto-generated code.

It appears to be the standard way to approach the problem in Go. It
sort of comes naturally given how the APIs for marshal/unmarshal have
been defined.

> > > func (s *GuestPanicInformation) UnmarshalJSON(data []byte) error {
> > > type Alias GuestPanicInformation
> > > peek := struct {
> > > Alias
> > > Driver string `json:"type"`
> > > }{}
> > >
> > > if err := json.Unmarshal(data, &peek); err != nil {
> > > return err
> > > }
> > > *s = GuestPanicInformation(peek.Alias)
> > >
> > > switch peek.Driver {
> > >
> > > case "hyper-v":
> > > s.HyperV = new(GuestPanicInformationHyperV)
> > > if err := json.Unmarshal(data, s.HyperV); err != nil {
> > > s.HyperV = nil
> > > return err
> > > }
> > > case "s390":
> > > s.S390 = new(GuestPanicInformationS390)
> > > if err := json.Unmarshal(data, s.S390); err != nil {
> > > s.S390 = nil
> > > return err
> > > }
> > > }
> > > // Unrecognizer drivers are silently ignored.
> > > return nil
> >
> > This looks pretty reasonable, but since you're only using "peek" to
> > look at the discriminator you should be able to leave out the Alias
> > type entirely and perform the initial Unmarshal operation while
> > ignoring all other fields.
>
> Once you've defined the dummy structs for the Marshall case
> though, you might as well use them for Unmarshall too, so you're
> not parsing the JSON twice.

You're right, that is undesirable. What about something like this?

  type GuestPanicInformation struct {
  HyperV *GuestPanicInformationHyperV
  S390   *GuestPanicInformationS390
  }

  type jsonGuestPanicInformation struct {
  Discriminator string   `json:"type"`
  HyperV*GuestPanicInformationHyperV `json:"hyper-v"`
  S390  *GuestPanicInformationS390   `json:"s390"`
  }

  func (s GuestPanicInformation) MarshalJSON() ([]byte, error) {
  if (s.HyperV != nil && s.S390 != nil) ||
  (s.HyperV == nil && s.S390 == nil) {
  // client hasn't filled in the struct properly
  return nil, errors.New("...")
  }

  tmp := jsonGuestPanicInformation{}

  if s.HyperV != nil {
  tmp.Discriminator = "hyper-v"
  tmp.HyperV = s.HyperV
  } else if s.S390 != nil {
  tmp.Discriminator = "s390"
  tmp.S390 = s.S390
  }

  return json.Marshal(tmp)
  }

  func (s *GuestPanicInformation) UnmarshalJSON(data []byte) error {
  tmp := jsonGuestPanicInformation{}

  err := json.Unmarshal(data, &tmp)
  if err != nil {
  return err
  }

  switch tmp.Discriminator {
  case "hyper-v":
  if tmp.HyperV == nil {
  return errors.New("...")
  }
  s.HyperV = tmp.HyperV
  case "s390":
  if tmp.S390 == nil {
  return errors.New("...")
  }
  s.S390 = tmp.S390
  }
  // if we hit none of the cases above, that means the
  // server has produced a variant we don't know about

  return nil
  }

This avoid parsing the JSON twice as well as having to define
multiple dummy structs, which keeps the code shorter and more
readable.

I've also thrown in some additional error checking for good measure,
ensuring that we abort when the input is completely nonsensical from
a semantical standpoint.

-- 
Andrea Bolognani / Red Ha

Re: [PATCH v2 17/18] block: Reorganize some declarations in block-backend-io.h

2022-07-06 Thread Hanna Reitz


On 05.07.22 18:15, Alberto Faria wrote:

Keep generated_co_wrapper and coroutine_fn pairs together. This should
make it clear that each I/O function has these two versions.

Also move blk_co_{pread,pwrite}()'s implementations out of the header
file for consistency.

Signed-off-by: Alberto Faria 
Reviewed-by: Paolo Bonzini 
---
  block/block-backend.c | 22 +
  include/sysemu/block-backend-io.h | 77 +--
  2 files changed, 54 insertions(+), 45 deletions(-)


Reviewed-by: Hanna Reitz

Re: [PATCH v2 01/18] block: Make blk_{pread,pwrite}() return 0 on success

2022-07-06 Thread Hanna Reitz


On 05.07.22 18:15, Alberto Faria wrote:

They currently return the value of their 'bytes' parameter on success.

Make them return 0 instead, for consistency with other I/O functions and
in preparation to implement them using generated_co_wrapper. This also
makes it clear that short reads/writes are not possible.

Signed-off-by: Alberto Faria 
---
  block.c  |  8 +---
  block/block-backend.c|  7 ++-
  block/qcow.c |  6 +++---
  hw/block/m25p80.c|  2 +-
  hw/misc/mac_via.c|  4 ++--
  hw/misc/sifive_u_otp.c   |  2 +-
  hw/nvram/eeprom_at24c.c  |  8 
  hw/nvram/spapr_nvram.c   | 14 +++---
  hw/ppc/pnv_pnor.c|  2 +-
  qemu-img.c   | 25 +
  qemu-io-cmds.c   | 18 --
  tests/unit/test-block-iothread.c |  4 ++--
  12 files changed, 49 insertions(+), 51 deletions(-)


Reviewed-by: Hanna Reitz

Re: [RFC PATCH v2 4/8] qapi: golang: Generate qapi's union types in Go

2022-07-06 Thread Daniel P . Berrangé

On Wed, Jul 06, 2022 at 04:28:16AM -0500, Andrea Bolognani wrote:
> On Tue, Jul 05, 2022 at 05:35:26PM +0100, Daniel P. Berrangé wrote:
> > On Tue, Jul 05, 2022 at 08:45:30AM -0700, Andrea Bolognani wrote:
> > > All this string manipulation looks sketchy. Is there some reason that
> > > I'm not seeing preventing you for doing something like the untested
> > > code below?
> > >
> > >   func (s GuestPanicInformation) MarshalJSON() ([]byte, error) {
> > >   if s.HyperV != nil {
> > >   type union struct {
> > >   Discriminator string  `json:"type"`
> > >   HyperVGuestPanicInformationHyperV `json:"hyper-v"`
> > >   }
> > >   tmp := union {
> > >   Discriminator: "hyper-v",
> > >   HyperV:s.HyperV,
> > >   }
> > >   return json.Marshal(tmp)
> > >   } else if s.S390 != nil {
> > >   type union struct {
> > >   Discriminator string  `json:"type"`
> > >   S390  GuestPanicInformationHyperV `json:"s390"`
> > >   }
> > >   tmp := union {
> > >   Discriminator: "s390",
> > >   S390:  s.S390,
> > >   }
> > >   return json.Marshal(tmp)
> > >   }
> > >   return nil, errors.New("...")
> > >   }
> >
> > Using these dummy structs is the way I've approached the
> > discriminated union issue in the libvirt Golang XML bindings
> > and it works well. It is the bit I like the least, but it was
> > the lesser of many evils, and on the plus side in the QEMU case
> > it'll be auto-generated code.
> 
> It appears to be the standard way to approach the problem in Go. It
> sort of comes naturally given how the APIs for marshal/unmarshal have
> been defined.
> 
> > > > func (s *GuestPanicInformation) UnmarshalJSON(data []byte) error {
> > > > type Alias GuestPanicInformation
> > > > peek := struct {
> > > > Alias
> > > > Driver string `json:"type"`
> > > > }{}
> > > >
> > > > if err := json.Unmarshal(data, &peek); err != nil {
> > > > return err
> > > > }
> > > > *s = GuestPanicInformation(peek.Alias)
> > > >
> > > > switch peek.Driver {
> > > >
> > > > case "hyper-v":
> > > > s.HyperV = new(GuestPanicInformationHyperV)
> > > > if err := json.Unmarshal(data, s.HyperV); err != nil {
> > > > s.HyperV = nil
> > > > return err
> > > > }
> > > > case "s390":
> > > > s.S390 = new(GuestPanicInformationS390)
> > > > if err := json.Unmarshal(data, s.S390); err != nil {
> > > > s.S390 = nil
> > > > return err
> > > > }
> > > > }
> > > > // Unrecognizer drivers are silently ignored.
> > > > return nil
> > >
> > > This looks pretty reasonable, but since you're only using "peek" to
> > > look at the discriminator you should be able to leave out the Alias
> > > type entirely and perform the initial Unmarshal operation while
> > > ignoring all other fields.
> >
> > Once you've defined the dummy structs for the Marshall case
> > though, you might as well use them for Unmarshall too, so you're
> > not parsing the JSON twice.
> 
> You're right, that is undesirable. What about something like this?
> 
>   type GuestPanicInformation struct {
>   HyperV *GuestPanicInformationHyperV
>   S390   *GuestPanicInformationS390
>   }
> 
>   type jsonGuestPanicInformation struct {
>   Discriminator string   `json:"type"`
>   HyperV*GuestPanicInformationHyperV `json:"hyper-v"`
>   S390  *GuestPanicInformationS390   `json:"s390"`
>   }

It can possibly be even simpler with just embedding the real
struct

   type jsonGuestPanicInformation struct {
   Discriminator string
   GuestPanicInformation
   }

> 
>   func (s GuestPanicInformation) MarshalJSON() ([]byte, error) {
>   if (s.HyperV != nil && s.S390 != nil) ||
>   (s.HyperV == nil && s.S390 == nil) {
>   // client hasn't filled in the struct properly
>   return nil, errors.New("...")
>   }
> 
>   tmp := jsonGuestPanicInformation{}
> 
>   if s.HyperV != nil {
>   tmp.Discriminator = "hyper-v"
>   tmp.HyperV = s.HyperV
>   } else if s.S390 != nil {
>   tmp.Discriminator = "s390"
>   tmp.S390 = s.S390
>   }
> 
>   return json.Marshal(tmp)
>   }

And...

   var discriminator string
   if s.HyperV != nil {
   discriminator = "hyper-v"
   } else if s.S390 != nil {
   discriminator = "s390"
   }

   tmp := jsonGuestPanicInformation{ discriminator, s}
   return json.Marshal(tmp)

> 
>   func (s *GuestPanicInformation) UnmarshalJSON(data []byte) error {
>   tmp := jsonGuestPanicInformation{}
> 
>   err := json.Unmarshal(data, &tmp)
>   if err != nil {
>   return err
>   }
> 
>   switch tmp

[PATCH v5 36/45] linux-user/aarch64: Tidy target_restore_sigframe error return

2022-07-06 Thread Richard Henderson

Fold the return value setting into the goto, so each
point of failure need not do both.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 3cef2f44cf..8b352abb97 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -287,7 +287,6 @@ static int target_restore_sigframe(CPUARMState *env,
 struct target_sve_context *sve = NULL;
 uint64_t extra_datap = 0;
 bool used_extra = false;
-bool err = false;
 int vq = 0, sve_size = 0;
 
 target_restore_general_frame(env, sf);
@@ -301,8 +300,7 @@ static int target_restore_sigframe(CPUARMState *env,
 switch (magic) {
 case 0:
 if (size != 0) {
-err = true;
-goto exit;
+goto err;
 }
 if (used_extra) {
 ctx = NULL;
@@ -314,8 +312,7 @@ static int target_restore_sigframe(CPUARMState *env,
 
 case TARGET_FPSIMD_MAGIC:
 if (fpsimd || size != sizeof(struct target_fpsimd_context)) {
-err = true;
-goto exit;
+goto err;
 }
 fpsimd = (struct target_fpsimd_context *)ctx;
 break;
@@ -329,13 +326,11 @@ static int target_restore_sigframe(CPUARMState *env,
 break;
 }
 }
-err = true;
-goto exit;
+goto err;
 
 case TARGET_EXTRA_MAGIC:
 if (extra || size != sizeof(struct target_extra_context)) {
-err = true;
-goto exit;
+goto err;
 }
 __get_user(extra_datap,
&((struct target_extra_context *)ctx)->datap);
@@ -348,8 +343,7 @@ static int target_restore_sigframe(CPUARMState *env,
 /* Unknown record -- we certainly didn't generate it.
  * Did we in fact get out of sync?
  */
-err = true;
-goto exit;
+goto err;
 }
 ctx = (void *)ctx + size;
 }
@@ -358,17 +352,19 @@ static int target_restore_sigframe(CPUARMState *env,
 if (fpsimd) {
 target_restore_fpsimd_record(env, fpsimd);
 } else {
-err = true;
+goto err;
 }
 
 /* SVE data, if present, overwrites FPSIMD data.  */
 if (sve) {
 target_restore_sve_record(env, sve, vq);
 }
-
- exit:
 unlock_user(extra, extra_datap, 0);
-return err;
+return 0;
+
+ err:
+unlock_user(extra, extra_datap, 0);
+return 1;
 }
 
 static abi_ulong get_sigframe(struct target_sigaction *ka,
-- 
2.34.1

[PATCH v5 37/45] linux-user/aarch64: Do not allow duplicate or short sve records

2022-07-06 Thread Richard Henderson

In parse_user_sigframe, the kernel rejects duplicate sve records,
or records that are smaller than the header.  We were silently
allowing these cases to pass, dropping the record.

Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/signal.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 8b352abb97..8fbe98d72f 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -318,10 +318,13 @@ static int target_restore_sigframe(CPUARMState *env,
 break;
 
 case TARGET_SVE_MAGIC:
+if (sve || size < sizeof(struct target_sve_context)) {
+goto err;
+}
 if (cpu_isar_feature(aa64_sve, env_archcpu(env))) {
 vq = sve_vq(env);
 sve_size = QEMU_ALIGN_UP(TARGET_SVE_SIG_CONTEXT_SIZE(vq), 16);
-if (!sve && size == sve_size) {
+if (size == sve_size) {
 sve = (struct target_sve_context *)ctx;
 break;
 }
-- 
2.34.1

Re: [PATCH v2 00/18] Make block-backend-io.h API more consistent

2022-07-06 Thread Hanna Reitz


On 05.07.22 18:15, Alberto Faria wrote:

Adjust existing pairs of non-coroutine and coroutine functions to share
the same calling convention, and add non-coroutine/coroutine
counterparts where they don't exist.

Also make the non-coroutine versions generated_co_wrappers.

This series sits on top of "[PATCH v5 00/10] Implement
bdrv_{pread,pwrite,pwrite_sync,pwrite_zeroes}() using
generated_co_wrapper":

 
https://lore.kernel.org/qemu-devel/20220609152744.3891847-1-afa...@redhat.com/

Based-on: <20220609152744.3891847-1-afa...@redhat.com>

v2:
   - Avoid using variables named 'len' or similar to hold return values
 from blk_{pread,pwrite}(), as they don't return a length anymore.
   - Drop variables in_ret and out_ret in qemu-img.c:img_dd().
   - Initialize buf in test_sync_op_blk_pwritev_part().
   - Keep blk_co_copy_range() in the "I/O API functions" section of
 include/sysemu/block-backend-io.h.


Thanks!  Applied to my block branch:

https://gitlab.com/hreitz/qemu/-/commits/block

Hanna

Re: [PATCH v8 06/20] job.h: define functions called without job lock held

2022-07-06 Thread Vladimir Sementsov-Ogievskiy


On 7/6/22 11:22, Emanuele Giuseppe Esposito wrote:



Am 05/07/2022 um 12:53 schrieb Vladimir Sementsov-Ogievskiy:

On 6/29/22 17:15, Emanuele Giuseppe Esposito wrote:

These functions don't need a _locked() counterpart, since
they are all called outside job.c and take the lock only
internally.

Update also the comments in blockjob.c (and move them in job.c).


Still, that would be better as a separate patch.



Note: at this stage, job_{lock/unlock} and job lock guard macros
are *nop*.

No functional change intended.

Signed-off-by: Emanuele Giuseppe Esposito 
---
   blockjob.c | 20 
   include/qemu/job.h | 37 ++---
   job.c  | 15 +++
   3 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 4868453d74..7da59a1f1c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -36,21 +36,6 @@
   #include "qemu/main-loop.h"
   #include "qemu/timer.h"
   -/*
- * The block job API is composed of two categories of functions.
- *
- * The first includes functions used by the monitor.  The monitor is
- * peculiar in that it accesses the block job list with
block_job_get, and
- * therefore needs consistency across block_job_get and the actual
operation
- * (e.g. block_job_set_speed).  The consistency is achieved with
- * aio_context_acquire/release.  These functions are declared in
blockjob.h.
- *
- * The second includes functions used by the block job drivers and
sometimes
- * by the core block layer.  These do not care about locking, because
the
- * whole coroutine runs under the AioContext lock, and are declared in
- * blockjob_int.h.
- */
-
   static bool is_block_job(Job *job)
   {
   return job_type(job) == JOB_TYPE_BACKUP ||
@@ -433,11 +418,6 @@ static void block_job_event_ready(Notifier *n,
void *opaque)
   }
     -/*
- * API for block job drivers and the block layer.  These functions are
- * declared in blockjob_int.h.
- */
-
   void *block_job_create(const char *job_id, const BlockJobDriver
*driver,
  JobTxn *txn, BlockDriverState *bs, uint64_t
perm,
  uint64_t shared_perm, int64_t speed, int flags,
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 99960cc9a3..b714236c1a 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -363,6 +363,7 @@ void job_txn_unref_locked(JobTxn *txn);
     /**
    * Create a new long-running job and return it.
+ * Called with job_mutex *not* held.
    *
    * @job_id: The id of the newly-created job, or %NULL for internal jobs
    * @driver: The class object for the newly-created job.
@@ -400,6 +401,8 @@ void job_unref_locked(Job *job);
    * @done: How much progress the job made since the last call
    *
    * Updates the progress counter of the job.
+ *
+ * Progress API is thread safe.


This tell nothing for function user. Finally the whole job_ API will be
thread safe, isn't it?

I think here we need simply "called with mutex not held". (Or even "may
be called with mutex held or not held" if we need it, or just nothing)

and note about progress API should be somewhere in job.c, as that's
implementation details.


What about "Progress API is thread safe. Can be called with job mutex
held or not"?


OK, if you like, that's not critical. Still, I think that after this series the whole job 
API should be thread safe, which make a comment about progress API misleading: user will 
think "hmm.. OK, progress related functions are thread safe. Others are not?"






[...]


I'd merge all new comments in job.h to the previous commit, as they are
related to the questions risen by it.


I disagree, I think it will be a mess of functions again if we mix these
one that don't need the lock held and the ones that need it.

You understand it because you got the logic of this serie, but others
may not.



That's not critical.. Why it seems better in one patch for me:

For a patch like 05 I anyway have to review the whole job.c/job.h checking that 
everything is correct. When I see that something was not updated, it looks like 
a mistake to me. Than I find missed part in the next commit..






   void job_cancel_sync_all(void);
     /**
diff --git a/job.c b/job.c
index dd44fac8dd..7a3cc93f66 100644
--- a/job.c
+++ b/job.c
@@ -32,12 +32,27 @@
   #include "trace/trace-root.h"
   #include "qapi/qapi-events-job.h"
   +/*
+ * The job API is composed of two categories of functions.
+ *
+ * The first includes functions used by the monitor.  The monitor is
+ * peculiar in that it accesses the block job list with job_get, and
+ * therefore needs consistency across job_get and the actual operation
+ * (e.g. job_user_cancel). To achieve this consistency, the caller
+ * calls job_lock/job_unlock itself around the whole operation.
+ *
+ *
+ * The second includes functions used by the block job drivers and
sometimes
+ * by the core block layer. These delegate the locking to the callee
instead.
+ */
+
   /*
    * job_mutex protects th

Re: [PATCH v2] m68k: virt: pass RNG seed via bootinfo block

2022-07-06 Thread Geert Uytterhoeven

On Sun, Jun 26, 2022 at 1:18 PM Jason A. Donenfeld  wrote:
> This commit wires up bootinfo's RNG seed attribute so that Linux VMs can
> have their RNG seeded from the earliest possible time in boot, just like
> the "rng-seed" device tree property on those platforms. The link
> contains the corresponding Linux patch.
>
> Link: https://lore.kernel.org/lkml/20220626111509.330159-1-ja...@zx2c4.com/
> Based-on: <20220625152318.120849-1-ja...@zx2c4.com>
> Reviewed-by: Laurent Vivier 
> Signed-off-by: Jason A. Donenfeld 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

[PATCH v5 42/45] linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL

2022-07-06 Thread Richard Henderson

These prctl set the Streaming SVE vector length, which may
be completely different from the Normal SVE vector length.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 linux-user/aarch64/target_prctl.h | 48 +++
 linux-user/syscall.c  | 16 +++
 2 files changed, 64 insertions(+)

diff --git a/linux-user/aarch64/target_prctl.h 
b/linux-user/aarch64/target_prctl.h
index 40481e6663..f8f8f88992 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -10,6 +10,7 @@ static abi_long do_prctl_sve_get_vl(CPUArchState *env)
 {
 ARMCPU *cpu = env_archcpu(env);
 if (cpu_isar_feature(aa64_sve, cpu)) {
+/* PSTATE.SM is always unset on syscall entry. */
 return sve_vq(env) * 16;
 }
 return -TARGET_EINVAL;
@@ -27,6 +28,7 @@ static abi_long do_prctl_sve_set_vl(CPUArchState *env, 
abi_long arg2)
 && arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
 uint32_t vq, old_vq;
 
+/* PSTATE.SM is always unset on syscall entry. */
 old_vq = sve_vq(env);
 
 /*
@@ -49,6 +51,52 @@ static abi_long do_prctl_sve_set_vl(CPUArchState *env, 
abi_long arg2)
 }
 #define do_prctl_sve_set_vl do_prctl_sve_set_vl
 
+static abi_long do_prctl_sme_get_vl(CPUArchState *env)
+{
+ARMCPU *cpu = env_archcpu(env);
+if (cpu_isar_feature(aa64_sme, cpu)) {
+return sme_vq(env) * 16;
+}
+return -TARGET_EINVAL;
+}
+#define do_prctl_sme_get_vl do_prctl_sme_get_vl
+
+static abi_long do_prctl_sme_set_vl(CPUArchState *env, abi_long arg2)
+{
+/*
+ * We cannot support either PR_SME_SET_VL_ONEXEC or PR_SME_VL_INHERIT.
+ * Note the kernel definition of sve_vl_valid allows for VQ=512,
+ * i.e. VL=8192, even though the architectural maximum is VQ=16.
+ */
+if (cpu_isar_feature(aa64_sme, env_archcpu(env))
+&& arg2 >= 0 && arg2 <= 512 * 16 && !(arg2 & 15)) {
+int vq, old_vq;
+
+old_vq = sme_vq(env);
+
+/*
+ * Bound the value of vq, so that we know that it fits into
+ * the 4-bit field in SMCR_EL1.  Because PSTATE.SM is cleared
+ * on syscall entry, we are not modifying the current SVE
+ * vector length.
+ */
+vq = MAX(arg2 / 16, 1);
+vq = MIN(vq, 16);
+env->vfp.smcr_el[1] =
+FIELD_DP64(env->vfp.smcr_el[1], SMCR, LEN, vq - 1);
+vq = sme_vq(env);
+
+if (old_vq != vq) {
+/* PSTATE.ZA state is cleared on any change to VQ. */
+env->svcr = FIELD_DP64(env->svcr, SVCR, ZA, 0);
+arm_rebuild_hflags(env);
+}
+return vq * 16;
+}
+return -TARGET_EINVAL;
+}
+#define do_prctl_sme_set_vl do_prctl_sme_set_vl
+
 static abi_long do_prctl_reset_keys(CPUArchState *env, abi_long arg2)
 {
 ARMCPU *cpu = env_archcpu(env);
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index cbde82c907..991b85e6b4 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6343,6 +6343,12 @@ abi_long do_arch_prctl(CPUX86State *env, int code, 
abi_ulong addr)
 #ifndef PR_SET_SYSCALL_USER_DISPATCH
 # define PR_SET_SYSCALL_USER_DISPATCH 59
 #endif
+#ifndef PR_SME_SET_VL
+# define PR_SME_SET_VL  63
+# define PR_SME_GET_VL  64
+# define PR_SME_VL_LEN_MASK  0x
+# define PR_SME_VL_INHERIT   (1 << 17)
+#endif
 
 #include "target_prctl.h"
 
@@ -6383,6 +6389,12 @@ static abi_long do_prctl_inval1(CPUArchState *env, 
abi_long arg2)
 #ifndef do_prctl_set_unalign
 #define do_prctl_set_unalign do_prctl_inval1
 #endif
+#ifndef do_prctl_sme_get_vl
+#define do_prctl_sme_get_vl do_prctl_inval0
+#endif
+#ifndef do_prctl_sme_set_vl
+#define do_prctl_sme_set_vl do_prctl_inval1
+#endif
 
 static abi_long do_prctl(CPUArchState *env, abi_long option, abi_long arg2,
  abi_long arg3, abi_long arg4, abi_long arg5)
@@ -6434,6 +6446,10 @@ static abi_long do_prctl(CPUArchState *env, abi_long 
option, abi_long arg2,
 return do_prctl_sve_get_vl(env);
 case PR_SVE_SET_VL:
 return do_prctl_sve_set_vl(env, arg2);
+case PR_SME_GET_VL:
+return do_prctl_sme_get_vl(env);
+case PR_SME_SET_VL:
+return do_prctl_sme_set_vl(env, arg2);
 case PR_PAC_RESET_KEYS:
 if (arg3 || arg4 || arg5) {
 return -TARGET_EINVAL;
-- 
2.34.1

Re: [RFC PATCH v2 4/8] qapi: golang: Generate qapi's union types in Go

2022-07-06 Thread Daniel P . Berrangé

On Wed, Jul 06, 2022 at 10:37:54AM +0100, Daniel P. Berrangé wrote:
> On Wed, Jul 06, 2022 at 04:28:16AM -0500, Andrea Bolognani wrote:
> > On Tue, Jul 05, 2022 at 05:35:26PM +0100, Daniel P. Berrangé wrote:
> > > On Tue, Jul 05, 2022 at 08:45:30AM -0700, Andrea Bolognani wrote:
> > > > All this string manipulation looks sketchy. Is there some reason that
> > > > I'm not seeing preventing you for doing something like the untested
> > > > code below?
> > > >
> > > >   func (s GuestPanicInformation) MarshalJSON() ([]byte, error) {
> > > >   if s.HyperV != nil {
> > > >   type union struct {
> > > >   Discriminator string  `json:"type"`
> > > >   HyperVGuestPanicInformationHyperV `json:"hyper-v"`
> > > >   }
> > > >   tmp := union {
> > > >   Discriminator: "hyper-v",
> > > >   HyperV:s.HyperV,
> > > >   }
> > > >   return json.Marshal(tmp)
> > > >   } else if s.S390 != nil {
> > > >   type union struct {
> > > >   Discriminator string  `json:"type"`
> > > >   S390  GuestPanicInformationHyperV `json:"s390"`
> > > >   }
> > > >   tmp := union {
> > > >   Discriminator: "s390",
> > > >   S390:  s.S390,
> > > >   }
> > > >   return json.Marshal(tmp)
> > > >   }
> > > >   return nil, errors.New("...")
> > > >   }
> > >
> > > Using these dummy structs is the way I've approached the
> > > discriminated union issue in the libvirt Golang XML bindings
> > > and it works well. It is the bit I like the least, but it was
> > > the lesser of many evils, and on the plus side in the QEMU case
> > > it'll be auto-generated code.
> > 
> > It appears to be the standard way to approach the problem in Go. It
> > sort of comes naturally given how the APIs for marshal/unmarshal have
> > been defined.
> > 
> > > > > func (s *GuestPanicInformation) UnmarshalJSON(data []byte) error {
> > > > > type Alias GuestPanicInformation
> > > > > peek := struct {
> > > > > Alias
> > > > > Driver string `json:"type"`
> > > > > }{}
> > > > >
> > > > > if err := json.Unmarshal(data, &peek); err != nil {
> > > > > return err
> > > > > }
> > > > > *s = GuestPanicInformation(peek.Alias)
> > > > >
> > > > > switch peek.Driver {
> > > > >
> > > > > case "hyper-v":
> > > > > s.HyperV = new(GuestPanicInformationHyperV)
> > > > > if err := json.Unmarshal(data, s.HyperV); err != nil {
> > > > > s.HyperV = nil
> > > > > return err
> > > > > }
> > > > > case "s390":
> > > > > s.S390 = new(GuestPanicInformationS390)
> > > > > if err := json.Unmarshal(data, s.S390); err != nil {
> > > > > s.S390 = nil
> > > > > return err
> > > > > }
> > > > > }
> > > > > // Unrecognizer drivers are silently ignored.
> > > > > return nil
> > > >
> > > > This looks pretty reasonable, but since you're only using "peek" to
> > > > look at the discriminator you should be able to leave out the Alias
> > > > type entirely and perform the initial Unmarshal operation while
> > > > ignoring all other fields.
> > >
> > > Once you've defined the dummy structs for the Marshall case
> > > though, you might as well use them for Unmarshall too, so you're
> > > not parsing the JSON twice.
> > 
> > You're right, that is undesirable. What about something like this?
> > 
> >   type GuestPanicInformation struct {
> >   HyperV *GuestPanicInformationHyperV
> >   S390   *GuestPanicInformationS390
> >   }
> > 
> >   type jsonGuestPanicInformation struct {
> >   Discriminator string   `json:"type"`
> >   HyperV*GuestPanicInformationHyperV `json:"hyper-v"`
> >   S390  *GuestPanicInformationS390   `json:"s390"`
> >   }
> 
> It can possibly be even simpler with just embedding the real
> struct
> 
>type jsonGuestPanicInformation struct {
>Discriminator string
>GuestPanicInformation
>}
> 
> > 
> >   func (s GuestPanicInformation) MarshalJSON() ([]byte, error) {
> >   if (s.HyperV != nil && s.S390 != nil) ||
> >   (s.HyperV == nil && s.S390 == nil) {
> >   // client hasn't filled in the struct properly
> >   return nil, errors.New("...")
> >   }
> > 
> >   tmp := jsonGuestPanicInformation{}
> > 
> >   if s.HyperV != nil {
> >   tmp.Discriminator = "hyper-v"
> >   tmp.HyperV = s.HyperV
> >   } else if s.S390 != nil {
> >   tmp.Discriminator = "s390"
> >   tmp.S390 = s.S390
> >   }
> > 
> >   return json.Marshal(tmp)
> >   }
> 
> And...
> 
>var discriminator string
>if s.HyperV != nil {
>discriminator = "hyper-v"
>} else if s.S390 != nil {
>discriminator = "s390"
>}
>

Re: [RFC 0/8] Introduce an extensible static analyzer

2022-07-06 Thread Alberto Faria

On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé  wrote:
> On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé  
> > wrote:
> > >  for i in `git ls-tree --name-only -r HEAD:`
> > >  do
> > > clang-tidy $i 1>/dev/null 2>&1
> > >  done
> >
> > All of those invocations are probably failing quickly due to missing
> > includes and other problems, since the location of the compilation
> > database and some other arguments haven't been specified.
>
> Opps yes, I was wy too minimalist in testing that.
>
> >
> > Accounting for those problems (and enabling just one random C++ check):
> >
> > $ time clang-tidy -p build \
> > --extra-arg-before=-Wno-unknown-warning-option \
> > --extra-arg='-isystem [...]' \
> > --checks='-*,clang-analyzer-cplusplus.Move' \
> > $( find block -name '*.c' )
> > [...]
> >
> > real3m0.260s
> > user2m58.041s
> > sys 0m1.467s
>
> Only analysing the block tree, but if we consider a static analysis
> framework is desirable to use for whole of qemu.git, lets see the
> numbers for everything.
>
> What follows was done on  my P1 Gen2 thinkpad with 6 core / 12 threads,
> where I use 'make -j 12' normally.
>
> First as a benchmark, lets see a full compile of whole of QEMU (with
> GCC) on Fedora 36 x86_64
>
> => 14 minutes
>
>
> I find this way too slow though, so I typically configure QEMU with
> --target-list=x86_64-softmmu since that suffices 90% of the time.
>
>=> 2 minutes
>
>
> A 'make check' on this x86_64-only build takes another 2 minutes.
>
>
> Now, a static analysis baseline across the whole tree with default
> tests enabled
>
>  $ clang-tidy --quiet -p build $(git ls-tree -r --name-only HEAD: | grep 
> '\.c$')
>
>   => 45 minutes
>
> wow, wasn't expecting it to be that slow !
>
> Lets restrict to just the block/ dir
>
>  $ clang-tidy --quiet -p build $(find block -name '*.c')
>
>   => 4 minutes
>
> And further restrict to just 1 simple check
>
>  $ clang-tidy --quiet   --checks='-*,clang-analyzer-cplusplus.Move'  -p build 
> $(find block -name '*.c')
>   => 2 minutes 30
>
>
> So extrapolated just that single check would probably work out
> at 30 mins for the whole tree.
>
> Overall this isn't cheap, and in the same order of magnitude
> as a full compile. I guess this shouldn't be that surprising
> really.
>
>
>
> > Single-threaded static-analyzer.py without any checks:
> >
> > $ time ./static-analyzer.py build block -j 1
> > Analyzed 79 translation units in 16.0 seconds.
> >
> > real0m16.665s
> > user0m15.967s
> > sys 0m0.604s
> >
> > And with just the 'return-value-never-used' check enabled for a
> > somewhat fairer comparison:
> >
> > $ time ./static-analyzer.py build block -j 1 \
> > -c return-value-never-used
> > Analyzed 79 translation units in 61.5 seconds.
> >
> > real1m2.080s
> > user1m1.372s
> > sys 0m0.513s
> >
> > Which is good news!

(Well, good news for the Python libclang approach vs others like
clang-tidy plugins; bad news in absolute terms.)

>
> On my machine, a whole tree analysis allowing parallel execution
> (iiuc no -j arg means use all cores):
>
>   ./static-analyzer.py build  $(git ls-tree -r --name-only HEAD: | grep '\.c$'
>
>=> 13 minutes
>
> Or just the block layer
>
>   ./static-analyzer.py build  $(find block -name '*.c')
>
>=> 45 seconds
>
>
> So your script is faster than clang-tidy, which suggests python probably
> isn't the major dominating factor in speed, at least at this point in
> time.
>
>
> Still, a full tree analysis time of 13 minutes, compared to  my normal
> 'make' time of 2 minutes is an order of magnitude.

There goes my 10% overhead target...

>
>
> One thing that I noticed when doing this is that we can only really
> do static analysis of files that we can actually compile on the host.
> IOW, if on a Linux host, we don't get static analysis of code that
> is targetted at FreeBSD / Windows platforms. Obvious really, since
> libclang has to do a full parse and this will lack header files
> for those platforms. That's just the tradeoff you have to accept
> when using a compiler for static analysis, vs a tool that does
> "dumb" string based regex matching to detect mistakes. Both kinds
> of tools likely have their place for different tasks.

Right, I don't think there's anything reasonable we can do about this
limitation.

>
>
> Overall I think a libclang based analysis tool will be useful, but
> I can't see us enabling it as a standard part of 'make check'
> given the time penalty.
>
>
> Feels like something that'll have to be opt-in to a large degree
> for regular contributors. In terms of gating CI though, it is less
> of an issue, since we massively parallelize jobs. As long as we
> have a dedicated build job just for running this static analysis
> check in isolation, and NOT as 'make check' in all e

[PATCH v5 43/45] target/arm: Only set ZEN in reset if SVE present

2022-07-06 Thread Richard Henderson

There's no reason to set CPACR_EL1.ZEN if SVE disabled.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 9c58be8b14..9b54443843 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -204,11 +204,10 @@ static void arm_cpu_reset(DeviceState *dev)
 /* and to the FP/Neon instructions */
 env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
  CPACR_EL1, FPEN, 3);
-/* and to the SVE instructions */
-env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
- CPACR_EL1, ZEN, 3);
-/* with reasonable vector length */
+/* and to the SVE instructions, with default vector length */
 if (cpu_isar_feature(aa64_sve, cpu)) {
+env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
+ CPACR_EL1, ZEN, 3);
 env->vfp.zcr_el[1] = cpu->sve_default_vq - 1;
 }
 /*
-- 
2.34.1

Re: [PATCH v3 00/14] scsi: add quirks and features to support m68k Macs

2022-07-06 Thread Mark Cave-Ayland


On 22/06/2022 11:53, Mark Cave-Ayland wrote:


Here are the next set of patches from my ongoing work to allow the q800
machine to boot MacOS related to SCSI devices.

Patch 1 adds a new quirks bitmap to SCSIDiskState to allow buggy and/or
legacy features to enabled on an individual device basis. Once the quirks
bitmap has been added, patch 2 uses the quirks feature to implement an
Apple-specific mode page which is required to allow the disk to be recognised
and used by Apple HD SC Setup.

Patch 3 adds compat_props to the q800 machine which enable the new
MODE_PAGE_APPLE_VENDOR quirk for all scsi-cd devices attached to the machine.

Patch 4 adds a new quirk to force SCSI CDROMs to always honour the block
descriptor for a MODE SENSE command which is expected by A/UX, whilst patch 5
enables the quirk for all scsi-cd devices on the q800 machine.

Patches 6 and 7 implement a new MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk to
allow PF=0 MODE SELECT requests which are used by both MacOS and A/UX, along
with a MODE_PAGE_VENDOR_SPECIFIC (0x0) mode page compatible with MacOS. Once
again this quirk is only enabled for SCSI devices on the q800 machine.

Patch 8 implements a dummy FORMAT UNIT command which is used by the Apple HD SC
Setup program when preparing an empty disk to install MacOS.

Patches 9 and 10 add support for allowing truncated MODE SELECT requests which 
are
sent by A/UX when enumerating a SCSI CDROM device. Allowing these broken 
requests
is protected by a new MODE_PAGE_TRUNCATED quirk which is only enabled for SCSI
CDROM devices attached to the q800 machine.

Patch 11 allows the MODE_PAGE_R_W_ERROR AWRE bit to be changeable since the A/UX
MODE SELECT request sets this bit to 0 rather than the QEMU default which is 1.

Patch 12 adds support for setting the SCSI block size via a MODE SELECT request
which is most commonly used by older CDROMs to allow the block size to be 
changed
from the default of 2048 bytes to 512 bytes for compatibility purposes. This is
used by A/UX which otherwise fails with SCSI errors if the block size is not set
to 512 bytes when accessing CDROMs.

Finally patches 13 and 14 augment the compat_props to set the default vendor,
product and version information for all scsi-hd and scsi-cd devices attached
to the q800 machine, taken from real drives. This is because MacOS will only
allow a known set of SCSI devices to be detected and initialised during the
installation process.

Signed-off-by: Mark Cave-Ayland 

v3:
[Note from v2: this series has changed in structure and functionality based upon
bug reports from Howard off-list regarding detection/changing of CDROM media in
both A/UX and MacOS]

- Rearrange order to aid bisecting differences between CDROM and DISK quirks
- Add R-B tags from Laurent and Phil
- Replace %zd with %zu in trace-events in patch 8
- Add a new SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk to handle truncated MODE 
SELECT
   requests
- Rename SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk to
   SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD since due to additional changes in 
this series
   the DBD bit can be honoured rather than forced off
- Add support for PF=0 MODE SELECT commands a and new MODE_PAGE_VENDOR_SPECIFIC 
(0x0) page
   with a suitable implementation for MacOS protected by a new
   SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk (this fixes detection 
of CDROM
   media in some cases)
- Allow the SCSI block size to be set for both CDROMs and DISKs as requested by 
Paolo

v2:
- Change patchset title from "scsi: add support for FORMAT UNIT command and 
quirks"
   to "scsi: add quirks and features to support m68k Macs"
- Fix missing shift in patch 2 as pointed out by Fam
- Rename MODE_PAGE_APPLE to MODE_PAGE_APPLE_VENDOR
- Add SCSI_DISK_QUIRK_MODE_SENSE_ROM_FORCE_DBD quirk
- Add support for truncated MODE SELECT requests
- Allow MODE_PAGE_R_W_ERROR AWRE bit to be changeable for CDROM devices
- Allow the MODE SELECT block descriptor to set the CDROM block size


Mark Cave-Ayland (14):
   scsi-disk: add new quirks bitmap to SCSIDiskState
   scsi-disk: add MODE_PAGE_APPLE_VENDOR quirk for Macintosh
   q800: implement compat_props to enable quirk_mode_page_apple_vendor
 for scsi-cd devices
   scsi-disk: add SCSI_DISK_QUIRK_MODE_SENSE_ROM_USE_DBD quirk for
 Macintosh
   q800: implement compat_props to enable quirk_mode_sense_rom_use_dbd
 for scsi-cd devices
   scsi-disk: add SCSI_DISK_QUIRK_MODE_PAGE_VENDOR_SPECIFIC_APPLE quirk
 for Macintosh
   q800: implement compat_props to enable
 quirk_mode_page_vendor_specific_apple for scsi devices
   scsi-disk: add FORMAT UNIT command
   scsi-disk: add SCSI_DISK_QUIRK_MODE_PAGE_TRUNCATED quirk for Macintosh
   q800: implement compat_props to enable quirk_mode_page_truncated for
 scsi-cd devices
   scsi-disk: allow the MODE_PAGE_R_W_ERROR AWRE bit to be changeable for
 CDROM drives
   scsi-disk: allow MODE SELECT block descriptor to set the block size
   q800: add default vendor and product infor

[PATCH v3 0/1] qga: add command 'guest-get-cpustats'

2022-07-06 Thread zhenwei pi

v2 -> v3:
- Rename 'GuestOsType' to 'GuestCpuStatsType'.
- Add 'linux' into polluted_words, rename 'linuxos' to 'linux'. Remove
  'windows' from 'GuestCpuStatsType', because currently we don't use it.

v1 -> v2:
- Konstantin & Marc-André pointed out that the structure 'GuestCpuStats'
  is too *linux style*, so re-define it to 'GuestLinuxCpuStats', and use
  an union type of 'GuestCpuStats'.

- Modify comment info from 'man proc', also add linux version infomation.

- Test sscanf return value by '(i == EOF)' (To Marc-André: name is declared
  as 'char name[64];', so we can't test '!name').

- Suggested by Marc-André, use 'int clk_tck = sysconf(_SC_CLK_TCK);'
  instead of hard code.

v1:
- Implement guest agent command 'guest-get-cpustats'

Zhenwei Pi (1):
  qga: add command 'guest-get-cpustats'

 qga/commands-posix.c   | 89 ++
 qga/commands-win32.c   |  6 +++
 qga/qapi-schema.json   | 81 ++
 scripts/qapi/common.py |  2 +-
 4 files changed, 177 insertions(+), 1 deletion(-)

-- 
2.20.1

Re: [PATCH v8 13/20] jobs: group together API calls under the same job lock

2022-07-06 Thread Stefan Hajnoczi

On Tue, Jul 05, 2022 at 04:22:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 7/5/22 16:01, Emanuele Giuseppe Esposito wrote:
> > 
> > 
> > Am 05/07/2022 um 10:17 schrieb Emanuele Giuseppe Esposito:
> > > 
> > > 
> > > Am 05/07/2022 um 10:14 schrieb Stefan Hajnoczi:
> > > > On Wed, Jun 29, 2022 at 10:15:31AM -0400, Emanuele Giuseppe Esposito 
> > > > wrote:
> > > > > diff --git a/blockdev.c b/blockdev.c
> > > > > index 71f793c4ab..5b79093155 100644
> > > > > --- a/blockdev.c
> > > > > +++ b/blockdev.c
> > > > > @@ -150,12 +150,15 @@ void blockdev_mark_auto_del(BlockBackend *blk)
> > > > >   return;
> > > > >   }
> > > > > -for (job = block_job_next(NULL); job; job = block_job_next(job)) 
> > > > > {
> > > > > +JOB_LOCK_GUARD();
> > > > > +
> > > > > +for (job = block_job_next_locked(NULL); job;
> > > > > + job = block_job_next_locked(job)) {
> > > > >   if (block_job_has_bdrv(job, blk_bs(blk))) {
> > > > >   AioContext *aio_context = job->job.aio_context;
> > > > >   aio_context_acquire(aio_context);
> > > > 
> > > > Is there a lock ordering rule for job_mutex and the AioContext lock? I
> > > > haven't audited the code, but there might be ABBA lock ordering issues.
> > > 
> > > Doesn't really matter here, as lock is nop. To be honest I forgot which
> > > one should go first, probably job_lock because the aiocontext lock can
> > > be taken and released in callbacks.
> > > 
> > > Should I resend with ordering fixed? Just to have a consistent logic
> > 
> > Well actually how do I fix that? I would just add useless additional
> > changes into the diff, because for example in the case below I am not
> > even sure what exactly is the aiocontext protecting.
> > 
> > So I guess I'll leave as it is. I will just update the commit message to
> > make sure it is clear that the lock is nop and ordering is mixed.
> > 
> 
> Yes, I think it's OK.
> 
> As far as I understand, our final ordering rule is that job_mutex can be 
> taken under aio context lock but not visa-versa.

I'm also fine with resolving the ordering in a later patch.

Stefan


signature.asc
Description: PGP signature

[PATCH v2 0/3] Fix some coverity issues on VDUSE

2022-07-06 Thread Xie Yongji

This series fixes some issues reported by coverity.

Patch 1 fixes a incorrect function name.

Patch 2 fixes Coverity CID 1490224.

Patch 3 fixes Coverity CID 1490226, 1490223.

V1 to V2:
- Drop the patch to fix Coverity CID 1490222, 1490227 [Markus]
- Add some commit log to explain why we don't use g_strlcpy() [Markus]

Xie Yongji (3):
  libvduse: Fix the incorrect function name
  libvduse: Replace strcpy() with strncpy()
  libvduse: Pass positive value to strerror()

 subprojects/libvduse/libvduse.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

-- 
2.20.1

Re: [PATCH v8 06/20] job.h: define functions called without job lock held

2022-07-06 Thread Vladimir Sementsov-Ogievskiy


On 7/6/22 11:23, Emanuele Giuseppe Esposito wrote:



Am 05/07/2022 um 12:54 schrieb Vladimir Sementsov-Ogievskiy:

To subject: hmm, the commit don't define any function..


mark functions called without job lock held?



Yes, that's better)

--
Best regards,
Vladimir

Re: [RFC 0/8] Introduce an extensible static analyzer

2022-07-06 Thread Daniel P . Berrangé

On Wed, Jul 06, 2022 at 10:54:51AM +0100, Alberto Faria wrote:
> On Tue, Jul 5, 2022 at 5:12 PM Daniel P. Berrangé  wrote:
> > On Tue, Jul 05, 2022 at 12:28:55PM +0100, Alberto Faria wrote:
> > > On Tue, Jul 5, 2022 at 8:16 AM Daniel P. Berrangé  
> > > wrote:
> >
> > Overall I think a libclang based analysis tool will be useful, but
> > I can't see us enabling it as a standard part of 'make check'
> > given the time penalty.
> >
> >
> > Feels like something that'll have to be opt-in to a large degree
> > for regular contributors. In terms of gating CI though, it is less
> > of an issue, since we massively parallelize jobs. As long as we
> > have a dedicated build job just for running this static analysis
> > check in isolation, and NOT as 'make check' in all existing jobs,
> > it can happen in parallel with all the other build jobs, and we
> > won't notice the speed.
> >
> > In summary, I think this approach is viable despite the speed
> > penalty provided we dont wire it into 'make check' by default.
> 
> Agreed. Thanks for gathering these numbers.
> 
> Making the script use build dependency information, to avoid
> re-analyzing translation units that weren't modified since the last
> analysis, should make it fast enough to be usable iteratively during
> development. Header precompilation could also be worth looking into.
> Doing that + running a full analysis in CI should be good enough.

For clang-tidy, I've been trying it out integrated into emacs
via eglot and clangd. This means I get clang-tidy errors reported
interactively as I write code, so wouldn't need to run a full
tree analysis. Unfortunately, unless I'm missing something, there's
no way to extend clangd to plugin extra checks.  So it would need
to re-implement something equivalent to clangd for our custom checks,
and then integrate that into eglot (or equiv for other editors).


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v3 1/1] qga: add command 'guest-get-cpustats'

2022-07-06 Thread zhenwei pi

A vCPU thread always reaches 100% utilization when:
- guest uses idle=poll
- disable HLT vm-exit
- enable MWAIT

Add new guest agent command 'guest-get-cpustats' to get guest CPU
statistics, we can know the guest workload and how busy the CPU is.

To avoid compiling error like:
qga/qga-qapi-types.h:948:28: error: expected member name or ';'
 after declaration specifiers
GuestLinuxCpuStats linux;
~~ ^
:336:15: note: expanded from here

Also add 'linux' into polluted_words.

Signed-off-by: zhenwei pi 
---
 qga/commands-posix.c   | 89 ++
 qga/commands-win32.c   |  6 +++
 qga/qapi-schema.json   | 81 ++
 scripts/qapi/common.py |  2 +-
 4 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 0469dc409d..f18530d85f 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2893,6 +2893,90 @@ GuestDiskStatsInfoList *qmp_guest_get_diskstats(Error 
**errp)
 return guest_get_diskstats(errp);
 }
 
+GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
+{
+GuestCpuStatsList *head = NULL, **tail = &head;
+const char *cpustats = "/proc/stat";
+int clk_tck = sysconf(_SC_CLK_TCK);
+FILE *fp;
+size_t n;
+char *line = NULL;
+
+fp = fopen(cpustats, "r");
+if (fp  == NULL) {
+error_setg_errno(errp, errno, "open(\"%s\")", cpustats);
+return NULL;
+}
+
+while (getline(&line, &n, fp) != -1) {
+GuestCpuStats *cpustat = NULL;
+GuestLinuxCpuStats *linuxcpustat;
+int i;
+unsigned long user, system, idle, iowait, irq, softirq, steal, guest;
+unsigned long nice, guest_nice;
+char name[64];
+
+i = sscanf(line, "%s %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu",
+   name, &user, &nice, &system, &idle, &iowait, &irq, &softirq,
+   &steal, &guest, &guest_nice);
+
+/* drop "cpu 1 2 3 ...", get "cpuX 1 2 3 ..." only */
+if ((i == EOF) || strncmp(name, "cpu", 3) || (name[3] == '\0')) {
+continue;
+}
+
+if (i < 5) {
+slog("Parsing cpu stat from %s failed, see \"man proc\"", 
cpustats);
+break;
+}
+
+cpustat = g_new0(GuestCpuStats, 1);
+cpustat->type = GUEST_CPU_STATS_TYPE_LINUX;
+
+linuxcpustat = &cpustat->u.q_linux;
+linuxcpustat->cpu = atoi(&name[3]);
+linuxcpustat->user = user * 1000 / clk_tck;
+linuxcpustat->nice = nice * 1000 / clk_tck;
+linuxcpustat->system = system * 1000 / clk_tck;
+linuxcpustat->idle = idle * 1000 / clk_tck;
+
+if (i > 5) {
+linuxcpustat->has_iowait = true;
+linuxcpustat->iowait = iowait * 1000 / clk_tck;
+}
+
+if (i > 6) {
+linuxcpustat->has_irq = true;
+linuxcpustat->irq = irq * 1000 / clk_tck;
+linuxcpustat->has_softirq = true;
+linuxcpustat->softirq = softirq * 1000 / clk_tck;
+}
+
+if (i > 8) {
+linuxcpustat->has_steal = true;
+linuxcpustat->steal = steal * 1000 / clk_tck;
+}
+
+if (i > 9) {
+linuxcpustat->has_guest = true;
+linuxcpustat->guest = guest * 1000 / clk_tck;
+}
+
+if (i > 10) {
+linuxcpustat->has_guest = true;
+linuxcpustat->guest = guest * 1000 / clk_tck;
+linuxcpustat->has_guestnice = true;
+linuxcpustat->guestnice = guest_nice * 1000 / clk_tck;
+}
+
+QAPI_LIST_APPEND(tail, cpustat);
+}
+
+free(line);
+fclose(fp);
+return head;
+}
+
 #else /* defined(__linux__) */
 
 void qmp_guest_suspend_disk(Error **errp)
@@ -3247,6 +3331,11 @@ GuestDiskStatsInfoList *qmp_guest_get_diskstats(Error 
**errp)
 return NULL;
 }
 
+GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
 
 #endif /* CONFIG_FSFREEZE */
 
diff --git a/qga/commands-win32.c b/qga/commands-win32.c
index 36f94c0f9c..7ed7664715 100644
--- a/qga/commands-win32.c
+++ b/qga/commands-win32.c
@@ -2543,3 +2543,9 @@ GuestDiskStatsInfoList *qmp_guest_get_diskstats(Error 
**errp)
 error_setg(errp, QERR_UNSUPPORTED);
 return NULL;
 }
+
+GuestCpuStatsList *qmp_guest_get_cpustats(Error **errp)
+{
+error_setg(errp, QERR_UNSUPPORTED);
+return NULL;
+}
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 9fa20e791b..869399ea1a 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1576,3 +1576,84 @@
 { 'command': 'guest-get-diskstats',
   'returns': ['GuestDiskStatsInfo']
 }
+
+##
+# @GuestCpuStatsType:
+#
+# An enumeration of OS type
+#
+# Since: 7.1
+##
+{ 'enum': 'GuestCpuStatsType',
+  'data': [ 'linux' ] }
+
+
+##
+# @GuestLinuxCpuStats:
+#
+# CPU statistics of Linux
+#
+# @cpu: CPU index in guest OS
+#
+# @user: Time spent in user mod

Re: [RFC 0/8] Introduce an extensible static analyzer

2022-07-06 Thread Alberto Faria

On Tue, Jul 5, 2022 at 5:13 PM Daniel P. Berrangé  wrote:
> FWIW, after applying this series 'make check' throws lots of failures
> and hangs for me in the block I/O tests, so something appears not quite
> correct here. I didn't bother to investigate/debug since you marked this
> as just an RFC

Thanks, it appears some coroutine_fn functions are being called from
non-coroutine context, so some call conversions from bdrv_... to
bdrv_co_... introduce problems. These changes are only intended as
examples of using the tool for the time being.

Alberto

1 2 3 >

1 - 100 of 296 matches

Mail list logo