Re: Boot failure after QEMU's upgrade to OpenSBI v1.3 (was Re: [PATCH for-8.2 6/7] target/riscv: add 'max' CPU type)

2023-07-15 Thread Atish Patra
On Fri, Jul 14, 2023 at 5:29 AM Conor Dooley  wrote:
>
> On Fri, Jul 14, 2023 at 11:19:34AM +0100, Conor Dooley wrote:
> > On Fri, Jul 14, 2023 at 10:00:19AM +0530, Anup Patel wrote:
> >
> > > > > OpenSBI v1.3
> > > > >_  _
> > > > >   / __ \  / |  _ \_   _|
> > > > >  | |  | |_ __   ___ _ __ | (___ | |_) || |
> > > > >  | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
> > > > >  | |__| | |_) |  __/ | | |) | |_) || |_
> > > > >   \/| .__/ \___|_| |_|_/|___/_|
> > > > > | |
> > > > > |_|
> > > > >
> > > > > init_coldboot: ipi init failed (error -1009)
> > > > >
> > > > > Just to note, because we use our own firmware that vendors in OpenSBI
> > > > > and compiles only a significantly cut down number of files from it, we
> > > > > do not use the fw_dynamic etc flow on our hardware. As a result, we 
> > > > > have
> > > > > not tested v1.3, nor do we have any immediate plans to change our
> > > > > platform firmware to vendor v1.3 either.
> > > > >
> > > > > I unless there's something obvious to you, it sounds like I will need 
> > > > > to
> > > > > go and bisect OpenSBI. That's a job for another day though, given the
> > > > > time.
> > > > >
> > >
> > > The real issue is some CPU/HART DT nodes marked as disabled in the
> > > DT passed to OpenSBI 1.3.
> > >
> > > This issue does not exist in any of the DTs generated by QEMU but some
> > > of the DTs in the kernel (such as microchip and SiFive board DTs) have
> > > the E-core disabled.
> > >
> > > I had discovered this issue in a totally different context after the 
> > > OpenSBI 1.3
> > > release happened. This issue is already fixed in the latest OpenSBI by the
> > > following commit c6a35733b74aeff612398f274ed19a74f81d1f37 ("lib: utils:
> > > Fix sbi_hartid_to_scratch() usage in ACLINT drivers").
> >
> > Great, thanks Anup! I thought I had tested tip-of-tree too, but
> > obviously not.
> >
> > > I always assumed that Microchip hss.bin is the preferred BIOS for the
> > > QEMU microchip-icicle-kit machine but I guess that's not true.
> >
> > Unfortunately the HSS has not worked in QEMU for a long time, and while
> > I would love to fix it, but am pretty stretched for spare time to begin
> > with.
> > I usually just do direct kernel boots, which use the OpenSBI that comes
> > with QEMU, as I am sure you already know :)
> >
> > > At this point, you can either:
> > > 1) Use latest OpenSBI on QEMU microchip-icicle-kit machine
>
> I forgot to reply to this point, wondering what should be done with
> QEMU. Bumping to v1.3 in QEMU introduces a regression here, regardless
> of whether I can go and build a fixed version of OpenSBI.
>
FYI: The no-map fix went in OpenSBI v1.3. Without the upgrade, any
user using the latest kernel (> v6.4)
may hit those random linear map related issues (in hibernation or EFI
booting path).

There are three possible scenarios:

1. Upgrade to OpenSBI v1.3: Any user of microchip-icicle-kit machine
or sifive fu540 machine users
may hit this issue if the device tree has the disabled hart (e core).
2. No upgrade to OpenSBI v1.2. Any user using hibernation or UEFI may
have issues [1]
3. Include a non-release version OpenSBI in Qemu with the fix as an exception.

#3 probably deviates from policy and sets a bad precedent. So I am not
advocating for it though ;)
For both #1 & #2, the solution would be to use the latest OpenSBI in
-bios argument instead of the stock one.
I could be wrong but my guess is the number of users facing #2 would
be higher than #1.

[1] 
https://lore.kernel.org/linux-riscv/20230625140931.1266216-1-songshuaish...@tinylab.org/
> > > 2) Ensure CPU0 DT node is enabled in DT when booting on QEMU
> > > microchip-icicle-kit machine with OpenSBI 1.3
> >
> > Will OpenSBI disable it? If not, I think option 2) needs to be remove
> > the DT node. I'll just use tip-of-tree myself & up to the
>
> Clearly didn't finish this comment. It was meant to say "up to the QEMU
> maintainers what they want to do on the QEMU side of things".
>
> Thanks,
> Conor.



-- 
Regards,
Atish



[PATCH] For curses display, recognize a few more control keys

2023-07-15 Thread Sean Estabrooks
The curses display handles most control-X keys, and translates
them into their corresponding keycode.  Here we recognize
a few that are missing, Ctrl-@ (null), Ctrl-\ (backslash),
Ctrl-] (right bracket), Ctrl-^ (caret), Ctrl-_ (underscore).

Signed-off-by: Sean Estabrooks 
---
 ui/curses_keys.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/ui/curses_keys.h b/ui/curses_keys.h
index 71e04acdc7..88a2208ed1 100644
--- a/ui/curses_keys.h
+++ b/ui/curses_keys.h
@@ -210,6 +210,12 @@ static const int _curses2keycode[CURSES_CHARS] = {
 ['N' - '@'] = 49 | CNTRL, /* Control + n */
 /* Control + m collides with the keycode for Enter */

+['@' - '@']  =  3 | CNTRL, /* Control + @ */
+/* Control + [ collides with the keycode for Escape */
+['\\' - '@'] = 43 | CNTRL, /* Control + Backslash */
+[']' - '@']  = 27 | CNTRL, /* Control + ] */
+['^' - '@']  =  7 | CNTRL, /* Control + ^ */
+['_' - '@']  = 12 | CNTRL, /* Control + Underscore */
 };

 static const int _curseskey2keycode[CURSES_KEYS] = {
-- 
2.40.1



[PULL 06/47] linux-user: Use abi_llong not int64_t in syscall_defs.h

2023-07-15 Thread Richard Henderson
Be careful not to change linux_dirent64, which is a host structure.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 0af7249330..2846a8cfa5 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1455,8 +1455,8 @@ struct target_stat64 {
 unsigned char   __pad2[6];
 unsigned short  st_rdev;
 
-int64_t st_size;
-int64_t st_blksize;
+abi_llong   st_size;
+abi_llong   st_blksize;
 
 unsigned char   __pad4[4];
 unsigned intst_blocks;
@@ -1514,7 +1514,7 @@ struct target_stat64 {
 
 unsigned char   __pad3[8];
 
-int64_t st_size;
+abi_llong   st_size;
 unsigned intst_blksize;
 
 unsigned char   __pad4[8];
@@ -1630,10 +1630,10 @@ struct QEMU_PACKED target_stat64 {
 abi_ullong st_rdev;
 abi_ullong __pad1;
 
-int64_t  st_size;
+abi_llong st_size;
 abi_int  st_blksize;
 abi_uint __pad2;
-int64_t st_blocks;  /* Number 512-byte blocks allocated. */
+abi_llong st_blocks;
 
 inttarget_st_atime;
 unsigned int   target_st_atime_nsec;
@@ -1760,7 +1760,7 @@ struct target_stat {
 int  st_gid;
 abi_ulongst_rdev;
 abi_ulongst_pad1[3]; /* Reserved for st_rdev expansion */
-int64_t  st_size;
+abi_llongst_size;
 abi_long target_st_atime;
 abi_ulongtarget_st_atime_nsec; /* Reserved for st_atime expansion */
 abi_long target_st_mtime;
@@ -1769,7 +1769,7 @@ struct target_stat {
 abi_ulongtarget_st_ctime_nsec; /* Reserved for st_ctime expansion */
 abi_ulongst_blksize;
 abi_ulongst_pad2;
-int64_t  st_blocks;
+abi_llongst_blocks;
 };
 
 #elif defined(TARGET_ABI_MIPSO32)
@@ -1824,7 +1824,7 @@ struct target_stat64 {
 abi_ulong   st_rdev;
 abi_ulong   st_pad1[3]; /* Reserved for st_rdev expansion  */
 
-int64_t st_size;
+abi_llong   st_size;
 
 /*
  * Actually this should be timestruc_t st_atime, st_mtime and st_ctime
@@ -1842,7 +1842,7 @@ struct target_stat64 {
 abi_ulong   st_blksize;
 abi_ulong   st_pad2;
 
-int64_t st_blocks;
+abi_llong   st_blocks;
 };
 
 #elif defined(TARGET_ALPHA)
@@ -2051,7 +2051,7 @@ struct target_stat64  {
 unsigned int  st_uid;   /* User ID of the file's owner. */
 unsigned int  st_gid;   /* Group ID of the file's group. */
 abi_ullong st_rdev; /* Device number, if device. */
-int64_t st_size;/* Size of file, in bytes. */
+abi_llong st_size;  /* Size of file, in bytes. */
 abi_ulong st_blksize;   /* Optimal block size for I/O. */
 abi_ulong __unused2;
 abi_ullong st_blocks;   /* Number 512-byte blocks allocated. */
@@ -2105,10 +2105,10 @@ struct target_stat64 {
 unsigned int st_gid;
 abi_ullong st_rdev;
 abi_ullong __pad1;
-int64_t st_size;
+abi_llong st_size;
 int st_blksize;
 int __pad2;
-int64_t st_blocks;
+abi_llong st_blocks;
 int target_st_atime;
 unsigned int target_st_atime_nsec;
 int target_st_mtime;
@@ -2165,9 +2165,9 @@ struct target_stat64 {
 abi_uint   st_gid;
 abi_ullong st_rdev;
 abi_uint   _pad2;
-int64_tst_size;
+abi_llong  st_size;
 abi_intst_blksize;
-int64_tst_blocks;
+abi_llong  st_blocks;
 abi_inttarget_st_atime;
 abi_uint   target_st_atime_nsec;
 abi_inttarget_st_mtime;
@@ -2790,7 +2790,7 @@ struct target_user_cap_data {
 #define TARGET_SYSLOG_ACTION_SIZE_BUFFER   10
 
 struct target_statx_timestamp {
-int64_t tv_sec;
+abi_llong tv_sec;
 abi_uint tv_nsec;
 abi_int __reserved;
 };
-- 
2.34.1




[PULL 15/47] include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2

2023-07-15 Thread Richard Henderson
Based on gcc's nios2.h setting BIGGEST_ALIGNMENT to 32 bits.

Signed-off-by: Richard Henderson 
---
 include/exec/user/abitypes.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/exec/user/abitypes.h b/include/exec/user/abitypes.h
index beba0a48c7..6191ce9f74 100644
--- a/include/exec/user/abitypes.h
+++ b/include/exec/user/abitypes.h
@@ -17,7 +17,8 @@
 
 #if (defined(TARGET_I386) && !defined(TARGET_X86_64)) \
 || defined(TARGET_SH4) \
-|| defined(TARGET_MICROBLAZE)
+|| defined(TARGET_MICROBLAZE) \
+|| defined(TARGET_NIOS2)
 #define ABI_LLONG_ALIGNMENT 4
 #endif
 
-- 
2.34.1




[PULL 05/47] linux-user: Use abi_ullong not uint64_t in syscall_defs.h

2023-07-15 Thread Richard Henderson
Be careful not to change linux_dirent64, which is a host structure.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 72 +++
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index caaa895bec..0af7249330 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1444,8 +1444,8 @@ struct target_stat64 {
 unsigned char   __pad0[6];
 unsigned short  st_dev;
 
-uint64_tst_ino;
-uint64_tst_nlink;
+abi_ullong  st_ino;
+abi_ullong  st_nlink;
 
 unsigned intst_mode;
 
@@ -1501,7 +1501,7 @@ struct target_stat64 {
 unsigned char   __pad0[6];
 unsigned short  st_dev;
 
-uint64_t st_ino;
+abi_ullong  st_ino;
 
 unsigned intst_mode;
 unsigned intst_nlink;
@@ -1618,7 +1618,7 @@ struct target_stat {
 /* FIXME: Microblaze no-mmu user-space has a difference stat64 layout...  */
 #define TARGET_HAS_STRUCT_STAT64
 struct QEMU_PACKED target_stat64 {
-uint64_t st_dev;
+abi_ullong st_dev;
 #define TARGET_STAT64_HAS_BROKEN_ST_INO 1
 abi_uint pad0;
 abi_uint __st_ino;
@@ -1627,8 +1627,8 @@ struct QEMU_PACKED target_stat64 {
 abi_uint st_nlink;
 abi_uint st_uid;
 abi_uint st_gid;
-uint64_t st_rdev;
-uint64_t __pad1;
+abi_ullong st_rdev;
+abi_ullong __pad1;
 
 int64_t  st_size;
 abi_int  st_blksize;
@@ -1641,7 +1641,7 @@ struct QEMU_PACKED target_stat64 {
 unsigned int   target_st_mtime_nsec;
 inttarget_st_ctime;
 unsigned int   target_st_ctime_nsec;
-uint64_t st_ino;
+abi_ullong st_ino;
 };
 
 #elif defined(TARGET_M68K)
@@ -1753,7 +1753,7 @@ struct target_stat {
 struct target_stat {
 abi_ulongst_dev;
 abi_ulongst_pad0[3]; /* Reserved for st_dev expansion */
-uint64_t st_ino;
+abi_ullong   st_ino;
 unsigned int st_mode;
 unsigned int st_nlink;
 int  st_uid;
@@ -1813,7 +1813,7 @@ struct target_stat64 {
 abi_ulong   st_dev;
 abi_ulong   st_pad0[3]; /* Reserved for st_dev expansion  */
 
-uint64_tst_ino;
+abi_ullong  st_ino;
 
 unsigned intst_mode;
 unsigned intst_nlink;
@@ -2044,17 +2044,17 @@ struct target_stat {
 
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64  {
-uint64_t st_dev;/* Device */
-uint64_t st_ino;/* File serial number */
+abi_ullong st_dev;  /* Device */
+abi_ullong st_ino;  /* File serial number */
 unsigned int  st_mode;  /* File mode. */
 unsigned int  st_nlink; /* Link count. */
 unsigned int  st_uid;   /* User ID of the file's owner. */
 unsigned int  st_gid;   /* Group ID of the file's group. */
-uint64_t st_rdev;   /* Device number, if device. */
+abi_ullong st_rdev; /* Device number, if device. */
 int64_t st_size;/* Size of file, in bytes. */
 abi_ulong st_blksize;   /* Optimal block size for I/O. */
 abi_ulong __unused2;
-uint64_t st_blocks; /* Number 512-byte blocks allocated. */
+abi_ullong st_blocks;   /* Number 512-byte blocks allocated. */
 abi_ulong target_st_atime;  /* Time of last access. */
 abi_ulong target_st_atime_nsec;
 abi_ulong target_st_mtime;  /* Time of last modification. */
@@ -2097,14 +2097,14 @@ struct target_stat {
 #if !defined(TARGET_RISCV64)
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64 {
-uint64_t st_dev;
-uint64_t st_ino;
+abi_ullong st_dev;
+abi_ullong st_ino;
 unsigned int st_mode;
 unsigned int st_nlink;
 unsigned int st_uid;
 unsigned int st_gid;
-uint64_t st_rdev;
-uint64_t __pad1;
+abi_ullong st_rdev;
+abi_ullong __pad1;
 int64_t st_size;
 int st_blksize;
 int __pad2;
@@ -2156,14 +2156,14 @@ struct target_stat {
 
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64 {
-uint64_t   st_dev;
+abi_ullong st_dev;
 abi_uint   _pad1;
 abi_uint   _res1;
 abi_uint   st_mode;
 abi_uint   st_nlink;
 abi_uint   st_uid;
 abi_uint   st_gid;
-uint64_t   st_rdev;
+abi_ullong st_rdev;
 abi_uint   _pad2;
 int64_tst_size;
 abi_intst_blksize;
@@ -2174,7 +2174,7 @@ struct target_stat64 {
 abi_uint   target_st_mtime_nsec;
 abi_inttarget_st_ctime;
 abi_uint   target_st_ctime_nsec;
-uint64_t   st_ino;
+abi_ullong st_ino;
 };
 
 #elif defined(TARGET_LOONGARCH64)
@@ -2231,11 +2231,11 @@ struct target_statfs64 {
 abi_uintf_bsize;
 abi_uintf_frsize;   /* Fragment size - unsupported */
 abi_uint__pad;
-uint64_tf_blocks;
-uint64_tf_bfree;
-uint64_tf_files;
-uint64_tf_ffree;
-uint64_tf_bavail;
+abi_ullong  f_blocks;
+a

[PULL 11/47] linux-user: Use abi_ushort not unsigned short in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 90 +++
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 442a8aefe3..21ca03b0f4 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -432,7 +432,7 @@ typedef struct {
 struct target_dirent {
 abi_longd_ino;
 abi_longd_off;
-unsigned short  d_reclen;
+abi_ushort  d_reclen;
 chard_name[];
 };
 
@@ -1210,19 +1210,19 @@ struct target_rtc_pll_info {
 
 #define TARGET_NCC 8
 struct target_termio {
-unsigned short c_iflag; /* input mode flags */
-unsigned short c_oflag; /* output mode flags */
-unsigned short c_cflag; /* control mode flags */
-unsigned short c_lflag; /* local mode flags */
+abi_ushort c_iflag; /* input mode flags */
+abi_ushort c_oflag; /* output mode flags */
+abi_ushort c_cflag; /* control mode flags */
+abi_ushort c_lflag; /* local mode flags */
 unsigned char c_line;   /* line discipline */
 unsigned char c_cc[TARGET_NCC]; /* control characters */
 };
 
 struct target_winsize {
-unsigned short ws_row;
-unsigned short ws_col;
-unsigned short ws_xpixel;
-unsigned short ws_ypixel;
+abi_ushort ws_row;
+abi_ushort ws_col;
+abi_ushort ws_xpixel;
+abi_ushort ws_ypixel;
 };
 
 #include "termbits.h"
@@ -1328,15 +1328,15 @@ struct target_winsize {
 || defined(TARGET_CRIS)
 #define TARGET_STAT_HAVE_NSEC
 struct target_stat {
-unsigned short st_dev;
-unsigned short __pad1;
+abi_ushort st_dev;
+abi_ushort __pad1;
 abi_ulong st_ino;
-unsigned short st_mode;
-unsigned short st_nlink;
-unsigned short st_uid;
-unsigned short st_gid;
-unsigned short st_rdev;
-unsigned short __pad2;
+abi_ushort st_mode;
+abi_ushort st_nlink;
+abi_ushort st_uid;
+abi_ushort st_gid;
+abi_ushort st_rdev;
+abi_ushort __pad2;
 abi_ulong  st_size;
 abi_ulong  st_blksize;
 abi_ulong  st_blocks;
@@ -1355,7 +1355,7 @@ struct target_stat {
  */
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64 {
-unsigned short  st_dev;
+abi_ushort  st_dev;
 unsigned char   __pad0[10];
 
 #define TARGET_STAT64_HAS_BROKEN_ST_INO 1
@@ -1367,7 +1367,7 @@ struct target_stat64 {
 abi_ulong   st_uid;
 abi_ulong   st_gid;
 
-unsigned short  st_rdev;
+abi_ushort  st_rdev;
 unsigned char   __pad3[10];
 
 abi_llong   st_size;
@@ -1442,7 +1442,7 @@ struct target_stat {
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64 {
 unsigned char   __pad0[6];
-unsigned short  st_dev;
+abi_ushort  st_dev;
 
 abi_ullong  st_ino;
 abi_ullong  st_nlink;
@@ -1453,7 +1453,7 @@ struct target_stat64 {
 abi_uintst_gid;
 
 unsigned char   __pad2[6];
-unsigned short  st_rdev;
+abi_ushort  st_rdev;
 
 abi_llong   st_size;
 abi_llong   st_blksize;
@@ -1477,13 +1477,13 @@ struct target_stat64 {
 
 #define TARGET_STAT_HAVE_NSEC
 struct target_stat {
-unsigned short  st_dev;
+abi_ushort  st_dev;
 abi_ulong   st_ino;
-unsigned short  st_mode;
+abi_ushort  st_mode;
 short   st_nlink;
-unsigned short  st_uid;
-unsigned short  st_gid;
-unsigned short  st_rdev;
+abi_ushort  st_uid;
+abi_ushort  st_gid;
+abi_ushort  st_rdev;
 abi_longst_size;
 abi_longtarget_st_atime;
 abi_ulong   target_st_atime_nsec;
@@ -1499,7 +1499,7 @@ struct target_stat {
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64 {
 unsigned char   __pad0[6];
-unsigned short  st_dev;
+abi_ushort  st_dev;
 
 abi_ullong  st_ino;
 
@@ -1510,7 +1510,7 @@ struct target_stat64 {
 abi_uintst_gid;
 
 unsigned char   __pad2[6];
-unsigned short  st_rdev;
+abi_ushort  st_rdev;
 
 unsigned char   __pad3[8];
 
@@ -1544,7 +1544,7 @@ struct target_stat {
 abi_uint  st_mode;
 #else
 abi_uint  st_mode;
-unsigned short st_nlink;
+abi_ushort st_nlink;
 #endif
 abi_uint   st_uid;
 abi_uint   st_gid;
@@ -1598,7 +1598,7 @@ struct target_stat {
 abi_ulong st_dev;
 abi_ulong st_ino;
 abi_uint st_mode;
-unsigned short st_nlink;
+abi_ushort st_nlink;
 abi_uint st_uid;
 abi_uint st_gid;
 abi_ulong  st_rdev;
@@ -1647,15 +1647,15 @@ struct QEMU_PACKED target_stat64 {
 #elif defined(TARGET_M68K)
 
 struct target_stat {
-unsigned short st_dev;
-unsigned short __pad1;
-abi_ulong st_ino;
-unsigned short st_mode;
-unsigned short st_nlink;
-unsigned short st_uid;
-unsigned short st_gid;
-unsigned short st_rdev;
-unsigned short __pad2;
+abi_u

[PULL 00/47] tcg + linux-user patch queue

2023-07-15 Thread Richard Henderson
The following changes since commit 4633c1e2c576fbabfe5c8c93f4b842504b69c096:

  Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
(2023-07-14 16:39:46 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230715

for you to fetch changes up to 76f9d6ad19494290eb2f00d33c6a582ce3447991:

  tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128 (2023-07-15 08:02:49 
+0100)


tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128
accel/tcg: Introduce page_check_range_empty
accel/tcg: Introduce page_find_range_empty
accel/tcg: Accept more page flags in page_check_range
accel/tcg: Return bool from page_check_range
accel/tcg: Always lock pages before translation
linux-user: Use abi_* types for target structures in syscall_defs.h
linux-user: Fix abi_llong alignment for microblaze and nios2
linux-user: Fix do_shmat type errors
linux-user: Implement execve without execveat
linux-user: Make sure initial brk is aligned
linux-user: Use a mask with strace flags
linux-user: Implement MAP_FIXED_NOREPLACE
linux-user: Widen target_mmap offset argument to off_t
linux-user: Use page_find_range_empty for mmap_find_vma_reserved
linux-user: Use 'last' instead of 'end' in target_mmap and subroutines
linux-user: Remove can_passthrough_madvise
linux-user: Simplify target_madvise
linux-user: Drop uint and ulong types
linux-user/arm: Do not allocate a commpage at all for M-profile CPUs
bsd-user: Use page_check_range_empty for MAP_EXCL
bsd-user: Use page_find_range_empty for mmap_find_vma_reserved


Andreas Schwab (1):
  linux-user: Make sure initial brk(0) is page-aligned

Juan Quintela (1):
  linux-user: Drop uint and ulong

Philippe Mathieu-Daudé (1):
  linux-user/arm: Do not allocate a commpage at all for M-profile CPUs

Pierrick Bouvier (1):
  linux-user/syscall: Implement execve without execveat

Richard Henderson (43):
  linux-user: Reformat syscall_defs.h
  linux-user: Remove #if 0 block in syscall_defs.h
  linux-user: Use abi_uint not uint32_t in syscall_defs.h
  linux-user: Use abi_int not int32_t in syscall_defs.h
  linux-user: Use abi_ullong not uint64_t in syscall_defs.h
  linux-user: Use abi_llong not int64_t in syscall_defs.h
  linux-user: Use abi_uint not unsigned int in syscall_defs.h
  linux-user: Use abi_ullong not unsigned long long in syscall_defs.h
  linux-user: Use abi_llong not long long in syscall_defs.h
  linux-user: Use abi_int not int in syscall_defs.h
  linux-user: Use abi_ushort not unsigned short in syscall_defs.h
  linux-user: Use abi_short not short in syscall_defs.h
  linux-user: Use abi_uint not unsigned in syscall_defs.h
  include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for microblaze
  include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2
  linux-user: Fix do_shmat type errors
  accel/tcg: Split out cpu_exec_longjmp_cleanup
  tcg: Fix info_in_idx increment in layout_arg_by_ref
  linux-user: Fix formatting of mmap.c
  linux-user/strace: Expand struct flags to hold a mask
  linux-user: Split TARGET_MAP_* out of syscall_defs.h
  linux-user: Split TARGET_PROT_* out of syscall_defs.h
  linux-user: Populate more bits in mmap_flags_tbl
  accel/tcg: Introduce page_check_range_empty
  bsd-user: Use page_check_range_empty for MAP_EXCL
  linux-user: Implement MAP_FIXED_NOREPLACE
  linux-user: Split out target_to_host_prot
  linux-user: Widen target_mmap offset argument to off_t
  linux-user: Rewrite target_mprotect
  linux-user: Rewrite mmap_frag
  accel/tcg: Introduce page_find_range_empty
  bsd-user: Use page_find_range_empty for mmap_find_vma_reserved
  linux-user: Use page_find_range_empty for mmap_find_vma_reserved
  linux-user: Use 'last' instead of 'end' in target_mmap
  linux-user: Rewrite mmap_reserve
  linux-user: Rename mmap_reserve to mmap_reserve_or_unmap
  linux-user: Simplify target_munmap
  accel/tcg: Accept more page flags in page_check_range
  accel/tcg: Return bool from page_check_range
  linux-user: Remove can_passthrough_madvise
  linux-user: Simplify target_madvise
  accel/tcg: Always lock pages before translation
  tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128

 accel/tcg/internal.h   |   30 +-
 accel/tcg/tcg-runtime.h|2 +-
 bsd-user/qemu.h|2 +-
 include/exec/cpu-all.h |   40 +-
 include/exec/helper-proto-common.h |2 +
 include/exec/user/abitypes.h   |5 +-
 linux-user/aarch64/target_mman.h   |8 +
 linux-user/alpha/target_mman.h |   13 +
 linux-user/generic/target_mman.h   |   58 +
 linux-user/hppa/target_mman.h  |   10 +
 linux-user/mips/target_mman.h  |

[PULL 39/47] linux-user: Simplify target_munmap

2023-07-15 Thread Richard Henderson
All of the guest to host page adjustment is handled by
mmap_reserve_or_unmap; there is no need to duplicate that.
There are no failure modes for munmap after alignment and
guest address range have been validated.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-23-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 47 ---
 1 file changed, 4 insertions(+), 43 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 22c2869be8..c0946322fb 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -789,9 +789,6 @@ static void mmap_reserve_or_unmap(abi_ulong start, 
abi_ulong len)
 
 int target_munmap(abi_ulong start, abi_ulong len)
 {
-abi_ulong end, real_start, real_end, addr;
-int prot, ret;
-
 trace_target_munmap(start, len);
 
 if (start & ~TARGET_PAGE_MASK) {
@@ -803,47 +800,11 @@ int target_munmap(abi_ulong start, abi_ulong len)
 }
 
 mmap_lock();
-end = start + len;
-real_start = start & qemu_host_page_mask;
-real_end = HOST_PAGE_ALIGN(end);
-
-if (start > real_start) {
-/* handle host page containing start */
-prot = 0;
-for (addr = real_start; addr < start; addr += TARGET_PAGE_SIZE) {
-prot |= page_get_flags(addr);
-}
-if (real_end == real_start + qemu_host_page_size) {
-for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) {
-prot |= page_get_flags(addr);
-}
-end = real_end;
-}
-if (prot != 0) {
-real_start += qemu_host_page_size;
-}
-}
-if (end < real_end) {
-prot = 0;
-for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) {
-prot |= page_get_flags(addr);
-}
-if (prot != 0) {
-real_end -= qemu_host_page_size;
-}
-}
-
-ret = 0;
-/* unmap what we can */
-if (real_start < real_end) {
-mmap_reserve_or_unmap(real_start, real_end - real_start);
-}
-
-if (ret == 0) {
-page_set_flags(start, start + len - 1, 0);
-}
+mmap_reserve_or_unmap(start, len);
+page_set_flags(start, start + len - 1, 0);
 mmap_unlock();
-return ret;
+
+return 0;
 }
 
 abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
-- 
2.34.1




[PULL 04/47] linux-user: Use abi_int not int32_t in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 60 +++
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 414d88a9ec..caaa895bec 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -501,7 +501,7 @@ int do_sigaction(int sig, const struct target_sigaction 
*act,
 #endif
 
 #if defined(TARGET_ALPHA)
-typedef int32_t target_old_sa_flags;
+typedef abi_int target_old_sa_flags;
 #else
 typedef abi_ulong target_old_sa_flags;
 #endif
@@ -1631,7 +1631,7 @@ struct QEMU_PACKED target_stat64 {
 uint64_t __pad1;
 
 int64_t  st_size;
-int32_t  st_blksize;
+abi_int  st_blksize;
 abi_uint __pad2;
 int64_t st_blocks;  /* Number 512-byte blocks allocated. */
 
@@ -2192,20 +2192,20 @@ typedef struct {
 #ifdef TARGET_MIPS
 #ifdef TARGET_ABI_MIPSN32
 struct target_statfs {
-int32_t f_type;
-int32_t f_bsize;
-int32_t f_frsize;   /* Fragment size - unsupported */
-int32_t f_blocks;
-int32_t f_bfree;
-int32_t f_files;
-int32_t f_ffree;
-int32_t f_bavail;
+abi_int f_type;
+abi_int f_bsize;
+abi_int f_frsize;   /* Fragment size - unsupported */
+abi_int f_blocks;
+abi_int f_bfree;
+abi_int f_files;
+abi_int f_ffree;
+abi_int f_bavail;
 
 /* Linux specials */
 target_fsid_t   f_fsid;
-int32_t f_namelen;
-int32_t f_flags;
-int32_t f_spare[5];
+abi_int f_namelen;
+abi_int f_flags;
+abi_int f_spare[5];
 };
 #else
 struct target_statfs {
@@ -2276,34 +2276,34 @@ struct target_statfs64 {
 };
 #elif defined(TARGET_S390X)
 struct target_statfs {
-int32_t  f_type;
-int32_t  f_bsize;
+abi_int  f_type;
+abi_int  f_bsize;
 abi_long f_blocks;
 abi_long f_bfree;
 abi_long f_bavail;
 abi_long f_files;
 abi_long f_ffree;
 kernel_fsid_t f_fsid;
-int32_t  f_namelen;
-int32_t  f_frsize;
-int32_t  f_flags;
-int32_t  f_spare[4];
+abi_int  f_namelen;
+abi_int  f_frsize;
+abi_int  f_flags;
+abi_int  f_spare[4];
 
 };
 
 struct target_statfs64 {
-int32_t  f_type;
-int32_t  f_bsize;
+abi_int  f_type;
+abi_int  f_bsize;
 abi_long f_blocks;
 abi_long f_bfree;
 abi_long f_bavail;
 abi_long f_files;
 abi_long f_ffree;
 kernel_fsid_t f_fsid;
-int32_t  f_namelen;
-int32_t  f_frsize;
-int32_t  f_flags;
-int32_t  f_spare[4];
+abi_int  f_namelen;
+abi_int  f_frsize;
+abi_int  f_flags;
+abi_int  f_spare[4];
 };
 #else
 struct target_statfs {
@@ -2718,21 +2718,21 @@ struct target_ucred {
 abi_uint gid;
 };
 
-typedef int32_t target_timer_t;
+typedef abi_int target_timer_t;
 
 #define TARGET_SIGEV_MAX_SIZE 64
 
 /* This is architecture-specific but most architectures use the default */
 #ifdef TARGET_MIPS
-#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(int32_t) * 2 + sizeof(abi_long))
+#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(abi_int) * 2 + sizeof(abi_long))
 #else
-#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(int32_t) * 2 \
+#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(abi_int) * 2 \
 + sizeof(target_sigval_t))
 #endif
 
 #define TARGET_SIGEV_PAD_SIZE ((TARGET_SIGEV_MAX_SIZE   \
 - TARGET_SIGEV_PREAMBLE_SIZE)   \
-   / sizeof(int32_t))
+   / sizeof(abi_int))
 
 struct target_sigevent {
 target_sigval_t sigev_value;
@@ -2792,7 +2792,7 @@ struct target_user_cap_data {
 struct target_statx_timestamp {
 int64_t tv_sec;
 abi_uint tv_nsec;
-int32_t __reserved;
+abi_int __reserved;
 };
 
 struct target_statx {
-- 
2.34.1




[PULL 18/47] accel/tcg: Split out cpu_exec_longjmp_cleanup

2023-07-15 Thread Richard Henderson
Share the setjmp cleanup between cpu_exec_step_atomic
and cpu_exec_setjmp.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard W.M. Jones 
Signed-off-by: Richard Henderson 
---
 accel/tcg/cpu-exec.c | 43 +++
 1 file changed, 19 insertions(+), 24 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index ba1890a373..31aa320513 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -526,6 +526,23 @@ static void cpu_exec_exit(CPUState *cpu)
 }
 }
 
+static void cpu_exec_longjmp_cleanup(CPUState *cpu)
+{
+/* Non-buggy compilers preserve this; assert the correct value. */
+g_assert(cpu == current_cpu);
+
+#ifdef CONFIG_USER_ONLY
+clear_helper_retaddr();
+if (have_mmap_lock()) {
+mmap_unlock();
+}
+#endif
+if (qemu_mutex_iothread_locked()) {
+qemu_mutex_unlock_iothread();
+}
+assert_no_pages_locked();
+}
+
 void cpu_exec_step_atomic(CPUState *cpu)
 {
 CPUArchState *env = cpu->env_ptr;
@@ -568,16 +585,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 cpu_tb_exec(cpu, tb, &tb_exit);
 cpu_exec_exit(cpu);
 } else {
-#ifdef CONFIG_USER_ONLY
-clear_helper_retaddr();
-if (have_mmap_lock()) {
-mmap_unlock();
-}
-#endif
-if (qemu_mutex_iothread_locked()) {
-qemu_mutex_unlock_iothread();
-}
-assert_no_pages_locked();
+cpu_exec_longjmp_cleanup(cpu);
 }
 
 /*
@@ -1023,20 +1031,7 @@ static int cpu_exec_setjmp(CPUState *cpu, SyncClocks *sc)
 {
 /* Prepare setjmp context for exception handling. */
 if (unlikely(sigsetjmp(cpu->jmp_env, 0) != 0)) {
-/* Non-buggy compilers preserve this; assert the correct value. */
-g_assert(cpu == current_cpu);
-
-#ifdef CONFIG_USER_ONLY
-clear_helper_retaddr();
-if (have_mmap_lock()) {
-mmap_unlock();
-}
-#endif
-if (qemu_mutex_iothread_locked()) {
-qemu_mutex_unlock_iothread();
-}
-
-assert_no_pages_locked();
+cpu_exec_longjmp_cleanup(cpu);
 }
 
 return cpu_exec_loop(cpu, sc);
-- 
2.34.1




[PULL 02/47] linux-user: Remove #if 0 block in syscall_defs.h

2023-07-15 Thread Richard Henderson
These definitions are in sparc/signal.c.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 24 
 1 file changed, 24 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index e80d54780b..a4e4df8d3e 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -547,30 +547,6 @@ typedef union target_sigval {
 int sival_int;
 abi_ulong sival_ptr;
 } target_sigval_t;
-#if 0
-#if defined (TARGET_SPARC)
-typedef struct {
-struct {
-abi_ulong psr;
-abi_ulong pc;
-abi_ulong npc;
-abi_ulong y;
-abi_ulong u_regs[16]; /* globals and ins */
-}   si_regs;
-int si_mask;
-} __siginfo_t;
-
-typedef struct {
-unsigned   long si_float_regs [32];
-unsigned   long si_fsr;
-unsigned   long si_fpqdepth;
-struct {
-unsigned long *insn_addr;
-unsigned long insn;
-} si_fpqueue [16];
-} __siginfo_fpu_t;
-#endif
-#endif
 
 #define TARGET_SI_MAX_SIZE  128
 
-- 
2.34.1




[PULL 30/47] linux-user: Widen target_mmap offset argument to off_t

2023-07-15 Thread Richard Henderson
We build with _FILE_OFFSET_BITS=64, so off_t = off64_t = uint64_t.
With an extra cast, this fixes emulation of mmap2, which could
overflow the computation of the full value of offset.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-14-richard.hender...@linaro.org>
---
 linux-user/user-mmap.h |  2 +-
 linux-user/mmap.c  | 14 --
 linux-user/syscall.c   |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h
index 480ce1c114..3fc986f92f 100644
--- a/linux-user/user-mmap.h
+++ b/linux-user/user-mmap.h
@@ -20,7 +20,7 @@
 
 int target_mprotect(abi_ulong start, abi_ulong len, int prot);
 abi_long target_mmap(abi_ulong start, abi_ulong len, int prot,
- int flags, int fd, abi_ulong offset);
+ int flags, int fd, off_t offset);
 int target_munmap(abi_ulong start, abi_ulong len);
 abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size,
abi_ulong new_size, unsigned long flags,
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 12b1308a83..b2c2d85857 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -196,7 +196,7 @@ error:
 /* map an incomplete host page */
 static int mmap_frag(abi_ulong real_start,
  abi_ulong start, abi_ulong end,
- int prot, int flags, int fd, abi_ulong offset)
+ int prot, int flags, int fd, off_t offset)
 {
 abi_ulong real_end, addr;
 void *host_start;
@@ -463,11 +463,12 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 
 /* NOTE: all the constants are the HOST ones */
 abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot,
- int flags, int fd, abi_ulong offset)
+ int flags, int fd, off_t offset)
 {
-abi_ulong ret, end, real_start, real_end, retaddr, host_offset, host_len,
+abi_ulong ret, end, real_start, real_end, retaddr, host_len,
   passthrough_start = -1, passthrough_end = -1;
 int page_flags;
+off_t host_offset;
 
 mmap_lock();
 trace_target_mmap(start, len, target_prot, flags, fd, offset);
@@ -559,7 +560,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 }
 
 if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
-unsigned long host_start;
+uintptr_t host_start;
 int host_prot;
 void *p;
 
@@ -578,7 +579,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 goto fail;
 }
 /* update start so that it points to the file position at 'offset' */
-host_start = (unsigned long)p;
+host_start = (uintptr_t)p;
 if (!(flags & MAP_ANONYMOUS)) {
 p = mmap(g2h_untagged(start), len, host_prot,
  flags | MAP_FIXED, fd, host_offset);
@@ -681,7 +682,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 /* map the middle (easier) */
 if (real_start < real_end) {
 void *p;
-unsigned long offset1;
+off_t offset1;
+
 if (flags & MAP_ANONYMOUS) {
 offset1 = 0;
 } else {
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3a89f6b408..a80d33ecf2 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -10591,7 +10591,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 #endif
 ret = target_mmap(arg1, arg2, arg3,
   target_to_host_bitmask(arg4, mmap_flags_tbl),
-  arg5, arg6 << MMAP_SHIFT);
+  arg5, (off_t)(abi_ulong)arg6 << MMAP_SHIFT);
 return get_errno(ret);
 #endif
 case TARGET_NR_munmap:
-- 
2.34.1




[PULL 10/47] linux-user: Use abi_int not int in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 216 +++---
 1 file changed, 108 insertions(+), 108 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index e4fcbd16d2..442a8aefe3 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -361,7 +361,7 @@ struct target_iovec {
 
 struct target_msghdr {
 abi_long msg_name;   /* Socket name */
-int  msg_namelen;/* Length of name  */
+abi_int  msg_namelen;/* Length of name  */
 abi_long msg_iov;/* Data blocks */
 abi_long msg_iovlen; /* Number of blocks*/
 abi_long msg_control;/* Per protocol magic (eg BSD file descriptor 
passing) */
@@ -371,8 +371,8 @@ struct target_msghdr {
 
 struct target_cmsghdr {
 abi_long cmsg_len;
-int  cmsg_level;
-int  cmsg_type;
+abi_int  cmsg_level;
+abi_int  cmsg_type;
 };
 
 #define TARGET_CMSG_DATA(cmsg) ((unsigned char *) ((struct target_cmsghdr *) 
(cmsg) + 1))
@@ -426,7 +426,7 @@ struct  target_rusage {
 };
 
 typedef struct {
-int val[2];
+abi_int val[2];
 } kernel_fsid_t;
 
 struct target_dirent {
@@ -544,7 +544,7 @@ struct target_sigaction {
 #endif
 
 typedef union target_sigval {
-int sival_int;
+abi_int sival_int;
 abi_ulong sival_ptr;
 } target_sigval_t;
 
@@ -575,17 +575,17 @@ typedef union target_sigval {
 
 typedef struct target_siginfo {
 #ifdef TARGET_MIPS
-int si_signo;
-int si_code;
-int si_errno;
+abi_int si_signo;
+abi_int si_code;
+abi_int si_errno;
 #else
-int si_signo;
-int si_errno;
-int si_code;
+abi_int si_signo;
+abi_int si_errno;
+abi_int si_code;
 #endif
 
 union {
-int _pad[TARGET_SI_PAD_SIZE];
+abi_int _pad[TARGET_SI_PAD_SIZE];
 
 /* kill() */
 struct {
@@ -610,7 +610,7 @@ typedef struct target_siginfo {
 struct {
 pid_t _pid; /* which child */
 uid_t _uid; /* sender's uid */
-int _status;/* exit code */
+abi_int _status;/* exit code */
 target_clock_t _utime;
 target_clock_t _stime;
 } _sigchld;
@@ -622,8 +622,8 @@ typedef struct target_siginfo {
 
 /* SIGPOLL */
 struct {
-int _band;  /* POLL_IN, POLL_OUT, POLL_MSG */
-int _fd;
+abi_int _band;   /* POLL_IN, POLL_OUT, POLL_MSG */
+abi_int _fd;
 } _sigpoll;
 } _sifields;
 } target_siginfo_t;
@@ -701,7 +701,7 @@ typedef struct target_siginfo {
 #include "target_resource.h"
 
 struct target_pollfd {
-int fd;   /* file descriptor */
+abi_int fd;   /* file descriptor */
 short events; /* requested events */
 short revents;/* returned events */
 };
@@ -722,12 +722,12 @@ struct target_pollfd {
 #define TARGET_KDSIGACCEPT 0x4B4E
 
 struct target_rtc_pll_info {
-int pll_ctrl;
-int pll_value;
-int pll_max;
-int pll_min;
-int pll_posmult;
-int pll_negmult;
+abi_int pll_ctrl;
+abi_int pll_value;
+abi_int pll_max;
+abi_int pll_min;
+abi_int pll_posmult;
+abi_int pll_negmult;
 abi_long pll_clock;
 };
 
@@ -754,14 +754,14 @@ struct target_rtc_pll_info {
struct target_rtc_pll_info)
 #define TARGET_RTC_PLL_SET  TARGET_IOW('p', 0x12,   \
struct target_rtc_pll_info)
-#define TARGET_RTC_VL_READ  TARGET_IOR('p', 0x13, int)
+#define TARGET_RTC_VL_READ  TARGET_IOR('p', 0x13, abi_int)
 #define TARGET_RTC_VL_CLR   TARGET_IO('p', 0x14)
 
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) || \
 defined(TARGET_XTENSA)
-#define TARGET_FIOGETOWN   TARGET_IOR('f', 123, int)
-#define TARGET_FIOSETOWN   TARGET_IOW('f', 124, int)
-#define TARGET_SIOCATMARK  TARGET_IOR('s', 7, int)
+#define TARGET_FIOGETOWN   TARGET_IOR('f', 123, abi_int)
+#define TARGET_FIOSETOWN   TARGET_IOW('f', 124, abi_int)
+#define TARGET_SIOCATMARK  TARGET_IOR('s', 7, abi_int)
 #define TARGET_SIOCSPGRP   TARGET_IOW('s', 8, pid_t)
 #define TARGET_SIOCGPGRP   TARGET_IOR('s', 9, pid_t)
 #else
@@ -851,40 +851,40 @@ struct target_rtc_pll_info {
 
 /* From  */
 
-#define TARGET_TUNSETDEBUGTARGET_IOW('T', 201, int)
-#define TARGET_TUNSETIFF  TARGET_IOW('T', 202, int)
-#define TARGET_TUNSETPERSIST  TARGET_IOW('T', 203, int)
-#define TARGET_TUNSETOWNERTARGET_IOW('T', 204, int)
-#define TARGET_TUNSETLINK TARGET_IOW('T', 205, int)
-#define TARGET_TUNSETGROUPTARGET_IOW('T', 206, int)
+#define TARGET_TUNSETDEBUGTARGET_IOW('T', 201, abi_int)
+#defin

[PULL 16/47] linux-user/syscall: Implement execve without execveat

2023-07-15 Thread Richard Henderson
From: Pierrick Bouvier 

Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.

However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.

With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS

qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].

One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.

The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.

This patch was tested with example presented in this commit message.

[1] http://ftp.us.debian.org/debian/pool/main/q/qemu/
[1] https://www.linaro.org/blog/emulate-windows-on-arm/

Signed-off-by: Pierrick Bouvier 
Reviewed-by: Richard Henderson 
Reviewed-by: Michael Tokarev 
Message-Id: <20230705121023.973284-1-pierrick.bouv...@linaro.org>
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 420bab7c68..c15d9ad743 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -659,6 +659,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, 
options, \
 #endif
 safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \
   int, options, struct rusage *, rusage)
+safe_syscall3(int, execve, const char *, filename, char **, argv, char **, 
envp)
 safe_syscall5(int, execveat, int, dirfd, const char *, filename,
   char **, argv, char **, envp, int, flags)
 #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \
@@ -8629,9 +8630,9 @@ ssize_t do_guest_readlink(const char *pathname, char 
*buf, size_t bufsiz)
 return ret;
 }
 
-static int do_execveat(CPUArchState *cpu_env, int dirfd,
-   abi_long pathname, abi_long guest_argp,
-   abi_long guest_envp, int flags)
+static int do_execv(CPUArchState *cpu_env, int dirfd,
+abi_long pathname, abi_long guest_argp,
+abi_long guest_envp, int flags, bool is_execveat)
 {
 int ret;
 char **argp, **envp;
@@ -8710,11 +8711,14 @@ static int do_execveat(CPUArchState *cpu_env, int dirfd,
 goto execve_efault;
 }
 
+const char *exe = p;
 if (is_proc_myself(p, "exe")) {
-ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags));
-} else {
-ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags));
+exe = exec_path;
 }
+ret = is_execveat
+? safe_execveat(dirfd, exe, argp, envp, flags)
+: safe_execve(exe, argp, envp);
+ret = get_errno(ret);
 
 unlock_user(p, pathname, 0);
 
@@ -9406,9 +9410,9 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int 
num, abi_long arg1,
 return ret;
 #endif
 case TARGET_NR_execveat:
-return do_execveat(cpu_env, arg1, arg2, arg3, arg4, arg5);
+return do_execv(cpu_env, arg1, arg2, arg3, arg4, arg5, true);
 case TARGET_NR_execve:
-return do_execveat(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0);
+return do_execv(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0, false);
 case TARGET_NR_chdir:
 if (!(p = lock_user_string(arg1)))
 return -TARGET_EFAULT;
-- 
2.34.1




[PULL 23/47] linux-user: Split TARGET_MAP_* out of syscall_defs.h

2023-07-15 Thread Richard Henderson
Move the values into the per-target target_mman.h headers

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-7-richard.hender...@linaro.org>
---
 linux-user/alpha/target_mman.h   | 13 +
 linux-user/generic/target_mman.h | 54 
 linux-user/hppa/target_mman.h| 10 
 linux-user/mips/target_mman.h| 16 ++
 linux-user/mips64/target_mman.h  |  2 +-
 linux-user/ppc/target_mman.h |  8 +++
 linux-user/sparc/target_mman.h   |  9 
 linux-user/syscall_defs.h| 85 +---
 linux-user/xtensa/target_mman.h  | 16 ++
 9 files changed, 128 insertions(+), 85 deletions(-)

diff --git a/linux-user/alpha/target_mman.h b/linux-user/alpha/target_mman.h
index 051544f5ab..6bb03e7336 100644
--- a/linux-user/alpha/target_mman.h
+++ b/linux-user/alpha/target_mman.h
@@ -1,6 +1,19 @@
 #ifndef ALPHA_TARGET_MMAN_H
 #define ALPHA_TARGET_MMAN_H
 
+#define TARGET_MAP_ANONYMOUS0x10
+#define TARGET_MAP_FIXED0x100
+#define TARGET_MAP_GROWSDOWN0x01000
+#define TARGET_MAP_DENYWRITE0x02000
+#define TARGET_MAP_EXECUTABLE   0x04000
+#define TARGET_MAP_LOCKED   0x08000
+#define TARGET_MAP_NORESERVE0x1
+#define TARGET_MAP_POPULATE 0x2
+#define TARGET_MAP_NONBLOCK 0x4
+#define TARGET_MAP_STACK0x8
+#define TARGET_MAP_HUGETLB  0x10
+#define TARGET_MAP_FIXED_NOREPLACE  0x20
+
 #define TARGET_MADV_DONTNEED 6
 
 #define TARGET_MS_ASYNC 1
diff --git a/linux-user/generic/target_mman.h b/linux-user/generic/target_mman.h
index 32bf1a52d0..7b888fb7f8 100644
--- a/linux-user/generic/target_mman.h
+++ b/linux-user/generic/target_mman.h
@@ -1,6 +1,60 @@
 #ifndef LINUX_USER_TARGET_MMAN_H
 #define LINUX_USER_TARGET_MMAN_H
 
+/* These are defined in linux/mmap.h */
+#define TARGET_MAP_SHARED   0x01
+#define TARGET_MAP_PRIVATE  0x02
+#define TARGET_MAP_SHARED_VALIDATE  0x03
+
+/* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */
+#ifndef TARGET_MAP_GROWSDOWN
+#define TARGET_MAP_GROWSDOWN0x0100
+#endif
+#ifndef TARGET_MAP_DENYWRITE
+#define TARGET_MAP_DENYWRITE0x0800
+#endif
+#ifndef TARGET_MAP_EXECUTABLE
+#define TARGET_MAP_EXECUTABLE   0x1000
+#endif
+#ifndef TARGET_MAP_LOCKED
+#define TARGET_MAP_LOCKED   0x2000
+#endif
+#ifndef TARGET_MAP_NORESERVE
+#define TARGET_MAP_NORESERVE0x4000
+#endif
+
+/* Other MAP flags are defined in asm-generic/mman-common.h */
+#ifndef TARGET_MAP_TYPE
+#define TARGET_MAP_TYPE 0x0f
+#endif
+#ifndef TARGET_MAP_FIXED
+#define TARGET_MAP_FIXED0x10
+#endif
+#ifndef TARGET_MAP_ANONYMOUS
+#define TARGET_MAP_ANONYMOUS0x20
+#endif
+#ifndef TARGET_MAP_POPULATE
+#define TARGET_MAP_POPULATE 0x008000
+#endif
+#ifndef TARGET_MAP_NONBLOCK
+#define TARGET_MAP_NONBLOCK 0x01
+#endif
+#ifndef TARGET_MAP_STACK
+#define TARGET_MAP_STACK0x02
+#endif
+#ifndef TARGET_MAP_HUGETLB
+#define TARGET_MAP_HUGETLB  0x04
+#endif
+#ifndef TARGET_MAP_SYNC
+#define TARGET_MAP_SYNC 0x08
+#endif
+#ifndef TARGET_MAP_FIXED_NOREPLACE
+#define TARGET_MAP_FIXED_NOREPLACE  0x10
+#endif
+#ifndef TARGET_MAP_UNINITIALIZED
+#define TARGET_MAP_UNINITIALIZED0x400
+#endif
+
 #ifndef TARGET_MADV_NORMAL
 #define TARGET_MADV_NORMAL 0
 #endif
diff --git a/linux-user/hppa/target_mman.h b/linux-user/hppa/target_mman.h
index f9b6b97032..97f87d042a 100644
--- a/linux-user/hppa/target_mman.h
+++ b/linux-user/hppa/target_mman.h
@@ -1,6 +1,16 @@
 #ifndef HPPA_TARGET_MMAN_H
 #define HPPA_TARGET_MMAN_H
 
+#define TARGET_MAP_TYPE 0x2b
+#define TARGET_MAP_FIXED0x04
+#define TARGET_MAP_ANONYMOUS0x10
+#define TARGET_MAP_GROWSDOWN0x8000
+#define TARGET_MAP_POPULATE 0x1
+#define TARGET_MAP_NONBLOCK 0x2
+#define TARGET_MAP_STACK0x4
+#define TARGET_MAP_HUGETLB  0x8
+#define TARGET_MAP_UNINITIALIZED0
+
 #define TARGET_MADV_MERGEABLE 65
 #define TARGET_MADV_UNMERGEABLE 66
 #define TARGET_MADV_HUGEPAGE 67
diff --git a/linux-user/mips/target_mman.h b/linux-user/mips/target_mman.h
index e7ba6070fe..cd566c24b6 100644
--- a/linux-user/mips/target_mman.h
+++ b/linux-user/mips/target_mman.h
@@ -1 +1,17 @@
+#ifndef MIPS_TARGET_MMAN_H
+#define MIPS_TARGET_MMAN_H
+
+#define TARGET_MAP_NORESERVE0x0400
+#define TARGET_MAP_ANONYMOUS0x0800
+#define TARGET_MAP_GROWSDOWN0x1000
+#define TARGET_MAP_DENYWRITE0x2000
+#define TARGET_MAP_EXECUTABLE   0x4000
+#define TARGET_MAP_LOCKED   0x8000
+#define TARGET_MAP_POPULATE 0x1
+#define TARGET_MAP_NONBLOCK 0x2
+#define TARGET_MAP_STACK   

[PULL 01/47] linux-user: Reformat syscall_defs.h

2023-07-15 Thread Richard Henderson
Untabify and re-indent.
We had a mix of 2, 3, 4, and 8 space indentation.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 1948 ++---
 1 file changed, 974 insertions(+), 974 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index cc37054cb5..e80d54780b 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -33,18 +33,18 @@
 #define TARGET_SYS_SENDMMSG 20/* sendmmsg()*/
 
 #define IPCOP_CALL(VERSION, OP) ((VERSION) << 16 | (OP))
-#define IPCOP_semop1
-#define IPCOP_semget   2
-#define IPCOP_semctl   3
-#define IPCOP_semtimedop   4
-#define IPCOP_msgsnd   11
-#define IPCOP_msgrcv   12
-#define IPCOP_msgget   13
-#define IPCOP_msgctl   14
-#define IPCOP_shmat21
-#define IPCOP_shmdt22
-#define IPCOP_shmget   23
-#define IPCOP_shmctl   24
+#define IPCOP_semop 1
+#define IPCOP_semget2
+#define IPCOP_semctl3
+#define IPCOP_semtimedop4
+#define IPCOP_msgsnd11
+#define IPCOP_msgrcv12
+#define IPCOP_msgget13
+#define IPCOP_msgctl14
+#define IPCOP_shmat 21
+#define IPCOP_shmdt 22
+#define IPCOP_shmget23
+#define IPCOP_shmctl24
 
 #define TARGET_SEMOPM 500
 
@@ -56,42 +56,42 @@
  * this explicit here.  Please be sure to use the decoding macros
  * below from now on.
  */
-#define TARGET_IOC_NRBITS  8
-#define TARGET_IOC_TYPEBITS8
+#define TARGET_IOC_NRBITS   8
+#define TARGET_IOC_TYPEBITS 8
 
-#if (defined(TARGET_I386) && defined(TARGET_ABI32)) \
-|| (defined(TARGET_ARM) && defined(TARGET_ABI32)) \
-|| (defined(TARGET_SPARC) && defined(TARGET_ABI32)) \
+#if (defined(TARGET_I386) && defined(TARGET_ABI32)) \
+|| (defined(TARGET_ARM) && defined(TARGET_ABI32))   \
+|| (defined(TARGET_SPARC) && defined(TARGET_ABI32)) \
 || defined(TARGET_M68K) || defined(TARGET_SH4) || defined(TARGET_CRIS)
-/* 16 bit uid wrappers emulation */
+/* 16 bit uid wrappers emulation */
 #define USE_UID16
 #define target_id uint16_t
 #else
 #define target_id uint32_t
 #endif
 
-#if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4) \
-|| defined(TARGET_M68K) || defined(TARGET_CRIS) \
-|| defined(TARGET_S390X) || defined(TARGET_OPENRISC) \
-|| defined(TARGET_NIOS2) || defined(TARGET_RISCV) \
+#if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4)  \
+|| defined(TARGET_M68K) || defined(TARGET_CRIS) \
+|| defined(TARGET_S390X) || defined(TARGET_OPENRISC)\
+|| defined(TARGET_NIOS2) || defined(TARGET_RISCV)   \
 || defined(TARGET_XTENSA) || defined(TARGET_LOONGARCH64)
 
-#define TARGET_IOC_SIZEBITS14
-#define TARGET_IOC_DIRBITS 2
+#define TARGET_IOC_SIZEBITS 14
+#define TARGET_IOC_DIRBITS  2
 
-#define TARGET_IOC_NONE  0U
+#define TARGET_IOC_NONE   0U
 #define TARGET_IOC_WRITE  1U
-#define TARGET_IOC_READ  2U
+#define TARGET_IOC_READ   2U
 
-#elif defined(TARGET_PPC) || defined(TARGET_ALPHA) || \
-  defined(TARGET_SPARC) || defined(TARGET_MICROBLAZE) || \
-  defined(TARGET_MIPS)
+#elif defined(TARGET_PPC) || defined(TARGET_ALPHA) ||   \
+defined(TARGET_SPARC) || defined(TARGET_MICROBLAZE) ||  \
+defined(TARGET_MIPS)
 
-#define TARGET_IOC_SIZEBITS13
-#define TARGET_IOC_DIRBITS 3
+#define TARGET_IOC_SIZEBITS 13
+#define TARGET_IOC_DIRBITS  3
 
-#define TARGET_IOC_NONE  1U
-#define TARGET_IOC_READ  2U
+#define TARGET_IOC_NONE   1U
+#define TARGET_IOC_READ   2U
 #define TARGET_IOC_WRITE  4U
 
 #elif defined(TARGET_HPPA)
@@ -115,32 +115,32 @@
 #error unsupported CPU
 #endif
 
-#define TARGET_IOC_NRMASK  ((1 << TARGET_IOC_NRBITS)-1)
-#define TARGET_IOC_TYPEMASK((1 << TARGET_IOC_TYPEBITS)-1)
-#define TARGET_IOC_SIZEMASK((1 << TARGET_IOC_SIZEBITS)-1)
-#define TARGET_IOC_DIRMASK ((1 << TARGET_IOC_DIRBITS)-1)
+#define TARGET_IOC_NRMASK   ((1 << TARGET_IOC_NRBITS)-1)
+#define TARGET_IOC_TYPEMASK ((1 << TARGET_IOC_TYPEBITS)-1)
+#define TARGET_IOC_SIZEMASK ((1 << TARGET_IOC_SIZEBITS)-1)
+#define TARGET_IOC_DIRMASK  ((1 << TARGET_IOC_DIRBITS)-1)
 
-#define TARGET_IOC_NRSHIFT 0
-#define TARGET_IOC_TYPESHIFT   (TARGET_IOC_NRSHIFT+TARGET_IOC_NRBITS)
-#define TARGET_IOC_SIZESHIFT   (TARGET_IOC_TYPESHIFT+TARGET_IOC_TYPEBITS)
-#define TARGET_IOC_DIRSHIFT(TARGET_IOC_SIZESHIFT+TARGET_IOC_SIZEBITS)
+#define TARGET_IOC_NRSHIFT  0
+#define TARGET_IOC_TYPESHIFT(TARGET_IOC_NRSHIFT+TARGET_IOC_NRBITS)
+#define TARGET_IOC_SIZESHIFT(TARGET_IOC_TYPESHIFT+TARGET_IOC_TYPEBITS)
+#define TARGET_IOC_DIRSHIFT (TARGET_IOC_SIZESHIFT+TARGET_IOC_

[PULL 34/47] bsd-user: Use page_find_range_empty for mmap_find_vma_reserved

2023-07-15 Thread Richard Henderson
Use the interval tree to find empty space, rather than
probing each page in turn.

Cc: Warner Losh 
Cc: Kyle Evans 
Signed-off-by: Richard Henderson 
Reviewed-bt: Warner Losh 
Message-Id: <20230707204054.8792-18-richard.hender...@linaro.org>
---
 bsd-user/mmap.c | 48 +++-
 1 file changed, 7 insertions(+), 41 deletions(-)

diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index 07b5b8055e..aca8764356 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -222,50 +222,16 @@ unsigned long last_brk;
 static abi_ulong mmap_find_vma_reserved(abi_ulong start, abi_ulong size,
 abi_ulong alignment)
 {
-abi_ulong addr;
-abi_ulong end_addr;
-int prot;
-int looped = 0;
+abi_ulong ret;
 
-if (size > reserved_va) {
-return (abi_ulong)-1;
+ret = page_find_range_empty(start, reserved_va, size, alignment);
+if (ret == -1 && start > TARGET_PAGE_SIZE) {
+/* Restart at the beginning of the address space. */
+ret = page_find_range_empty(TARGET_PAGE_SIZE, start - 1,
+size, alignment);
 }
 
-size = HOST_PAGE_ALIGN(size) + alignment;
-end_addr = start + size;
-if (end_addr > reserved_va) {
-end_addr = reserved_va + 1;
-}
-addr = end_addr - qemu_host_page_size;
-
-while (1) {
-if (addr > end_addr) {
-if (looped) {
-return (abi_ulong)-1;
-}
-end_addr = reserved_va + 1;
-addr = end_addr - qemu_host_page_size;
-looped = 1;
-continue;
-}
-prot = page_get_flags(addr);
-if (prot) {
-end_addr = addr;
-}
-if (end_addr - addr >= size) {
-break;
-}
-addr -= qemu_host_page_size;
-}
-
-if (start == mmap_next_start) {
-mmap_next_start = addr;
-}
-/* addr is sufficiently low to align it up */
-if (alignment != 0) {
-addr = (addr + alignment) & ~(alignment - 1);
-}
-return addr;
+return ret;
 }
 
 /*
-- 
2.34.1




[PULL 35/47] linux-user: Use page_find_range_empty for mmap_find_vma_reserved

2023-07-15 Thread Richard Henderson
Use the interval tree to find empty space, rather than
probing each page in turn.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-19-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 52 ++-
 1 file changed, 6 insertions(+), 46 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index c4b2515271..738b9b797d 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -318,55 +318,15 @@ unsigned long last_brk;
 static abi_ulong mmap_find_vma_reserved(abi_ulong start, abi_ulong size,
 abi_ulong align)
 {
-abi_ulong addr, end_addr, incr = qemu_host_page_size;
-int prot;
-bool looped = false;
+target_ulong ret;
 
-if (size > reserved_va) {
-return (abi_ulong)-1;
+ret = page_find_range_empty(start, reserved_va, size, align);
+if (ret == -1 && start > mmap_min_addr) {
+/* Restart at the beginning of the address space. */
+ret = page_find_range_empty(mmap_min_addr, start - 1, size, align);
 }
 
-/* Note that start and size have already been aligned by mmap_find_vma. */
-
-end_addr = start + size;
-/*
- * Start at the top of the address space, ignoring the last page.
- * If reserved_va == UINT32_MAX, then end_addr wraps to 0,
- * throwing the rest of the calculations off.
- * TODO: rewrite using last_addr instead.
- * TODO: use the interval tree instead of probing every page.
- */
-if (start > reserved_va - size) {
-end_addr = ((reserved_va - size) & -align) + size;
-looped = true;
-}
-
-/* Search downward from END_ADDR, checking to see if a page is in use.  */
-addr = end_addr;
-while (1) {
-addr -= incr;
-if (addr > end_addr) {
-if (looped) {
-/* Failure.  The entire address space has been searched.  */
-return (abi_ulong)-1;
-}
-/* Re-start at the top of the address space (see above). */
-addr = end_addr = ((reserved_va - size) & -align) + size;
-looped = true;
-} else {
-prot = page_get_flags(addr);
-if (prot) {
-/* Page in use.  Restart below this page.  */
-addr = end_addr = ((addr - size) & -align) + size;
-} else if (addr && addr + size == end_addr) {
-/* Success!  All pages between ADDR and END_ADDR are free.  */
-if (start == mmap_next_start) {
-mmap_next_start = addr;
-}
-return addr;
-}
-}
-}
+return ret;
 }
 
 /*
-- 
2.34.1




[PULL 33/47] accel/tcg: Introduce page_find_range_empty

2023-07-15 Thread Richard Henderson
Use the interval tree to locate an unused range in the VM.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-17-richard.hender...@linaro.org>
---
 include/exec/cpu-all.h | 15 +++
 accel/tcg/user-exec.c  | 41 +
 2 files changed, 56 insertions(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 94f828b109..eb1c54701a 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -236,6 +236,21 @@ int page_check_range(target_ulong start, target_ulong len, 
int flags);
  */
 bool page_check_range_empty(target_ulong start, target_ulong last);
 
+/**
+ * page_find_range_empty
+ * @min: first byte of search range
+ * @max: last byte of search range
+ * @len: size of the hole required
+ * @align: alignment of the hole required (power of 2)
+ *
+ * If there is a range [x, x+@len) within [@min, @max] such that
+ * x % @align == 0, then return x.  Otherwise return -1.
+ * The memory lock must be held, as the caller will want to ensure
+ * the returned range stays empty until a new mapping can be installed.
+ */
+target_ulong page_find_range_empty(target_ulong min, target_ulong max,
+   target_ulong len, target_ulong align);
+
 /**
  * page_get_target_data(address)
  * @address: guest virtual address
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index ab684a3ea2..e4f9563730 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -605,6 +605,47 @@ bool page_check_range_empty(target_ulong start, 
target_ulong last)
 return pageflags_find(start, last) == NULL;
 }
 
+target_ulong page_find_range_empty(target_ulong min, target_ulong max,
+   target_ulong len, target_ulong align)
+{
+target_ulong len_m1, align_m1;
+
+assert(min <= max);
+assert(max <= GUEST_ADDR_MAX);
+assert(len != 0);
+assert(is_power_of_2(align));
+assert_memory_lock();
+
+len_m1 = len - 1;
+align_m1 = align - 1;
+
+/* Iteratively narrow the search region. */
+while (1) {
+PageFlagsNode *p;
+
+/* Align min and double-check there's enough space remaining. */
+min = (min + align_m1) & ~align_m1;
+if (min > max) {
+return -1;
+}
+if (len_m1 > max - min) {
+return -1;
+}
+
+p = pageflags_find(min, min + len_m1);
+if (p == NULL) {
+/* Found! */
+return min;
+}
+if (max <= p->itree.last) {
+/* Existing allocation fills the remainder of the search region. */
+return -1;
+}
+/* Skip across existing allocation. */
+min = p->itree.last + 1;
+}
+}
+
 void page_protect(tb_page_addr_t address)
 {
 PageFlagsNode *p;
-- 
2.34.1




[PULL 08/47] linux-user: Use abi_ullong not unsigned long long in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 20986bd1d3..45ebacd4b4 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1385,13 +1385,13 @@ struct target_stat64 {
 abi_ulong   target_st_ctime;
 abi_ulong   target_st_ctime_nsec;
 
-unsigned long long  st_ino;
+abi_ullong  st_ino;
 } QEMU_PACKED;
 
 #ifdef TARGET_ARM
 #define TARGET_HAS_STRUCT_STAT64
 struct target_eabi_stat64 {
-unsigned long long st_dev;
+abi_ullong   st_dev;
 abi_uint __pad1;
 abi_ulong__st_ino;
 abi_uint st_mode;
@@ -1400,13 +1400,13 @@ struct target_eabi_stat64 {
 abi_ulongst_uid;
 abi_ulongst_gid;
 
-unsigned long long st_rdev;
+abi_ullong   st_rdev;
 abi_uint __pad2[2];
 
 long long   st_size;
 abi_ulongst_blksize;
 abi_uint __pad3;
-unsigned long long st_blocks;
+abi_ullong   st_blocks;
 
 abi_ulongtarget_st_atime;
 abi_ulongtarget_st_atime_nsec;
@@ -1417,7 +1417,7 @@ struct target_eabi_stat64 {
 abi_ulongtarget_st_ctime;
 abi_ulongtarget_st_ctime_nsec;
 
-unsigned long long st_ino;
+abi_ullong   st_ino;
 } QEMU_PACKED;
 #endif
 
@@ -1568,14 +1568,14 @@ struct target_stat {
 #if !defined(TARGET_PPC64)
 #define TARGET_HAS_STRUCT_STAT64
 struct QEMU_PACKED target_stat64 {
-unsigned long long st_dev;
-unsigned long long st_ino;
+abi_ullong st_dev;
+abi_ullong st_ino;
 abi_uint st_mode;
 abi_uint st_nlink;
 abi_uint st_uid;
 abi_uint st_gid;
-unsigned long long st_rdev;
-unsigned long long __pad0;
+abi_ullong st_rdev;
+abi_ullong __pad0;
 long long  st_size;
 intst_blksize;
 abi_uint   __pad1;
@@ -1674,7 +1674,7 @@ struct target_stat {
  */
 #define TARGET_HAS_STRUCT_STAT64
 struct target_stat64 {
-unsigned long long  st_dev;
+abi_ullong  st_dev;
 unsigned char   __pad1[2];
 
 #define TARGET_STAT64_HAS_BROKEN_ST_INO 1
@@ -1686,7 +1686,7 @@ struct target_stat64 {
 abi_ulong   st_uid;
 abi_ulong   st_gid;
 
-unsigned long long  st_rdev;
+abi_ullong  st_rdev;
 unsigned char   __pad3[2];
 
 long long   st_size;
@@ -1704,7 +1704,7 @@ struct target_stat64 {
 abi_ulong   target_st_ctime;
 abi_ulong   target_st_ctime_nsec;
 
-unsigned long long  st_ino;
+abi_ullong  st_ino;
 } QEMU_PACKED;
 
 #elif defined(TARGET_ABI_MIPSN64)
@@ -1918,7 +1918,7 @@ struct target_stat {
  */
 #define TARGET_HAS_STRUCT_STAT64
 struct QEMU_PACKED target_stat64 {
-unsigned long long  st_dev;
+abi_ullong  st_dev;
 unsigned char   __pad0[4];
 
 #define TARGET_STAT64_HAS_BROKEN_ST_INO 1
@@ -1930,13 +1930,13 @@ struct QEMU_PACKED target_stat64 {
 abi_ulong   st_uid;
 abi_ulong   st_gid;
 
-unsigned long long  st_rdev;
+abi_ullong  st_rdev;
 unsigned char   __pad3[4];
 
 long long   st_size;
 abi_ulong   st_blksize;
 
-unsigned long long  st_blocks;  /* Number 512-byte blocks 
allocated. */
+abi_ullong  st_blocks;  /* Number 512-byte blocks allocated. */
 
 abi_ulong   target_st_atime;
 abi_ulong   target_st_atime_nsec;
@@ -1947,7 +1947,7 @@ struct QEMU_PACKED target_stat64 {
 abi_ulong   target_st_ctime;
 abi_ulong   target_st_ctime_nsec;
 
-unsigned long long  st_ino;
+abi_ullong  st_ino;
 };
 
 #elif defined(TARGET_I386) && !defined(TARGET_ABI32)
-- 
2.34.1




[PULL 19/47] tcg: Fix info_in_idx increment in layout_arg_by_ref

2023-07-15 Thread Richard Henderson
Off by one error, failing to take into account that layout_arg_1
already incremented info_in_idx for the first piece.  We only
need care for the n-1 TCG_CALL_ARG_BY_REF_N pieces here.

Cc: qemu-sta...@nongnu.org
Fixes: 313bdea84d2 ("tcg: Add TCG_CALL_{RET,ARG}_BY_REF")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1751
Signed-off-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Tested-by: Peter Maydell 
---
 tcg/tcg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index a0628fe424..652e8ea6b9 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1083,7 +1083,7 @@ static void layout_arg_by_ref(TCGCumulativeArgs *cum, 
TCGHelperInfo *info)
 .ref_slot = cum->ref_slot + i,
 };
 }
-cum->info_in_idx += n;
+cum->info_in_idx += n - 1;  /* i=0 accounted for in layout_arg_1 */
 cum->ref_slot += n;
 }
 
-- 
2.34.1




[PULL 32/47] linux-user: Rewrite mmap_frag

2023-07-15 Thread Richard Henderson
Use 'last' variables instead of 'end' variables.
Always zero MAP_ANONYMOUS fragments, which we previously
failed to do if they were not writable; early exit in case
we allocate a new page from the kernel, known zeros.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-16-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 123 +++---
 1 file changed, 62 insertions(+), 61 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index d02d74d279..c4b2515271 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -222,73 +222,76 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 }
 
 /* map an incomplete host page */
-static int mmap_frag(abi_ulong real_start,
- abi_ulong start, abi_ulong end,
- int prot, int flags, int fd, off_t offset)
+static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last,
+  int prot, int flags, int fd, off_t offset)
 {
-abi_ulong real_end, addr;
+abi_ulong real_last;
 void *host_start;
-int prot1, prot_new;
+int prot_old, prot_new;
+int host_prot_old, host_prot_new;
 
-real_end = real_start + qemu_host_page_size;
-host_start = g2h_untagged(real_start);
-
-/* get the protection of the target pages outside the mapping */
-prot1 = 0;
-for (addr = real_start; addr < real_end; addr++) {
-if (addr < start || addr >= end) {
-prot1 |= page_get_flags(addr);
-}
+if (!(flags & MAP_ANONYMOUS)
+&& (flags & MAP_TYPE) == MAP_SHARED
+&& (prot & PROT_WRITE)) {
+/*
+ * msync() won't work with the partial page, so we return an
+ * error if write is possible while it is a shared mapping.
+ */
+errno = EINVAL;
+return false;
 }
 
-if (prot1 == 0) {
-/* no page was there, so we allocate one */
+real_last = real_start + qemu_host_page_size - 1;
+host_start = g2h_untagged(real_start);
+
+/* Get the protection of the target pages outside the mapping. */
+prot_old = 0;
+for (abi_ulong a = real_start; a < start; a += TARGET_PAGE_SIZE) {
+prot_old |= page_get_flags(a);
+}
+for (abi_ulong a = real_last; a > last; a -= TARGET_PAGE_SIZE) {
+prot_old |= page_get_flags(a);
+}
+
+if (prot_old == 0) {
+/*
+ * Since !(prot_old & PAGE_VALID), there were no guest pages
+ * outside of the fragment we need to map.  Allocate a new host
+ * page to cover, discarding whatever else may have been present.
+ */
 void *p = mmap(host_start, qemu_host_page_size,
target_to_host_prot(prot),
flags | MAP_ANONYMOUS, -1, 0);
 if (p == MAP_FAILED) {
-return -1;
+return false;
 }
-prot1 = prot;
+prot_old = prot;
 }
-prot1 &= PAGE_BITS;
+prot_new = prot | prot_old;
 
-prot_new = prot | prot1;
-if (!(flags & MAP_ANONYMOUS)) {
-/*
- * msync() won't work here, so we return an error if write is
- * possible while it is a shared mapping.
- */
-if ((flags & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) {
-return -1;
-}
+host_prot_old = target_to_host_prot(prot_old);
+host_prot_new = target_to_host_prot(prot_new);
 
-/* adjust protection to be able to read */
-if (!(prot1 & PROT_WRITE)) {
-mprotect(host_start, qemu_host_page_size,
- target_to_host_prot(prot1) | PROT_WRITE);
-}
+/* Adjust protection to be able to write. */
+if (!(host_prot_old & PROT_WRITE)) {
+host_prot_old |= PROT_WRITE;
+mprotect(host_start, qemu_host_page_size, host_prot_old);
+}
 
-/* read the corresponding file data */
-if (pread(fd, g2h_untagged(start), end - start, offset) == -1) {
-return -1;
-}
-
-/* put final protection */
-if (prot_new != (prot1 | PROT_WRITE)) {
-mprotect(host_start, qemu_host_page_size,
- target_to_host_prot(prot_new));
-}
+/* Read or zero the new guest pages. */
+if (flags & MAP_ANONYMOUS) {
+memset(g2h_untagged(start), 0, last - start + 1);
 } else {
-if (prot_new != prot1) {
-mprotect(host_start, qemu_host_page_size,
- target_to_host_prot(prot_new));
-}
-if (prot_new & PROT_WRITE) {
-memset(g2h_untagged(start), 0, end - start);
+if (pread(fd, g2h_untagged(start), last - start + 1, offset) == -1) {
+return false;
 }
 }
-return 0;
+
+/* Put final protection */
+if (host_prot_new != host_prot_old) {
+mprotect(host_start, qemu_host_page_size, host_prot_new);
+}
+return true;
 }
 
 #if HOST_LONG_BITS == 64 && TARGET_ABI_BITS == 

[PULL 14/47] include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for microblaze

2023-07-15 Thread Richard Henderson
Based on gcc's microblaze.h setting BIGGEST_ALIGNMENT to 32 bits.

Signed-off-by: Richard Henderson 
---
 include/exec/user/abitypes.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/exec/user/abitypes.h b/include/exec/user/abitypes.h
index 743b8bb9ea..beba0a48c7 100644
--- a/include/exec/user/abitypes.h
+++ b/include/exec/user/abitypes.h
@@ -15,7 +15,9 @@
 #define ABI_LLONG_ALIGNMENT 2
 #endif
 
-#if (defined(TARGET_I386) && !defined(TARGET_X86_64)) || defined(TARGET_SH4)
+#if (defined(TARGET_I386) && !defined(TARGET_X86_64)) \
+|| defined(TARGET_SH4) \
+|| defined(TARGET_MICROBLAZE)
 #define ABI_LLONG_ALIGNMENT 4
 #endif
 
-- 
2.34.1




[PULL 17/47] linux-user: Fix do_shmat type errors

2023-07-15 Thread Richard Henderson
The guest address, raddr, should be unsigned, aka abi_ulong.
The host addresses should be cast via *intptr_t not long.
Drop the inline and fix two other whitespace issues.

Signed-off-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Anton Johansson 
Message-Id: <20230626140250.69572-1-richard.hender...@linaro.org>
---
 linux-user/syscall.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index c15d9ad743..b78eb686d8 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4539,14 +4539,14 @@ static inline abi_ulong target_shmlba(CPUArchState 
*cpu_env)
 }
 #endif
 
-static inline abi_ulong do_shmat(CPUArchState *cpu_env,
- int shmid, abi_ulong shmaddr, int shmflg)
+static abi_ulong do_shmat(CPUArchState *cpu_env, int shmid,
+  abi_ulong shmaddr, int shmflg)
 {
 CPUState *cpu = env_cpu(cpu_env);
-abi_long raddr;
+abi_ulong raddr;
 void *host_raddr;
 struct shmid_ds shm_info;
-int i,ret;
+int i, ret;
 abi_ulong shmlba;
 
 /* shmat pointers are always untagged */
@@ -4602,9 +4602,9 @@ static inline abi_ulong do_shmat(CPUArchState *cpu_env,
 
 if (host_raddr == (void *)-1) {
 mmap_unlock();
-return get_errno((long)host_raddr);
+return get_errno((intptr_t)host_raddr);
 }
-raddr=h2g((unsigned long)host_raddr);
+raddr = h2g((uintptr_t)host_raddr);
 
 page_set_flags(raddr, raddr + shm_info.shm_segsz - 1,
PAGE_VALID | PAGE_RESET | PAGE_READ |
@@ -4621,7 +4621,6 @@ static inline abi_ulong do_shmat(CPUArchState *cpu_env,
 
 mmap_unlock();
 return raddr;
-
 }
 
 static inline abi_long do_shmdt(abi_ulong shmaddr)
-- 
2.34.1




[PULL 25/47] linux-user: Populate more bits in mmap_flags_tbl

2023-07-15 Thread Richard Henderson
Fix translation of TARGET_MAP_SHARED and TARGET_MAP_PRIVATE,
which are types not single bits.  Add TARGET_MAP_SHARED_VALIDATE,
TARGET_MAP_SYNC, TARGET_MAP_NONBLOCK, TARGET_MAP_POPULATE,
TARGET_MAP_FIXED_NOREPLACE, and TARGET_MAP_UNINITIALIZED.

Update strace to match.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-9-richard.hender...@linaro.org>
---
 linux-user/strace.c  | 23 ++-
 linux-user/syscall.c | 21 +++--
 2 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 9228b235da..bbd29148d4 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -1094,28 +1094,25 @@ UNUSED static const struct flags mmap_prot_flags[] = {
 };
 
 UNUSED static const struct flags mmap_flags[] = {
-FLAG_TARGET(MAP_SHARED),
-FLAG_TARGET(MAP_PRIVATE),
+FLAG_TARGET_MASK(MAP_SHARED, MAP_TYPE),
+FLAG_TARGET_MASK(MAP_PRIVATE, MAP_TYPE),
+FLAG_TARGET_MASK(MAP_SHARED_VALIDATE, MAP_TYPE),
 FLAG_TARGET(MAP_ANONYMOUS),
 FLAG_TARGET(MAP_DENYWRITE),
-FLAG_TARGET(MAP_FIXED),
-FLAG_TARGET(MAP_GROWSDOWN),
 FLAG_TARGET(MAP_EXECUTABLE),
-#ifdef MAP_LOCKED
+FLAG_TARGET(MAP_FIXED),
+FLAG_TARGET(MAP_FIXED_NOREPLACE),
+FLAG_TARGET(MAP_GROWSDOWN),
+FLAG_TARGET(MAP_HUGETLB),
 FLAG_TARGET(MAP_LOCKED),
-#endif
-#ifdef MAP_NONBLOCK
 FLAG_TARGET(MAP_NONBLOCK),
-#endif
 FLAG_TARGET(MAP_NORESERVE),
-#ifdef MAP_POPULATE
 FLAG_TARGET(MAP_POPULATE),
-#endif
-#if defined(TARGET_MAP_UNINITIALIZED) && TARGET_MAP_UNINITIALIZED != 0
+FLAG_TARGET(MAP_STACK),
+FLAG_TARGET(MAP_SYNC),
+#if TARGET_MAP_UNINITIALIZED != 0
 FLAG_TARGET(MAP_UNINITIALIZED),
 #endif
-FLAG_TARGET(MAP_HUGETLB),
-FLAG_TARGET(MAP_STACK),
 FLAG_END,
 };
 
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 02d3b6c90a..3a89f6b408 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6012,9 +6012,19 @@ static const StructEntry struct_termios_def = {
 .print = print_termios,
 };
 
+/* If the host does not provide these bits, they may be safely discarded. */
+#ifndef MAP_SYNC
+#define MAP_SYNC 0
+#endif
+#ifndef MAP_UNINITIALIZED
+#define MAP_UNINITIALIZED 0
+#endif
+
 static const bitmask_transtbl mmap_flags_tbl[] = {
-{ TARGET_MAP_SHARED, TARGET_MAP_SHARED, MAP_SHARED, MAP_SHARED },
-{ TARGET_MAP_PRIVATE, TARGET_MAP_PRIVATE, MAP_PRIVATE, MAP_PRIVATE },
+{ TARGET_MAP_TYPE, TARGET_MAP_SHARED, MAP_TYPE, MAP_SHARED },
+{ TARGET_MAP_TYPE, TARGET_MAP_PRIVATE, MAP_TYPE, MAP_PRIVATE },
+{ TARGET_MAP_TYPE, TARGET_MAP_SHARED_VALIDATE,
+  MAP_TYPE, MAP_SHARED_VALIDATE },
 { TARGET_MAP_FIXED, TARGET_MAP_FIXED, MAP_FIXED, MAP_FIXED },
 { TARGET_MAP_ANONYMOUS, TARGET_MAP_ANONYMOUS,
   MAP_ANONYMOUS, MAP_ANONYMOUS },
@@ -6032,6 +6042,13 @@ static const bitmask_transtbl mmap_flags_tbl[] = {
Recognize it for the target insofar as we do not want to pass
it through to the host.  */
 { TARGET_MAP_STACK, TARGET_MAP_STACK, 0, 0 },
+{ TARGET_MAP_SYNC, TARGET_MAP_SYNC, MAP_SYNC, MAP_SYNC },
+{ TARGET_MAP_NONBLOCK, TARGET_MAP_NONBLOCK, MAP_NONBLOCK, MAP_NONBLOCK },
+{ TARGET_MAP_POPULATE, TARGET_MAP_POPULATE, MAP_POPULATE, MAP_POPULATE },
+{ TARGET_MAP_FIXED_NOREPLACE, TARGET_MAP_FIXED_NOREPLACE,
+  MAP_FIXED_NOREPLACE, MAP_FIXED_NOREPLACE },
+{ TARGET_MAP_UNINITIALIZED, TARGET_MAP_UNINITIALIZED,
+  MAP_UNINITIALIZED, MAP_UNINITIALIZED },
 { 0, 0, 0, 0 }
 };
 
-- 
2.34.1




[PULL 28/47] linux-user: Implement MAP_FIXED_NOREPLACE

2023-07-15 Thread Richard Henderson
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-12-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 639921dba0..9dc34fc29d 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -509,7 +509,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
  * If the user is asking for the kernel to find a location, do that
  * before we truncate the length for mapping files below.
  */
-if (!(flags & MAP_FIXED)) {
+if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
 host_len = len + offset - host_offset;
 host_len = HOST_PAGE_ALIGN(host_len);
 start = mmap_find_vma(real_start, host_len, TARGET_PAGE_SIZE);
@@ -551,7 +551,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 }
 }
 
-if (!(flags & MAP_FIXED)) {
+if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) {
 unsigned long host_start;
 void *p;
 
@@ -600,6 +600,13 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 goto fail;
 }
 
+/* Validate that the chosen range is empty. */
+if ((flags & MAP_FIXED_NOREPLACE)
+&& !page_check_range_empty(start, end - 1)) {
+errno = EEXIST;
+goto fail;
+}
+
 /*
  * worst case: we cannot map the file because the offset is not
  * aligned, so we read it
@@ -615,7 +622,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 goto fail;
 }
 retaddr = target_mmap(start, len, target_prot | PROT_WRITE,
-  MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS,
+  (flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))
+  | MAP_PRIVATE | MAP_ANONYMOUS,
   -1, 0);
 if (retaddr == -1) {
 goto fail;
-- 
2.34.1




[PULL 20/47] linux-user: Make sure initial brk(0) is page-aligned

2023-07-15 Thread Richard Henderson
From: Andreas Schwab 

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab 
Message-Id: 
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b78eb686d8..02d3b6c90a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -806,7 +806,7 @@ static abi_ulong brk_page;
 
 void target_set_brk(abi_ulong new_brk)
 {
-target_brk = new_brk;
+target_brk = TARGET_PAGE_ALIGN(new_brk);
 brk_page = HOST_PAGE_ALIGN(target_brk);
 }
 
-- 
2.34.1




[PULL 27/47] bsd-user: Use page_check_range_empty for MAP_EXCL

2023-07-15 Thread Richard Henderson
The previous check returned -1 when any page within
[start, start+len) is unmapped, not when all are unmapped.

Cc: Warner Losh 
Cc: Kyle Evans 
Signed-off-by: Richard Henderson 
Reviewed-by: Warner Losh 
Message-Id: <20230707204054.8792-11-richard.hender...@linaro.org>
---
 bsd-user/mmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index 565b9f97ed..07b5b8055e 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -609,7 +609,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
prot,
 }
 
 /* Reject the mapping if any page within the range is mapped */
-if ((flags & MAP_EXCL) && page_check_range(start, len, 0) < 0) {
+if ((flags & MAP_EXCL) && !page_check_range_empty(start, end - 1)) {
 errno = EINVAL;
 goto fail;
 }
-- 
2.34.1




[PULL 26/47] accel/tcg: Introduce page_check_range_empty

2023-07-15 Thread Richard Henderson
Examine the interval tree to validate that a region
has no existing mappings.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-10-richard.hender...@linaro.org>
---
 include/exec/cpu-all.h | 12 
 accel/tcg/user-exec.c  |  7 +++
 2 files changed, 19 insertions(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 472fe9ad9c..94f828b109 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -224,6 +224,18 @@ void page_set_flags(target_ulong start, target_ulong last, 
int flags);
 void page_reset_target_data(target_ulong start, target_ulong last);
 int page_check_range(target_ulong start, target_ulong len, int flags);
 
+/**
+ * page_check_range_empty:
+ * @start: first byte of range
+ * @last: last byte of range
+ * Context: holding mmap lock
+ *
+ * Return true if the entire range [@start, @last] is unmapped.
+ * The memory lock must be held so that the caller will can ensure
+ * the result stays true until a new mapping can be installed.
+ */
+bool page_check_range_empty(target_ulong start, target_ulong last);
+
 /**
  * page_get_target_data(address)
  * @address: guest virtual address
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index d95b875a6a..ab684a3ea2 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -598,6 +598,13 @@ int page_check_range(target_ulong start, target_ulong len, 
int flags)
 return ret;
 }
 
+bool page_check_range_empty(target_ulong start, target_ulong last)
+{
+assert(last >= start);
+assert_memory_lock();
+return pageflags_find(start, last) == NULL;
+}
+
 void page_protect(tb_page_addr_t address)
 {
 PageFlagsNode *p;
-- 
2.34.1




[PULL 47/47] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128

2023-07-15 Thread Richard Henderson
We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with
CONFIG_ATOMIC128_OPT in atomic128.h.  It is difficult
to tell when those changes have been applied with the
ifdef we must use with CONFIG_CMPXCHG128.  So instead
use HAVE_CMPXCHG128, which triggers -Werror-undef when
the proper header has not been included.

Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which
requires CONFIG_ATOMIC128_OPT.  Without this we fall back
to EXCP_ATOMIC to single-step 128-bit atomics, which is
slow enough to cause some tests to time out.

Reported-by: Thomas Huth 
Tested-by: Thomas Huth 
Signed-off-by: Richard Henderson 
---
 accel/tcg/tcg-runtime.h| 2 +-
 include/exec/helper-proto-common.h | 2 ++
 accel/tcg/cputlb.c | 2 +-
 accel/tcg/user-exec.c  | 2 +-
 tcg/tcg-op-ldst.c  | 2 +-
 accel/tcg/atomic_common.c.inc  | 2 +-
 6 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index 39e68007f9..186899a2c7 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -58,7 +58,7 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG,
 DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG,
i64, env, i64, i64, i64, i32)
 #endif
-#ifdef CONFIG_CMPXCHG128
+#if HAVE_CMPXCHG128
 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG,
i128, env, i64, i128, i128, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG,
diff --git a/include/exec/helper-proto-common.h 
b/include/exec/helper-proto-common.h
index 4d4b022668..8b67170a22 100644
--- a/include/exec/helper-proto-common.h
+++ b/include/exec/helper-proto-common.h
@@ -7,6 +7,8 @@
 #ifndef HELPER_PROTO_COMMON_H
 #define HELPER_PROTO_COMMON_H
 
+#include "qemu/atomic128.h"  /* for HAVE_CMPXCHG128 */
+
 #define HELPER_H "accel/tcg/tcg-runtime.h"
 #include "exec/helper-proto.h.inc"
 #undef  HELPER_H
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index c2b81ec569..e0079c9a9d 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -3105,7 +3105,7 @@ void cpu_st16_mmu(CPUArchState *env, target_ulong addr, 
Int128 val,
 #include "atomic_template.h"
 #endif
 
-#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128)
+#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128
 #define DATA_SIZE 16
 #include "atomic_template.h"
 #endif
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index df60c7d673..ac38c2bf96 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -1433,7 +1433,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, vaddr 
addr, MemOpIdx oi,
 #include "atomic_template.h"
 #endif
 
-#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128)
+#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128
 #define DATA_SIZE 16
 #include "atomic_template.h"
 #endif
diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
index 0fcc1618e5..d54c305598 100644
--- a/tcg/tcg-op-ldst.c
+++ b/tcg/tcg-op-ldst.c
@@ -778,7 +778,7 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, 
TCGv_i64,
 #else
 # define WITH_ATOMIC64(X)
 #endif
-#ifdef CONFIG_CMPXCHG128
+#if HAVE_CMPXCHG128
 # define WITH_ATOMIC128(X) X,
 #else
 # define WITH_ATOMIC128(X)
diff --git a/accel/tcg/atomic_common.c.inc b/accel/tcg/atomic_common.c.inc
index ee222fd7e7..95a5c5ff12 100644
--- a/accel/tcg/atomic_common.c.inc
+++ b/accel/tcg/atomic_common.c.inc
@@ -41,7 +41,7 @@ CMPXCHG_HELPER(cmpxchgq_be, uint64_t)
 CMPXCHG_HELPER(cmpxchgq_le, uint64_t)
 #endif
 
-#ifdef CONFIG_CMPXCHG128
+#if HAVE_CMPXCHG128
 CMPXCHG_HELPER(cmpxchgo_be, Int128)
 CMPXCHG_HELPER(cmpxchgo_le, Int128)
 #endif
-- 
2.34.1




[PULL 37/47] linux-user: Rewrite mmap_reserve

2023-07-15 Thread Richard Henderson
Use 'last' variables instead of 'end' variables; be careful
about avoiding overflow.  Assert that the mmap succeeded.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-21-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 68 +--
 1 file changed, 42 insertions(+), 26 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index bb9cbe52cd..6308787942 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -722,47 +722,63 @@ fail:
 return -1;
 }
 
-static void mmap_reserve(abi_ulong start, abi_ulong size)
+static void mmap_reserve(abi_ulong start, abi_ulong len)
 {
 abi_ulong real_start;
-abi_ulong real_end;
-abi_ulong addr;
-abi_ulong end;
+abi_ulong real_last;
+abi_ulong real_len;
+abi_ulong last;
+abi_ulong a;
+void *host_start, *ptr;
 int prot;
 
+last = start + len - 1;
 real_start = start & qemu_host_page_mask;
-real_end = HOST_PAGE_ALIGN(start + size);
-end = start + size;
-if (start > real_start) {
-/* handle host page containing start */
+real_last = HOST_PAGE_ALIGN(last) - 1;
+
+/*
+ * If guest pages remain on the first or last host pages,
+ * adjust the deallocation to retain those guest pages.
+ * The single page special case is required for the last page,
+ * lest real_start overflow to zero.
+ */
+if (real_last - real_start < qemu_host_page_size) {
 prot = 0;
-for (addr = real_start; addr < start; addr += TARGET_PAGE_SIZE) {
-prot |= page_get_flags(addr);
+for (a = real_start; a < start; a += TARGET_PAGE_SIZE) {
+prot |= page_get_flags(a);
 }
-if (real_end == real_start + qemu_host_page_size) {
-for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) {
-prot |= page_get_flags(addr);
-}
-end = real_end;
+for (a = last; a < real_last; a += TARGET_PAGE_SIZE) {
+prot |= page_get_flags(a + 1);
+}
+if (prot != 0) {
+return;
+}
+} else {
+for (prot = 0, a = real_start; a < start; a += TARGET_PAGE_SIZE) {
+prot |= page_get_flags(a);
 }
 if (prot != 0) {
 real_start += qemu_host_page_size;
 }
-}
-if (end < real_end) {
-prot = 0;
-for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) {
-prot |= page_get_flags(addr);
+
+for (prot = 0, a = last; a < real_last; a += TARGET_PAGE_SIZE) {
+prot |= page_get_flags(a + 1);
 }
 if (prot != 0) {
-real_end -= qemu_host_page_size;
+real_last -= qemu_host_page_size;
+}
+
+if (real_last < real_start) {
+return;
 }
 }
-if (real_start != real_end) {
-mmap(g2h_untagged(real_start), real_end - real_start, PROT_NONE,
- MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE,
- -1, 0);
-}
+
+real_len = real_last - real_start + 1;
+host_start = g2h_untagged(real_start);
+
+ptr = mmap(host_start, real_len, PROT_NONE,
+   MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0);
+assert(ptr == host_start);
 }
 
 int target_munmap(abi_ulong start, abi_ulong len)
-- 
2.34.1




[PULL 41/47] accel/tcg: Return bool from page_check_range

2023-07-15 Thread Richard Henderson
Replace the 0/-1 result with true/false.
Invert the sense of the test of all callers.
Document the function.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-25-richard.hender...@linaro.org>
---
 bsd-user/qemu.h|  2 +-
 include/exec/cpu-all.h | 13 -
 linux-user/qemu.h  |  2 +-
 accel/tcg/user-exec.c  | 22 +++---
 linux-user/syscall.c   |  2 +-
 target/hppa/op_helper.c|  2 +-
 target/riscv/vector_helper.c   |  2 +-
 target/sparc/ldst_helper.c |  2 +-
 accel/tcg/ldst_atomicity.c.inc |  4 ++--
 9 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
index 41d84e0b81..edf9602f9b 100644
--- a/bsd-user/qemu.h
+++ b/bsd-user/qemu.h
@@ -267,7 +267,7 @@ abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, 
abi_long arg2);
 
 static inline bool access_ok(int type, abi_ulong addr, abi_ulong size)
 {
-return page_check_range((target_ulong)addr, size, type) == 0;
+return page_check_range((target_ulong)addr, size, type);
 }
 
 /*
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index eb1c54701a..94f44f1f59 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -222,7 +222,18 @@ int walk_memory_regions(void *, walk_memory_regions_fn);
 int page_get_flags(target_ulong address);
 void page_set_flags(target_ulong start, target_ulong last, int flags);
 void page_reset_target_data(target_ulong start, target_ulong last);
-int page_check_range(target_ulong start, target_ulong len, int flags);
+
+/**
+ * page_check_range
+ * @start: first byte of range
+ * @len: length of range
+ * @flags: flags required for each page
+ *
+ * Return true if every page in [@start, @start+@len) has @flags set.
+ * Return false if any page is unmapped.  Thus testing flags == 0 is
+ * equivalent to testing for flags == PAGE_VALID.
+ */
+bool page_check_range(target_ulong start, target_ulong last, int flags);
 
 /**
  * page_check_range_empty:
diff --git a/linux-user/qemu.h b/linux-user/qemu.h
index 9b8e0860d7..802794db63 100644
--- a/linux-user/qemu.h
+++ b/linux-user/qemu.h
@@ -182,7 +182,7 @@ static inline bool access_ok_untagged(int type, abi_ulong 
addr, abi_ulong size)
 : !guest_range_valid_untagged(addr, size)) {
 return false;
 }
-return page_check_range((target_ulong)addr, size, type) == 0;
+return page_check_range((target_ulong)addr, size, type);
 }
 
 static inline bool access_ok(CPUState *cpu, int type,
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 1e8fcaf6b0..df60c7d673 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -520,19 +520,19 @@ void page_set_flags(target_ulong start, target_ulong 
last, int flags)
 }
 }
 
-int page_check_range(target_ulong start, target_ulong len, int flags)
+bool page_check_range(target_ulong start, target_ulong len, int flags)
 {
 target_ulong last;
 int locked;  /* tri-state: =0: unlocked, +1: global, -1: local */
-int ret;
+bool ret;
 
 if (len == 0) {
-return 0;  /* trivial length */
+return true;  /* trivial length */
 }
 
 last = start + len - 1;
 if (last < start) {
-return -1; /* wrap around */
+return false; /* wrap around */
 }
 
 locked = have_mmap_lock();
@@ -551,33 +551,33 @@ int page_check_range(target_ulong start, target_ulong 
len, int flags)
 p = pageflags_find(start, last);
 }
 if (!p) {
-ret = -1; /* entire region invalid */
+ret = false; /* entire region invalid */
 break;
 }
 }
 if (start < p->itree.start) {
-ret = -1; /* initial bytes invalid */
+ret = false; /* initial bytes invalid */
 break;
 }
 
 missing = flags & ~p->flags;
 if (missing & ~PAGE_WRITE) {
-ret = -1; /* page doesn't match */
+ret = false; /* page doesn't match */
 break;
 }
 if (missing & PAGE_WRITE) {
 if (!(p->flags & PAGE_WRITE_ORG)) {
-ret = -1; /* page not writable */
+ret = false; /* page not writable */
 break;
 }
 /* Asking about writable, but has been protected: undo. */
 if (!page_unprotect(start, 0)) {
-ret = -1;
+ret = false;
 break;
 }
 /* TODO: page_unprotect should take a range, not a single page. */
 if (last - start < TARGET_PAGE_SIZE) {
-ret = 0; /* ok */
+ret = true; /* ok */
 break;
 }
 start += TARGET_PAGE_SIZE;
@@ -585,7 +585,7 @@ int page_check_range(target_ulong start, target_ulong len, 
int flags)
 }
 
 if (last <= p->itree.last) {
-ret = 0; /* ok */
+ret = tru

[PULL 38/47] linux-user: Rename mmap_reserve to mmap_reserve_or_unmap

2023-07-15 Thread Richard Henderson
If !reserved_va, munmap instead and assert success.
Update all callers.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-22-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 6308787942..22c2869be8 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -722,14 +722,14 @@ fail:
 return -1;
 }
 
-static void mmap_reserve(abi_ulong start, abi_ulong len)
+static void mmap_reserve_or_unmap(abi_ulong start, abi_ulong len)
 {
 abi_ulong real_start;
 abi_ulong real_last;
 abi_ulong real_len;
 abi_ulong last;
 abi_ulong a;
-void *host_start, *ptr;
+void *host_start;
 int prot;
 
 last = start + len - 1;
@@ -776,9 +776,15 @@ static void mmap_reserve(abi_ulong start, abi_ulong len)
 real_len = real_last - real_start + 1;
 host_start = g2h_untagged(real_start);
 
-ptr = mmap(host_start, real_len, PROT_NONE,
-   MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0);
-assert(ptr == host_start);
+if (reserved_va) {
+void *ptr = mmap(host_start, real_len, PROT_NONE,
+ MAP_FIXED | MAP_ANONYMOUS
+ | MAP_PRIVATE | MAP_NORESERVE, -1, 0);
+assert(ptr == host_start);
+} else {
+int ret = munmap(host_start, real_len);
+assert(ret == 0);
+}
 }
 
 int target_munmap(abi_ulong start, abi_ulong len)
@@ -830,11 +836,7 @@ int target_munmap(abi_ulong start, abi_ulong len)
 ret = 0;
 /* unmap what we can */
 if (real_start < real_end) {
-if (reserved_va) {
-mmap_reserve(real_start, real_end - real_start);
-} else {
-ret = munmap(g2h_untagged(real_start), real_end - real_start);
-}
+mmap_reserve_or_unmap(real_start, real_end - real_start);
 }
 
 if (ret == 0) {
@@ -871,7 +873,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
  * If new and old addresses overlap then the above mremap will
  * already have failed with EINVAL.
  */
-mmap_reserve(old_addr, old_size);
+mmap_reserve_or_unmap(old_addr, old_size);
 }
 } else if (flags & MREMAP_MAYMOVE) {
 abi_ulong mmap_start;
@@ -886,7 +888,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
flags | MREMAP_FIXED,
g2h_untagged(mmap_start));
 if (reserved_va) {
-mmap_reserve(old_addr, old_size);
+mmap_reserve_or_unmap(old_addr, old_size);
 }
 }
 } else {
@@ -912,7 +914,8 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 errno = ENOMEM;
 host_addr = MAP_FAILED;
 } else if (reserved_va && old_size > new_size) {
-mmap_reserve(old_addr + old_size, old_size - new_size);
+mmap_reserve_or_unmap(old_addr + old_size,
+  old_size - new_size);
 }
 }
 } else {
-- 
2.34.1




[PULL 09/47] linux-user: Use abi_llong not long long in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 45ebacd4b4..e4fcbd16d2 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1370,7 +1370,7 @@ struct target_stat64 {
 unsigned short  st_rdev;
 unsigned char   __pad3[10];
 
-long long   st_size;
+abi_llong   st_size;
 abi_ulong   st_blksize;
 
 abi_ulong   st_blocks;  /* Number 512-byte blocks allocated. */
@@ -1403,7 +1403,7 @@ struct target_eabi_stat64 {
 abi_ullong   st_rdev;
 abi_uint __pad2[2];
 
-long long   st_size;
+abi_llong   st_size;
 abi_ulongst_blksize;
 abi_uint __pad3;
 abi_ullong   st_blocks;
@@ -1576,10 +1576,10 @@ struct QEMU_PACKED target_stat64 {
 abi_uint st_gid;
 abi_ullong st_rdev;
 abi_ullong __pad0;
-long long  st_size;
+abi_llong  st_size;
 intst_blksize;
 abi_uint   __pad1;
-long long  st_blocks;   /* Number 512-byte blocks allocated. */
+abi_llong  st_blocks;   /* Number 512-byte blocks allocated. */
 inttarget_st_atime;
 abi_uint   target_st_atime_nsec;
 inttarget_st_mtime;
@@ -1689,7 +1689,7 @@ struct target_stat64 {
 abi_ullong  st_rdev;
 unsigned char   __pad3[2];
 
-long long   st_size;
+abi_llong   st_size;
 abi_ulong   st_blksize;
 
 abi_ulong   __pad4; /* future possible st_blocks high bits */
@@ -1933,7 +1933,7 @@ struct QEMU_PACKED target_stat64 {
 abi_ullong  st_rdev;
 unsigned char   __pad3[4];
 
-long long   st_size;
+abi_llong   st_size;
 abi_ulong   st_blksize;
 
 abi_ullong  st_blocks;  /* Number 512-byte blocks allocated. */
-- 
2.34.1




[PULL 46/47] accel/tcg: Always lock pages before translation

2023-07-15 Thread Richard Henderson
We had done this for user-mode by invoking page_protect
within the translator loop.  Extend this to handle system
mode as well.  Move page locking out of tb_link_page.

Reported-by: Liren Wei 
Reported-by: Richard W.M. Jones 
Signed-off-by: Richard Henderson 
Tested-by: Richard W.M. Jones 
---
 accel/tcg/internal.h  |  30 -
 accel/tcg/cpu-exec.c  |  20 
 accel/tcg/tb-maint.c  | 242 --
 accel/tcg/translate-all.c |  43 ++-
 accel/tcg/translator.c|  34 --
 5 files changed, 236 insertions(+), 133 deletions(-)

diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h
index 650c3ac53f..e8cbbde581 100644
--- a/accel/tcg/internal.h
+++ b/accel/tcg/internal.h
@@ -10,6 +10,7 @@
 #define ACCEL_TCG_INTERNAL_H
 
 #include "exec/exec-all.h"
+#include "exec/translate-all.h"
 
 /*
  * Access to the various translations structures need to be serialised
@@ -35,6 +36,32 @@ static inline void page_table_config_init(void) { }
 void page_table_config_init(void);
 #endif
 
+#ifdef CONFIG_USER_ONLY
+/*
+ * For user-only, page_protect sets the page read-only.
+ * Since most execution is already on read-only pages, and we'd need to
+ * account for other TBs on the same page, defer undoing any page protection
+ * until we receive the write fault.
+ */
+static inline void tb_lock_page0(tb_page_addr_t p0)
+{
+page_protect(p0);
+}
+
+static inline void tb_lock_page1(tb_page_addr_t p0, tb_page_addr_t p1)
+{
+page_protect(p1);
+}
+
+static inline void tb_unlock_page1(tb_page_addr_t p0, tb_page_addr_t p1) { }
+static inline void tb_unlock_pages(TranslationBlock *tb) { }
+#else
+void tb_lock_page0(tb_page_addr_t);
+void tb_lock_page1(tb_page_addr_t, tb_page_addr_t);
+void tb_unlock_page1(tb_page_addr_t, tb_page_addr_t);
+void tb_unlock_pages(TranslationBlock *);
+#endif
+
 #ifdef CONFIG_SOFTMMU
 void tb_invalidate_phys_range_fast(ram_addr_t ram_addr,
unsigned size,
@@ -48,8 +75,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc,
 void page_init(void);
 void tb_htable_init(void);
 void tb_reset_jump(TranslationBlock *tb, int n);
-TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
-   tb_page_addr_t phys_page2);
+TranslationBlock *tb_link_page(TranslationBlock *tb);
 bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc);
 void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
uintptr_t host_pc);
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 31aa320513..fdd6d3e0e4 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -536,6 +536,26 @@ static void cpu_exec_longjmp_cleanup(CPUState *cpu)
 if (have_mmap_lock()) {
 mmap_unlock();
 }
+#else
+/*
+ * For softmmu, a tlb_fill fault during translation will land here,
+ * and we need to release any page locks held.  In system mode we
+ * have one tcg_ctx per thread, so we know it was this cpu doing
+ * the translation.
+ *
+ * Alternative 1: Install a cleanup to be called via an exception
+ * handling safe longjmp.  It seems plausible that all our hosts
+ * support such a thing.  We'd have to properly register unwind info
+ * for the JIT for EH, rather that just for GDB.
+ *
+ * Alternative 2: Set and restore cpu->jmp_env in tb_gen_code to
+ * capture the cpu_loop_exit longjmp, perform the cleanup, and
+ * jump again to arrive here.
+ */
+if (tcg_ctx->gen_tb) {
+tb_unlock_pages(tcg_ctx->gen_tb);
+tcg_ctx->gen_tb = NULL;
+}
 #endif
 if (qemu_mutex_iothread_locked()) {
 qemu_mutex_unlock_iothread();
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 9566224d18..c406b2f7b7 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -70,17 +70,7 @@ typedef struct PageDesc PageDesc;
  */
 #define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock())
 
-static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
-  PageDesc **ret_p2, tb_page_addr_t phys2,
-  bool alloc)
-{
-*ret_p1 = NULL;
-*ret_p2 = NULL;
-}
-
-static inline void page_unlock(PageDesc *pd) { }
-static inline void page_lock_tb(const TranslationBlock *tb) { }
-static inline void page_unlock_tb(const TranslationBlock *tb) { }
+static inline void tb_lock_pages(const TranslationBlock *tb) { }
 
 /*
  * For user-only, since we are protecting all of memory with a single lock,
@@ -96,7 +86,7 @@ static void tb_remove_all(void)
 }
 
 /* Call with mmap_lock held. */
-static void tb_record(TranslationBlock *tb, PageDesc *p1, PageDesc *p2)
+static void tb_record(TranslationBlock *tb)
 {
 vaddr addr;
 int flags;
@@ -391,12 +381,108 @@ static void page_lock(PageDesc *pd)
 qemu_spin_lock(&pd->lock);
 }
 
+/* Like qemu_spin_trylock, returns false on success */
+static bool p

[PULL 24/47] linux-user: Split TARGET_PROT_* out of syscall_defs.h

2023-07-15 Thread Richard Henderson
Move the values into the per-target target_mman.h headers

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-8-richard.hender...@linaro.org>
---
 linux-user/aarch64/target_mman.h |  8 
 linux-user/generic/target_mman.h |  6 +-
 linux-user/mips/target_mman.h|  2 ++
 linux-user/syscall_defs.h| 11 ---
 linux-user/xtensa/target_mman.h  |  2 ++
 5 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/linux-user/aarch64/target_mman.h b/linux-user/aarch64/target_mman.h
index e7ba6070fe..f721295fe1 100644
--- a/linux-user/aarch64/target_mman.h
+++ b/linux-user/aarch64/target_mman.h
@@ -1 +1,9 @@
+#ifndef AARCH64_TARGET_MMAN_H
+#define AARCH64_TARGET_MMAN_H
+
+#define TARGET_PROT_BTI 0x10
+#define TARGET_PROT_MTE 0x20
+
 #include "../generic/target_mman.h"
+
+#endif
diff --git a/linux-user/generic/target_mman.h b/linux-user/generic/target_mman.h
index 7b888fb7f8..ec76a91b46 100644
--- a/linux-user/generic/target_mman.h
+++ b/linux-user/generic/target_mman.h
@@ -23,7 +23,11 @@
 #define TARGET_MAP_NORESERVE0x4000
 #endif
 
-/* Other MAP flags are defined in asm-generic/mman-common.h */
+/* Defined in asm-generic/mman-common.h */
+#ifndef TARGET_PROT_SEM
+#define TARGET_PROT_SEM 0x08
+#endif
+
 #ifndef TARGET_MAP_TYPE
 #define TARGET_MAP_TYPE 0x0f
 #endif
diff --git a/linux-user/mips/target_mman.h b/linux-user/mips/target_mman.h
index cd566c24b6..e97694aa4e 100644
--- a/linux-user/mips/target_mman.h
+++ b/linux-user/mips/target_mman.h
@@ -1,6 +1,8 @@
 #ifndef MIPS_TARGET_MMAN_H
 #define MIPS_TARGET_MMAN_H
 
+#define TARGET_PROT_SEM 0x10
+
 #define TARGET_MAP_NORESERVE0x0400
 #define TARGET_MAP_ANONYMOUS0x0800
 #define TARGET_MAP_GROWSDOWN0x1000
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 041105b7a7..77ba343c85 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1227,17 +1227,6 @@ struct target_winsize {
 
 #include "termbits.h"
 
-#if defined(TARGET_MIPS) || defined(TARGET_XTENSA)
-#define TARGET_PROT_SEM 0x10
-#else
-#define TARGET_PROT_SEM 0x08
-#endif
-
-#ifdef TARGET_AARCH64
-#define TARGET_PROT_BTI 0x10
-#define TARGET_PROT_MTE 0x20
-#endif
-
 #include "target_mman.h"
 
 #if (defined(TARGET_I386) && defined(TARGET_ABI32)) \
diff --git a/linux-user/xtensa/target_mman.h b/linux-user/xtensa/target_mman.h
index 3891bb5e07..3933771b5b 100644
--- a/linux-user/xtensa/target_mman.h
+++ b/linux-user/xtensa/target_mman.h
@@ -1,6 +1,8 @@
 #ifndef XTENSA_TARGET_MMAN_H
 #define XTENSA_TARGET_MMAN_H
 
+#define TARGET_PROT_SEM 0x10
+
 #define TARGET_MAP_NORESERVE0x0400
 #define TARGET_MAP_ANONYMOUS0x0800
 #define TARGET_MAP_GROWSDOWN0x1000
-- 
2.34.1




[PULL 45/47] linux-user/arm: Do not allocate a commpage at all for M-profile CPUs

2023-07-15 Thread Richard Henderson
From: Philippe Mathieu-Daudé 

Since commit fbd3c4cff6 ("linux-user/arm: Mark the commpage
executable") executing bare-metal (linked with rdimon.specs)
cortex-M code fails as:

  $ qemu-arm -cpu cortex-m3 ~/hello.exe.m3
  qemu-arm: ../../accel/tcg/user-exec.c:492: page_set_flags: Assertion `last <= 
GUEST_ADDR_MAX' failed.
  Aborted (core dumped)

Commit 4f5c67f8df ("linux-user/arm: Take more care allocating
commpage") already took care of not allocating a commpage for
M-profile CPUs, however it had to be reverted as commit 6cda41daa2.

Re-introduce the M-profile fix from commit 4f5c67f8df.

Fixes: fbd3c4cff6 ("linux-user/arm: Mark the commpage executable")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1755
Reported-by: Christophe Lyon 
Suggested-by: Richard Henderson 
Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Anton Johansson 
Reviewed-by: Richard Henderson 
Message-Id: <20230711153408.68389-1-phi...@linaro.org>
Signed-off-by: Richard Henderson 
---
 linux-user/elfload.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index d3d1352c4e..a26200d9f3 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -424,10 +424,23 @@ enum {
 
 static bool init_guest_commpage(void)
 {
-abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size;
-void *want = g2h_untagged(commpage);
-void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
-  MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
+ARMCPU *cpu = ARM_CPU(thread_cpu);
+abi_ptr commpage;
+void *want;
+void *addr;
+
+/*
+ * M-profile allocates maximum of 2GB address space, so can never
+ * allocate the commpage.  Skip it.
+ */
+if (arm_feature(&cpu->env, ARM_FEATURE_M)) {
+return true;
+}
+
+commpage = HI_COMMPAGE & -qemu_host_page_size;
+want = g2h_untagged(commpage);
+addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE,
+MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0);
 
 if (addr == MAP_FAILED) {
 perror("Allocating guest commpage");
-- 
2.34.1




[PULL 13/47] linux-user: Use abi_uint not unsigned in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 9dc41828cf..c8ffb4f785 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -1776,14 +1776,14 @@ struct target_stat {
 
 #define TARGET_STAT_HAVE_NSEC
 struct target_stat {
-unsignedst_dev;
+abi_uintst_dev;
 abi_longst_pad1[3]; /* Reserved for network id */
 abi_ulong   st_ino;
 abi_uintst_mode;
 abi_uintst_nlink;
 abi_int st_uid;
 abi_int st_gid;
-unsignedst_rdev;
+abi_uintst_rdev;
 abi_longst_pad2[2];
 abi_longst_size;
 abi_longst_pad3;
-- 
2.34.1




[PULL 21/47] linux-user: Fix formatting of mmap.c

2023-07-15 Thread Richard Henderson
Fix all checkpatch.pl errors within mmap.c.

Reviewed-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-5-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 199 --
 1 file changed, 122 insertions(+), 77 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 2692936773..639921dba0 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -56,10 +56,11 @@ void mmap_fork_start(void)
 
 void mmap_fork_end(int child)
 {
-if (child)
+if (child) {
 pthread_mutex_init(&mmap_mutex, NULL);
-else
+} else {
 pthread_mutex_unlock(&mmap_mutex);
+}
 }
 
 /*
@@ -203,40 +204,47 @@ static int mmap_frag(abi_ulong real_start,
 
 /* get the protection of the target pages outside the mapping */
 prot1 = 0;
-for(addr = real_start; addr < real_end; addr++) {
-if (addr < start || addr >= end)
+for (addr = real_start; addr < real_end; addr++) {
+if (addr < start || addr >= end) {
 prot1 |= page_get_flags(addr);
+}
 }
 
 if (prot1 == 0) {
 /* no page was there, so we allocate one */
 void *p = mmap(host_start, qemu_host_page_size, prot,
flags | MAP_ANONYMOUS, -1, 0);
-if (p == MAP_FAILED)
+if (p == MAP_FAILED) {
 return -1;
+}
 prot1 = prot;
 }
 prot1 &= PAGE_BITS;
 
 prot_new = prot | prot1;
 if (!(flags & MAP_ANONYMOUS)) {
-/* msync() won't work here, so we return an error if write is
-   possible while it is a shared mapping */
-if ((flags & MAP_TYPE) == MAP_SHARED &&
-(prot & PROT_WRITE))
+/*
+ * msync() won't work here, so we return an error if write is
+ * possible while it is a shared mapping.
+ */
+if ((flags & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) {
 return -1;
+}
 
 /* adjust protection to be able to read */
-if (!(prot1 & PROT_WRITE))
+if (!(prot1 & PROT_WRITE)) {
 mprotect(host_start, qemu_host_page_size, prot1 | PROT_WRITE);
+}
 
 /* read the corresponding file data */
-if (pread(fd, g2h_untagged(start), end - start, offset) == -1)
+if (pread(fd, g2h_untagged(start), end - start, offset) == -1) {
 return -1;
+}
 
 /* put final protection */
-if (prot_new != (prot1 | PROT_WRITE))
+if (prot_new != (prot1 | PROT_WRITE)) {
 mprotect(host_start, qemu_host_page_size, prot_new);
+}
 } else {
 if (prot_new != prot1) {
 mprotect(host_start, qemu_host_page_size, prot_new);
@@ -265,8 +273,10 @@ abi_ulong mmap_next_start = TASK_UNMAPPED_BASE;
 
 unsigned long last_brk;
 
-/* Subroutine of mmap_find_vma, used when we have pre-allocated a chunk
-   of guest address space.  */
+/*
+ * Subroutine of mmap_find_vma, used when we have pre-allocated
+ * a chunk of guest address space.
+ */
 static abi_ulong mmap_find_vma_reserved(abi_ulong start, abi_ulong size,
 abi_ulong align)
 {
@@ -362,15 +372,17 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
  *  - shmat() with SHM_REMAP flag
  */
 ptr = mmap(g2h_untagged(addr), size, PROT_NONE,
-   MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0);
+   MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0);
 
 /* ENOMEM, if host address space has no memory */
 if (ptr == MAP_FAILED) {
 return (abi_ulong)-1;
 }
 
-/* Count the number of sequential returns of the same address.
-   This is used to modify the search algorithm below.  */
+/*
+ * Count the number of sequential returns of the same address.
+ * This is used to modify the search algorithm below.
+ */
 repeat = (ptr == prev ? repeat + 1 : 0);
 
 if (h2g_valid(ptr + size - 1)) {
@@ -387,14 +399,18 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 /* The address is not properly aligned for the target.  */
 switch (repeat) {
 case 0:
-/* Assume the result that the kernel gave us is the
-   first with enough free space, so start again at the
-   next higher target page.  */
+/*
+ * Assume the result that the kernel gave us is the
+ * first with enough free space, so start again at the
+ * next higher target page.
+ */
 addr = ROUND_UP(addr, align);
 break;
 case 1:
-/* Sometimes the kernel decides to perform the allocation
-   at the top end of memory instead.  */
+/*

[PULL 40/47] accel/tcg: Accept more page flags in page_check_range

2023-07-15 Thread Richard Henderson
Only PAGE_WRITE needs special attention, all others can be
handled as we do for PAGE_READ.  Adjust the mask.

Signed-off-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230707204054.8792-24-richard.hender...@linaro.org>
---
 accel/tcg/user-exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index e4f9563730..1e8fcaf6b0 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -561,8 +561,8 @@ int page_check_range(target_ulong start, target_ulong len, 
int flags)
 }
 
 missing = flags & ~p->flags;
-if (missing & PAGE_READ) {
-ret = -1; /* page not readable */
+if (missing & ~PAGE_WRITE) {
+ret = -1; /* page doesn't match */
 break;
 }
 if (missing & PAGE_WRITE) {
-- 
2.34.1




[PULL 12/47] linux-user: Use abi_short not short in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 21ca03b0f4..9dc41828cf 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -702,8 +702,8 @@ typedef struct target_siginfo {
 
 struct target_pollfd {
 abi_int fd;   /* file descriptor */
-short events; /* requested events */
-short revents;/* returned events */
+abi_short events; /* requested events */
+abi_short revents;/* returned events */
 };
 
 /* virtual terminal ioctls */
@@ -1480,7 +1480,7 @@ struct target_stat {
 abi_ushort  st_dev;
 abi_ulong   st_ino;
 abi_ushort  st_mode;
-short   st_nlink;
+abi_short   st_nlink;
 abi_ushort  st_uid;
 abi_ushort  st_gid;
 abi_ushort  st_rdev;
-- 
2.34.1




[PULL 44/47] linux-user: Drop uint and ulong

2023-07-15 Thread Richard Henderson
From: Juan Quintela 

These are types not used anymore anywhere else.

Signed-off-by: Juan Quintela 
Reviewed-by: Richard Henderson 
Reviewed-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
Message-id: <20230511085056.13809-1-quint...@redhat.com>
Signed-off-by: Richard Henderson 
---
 linux-user/syscall.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 33bc242e6a..1464151826 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -309,16 +309,16 @@ _syscall0(int, sys_gettid)
 #endif
 
 #if defined(TARGET_NR_getdents) && defined(EMULATE_GETDENTS_WITH_GETDENTS)
-_syscall3(int, sys_getdents, uint, fd, struct linux_dirent *, dirp, uint, 
count);
+_syscall3(int, sys_getdents, unsigned int, fd, struct linux_dirent *, dirp, 
unsigned int, count);
 #endif
 #if (defined(TARGET_NR_getdents) && \
   !defined(EMULATE_GETDENTS_WITH_GETDENTS)) || \
 (defined(TARGET_NR_getdents64) && defined(__NR_getdents64))
-_syscall3(int, sys_getdents64, uint, fd, struct linux_dirent64 *, dirp, uint, 
count);
+_syscall3(int, sys_getdents64, unsigned int, fd, struct linux_dirent64 *, 
dirp, unsigned int, count);
 #endif
 #if defined(TARGET_NR__llseek) && defined(__NR_llseek)
-_syscall5(int, _llseek,  uint,  fd, ulong, hi, ulong, lo,
-  loff_t *, res, uint, wh);
+_syscall5(int, _llseek,  unsigned int,  fd, unsigned long, hi, unsigned long, 
lo,
+  loff_t *, res, unsigned int, wh);
 #endif
 _syscall3(int, sys_rt_sigqueueinfo, pid_t, pid, int, sig, siginfo_t *, uinfo)
 _syscall4(int, sys_rt_tgsigqueueinfo, pid_t, pid, pid_t, tid, int, sig,
-- 
2.34.1




[PULL 31/47] linux-user: Rewrite target_mprotect

2023-07-15 Thread Richard Henderson
Use 'last' variables instead of 'end' variables.
When host page size > guest page size, detect when
adjacent host pages have the same protection and
merge that expanded host range into fewer syscalls.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-15-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 106 +-
 1 file changed, 67 insertions(+), 39 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index b2c2d85857..d02d74d279 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -120,8 +120,11 @@ static int target_to_host_prot(int prot)
 /* NOTE: all the constants are the HOST ones, but addresses are target. */
 int target_mprotect(abi_ulong start, abi_ulong len, int target_prot)
 {
-abi_ulong end, host_start, host_end, addr;
-int prot1, ret, page_flags;
+abi_ulong starts[3];
+abi_ulong lens[3];
+int prots[3];
+abi_ulong host_start, host_last, last;
+int prot1, ret, page_flags, nranges;
 
 trace_target_mprotect(start, len, target_prot);
 
@@ -132,63 +135,88 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 if (!page_flags) {
 return -TARGET_EINVAL;
 }
-len = TARGET_PAGE_ALIGN(len);
-end = start + len;
-if (!guest_range_valid_untagged(start, len)) {
-return -TARGET_ENOMEM;
-}
 if (len == 0) {
 return 0;
 }
+len = TARGET_PAGE_ALIGN(len);
+if (!guest_range_valid_untagged(start, len)) {
+return -TARGET_ENOMEM;
+}
+
+last = start + len - 1;
+host_start = start & qemu_host_page_mask;
+host_last = HOST_PAGE_ALIGN(last) - 1;
+nranges = 0;
 
 mmap_lock();
-host_start = start & qemu_host_page_mask;
-host_end = HOST_PAGE_ALIGN(end);
-if (start > host_start) {
-/* handle host page containing start */
+
+if (host_last - host_start < qemu_host_page_size) {
+/* Single host page contains all guest pages: sum the prot. */
 prot1 = target_prot;
-for (addr = host_start; addr < start; addr += TARGET_PAGE_SIZE) {
-prot1 |= page_get_flags(addr);
+for (abi_ulong a = host_start; a < start; a += TARGET_PAGE_SIZE) {
+prot1 |= page_get_flags(a);
 }
-if (host_end == host_start + qemu_host_page_size) {
-for (addr = end; addr < host_end; addr += TARGET_PAGE_SIZE) {
-prot1 |= page_get_flags(addr);
+for (abi_ulong a = last; a < host_last; a += TARGET_PAGE_SIZE) {
+prot1 |= page_get_flags(a + 1);
+}
+starts[nranges] = host_start;
+lens[nranges] = qemu_host_page_size;
+prots[nranges] = prot1;
+nranges++;
+} else {
+if (host_start < start) {
+/* Host page contains more than one guest page: sum the prot. */
+prot1 = target_prot;
+for (abi_ulong a = host_start; a < start; a += TARGET_PAGE_SIZE) {
+prot1 |= page_get_flags(a);
+}
+/* If the resulting sum differs, create a new range. */
+if (prot1 != target_prot) {
+starts[nranges] = host_start;
+lens[nranges] = qemu_host_page_size;
+prots[nranges] = prot1;
+nranges++;
+host_start += qemu_host_page_size;
 }
-end = host_end;
 }
-ret = mprotect(g2h_untagged(host_start), qemu_host_page_size,
-   target_to_host_prot(prot1));
-if (ret != 0) {
-goto error;
+
+if (last < host_last) {
+/* Host page contains more than one guest page: sum the prot. */
+prot1 = target_prot;
+for (abi_ulong a = last; a < host_last; a += TARGET_PAGE_SIZE) {
+prot1 |= page_get_flags(a + 1);
+}
+/* If the resulting sum differs, create a new range. */
+if (prot1 != target_prot) {
+host_last -= qemu_host_page_size;
+starts[nranges] = host_last + 1;
+lens[nranges] = qemu_host_page_size;
+prots[nranges] = prot1;
+nranges++;
+}
 }
-host_start += qemu_host_page_size;
-}
-if (end < host_end) {
-prot1 = target_prot;
-for (addr = end; addr < host_end; addr += TARGET_PAGE_SIZE) {
-prot1 |= page_get_flags(addr);
+
+/* Create a range for the middle, if any remains. */
+if (host_start < host_last) {
+starts[nranges] = host_start;
+lens[nranges] = host_last - host_start + 1;
+prots[nranges] = target_prot;
+nranges++;
 }
-ret = mprotect(g2h_untagged(host_end - qemu_host_page_size),
-   qemu_host_page_size, target_to_host_prot(prot1));
-if (ret != 0) {
-goto error;
-}
-host_end -= qemu_host_page_size;
 

[PULL 43/47] linux-user: Simplify target_madvise

2023-07-15 Thread Richard Henderson
The trivial length 0 check can be moved up, simplifying some
of the other cases.  The end < start test is handled by
guest_range_valid_untagged.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-27-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 49cfa873e0..44b53bd446 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -900,28 +900,17 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 
 abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice)
 {
-abi_ulong len, end;
+abi_ulong len;
 int ret = 0;
 
 if (start & ~TARGET_PAGE_MASK) {
 return -TARGET_EINVAL;
 }
-len = TARGET_PAGE_ALIGN(len_in);
-
-if (len_in && !len) {
-return -TARGET_EINVAL;
-}
-
-end = start + len;
-if (end < start) {
-return -TARGET_EINVAL;
-}
-
-if (end == start) {
+if (len_in == 0) {
 return 0;
 }
-
-if (!guest_range_valid_untagged(start, len)) {
+len = TARGET_PAGE_ALIGN(len_in);
+if (len == 0 || !guest_range_valid_untagged(start, len)) {
 return -TARGET_EINVAL;
 }
 
-- 
2.34.1




[PULL 42/47] linux-user: Remove can_passthrough_madvise

2023-07-15 Thread Richard Henderson
Use page_check_range instead, which uses the interval tree
instead of checking each page individually.

Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-26-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 24 +++-
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index c0946322fb..49cfa873e0 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -898,23 +898,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong 
old_size,
 return new_addr;
 }
 
-static bool can_passthrough_madvise(abi_ulong start, abi_ulong end)
-{
-ulong addr;
-
-if ((start | end) & ~qemu_host_page_mask) {
-return false;
-}
-
-for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) {
-if (!(page_get_flags(addr) & PAGE_PASSTHROUGH)) {
-return false;
-}
-}
-
-return true;
-}
-
 abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice)
 {
 abi_ulong len, end;
@@ -964,9 +947,8 @@ abi_long target_madvise(abi_ulong start, abi_ulong len_in, 
int advice)
  *
  * A straight passthrough for those may not be safe because qemu sometimes
  * turns private file-backed mappings into anonymous mappings.
- * can_passthrough_madvise() helps to check if a passthrough is possible by
- * comparing mappings that are known to have the same semantics in the host
- * and the guest. In this case passthrough is safe.
+ * If all guest pages have PAGE_PASSTHROUGH set, mappings have the
+ * same semantics for the host as for the guest.
  *
  * We pass through MADV_WIPEONFORK and MADV_KEEPONFORK if possible and
  * return failure if not.
@@ -984,7 +966,7 @@ abi_long target_madvise(abi_ulong start, abi_ulong len_in, 
int advice)
 ret = -EINVAL;
 /* fall through */
 case MADV_DONTNEED:
-if (can_passthrough_madvise(start, end)) {
+if (page_check_range(start, len, PAGE_PASSTHROUGH)) {
 ret = get_errno(madvise(g2h_untagged(start), len, advice));
 if ((advice == MADV_DONTNEED) && (ret == 0)) {
 page_reset_target_data(start, start + len - 1);
-- 
2.34.1




[PULL 22/47] linux-user/strace: Expand struct flags to hold a mask

2023-07-15 Thread Richard Henderson
A zero bit value does not make sense -- it must relate to
some field in some way.

Define FLAG_BASIC with a build-time sanity check.
Adjust FLAG_GENERIC and FLAG_TARGET to use it.
Add FLAG_GENERIC_MASK and FLAG_TARGET_MASK.

Fix up the existing flag definitions for build errors.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-6-richard.hender...@linaro.org>
---
 linux-user/strace.c | 40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/linux-user/strace.c b/linux-user/strace.c
index 669200c4a4..9228b235da 100644
--- a/linux-user/strace.c
+++ b/linux-user/strace.c
@@ -46,15 +46,21 @@ struct syscallname {
  */
 struct flags {
 abi_longf_value;  /* flag */
+abi_longf_mask;   /* mask */
 const char  *f_string; /* stringified flag */
 };
 
+/* No 'struct flags' element should have a zero mask. */
+#define FLAG_BASIC(V, M, N)  { V, M | QEMU_BUILD_BUG_ON_ZERO(!(M)), N }
+
 /* common flags for all architectures */
-#define FLAG_GENERIC(name) { name, #name }
+#define FLAG_GENERIC_MASK(V, M)  FLAG_BASIC(V, M, #V)
+#define FLAG_GENERIC(V)  FLAG_BASIC(V, V, #V)
 /* target specific flags (syscall_defs.h has TARGET_) */
-#define FLAG_TARGET(name)  { TARGET_ ## name, #name }
+#define FLAG_TARGET_MASK(V, M)   FLAG_BASIC(TARGET_##V, TARGET_##M, #V)
+#define FLAG_TARGET(V)   FLAG_BASIC(TARGET_##V, TARGET_##V, #V)
 /* end of flags array */
-#define FLAG_END   { 0, NULL }
+#define FLAG_END   { 0, 0, NULL }
 
 /* Structure used to translate enumerated values into strings */
 struct enums {
@@ -963,7 +969,7 @@ print_syscall_ret_ioctl(CPUArchState *cpu_env, const struct 
syscallname *name,
 #endif
 
 UNUSED static const struct flags access_flags[] = {
-FLAG_GENERIC(F_OK),
+FLAG_GENERIC_MASK(F_OK, R_OK | W_OK | X_OK),
 FLAG_GENERIC(R_OK),
 FLAG_GENERIC(W_OK),
 FLAG_GENERIC(X_OK),
@@ -999,9 +1005,9 @@ UNUSED static const struct flags mode_flags[] = {
 };
 
 UNUSED static const struct flags open_access_flags[] = {
-FLAG_TARGET(O_RDONLY),
-FLAG_TARGET(O_WRONLY),
-FLAG_TARGET(O_RDWR),
+FLAG_TARGET_MASK(O_RDONLY, O_ACCMODE),
+FLAG_TARGET_MASK(O_WRONLY, O_ACCMODE),
+FLAG_TARGET_MASK(O_RDWR, O_ACCMODE),
 FLAG_END,
 };
 
@@ -1010,7 +1016,9 @@ UNUSED static const struct flags open_flags[] = {
 FLAG_TARGET(O_CREAT),
 FLAG_TARGET(O_DIRECTORY),
 FLAG_TARGET(O_EXCL),
+#if TARGET_O_LARGEFILE != 0
 FLAG_TARGET(O_LARGEFILE),
+#endif
 FLAG_TARGET(O_NOCTTY),
 FLAG_TARGET(O_NOFOLLOW),
 FLAG_TARGET(O_NONBLOCK),  /* also O_NDELAY */
@@ -1075,7 +1083,7 @@ UNUSED static const struct flags umount2_flags[] = {
 };
 
 UNUSED static const struct flags mmap_prot_flags[] = {
-FLAG_GENERIC(PROT_NONE),
+FLAG_GENERIC_MASK(PROT_NONE, PROT_READ | PROT_WRITE | PROT_EXEC),
 FLAG_GENERIC(PROT_EXEC),
 FLAG_GENERIC(PROT_READ),
 FLAG_GENERIC(PROT_WRITE),
@@ -1103,7 +,7 @@ UNUSED static const struct flags mmap_flags[] = {
 #ifdef MAP_POPULATE
 FLAG_TARGET(MAP_POPULATE),
 #endif
-#ifdef TARGET_MAP_UNINITIALIZED
+#if defined(TARGET_MAP_UNINITIALIZED) && TARGET_MAP_UNINITIALIZED != 0
 FLAG_TARGET(MAP_UNINITIALIZED),
 #endif
 FLAG_TARGET(MAP_HUGETLB),
@@ -1201,13 +1209,13 @@ UNUSED static const struct flags statx_flags[] = {
 FLAG_GENERIC(AT_SYMLINK_NOFOLLOW),
 #endif
 #ifdef AT_STATX_SYNC_AS_STAT
-FLAG_GENERIC(AT_STATX_SYNC_AS_STAT),
+FLAG_GENERIC_MASK(AT_STATX_SYNC_AS_STAT, AT_STATX_SYNC_TYPE),
 #endif
 #ifdef AT_STATX_FORCE_SYNC
-FLAG_GENERIC(AT_STATX_FORCE_SYNC),
+FLAG_GENERIC_MASK(AT_STATX_FORCE_SYNC, AT_STATX_SYNC_TYPE),
 #endif
 #ifdef AT_STATX_DONT_SYNC
-FLAG_GENERIC(AT_STATX_DONT_SYNC),
+FLAG_GENERIC_MASK(AT_STATX_DONT_SYNC, AT_STATX_SYNC_TYPE),
 #endif
 FLAG_END,
 };
@@ -1481,14 +1489,10 @@ print_flags(const struct flags *f, abi_long flags, int 
last)
 const char *sep = "";
 int n;
 
-if ((flags == 0) && (f->f_value == 0)) {
-qemu_log("%s%s", f->f_string, get_comma(last));
-return;
-}
 for (n = 0; f->f_string != NULL; f++) {
-if ((f->f_value != 0) && ((flags & f->f_value) == f->f_value)) {
+if ((flags & f->f_mask) == f->f_value) {
 qemu_log("%s%s", sep, f->f_string);
-flags &= ~f->f_value;
+flags &= ~f->f_mask;
 sep = "|";
 n++;
 }
-- 
2.34.1




[PULL 03/47] linux-user: Use abi_uint not uint32_t in syscall_defs.h

2023-07-15 Thread Richard Henderson
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 108 +++---
 1 file changed, 54 insertions(+), 54 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index a4e4df8d3e..414d88a9ec 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -67,7 +67,7 @@
 #define USE_UID16
 #define target_id uint16_t
 #else
-#define target_id uint32_t
+#define target_id abi_uint
 #endif
 
 #if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4)  \
@@ -215,9 +215,9 @@ struct target_ip_mreqn {
 
 struct target_ip_mreq_source {
 /* big endian */
-uint32_t imr_multiaddr;
-uint32_t imr_interface;
-uint32_t imr_sourceaddr;
+abi_uint imr_multiaddr;
+abi_uint imr_interface;
+abi_uint imr_sourceaddr;
 };
 
 struct target_linger {
@@ -508,9 +508,9 @@ typedef abi_ulong target_old_sa_flags;
 
 #if defined(TARGET_MIPS)
 struct target_sigaction {
-uint32_tsa_flags;
+abi_uintsa_flags;
 #if defined(TARGET_ABI_MIPSN32)
-uint32_t_sa_handler;
+abi_uint_sa_handler;
 #else
 abi_ulong   _sa_handler;
 #endif
@@ -1620,19 +1620,19 @@ struct target_stat {
 struct QEMU_PACKED target_stat64 {
 uint64_t st_dev;
 #define TARGET_STAT64_HAS_BROKEN_ST_INO 1
-uint32_t pad0;
-uint32_t __st_ino;
+abi_uint pad0;
+abi_uint __st_ino;
 
-uint32_t st_mode;
-uint32_t st_nlink;
-uint32_t st_uid;
-uint32_t st_gid;
+abi_uint st_mode;
+abi_uint st_nlink;
+abi_uint st_uid;
+abi_uint st_gid;
 uint64_t st_rdev;
 uint64_t __pad1;
 
 int64_t  st_size;
 int32_t  st_blksize;
-uint32_t __pad2;
+abi_uint __pad2;
 int64_t st_blocks;  /* Number 512-byte blocks allocated. */
 
 inttarget_st_atime;
@@ -2227,19 +2227,19 @@ struct target_statfs {
 #endif
 
 struct target_statfs64 {
-uint32_tf_type;
-uint32_tf_bsize;
-uint32_tf_frsize;   /* Fragment size - unsupported */
-uint32_t__pad;
+abi_uintf_type;
+abi_uintf_bsize;
+abi_uintf_frsize;   /* Fragment size - unsupported */
+abi_uint__pad;
 uint64_tf_blocks;
 uint64_tf_bfree;
 uint64_tf_files;
 uint64_tf_ffree;
 uint64_tf_bavail;
 target_fsid_t   f_fsid;
-uint32_tf_namelen;
-uint32_tf_flags;
-uint32_tf_spare[5];
+abi_uintf_namelen;
+abi_uintf_flags;
+abi_uintf_spare[5];
 };
 #elif (defined(TARGET_PPC64) || defined(TARGET_X86_64) ||   \
defined(TARGET_SPARC64) || defined(TARGET_AARCH64) ||\
@@ -2307,33 +2307,33 @@ struct target_statfs64 {
 };
 #else
 struct target_statfs {
-uint32_t f_type;
-uint32_t f_bsize;
-uint32_t f_blocks;
-uint32_t f_bfree;
-uint32_t f_bavail;
-uint32_t f_files;
-uint32_t f_ffree;
+abi_uint f_type;
+abi_uint f_bsize;
+abi_uint f_blocks;
+abi_uint f_bfree;
+abi_uint f_bavail;
+abi_uint f_files;
+abi_uint f_ffree;
 target_fsid_t f_fsid;
-uint32_t f_namelen;
-uint32_t f_frsize;
-uint32_t f_flags;
-uint32_t f_spare[4];
+abi_uint f_namelen;
+abi_uint f_frsize;
+abi_uint f_flags;
+abi_uint f_spare[4];
 };
 
 struct target_statfs64 {
-uint32_t f_type;
-uint32_t f_bsize;
+abi_uint f_type;
+abi_uint f_bsize;
 uint64_t f_blocks;
 uint64_t f_bfree;
 uint64_t f_bavail;
 uint64_t f_files;
 uint64_t f_ffree;
 target_fsid_t f_fsid;
-uint32_t f_namelen;
-uint32_t f_frsize;
-uint32_t f_flags;
-uint32_t f_spare[4];
+abi_uint f_namelen;
+abi_uint f_frsize;
+abi_uint f_flags;
+abi_uint f_spare[4];
 };
 #endif
 
@@ -2713,9 +2713,9 @@ struct target_epoll_event {
 #endif
 
 struct target_ucred {
-uint32_t pid;
-uint32_t uid;
-uint32_t gid;
+abi_uint pid;
+abi_uint uid;
+abi_uint gid;
 };
 
 typedef int32_t target_timer_t;
@@ -2754,14 +2754,14 @@ struct target_sigevent {
 };
 
 struct target_user_cap_header {
-uint32_t version;
+abi_uint version;
 int pid;
 };
 
 struct target_user_cap_data {
-uint32_t effective;
-uint32_t permitted;
-uint32_t inheritable;
+abi_uint effective;
+abi_uint permitted;
+abi_uint inheritable;
 };
 
 /* from kernel's include/linux/syslog.h */
@@ -2791,19 +2791,19 @@ struct target_user_cap_data {
 
 struct target_statx_timestamp {
 int64_t tv_sec;
-uint32_t tv_nsec;
+abi_uint tv_nsec;
 int32_t __reserved;
 };
 
 struct target_statx {
 /* 0x00 */
-uint32_t stx_mask;   /* What results were written [uncond] */
-uint32_t stx_blksize;/* Preferred general I/O size [uncond] */
+abi_uint stx_mask;   /* What results were written [uncond] */
+abi_uint stx_blksize;/* Preferre

[PULL 29/47] linux-user: Split out target_to_host_prot

2023-07-15 Thread Richard Henderson
Split out from validate_prot_to_pageflags, as there is not
one single host_prot for the entire range.  We need to adjust
prot for every host page that overlaps multiple guest pages.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
Message-Id: <20230707204054.8792-13-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 78 ++-
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 9dc34fc29d..12b1308a83 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -69,24 +69,11 @@ void mmap_fork_end(int child)
  * Return 0 if the target prot bitmask is invalid, otherwise
  * the internal qemu page_flags (which will include PAGE_VALID).
  */
-static int validate_prot_to_pageflags(int *host_prot, int prot)
+static int validate_prot_to_pageflags(int prot)
 {
 int valid = PROT_READ | PROT_WRITE | PROT_EXEC | TARGET_PROT_SEM;
 int page_flags = (prot & PAGE_BITS) | PAGE_VALID;
 
-/*
- * For the host, we need not pass anything except read/write/exec.
- * While PROT_SEM is allowed by all hosts, it is also ignored, so
- * don't bother transforming guest bit to host bit.  Any other
- * target-specific prot bits will not be understood by the host
- * and will need to be encoded into page_flags for qemu emulation.
- *
- * Pages that are executable by the guest will never be executed
- * by the host, but the host will need to be able to read them.
- */
-*host_prot = (prot & (PROT_READ | PROT_WRITE))
-   | (prot & PROT_EXEC ? PROT_READ : 0);
-
 #ifdef TARGET_AARCH64
 {
 ARMCPU *cpu = ARM_CPU(thread_cpu);
@@ -114,18 +101,34 @@ static int validate_prot_to_pageflags(int *host_prot, int 
prot)
 return prot & ~valid ? 0 : page_flags;
 }
 
+/*
+ * For the host, we need not pass anything except read/write/exec.
+ * While PROT_SEM is allowed by all hosts, it is also ignored, so
+ * don't bother transforming guest bit to host bit.  Any other
+ * target-specific prot bits will not be understood by the host
+ * and will need to be encoded into page_flags for qemu emulation.
+ *
+ * Pages that are executable by the guest will never be executed
+ * by the host, but the host will need to be able to read them.
+ */
+static int target_to_host_prot(int prot)
+{
+return (prot & (PROT_READ | PROT_WRITE)) |
+   (prot & PROT_EXEC ? PROT_READ : 0);
+}
+
 /* NOTE: all the constants are the HOST ones, but addresses are target. */
 int target_mprotect(abi_ulong start, abi_ulong len, int target_prot)
 {
 abi_ulong end, host_start, host_end, addr;
-int prot1, ret, page_flags, host_prot;
+int prot1, ret, page_flags;
 
 trace_target_mprotect(start, len, target_prot);
 
 if ((start & ~TARGET_PAGE_MASK) != 0) {
 return -TARGET_EINVAL;
 }
-page_flags = validate_prot_to_pageflags(&host_prot, target_prot);
+page_flags = validate_prot_to_pageflags(target_prot);
 if (!page_flags) {
 return -TARGET_EINVAL;
 }
@@ -143,7 +146,7 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 host_end = HOST_PAGE_ALIGN(end);
 if (start > host_start) {
 /* handle host page containing start */
-prot1 = host_prot;
+prot1 = target_prot;
 for (addr = host_start; addr < start; addr += TARGET_PAGE_SIZE) {
 prot1 |= page_get_flags(addr);
 }
@@ -154,19 +157,19 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 end = host_end;
 }
 ret = mprotect(g2h_untagged(host_start), qemu_host_page_size,
-   prot1 & PAGE_BITS);
+   target_to_host_prot(prot1));
 if (ret != 0) {
 goto error;
 }
 host_start += qemu_host_page_size;
 }
 if (end < host_end) {
-prot1 = host_prot;
+prot1 = target_prot;
 for (addr = end; addr < host_end; addr += TARGET_PAGE_SIZE) {
 prot1 |= page_get_flags(addr);
 }
 ret = mprotect(g2h_untagged(host_end - qemu_host_page_size),
-   qemu_host_page_size, prot1 & PAGE_BITS);
+   qemu_host_page_size, target_to_host_prot(prot1));
 if (ret != 0) {
 goto error;
 }
@@ -175,8 +178,8 @@ int target_mprotect(abi_ulong start, abi_ulong len, int 
target_prot)
 
 /* handle the pages in the middle */
 if (host_start < host_end) {
-ret = mprotect(g2h_untagged(host_start),
-   host_end - host_start, host_prot);
+ret = mprotect(g2h_untagged(host_start), host_end - host_start,
+   target_to_host_prot(target_prot));
 if (ret != 0) {
 goto error;
 }
@@ -212,7 +215,8 @@ static int mmap_frag(abi_ulong real_start,
 
 if (prot1 == 0) {
 /* no page was there, so we allocate one */
-void *p = mmap(host_start,

[PULL 07/47] linux-user: Use abi_uint not unsigned int in syscall_defs.h

2023-07-15 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 linux-user/syscall_defs.h | 290 +++---
 1 file changed, 145 insertions(+), 145 deletions(-)

diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 2846a8cfa5..20986bd1d3 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -366,7 +366,7 @@ struct target_msghdr {
 abi_long msg_iovlen; /* Number of blocks*/
 abi_long msg_control;/* Per protocol magic (eg BSD file descriptor 
passing) */
 abi_long msg_controllen; /* Length of cmsg list */
-unsigned int msg_flags;
+abi_uint msg_flags;
 };
 
 struct target_cmsghdr {
@@ -403,7 +403,7 @@ __target_cmsg_nxthdr(struct target_msghdr *__mhdr,
 
 struct target_mmsghdr {
 struct target_msghdr msg_hdr;  /* Message header */
-unsigned int msg_len;  /* Number of bytes transmitted 
*/
+abi_uint msg_len;  /* Number of bytes transmitted 
*/
 };
 
 struct  target_rusage {
@@ -595,8 +595,8 @@ typedef struct target_siginfo {
 
 /* POSIX.1b timers */
 struct {
-unsigned int _timer1;
-unsigned int _timer2;
+abi_uint _timer1;
+abi_uint _timer2;
 } _timer;
 
 /* POSIX.1b signals */
@@ -857,10 +857,10 @@ struct target_rtc_pll_info {
 #define TARGET_TUNSETOWNERTARGET_IOW('T', 204, int)
 #define TARGET_TUNSETLINK TARGET_IOW('T', 205, int)
 #define TARGET_TUNSETGROUPTARGET_IOW('T', 206, int)
-#define TARGET_TUNGETFEATURES TARGET_IOR('T', 207, unsigned int)
-#define TARGET_TUNSETOFFLOAD  TARGET_IOW('T', 208, unsigned int)
-#define TARGET_TUNSETTXFILTER TARGET_IOW('T', 209, unsigned int)
-#define TARGET_TUNGETIFF  TARGET_IOR('T', 210, unsigned int)
+#define TARGET_TUNGETFEATURES TARGET_IOR('T', 207, abi_uint)
+#define TARGET_TUNSETOFFLOAD  TARGET_IOW('T', 208, abi_uint)
+#define TARGET_TUNSETTXFILTER TARGET_IOW('T', 209, abi_uint)
+#define TARGET_TUNGETIFF  TARGET_IOR('T', 210, abi_uint)
 #define TARGET_TUNGETSNDBUF   TARGET_IOR('T', 211, int)
 #define TARGET_TUNSETSNDBUF   TARGET_IOW('T', 212, int)
 /*
@@ -870,7 +870,7 @@ struct target_rtc_pll_info {
 #define TARGET_TUNGETVNETHDRSZTARGET_IOR('T', 215, int)
 #define TARGET_TUNSETVNETHDRSZTARGET_IOW('T', 216, int)
 #define TARGET_TUNSETQUEUETARGET_IOW('T', 217, int)
-#define TARGET_TUNSETIFINDEX  TARGET_IOW('T', 218, unsigned int)
+#define TARGET_TUNSETIFINDEX  TARGET_IOW('T', 218, abi_uint)
 /* TUNGETFILTER is not supported: see TUNATTACHFILTER. */
 #define TARGET_TUNSETVNETLE   TARGET_IOW('T', 220, int)
 #define TARGET_TUNGETVNETLE   TARGET_IOR('T', 221, int)
@@ -1361,8 +1361,8 @@ struct target_stat64 {
 #define TARGET_STAT64_HAS_BROKEN_ST_INO 1
 abi_ulong   __st_ino;
 
-unsigned intst_mode;
-unsigned intst_nlink;
+abi_uintst_mode;
+abi_uintst_nlink;
 
 abi_ulong   st_uid;
 abi_ulong   st_gid;
@@ -1392,20 +1392,20 @@ struct target_stat64 {
 #define TARGET_HAS_STRUCT_STAT64
 struct target_eabi_stat64 {
 unsigned long long st_dev;
-unsigned int__pad1;
+abi_uint __pad1;
 abi_ulong__st_ino;
-unsigned intst_mode;
-unsigned intst_nlink;
+abi_uint st_mode;
+abi_uint st_nlink;
 
 abi_ulongst_uid;
 abi_ulongst_gid;
 
 unsigned long long st_rdev;
-unsigned int__pad2[2];
+abi_uint __pad2[2];
 
 long long   st_size;
 abi_ulongst_blksize;
-unsigned int__pad3;
+abi_uint __pad3;
 unsigned long long st_blocks;
 
 abi_ulongtarget_st_atime;
@@ -1423,13 +1423,13 @@ struct target_eabi_stat64 {
 
 #elif defined(TARGET_SPARC64) && !defined(TARGET_ABI32)
 struct target_stat {
-unsigned intst_dev;
+abi_uintst_dev;
 abi_ulong   st_ino;
-unsigned intst_mode;
-unsigned intst_nlink;
-unsigned intst_uid;
-unsigned intst_gid;
-unsigned intst_rdev;
+abi_uintst_mode;
+abi_uintst_nlink;
+abi_uintst_uid;
+abi_uintst_gid;
+abi_uintst_rdev;
 abi_longst_size;
 abi_longtarget_st_atime;
 abi_longtarget_st_mtime;
@@ -1447,10 +1447,10 @@ struct target_stat64 {
 abi_ullong  st_ino;
 abi_ullong  st_nlink;
 
-unsigned intst_mode;
+abi_uintst_mode;
 
-unsigned intst_uid;
-unsigned intst_gid;
+abi_uintst_uid;
+abi_uintst_gid;
 
 unsigned char   __pad2[6];
 unsigned short  st_rdev;
@@ -1459,7 +1459,7 @@ struct target_stat64 {
 abi_llong   st_blksize;
 
 unsigned char   __pad4[4];
-unsigned intst_blocks;
+abi_uintst_blocks;
 
 abi_ulong   target_st_atime;
 abi_ulong   target_st_atime

[PULL 36/47] linux-user: Use 'last' instead of 'end' in target_mmap

2023-07-15 Thread Richard Henderson
Complete the transition within the mmap functions to a formulation
that does not overflow at the end of the address space.

Signed-off-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230707204054.8792-20-richard.hender...@linaro.org>
---
 linux-user/mmap.c | 45 +++--
 1 file changed, 23 insertions(+), 22 deletions(-)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 738b9b797d..bb9cbe52cd 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -456,8 +456,8 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, 
abi_ulong align)
 abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot,
  int flags, int fd, off_t offset)
 {
-abi_ulong ret, end, real_start, real_end, retaddr, host_len,
-  passthrough_start = -1, passthrough_end = -1;
+abi_ulong ret, last, real_start, real_last, retaddr, host_len;
+abi_ulong passthrough_start = -1, passthrough_last = 0;
 int page_flags;
 off_t host_offset;
 
@@ -581,29 +581,30 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 host_start += offset - host_offset;
 }
 start = h2g(host_start);
+last = start + len - 1;
 passthrough_start = start;
-passthrough_end = start + len;
+passthrough_last = last;
 } else {
 if (start & ~TARGET_PAGE_MASK) {
 errno = EINVAL;
 goto fail;
 }
-end = start + len;
-real_end = HOST_PAGE_ALIGN(end);
+last = start + len - 1;
+real_last = HOST_PAGE_ALIGN(last) - 1;
 
 /*
  * Test if requested memory area fits target address space
  * It can fail only on 64-bit host with 32-bit target.
  * On any other target/host host mmap() handles this error correctly.
  */
-if (end < start || !guest_range_valid_untagged(start, len)) {
+if (last < start || !guest_range_valid_untagged(start, len)) {
 errno = ENOMEM;
 goto fail;
 }
 
 /* Validate that the chosen range is empty. */
 if ((flags & MAP_FIXED_NOREPLACE)
-&& !page_check_range_empty(start, end - 1)) {
+&& !page_check_range_empty(start, last)) {
 errno = EEXIST;
 goto fail;
 }
@@ -642,9 +643,9 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 
 /* handle the start of the mapping */
 if (start > real_start) {
-if (real_end == real_start + qemu_host_page_size) {
+if (real_last == real_start + qemu_host_page_size - 1) {
 /* one single host page */
-if (!mmap_frag(real_start, start, end - 1,
+if (!mmap_frag(real_start, start, last,
target_prot, flags, fd, offset)) {
 goto fail;
 }
@@ -658,18 +659,18 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 real_start += qemu_host_page_size;
 }
 /* handle the end of the mapping */
-if (end < real_end) {
-if (!mmap_frag(real_end - qemu_host_page_size,
-   real_end - qemu_host_page_size, end - 1,
+if (last < real_last) {
+abi_ulong real_page = real_last - qemu_host_page_size + 1;
+if (!mmap_frag(real_page, real_page, last,
target_prot, flags, fd,
-   offset + real_end - qemu_host_page_size - start)) {
+   offset + real_page - start)) {
 goto fail;
 }
-real_end -= qemu_host_page_size;
+real_last -= qemu_host_page_size;
 }
 
 /* map the middle (easier) */
-if (real_start < real_end) {
+if (real_start < real_last) {
 void *p;
 off_t offset1;
 
@@ -678,13 +679,13 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 } else {
 offset1 = offset + real_start - start;
 }
-p = mmap(g2h_untagged(real_start), real_end - real_start,
+p = mmap(g2h_untagged(real_start), real_last - real_start + 1,
  target_to_host_prot(target_prot), flags, fd, offset1);
 if (p == MAP_FAILED) {
 goto fail;
 }
 passthrough_start = real_start;
-passthrough_end = real_end;
+passthrough_last = real_last;
 }
 }
  the_end1:
@@ -692,16 +693,16 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int 
target_prot,
 page_flags |= PAGE_ANON;
 }
 page_flags |= PAGE_RESET;
-if (passthrough_start == passthrough_end) {
-page_set_flags(start, start + len - 1, page_flags);
+if (passthrough_start > passthrough_last) {
+page_set_flags(start, last, 

[PATCH v5 1/5] i386/tcg: implement x2APIC registers MSR access

2023-07-15 Thread Bui Quang Minh
This commit refactors apic_mem_read/write to support both MMIO access in
xAPIC and MSR access in x2APIC.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/intc/apic.c   | 79 ++--
 hw/intc/trace-events |  4 +-
 include/hw/i386/apic.h   |  3 ++
 target/i386/cpu.h|  3 ++
 target/i386/tcg/sysemu/misc_helper.c | 27 ++
 5 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index ac3d47d231..cb8c20de93 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, 
uint8_t delivery_mode,
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+bool is_x2apic_mode(DeviceState *dev)
+{
+APICCommonState *s = APIC(dev);
+
+return s->apicbase & MSR_IA32_APICBASE_EXTD;
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
 s->apicbase = (val & 0xf000) |
@@ -636,16 +643,11 @@ static void apic_timer(void *opaque)
 apic_timer_update(s, s->next_time);
 }
 
-static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+uint64_t apic_register_read(int index)
 {
 DeviceState *dev;
 APICCommonState *s;
-uint32_t val;
-int index;
-
-if (size < 4) {
-return 0;
-}
+uint64_t val;
 
 dev = cpu_get_current_apic();
 if (!dev) {
@@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, 
unsigned size)
 }
 s = APIC(dev);
 
-index = (addr >> 4) & 0xff;
 switch(index) {
 case 0x02: /* id */
 val = s->id << 24;
@@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, 
unsigned size)
 val = 0;
 break;
 }
-trace_apic_mem_readl(addr, val);
+
+trace_apic_register_read(index, val);
+return val;
+}
+
+static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+uint32_t val;
+int index;
+
+if (size < 4) {
+return 0;
+}
+
+index = (addr >> 4) & 0xff;
+val = (uint32_t)apic_register_read(index);
+
 return val;
 }
 
@@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi)
 apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
-static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
-   unsigned size)
+void apic_register_write(int index, uint64_t val)
 {
 DeviceState *dev;
 APICCommonState *s;
-int index = (addr >> 4) & 0xff;
-
-if (size < 4) {
-return;
-}
-
-if (addr > 0xfff || !index) {
-/* MSI and MMIO APIC are at the same memory location,
- * but actually not on the global bus: MSI is on PCI bus
- * APIC is connected directly to the CPU.
- * Mapping them on the global bus happens to work because
- * MSI registers are reserved in APIC MMIO and vice versa. */
-MSIMessage msi = { .address = addr, .data = val };
-apic_send_msi(&msi);
-return;
-}
 
 dev = cpu_get_current_apic();
 if (!dev) {
@@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, 
uint64_t val,
 }
 s = APIC(dev);
 
-trace_apic_mem_writel(addr, val);
+trace_apic_register_write(index, val);
 
 switch(index) {
 case 0x02:
@@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, 
uint64_t val,
 }
 }
 
+static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
+   unsigned size)
+{
+int index = (addr >> 4) & 0xff;
+
+if (size < 4) {
+return;
+}
+
+if (addr > 0xfff || !index) {
+/* MSI and MMIO APIC are at the same memory location,
+ * but actually not on the global bus: MSI is on PCI bus
+ * APIC is connected directly to the CPU.
+ * Mapping them on the global bus happens to work because
+ * MSI registers are reserved in APIC MMIO and vice versa. */
+MSIMessage msi = { .address = addr, .data = val };
+apic_send_msi(&msi);
+return;
+}
+
+apic_register_write(index, val);
+}
+
 static void apic_pre_save(APICCommonState *s)
 {
 apic_sync_vapic(s, SYNC_FROM_VAPIC);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 36ff71f947..1ef29d0256 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64
 # apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, 
uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode 
%d vector %d trigger_mode %d"
-apic_mem_readl(uint64_t addr, uint32_t val)  "0x%"PRIx64" = 0x%08x"
-apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x"
+apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x

[PATCH v5 3/5] apic, i386/tcg: add x2apic transitions

2023-07-15 Thread Bui Quang Minh
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/intc/apic.c   | 50 
 hw/intc/apic_common.c|  7 ++--
 target/i386/cpu-sysemu.c | 10 ++
 target/i386/cpu.c|  8 ++---
 target/i386/cpu.h|  6 
 target/i386/tcg/sysemu/misc_helper.c |  4 +++
 6 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 9f741794a7..b8f56836a6 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -309,8 +309,41 @@ bool is_x2apic_mode(DeviceState *dev)
 return s->apicbase & MSR_IA32_APICBASE_EXTD;
 }
 
+static void apic_set_base_check(APICCommonState *s, uint64_t val)
+{
+/* Enable x2apic when x2apic is not supported by CPU */
+if (!cpu_has_x2apic_feature(&s->cpu->env) &&
+val & MSR_IA32_APICBASE_EXTD)
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+/*
+ * Transition into invalid state
+ * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) &&
+ * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1
+ */
+if (!(val & MSR_IA32_APICBASE_ENABLE) &&
+(val & MSR_IA32_APICBASE_EXTD))
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+/* Invalid transition from disabled mode to x2APIC */
+if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+(val & MSR_IA32_APICBASE_ENABLE) &&
+(val & MSR_IA32_APICBASE_EXTD))
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+/* Invalid transition from x2APIC to xAPIC */
+if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+(val & MSR_IA32_APICBASE_ENABLE) &&
+!(val & MSR_IA32_APICBASE_EXTD))
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
+apic_set_base_check(s, val);
+
 s->apicbase = (val & 0xf000) |
 (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
 /* if disabled, cannot be enabled again */
@@ -319,6 +352,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
 cpu_clear_apic_feature(&s->cpu->env);
 s->spurious_vec &= ~APIC_SV_ENABLE;
 }
+
+/* Transition from disabled mode to xAPIC */
+if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+(val & MSR_IA32_APICBASE_ENABLE)) {
+s->apicbase |= MSR_IA32_APICBASE_ENABLE;
+cpu_set_apic_feature(&s->cpu->env);
+}
+
+/* Transition from xAPIC to x2APIC */
+if (cpu_has_x2apic_feature(&s->cpu->env) &&
+!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+(val & MSR_IA32_APICBASE_EXTD)) {
+s->apicbase |= MSR_IA32_APICBASE_EXTD;
+
+s->log_dest = ((s->initial_apic_id & 0x0) << 16) |
+  (1 << (s->initial_apic_id & 0xf));
+}
 }
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index d95914066e..396f828be8 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -43,11 +43,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val)
 if (dev) {
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
-/* switching to x2APIC, reset possibly modified xAPIC ID */
-if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
-(val & MSR_IA32_APICBASE_EXTD)) {
-s->id = s->initial_apic_id;
-}
+/* Reset possibly modified xAPIC ID */
+s->id = s->initial_apic_id;
 info->set_base(s, val);
 }
 }
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index a9ff10c517..f6bbe33372 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env)
 env->features[FEAT_1_EDX] &= ~CPUID_APIC;
 }
 
+void cpu_set_apic_feature(CPUX86State *env)
+{
+env->features[FEAT_1_EDX] |= CPUID_APIC;
+}
+
+bool cpu_has_x2apic_feature(CPUX86State *env)
+{
+return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC;
+}
+
 bool cpu_is_bsp(X86CPU *cpu)
 {
 return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 97ad229d8b..240a1f9737 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -630,8 +630,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
  * in CPL=3; remove them if they are ever implemented for system emulation.
  */
 #if defined CONFIG_USER_ONLY
-#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | 
CPUID_EXT_TSC_DEADLINE_TIMER | \
- CPUID_EXT_X2APIC)
+#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | 
CP

[PATCH v5 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC

2023-07-15 Thread Bui Quang Minh
As userspace APIC now supports x2APIC, intel interrupt remapping
hardware can be set to EIM mode when userspace local APIC is used.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/i386/intel_iommu.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index dcc334060c..5e576f6059 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4043,17 +4043,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
   && x86_iommu_ir_supported(x86_iommu) ?
   ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
 }
-if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
-if (!kvm_irqchip_is_split()) {
-error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
-return false;
-}
-if (!kvm_enable_x2apic()) {
-error_setg(errp, "eim=on requires support on the KVM side"
- "(X2APIC_API, first shipped in v4.7)");
-return false;
-}
-}
 
 /* Currently only address widths supported are 39 and 48 bits */
 if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
-- 
2.25.1




[PATCH v5 0/5] Support x2APIC mode with TCG accelerator

2023-07-15 Thread Bui Quang Minh
Hi everyone,

This series implements x2APIC mode in userspace local APIC and the
RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
using either Intel or AMD iommu.

Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot
with enabled x2APIC and can enumerate CPU with APIC ID 257

Using Intel IOMMU

qemu/build/qemu-system-x86_64 \
  -smp 2,maxcpus=260 \
  -cpu qemu64,x2apic=on \
  -machine q35 \
  -device intel-iommu,intremap=on,eim=on \
  -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
  -m 2G \
  -kernel $KERNEL_DIR \
  -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
net.ifnames=0" \
  -drive file=$IMAGE_DIR,format=raw \
  -nographic \
  -s

Using AMD IOMMU

qemu/build/qemu-system-x86_64 \
  -smp 2,maxcpus=260 \
  -cpu qemu64,x2apic=on \
  -machine q35 \
  -device amd-iommu,intremap=on,xtsup=on \
  -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
  -m 2G \
  -kernel $KERNEL_DIR \
  -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
net.ifnames=0" \
  -drive file=$IMAGE_DIR,format=raw \
  -nographic \
  -s

Testing the emulated userspace APIC with kvm-unit-tests, disable test
device with this patch

diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
index 1734afb..f56fe1c 100644
--- a/lib/x86/fwcfg.c
+++ b/lib/x86/fwcfg.c
@@ -27,6 +27,7 @@ static void read_cfg_override(void)
 
if ((str = getenv("TEST_DEVICE")))
no_test_device = !atol(str);
+   no_test_device = true;
 
if ((str = getenv("MEMLIMIT")))
fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;

~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
./run_tests.sh -v -g apic 

TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
-cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
apic-split (54 tests, 8 unexpected failures, 1 skipped)
TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
6 unexpected failures, 2 skipped)

  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0
  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0 
  FAIL: apicbase: relocate apic

These errors are because we don't disable MMIO region when switching to
x2APIC and don't support relocate MMIO region yet. This is a problem
because, MMIO region is the same for all CPUs, in order to support these we
need to figure out how to allocate and manage different MMIO regions for
each CPUs. This can be an improvement in the future.

  FAIL: nmi-after-sti
  FAIL: multiple nmi

These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.

  FAIL: TMCCT should stay at zero

This error is related to APIC timer which should be addressed in separate
patch.

Version 5 changes,
- Patch 3:
  + Rebase to master and fix conflict
- Patch 5:
  + Create a helper function to get amdvi extended feature register instead
  of storing it in AMDVIState

Version 4 changes,
- Patch 5:
  + Instead of replacing IVHD type 0x10 with type 0x11, export both types
  for backward compatibility with old guest operating system
  + Flip the xtsup feature check condition in amdvi_int_remap_ga for
  readability

Version 3 changes,
- Patch 2:
  + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
  + Make physical destination mode IPI which has destination id 0x
  a broadcast to xAPIC CPUs
  + Make cluster address 0xf in cluster model of xAPIC logical destination
  mode a broadcast to all clusters
  + Create new extended_log_dest to store APIC_LDR information in x2APIC
  instead of extending log_dest for backward compatibility in vmstate

Version 2 changes,
- Add support for APIC ID larger than 255
- Adjust AMD iommu for x2APIC suuport
- Reorganize and split patch 1,2 into patch 1,2,3 in version 2

Thanks,
Quang Minh.

Bui Quang Minh (5):
  i386/tcg: implement x2APIC registers MSR access
  apic: add support for x2APIC mode
  apic, i386/tcg: add x2apic transitions
  intel_iommu: allow Extended Interrupt Mode when using userspace APIC
  amd_iommu: report x2APIC support to the operating system

 hw/i386/acpi-build.c | 127 +
 hw/i386/amd_iommu.c  |  30 +-
 hw/i386/amd_iommu.h  |  16 +-
 hw/i386/intel_iommu.c|  11 -
 hw/i386/x86.c|   8 +-
 hw/intc/apic.c   | 395 +++

[PATCH v5 5/5] amd_iommu: report x2APIC support to the operating system

2023-07-15 Thread Bui Quang Minh
This commit adds XTSup configuration to let user choose to whether enable
this feature or not. When XTSup is enabled, additional bytes in IRTE with
enabled guest virtual VAPIC are used to support 32-bit destination id.

Additionally, this commit exports IVHD type 0x11 besides the old IVHD type
0x10 in ACPI table. IVHD type 0x10 does not report full set of IOMMU
features only the legacy ones, so operating system (e.g. Linux) may only
detects x2APIC support if IVHD type 0x11 is available. The IVHD type 0x10
is kept so that old operating system that only parses type 0x10 can detect
the IOMMU device.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/i386/acpi-build.c | 127 ++-
 hw/i386/amd_iommu.c  |  30 +-
 hw/i386/amd_iommu.h  |  16 --
 3 files changed, 117 insertions(+), 56 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9c74fa17ad..aeb41d917f 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2336,30 +2336,23 @@ static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
 const char *oem_table_id)
 {
-int ivhd_table_len = 24;
 AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
 GArray *ivhd_blob = g_array_new(false, true, 1);
 AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
 .oem_table_id = oem_table_id };
+uint64_t feature_report;
 
 acpi_table_begin(&table, table_data);
 /* IVinfo - IO virtualization information common to all
  * IOMMU units in a system
  */
-build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
+build_append_int_noprefix(table_data,
+ (1UL << 0) | /* EFRSup */
+ (40UL << 8), /* PASize */
+ 4);
 /* reserved */
 build_append_int_noprefix(table_data, 0, 8);
 
-/* IVHD definition - type 10h */
-build_append_int_noprefix(table_data, 0x10, 1);
-/* virtualization flags */
-build_append_int_noprefix(table_data,
- (1UL << 0) | /* HtTunEn  */
- (1UL << 4) | /* iotblSup */
- (1UL << 6) | /* PrefSup  */
- (1UL << 7),  /* PPRSup   */
- 1);
-
 /*
  * A PCI bus walk, for each PCI host bridge, is necessary to create a
  * complete set of IVHD entries.  Do this into a separate blob so that we
@@ -2379,56 +2372,92 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, 
const char *oem_id,
 build_append_int_noprefix(ivhd_blob, 0x001, 4);
 }
 
-ivhd_table_len += ivhd_blob->len;
-
 /*
  * When interrupt remapping is supported, we add a special IVHD device
- * for type IO-APIC.
- */
-if (x86_iommu_ir_supported(x86_iommu_get_default())) {
-ivhd_table_len += 8;
-}
-
-/* IVHD length */
-build_append_int_noprefix(table_data, ivhd_table_len, 2);
-/* DeviceID */
-build_append_int_noprefix(table_data,
-  object_property_get_int(OBJECT(&s->pci), "addr",
-  &error_abort), 2);
-/* Capability offset */
-build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
-/* IOMMU base address */
-build_append_int_noprefix(table_data, s->mmio.addr, 8);
-/* PCI Segment Group */
-build_append_int_noprefix(table_data, 0, 2);
-/* IOMMU info */
-build_append_int_noprefix(table_data, 0, 2);
-/* IOMMU Feature Reporting */
-build_append_int_noprefix(table_data,
- (48UL << 30) | /* HATS   */
- (48UL << 28) | /* GATS   */
- (1UL << 2)   | /* GTSup  */
- (1UL << 6),/* GASup  */
- 4);
-
-/* IVHD entries as found above */
-g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
-g_array_free(ivhd_blob, TRUE);
-
-/*
- * Add a special IVHD device type.
+ * for type IO-APIC
  * Refer to spec - Table 95: IVHD device entry type codes
  *
  * Linux IOMMU driver checks for the special IVHD device (type IO-APIC).
  * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
  */
 if (x86_iommu_ir_supported(x86_iommu_get_default())) {
-build_append_int_noprefix(table_data,
+build_append_int_noprefix(ivhd_blob,
  (0x1ull << 56) |   /* type IOAPIC */
  (IOAPIC_SB_DEVID << 40) |  /* IOAPIC devid */
  0x48,  /* special device 
*/
  8);
 }
+
+/* IVHD definition - type 10h */
+build_append_int_noprefix(table_data, 0x10, 1);
+/* virtualization

[PATCH v5 2/5] apic: add support for x2APIC mode

2023-07-15 Thread Bui Quang Minh
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
ID limit in userspace APIC. The array that manages local APICs is now
dynamically allocated based on the max APIC ID of created x86 machine.
Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
mode register access are supported.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/i386/x86.c   |   8 +-
 hw/intc/apic.c  | 266 
 hw/intc/apic_common.c   |   9 ++
 include/hw/i386/apic.h  |   3 +-
 include/hw/i386/apic_internal.h |   7 +-
 target/i386/cpu-sysemu.c|   8 +-
 6 files changed, 231 insertions(+), 70 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123..8b70f0a6ea 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
  * Can we support APIC ID 255 or higher?
  *
  * Under Xen: yes.
- * With userspace emulated lapic: no
+ * With userspace emulated lapic: checked later in apic_common_set_id.
  * With KVM's in-kernel lapic: only if X2APIC API is enabled.
  */
 if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
-(!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
+kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
 error_report("current -smp configuration requires kernel "
  "irqchip and X2APIC API support.");
 exit(EXIT_FAILURE);
@@ -146,6 +146,10 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 kvm_set_max_apic_id(x86ms->apic_id_limit);
 }
 
+if (!kvm_irqchip_in_kernel()) {
+apic_set_max_apic_id(x86ms->apic_id_limit);
+}
+
 possible_cpus = mc->possible_cpu_arch_ids(ms);
 for (i = 0; i < ms->smp.cpus; i++) {
 x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index cb8c20de93..9f741794a7 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -31,15 +31,15 @@
 #include "hw/i386/apic-msidef.h"
 #include "qapi/error.h"
 #include "qom/object.h"
-
-#define MAX_APICS 255
-#define MAX_APIC_WORDS 8
+#include "tcg/helper-tcg.h"
 
 #define SYNC_FROM_VAPIC 0x1
 #define SYNC_TO_VAPIC   0x2
 #define SYNC_ISR_IRR_TO_VAPIC   0x4
 
-static APICCommonState *local_apics[MAX_APICS + 1];
+static APICCommonState **local_apics;
+static uint32_t max_apics;
+static uint32_t max_apic_words;
 
 #define TYPE_APIC "apic"
 /*This is reusing the APICCommonState typedef from APIC_COMMON */
@@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC,
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICCommonState *s);
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-  uint8_t dest, uint8_t dest_mode);
+  uint32_t dest, uint8_t dest_mode);
+
+void apic_set_max_apic_id(uint32_t max_apic_id)
+{
+int word_size = 32;
+
+/* round up the max apic id to next multiple of words */
+max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1);
+
+local_apics = g_malloc0(sizeof(*local_apics) * max_apics);
+max_apic_words = max_apics >> 5;
+}
+
 
 /* Find first bit starting from msb */
 static int apic_fls_bit(uint32_t value)
@@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s)
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j;\
-for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\
+for(__i = 0; __i < max_apic_words; __i++) {\
 uint32_t __mask = deliver_bitmask[__i];\
 if (__mask) {\
 for(__j = 0; __j < 32; __j++) {\
@@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t 
*deliver_bitmask,
 {
 int i, d;
 d = -1;
-for(i = 0; i < MAX_APIC_WORDS; i++) {
+for(i = 0; i < max_apic_words; i++) {
 if (deliver_bitmask[i]) {
 d = i * 32 + apic_ffs_bit(deliver_bitmask[i]);
 break;
@@ -276,16 +288,18 @@ static void apic_bus_deliver(const uint32_t 
*deliver_bitmask,
  apic_set_irq(apic_iter, vector_num, trigger_mode) );
 }
 
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
-  uint8_t vector_num, uint8_t trigger_mode)
+static void apic_deliver_irq(uint32_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode, uint8_t vector_num,
+ uint8_t trigger_mode)
 {
-uint32_t deliver_bitmask[MAX_APIC_WORDS];
+uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
 
 trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
trigger_mode);
 
 

Re: [PATCH v5 0/5] Support x2APIC mode with TCG accelerator

2023-07-15 Thread Bui Quang Minh

On 7/15/23 21:28, Bui Quang Minh wrote:

Hi everyone,

This series implements x2APIC mode in userspace local APIC and the
RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
using either Intel or AMD iommu.

Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot
with enabled x2APIC and can enumerate CPU with APIC ID 257

Using Intel IOMMU

qemu/build/qemu-system-x86_64 \
   -smp 2,maxcpus=260 \
   -cpu qemu64,x2apic=on \
   -machine q35 \
   -device intel-iommu,intremap=on,eim=on \
   -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
   -m 2G \
   -kernel $KERNEL_DIR \
   -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
net.ifnames=0" \
   -drive file=$IMAGE_DIR,format=raw \
   -nographic \
   -s

Using AMD IOMMU

qemu/build/qemu-system-x86_64 \
   -smp 2,maxcpus=260 \
   -cpu qemu64,x2apic=on \
   -machine q35 \
   -device amd-iommu,intremap=on,xtsup=on \
   -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
   -m 2G \
   -kernel $KERNEL_DIR \
   -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
net.ifnames=0" \
   -drive file=$IMAGE_DIR,format=raw \
   -nographic \
   -s

Testing the emulated userspace APIC with kvm-unit-tests, disable test
device with this patch

diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
index 1734afb..f56fe1c 100644
--- a/lib/x86/fwcfg.c
+++ b/lib/x86/fwcfg.c
@@ -27,6 +27,7 @@ static void read_cfg_override(void)
  
 if ((str = getenv("TEST_DEVICE")))

 no_test_device = !atol(str);
+   no_test_device = true;
  
 if ((str = getenv("MEMLIMIT")))

 fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;

~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
./run_tests.sh -v -g apic

TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
-cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
apic-split (54 tests, 8 unexpected failures, 1 skipped)
TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
6 unexpected failures, 2 skipped)

   FAIL: apic_disable: *0xfee00030: 50014
   FAIL: apic_disable: *0xfee00080: f0
   FAIL: apic_disable: *0xfee00030: 50014
   FAIL: apic_disable: *0xfee00080: f0
   FAIL: apicbase: relocate apic

These errors are because we don't disable MMIO region when switching to
x2APIC and don't support relocate MMIO region yet. This is a problem
because, MMIO region is the same for all CPUs, in order to support these we
need to figure out how to allocate and manage different MMIO regions for
each CPUs. This can be an improvement in the future.

   FAIL: nmi-after-sti
   FAIL: multiple nmi

These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.

   FAIL: TMCCT should stay at zero

This error is related to APIC timer which should be addressed in separate
patch.

Version 5 changes,
- Patch 3:
   + Rebase to master and fix conflict
- Patch 5:
   + Create a helper function to get amdvi extended feature register instead
   of storing it in AMDVIState

Version 4 changes,
- Patch 5:
   + Instead of replacing IVHD type 0x10 with type 0x11, export both types
   for backward compatibility with old guest operating system
   + Flip the xtsup feature check condition in amdvi_int_remap_ga for
   readability

Version 3 changes,
- Patch 2:
   + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
   + Make physical destination mode IPI which has destination id 0x
   a broadcast to xAPIC CPUs
   + Make cluster address 0xf in cluster model of xAPIC logical destination
   mode a broadcast to all clusters
   + Create new extended_log_dest to store APIC_LDR information in x2APIC
   instead of extending log_dest for backward compatibility in vmstate

Version 2 changes,
- Add support for APIC ID larger than 255
- Adjust AMD iommu for x2APIC suuport
- Reorganize and split patch 1,2 into patch 1,2,3 in version 2

Thanks,
Quang Minh.

Bui Quang Minh (5):
   i386/tcg: implement x2APIC registers MSR access
   apic: add support for x2APIC mode
   apic, i386/tcg: add x2apic transitions
   intel_iommu: allow Extended Interrupt Mode when using userspace APIC
   amd_iommu: report x2APIC support to the operating system

  hw/i386/acpi-build.c | 127 +
  hw/i386/amd_iommu.c  |  30 +-
  hw/i386/amd_iommu.h  |  16 +-
  hw/i386/intel_iommu.c|  11

[PATCH v6 0/5] Support x2APIC mode with TCG accelerator

2023-07-15 Thread Bui Quang Minh
Hi everyone,

This series implements x2APIC mode in userspace local APIC and the
RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu
and AMD iommu are adjusted to support x2APIC interrupt remapping. With this
series, we can now boot Linux kernel into x2APIC mode with TCG accelerator
using either Intel or AMD iommu.

Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot
with enabled x2APIC and can enumerate CPU with APIC ID 257

Using Intel IOMMU

qemu/build/qemu-system-x86_64 \
  -smp 2,maxcpus=260 \
  -cpu qemu64,x2apic=on \
  -machine q35 \
  -device intel-iommu,intremap=on,eim=on \
  -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
  -m 2G \
  -kernel $KERNEL_DIR \
  -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
net.ifnames=0" \
  -drive file=$IMAGE_DIR,format=raw \
  -nographic \
  -s

Using AMD IOMMU

qemu/build/qemu-system-x86_64 \
  -smp 2,maxcpus=260 \
  -cpu qemu64,x2apic=on \
  -machine q35 \
  -device amd-iommu,intremap=on,xtsup=on \
  -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \
  -m 2G \
  -kernel $KERNEL_DIR \
  -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial 
net.ifnames=0" \
  -drive file=$IMAGE_DIR,format=raw \
  -nographic \
  -s

Testing the emulated userspace APIC with kvm-unit-tests, disable test
device with this patch

diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c
index 1734afb..f56fe1c 100644
--- a/lib/x86/fwcfg.c
+++ b/lib/x86/fwcfg.c
@@ -27,6 +27,7 @@ static void read_cfg_override(void)

if ((str = getenv("TEST_DEVICE")))
no_test_device = !atol(str);
+   no_test_device = true;

if ((str = getenv("MEMLIMIT")))
fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024;

~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \
./run_tests.sh -v -g apic

TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2
-cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL
apic-split (54 tests, 8 unexpected failures, 1 skipped)
TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp
1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests)
TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu
qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures,
1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp
2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests,
6 unexpected failures, 2 skipped)

  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0
  FAIL: apic_disable: *0xfee00030: 50014
  FAIL: apic_disable: *0xfee00080: f0
  FAIL: apicbase: relocate apic

These errors are because we don't disable MMIO region when switching to
x2APIC and don't support relocate MMIO region yet. This is a problem
because, MMIO region is the same for all CPUs, in order to support these we
need to figure out how to allocate and manage different MMIO regions for
each CPUs. This can be an improvement in the future.

  FAIL: nmi-after-sti
  FAIL: multiple nmi

These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG.

  FAIL: TMCCT should stay at zero

This error is related to APIC timer which should be addressed in separate
patch.

Version 6 changes,
- Patch 5:
  + Make all places use the amdvi_extended_feature_register to get extended
  feature register

Version 5 changes,
- Patch 3:
  + Rebase to master and fix conflict
- Patch 5:
  + Create a helper function to get amdvi extended feature register instead
  of storing it in AMDVIState

Version 4 changes,
- Patch 5:
  + Instead of replacing IVHD type 0x10 with type 0x11, export both types
  for backward compatibility with old guest operating system
  + Flip the xtsup feature check condition in amdvi_int_remap_ga for
  readability

Version 3 changes,
- Patch 2:
  + Allow APIC ID > 255 only when x2APIC feature is supported on CPU
  + Make physical destination mode IPI which has destination id 0x
  a broadcast to xAPIC CPUs
  + Make cluster address 0xf in cluster model of xAPIC logical destination
  mode a broadcast to all clusters
  + Create new extended_log_dest to store APIC_LDR information in x2APIC
  instead of extending log_dest for backward compatibility in vmstate

Version 2 changes,
- Add support for APIC ID larger than 255
- Adjust AMD iommu for x2APIC suuport
- Reorganize and split patch 1,2 into patch 1,2,3 in version 2

Thanks,
Quang Minh.

Bui Quang Minh (5):
  i386/tcg: implement x2APIC registers MSR access
  apic: add support for x2APIC mode
  apic, i386/tcg: add x2apic transitions
  intel_iommu: allow Extended Interrupt Mode when using userspace APIC
  amd_iommu: report x2APIC support to the operating system

 hw/i386/acpi-build.c | 129 +
 hw/i386/amd_iommu.c  |  29 +-
 hw/i386/amd_iommu.h  |  16 +-
 hw/i386/intel_iommu.c

[PATCH v6 3/5] apic, i386/tcg: add x2apic transitions

2023-07-15 Thread Bui Quang Minh
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/intc/apic.c   | 50 
 hw/intc/apic_common.c|  7 ++--
 target/i386/cpu-sysemu.c | 10 ++
 target/i386/cpu.c|  8 ++---
 target/i386/cpu.h|  6 
 target/i386/tcg/sysemu/misc_helper.c |  4 +++
 6 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 9f741794a7..b8f56836a6 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -309,8 +309,41 @@ bool is_x2apic_mode(DeviceState *dev)
 return s->apicbase & MSR_IA32_APICBASE_EXTD;
 }
 
+static void apic_set_base_check(APICCommonState *s, uint64_t val)
+{
+/* Enable x2apic when x2apic is not supported by CPU */
+if (!cpu_has_x2apic_feature(&s->cpu->env) &&
+val & MSR_IA32_APICBASE_EXTD)
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+/*
+ * Transition into invalid state
+ * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) &&
+ * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1
+ */
+if (!(val & MSR_IA32_APICBASE_ENABLE) &&
+(val & MSR_IA32_APICBASE_EXTD))
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+/* Invalid transition from disabled mode to x2APIC */
+if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+(val & MSR_IA32_APICBASE_ENABLE) &&
+(val & MSR_IA32_APICBASE_EXTD))
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+
+/* Invalid transition from x2APIC to xAPIC */
+if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+(val & MSR_IA32_APICBASE_ENABLE) &&
+!(val & MSR_IA32_APICBASE_EXTD))
+raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC());
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
+apic_set_base_check(s, val);
+
 s->apicbase = (val & 0xf000) |
 (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE));
 /* if disabled, cannot be enabled again */
@@ -319,6 +352,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val)
 cpu_clear_apic_feature(&s->cpu->env);
 s->spurious_vec &= ~APIC_SV_ENABLE;
 }
+
+/* Transition from disabled mode to xAPIC */
+if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) &&
+(val & MSR_IA32_APICBASE_ENABLE)) {
+s->apicbase |= MSR_IA32_APICBASE_ENABLE;
+cpu_set_apic_feature(&s->cpu->env);
+}
+
+/* Transition from xAPIC to x2APIC */
+if (cpu_has_x2apic_feature(&s->cpu->env) &&
+!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
+(val & MSR_IA32_APICBASE_EXTD)) {
+s->apicbase |= MSR_IA32_APICBASE_EXTD;
+
+s->log_dest = ((s->initial_apic_id & 0x0) << 16) |
+  (1 << (s->initial_apic_id & 0xf));
+}
 }
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index d95914066e..396f828be8 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -43,11 +43,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val)
 if (dev) {
 APICCommonState *s = APIC_COMMON(dev);
 APICCommonClass *info = APIC_COMMON_GET_CLASS(s);
-/* switching to x2APIC, reset possibly modified xAPIC ID */
-if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) &&
-(val & MSR_IA32_APICBASE_EXTD)) {
-s->id = s->initial_apic_id;
-}
+/* Reset possibly modified xAPIC ID */
+s->id = s->initial_apic_id;
 info->set_base(s, val);
 }
 }
diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c
index a9ff10c517..f6bbe33372 100644
--- a/target/i386/cpu-sysemu.c
+++ b/target/i386/cpu-sysemu.c
@@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env)
 env->features[FEAT_1_EDX] &= ~CPUID_APIC;
 }
 
+void cpu_set_apic_feature(CPUX86State *env)
+{
+env->features[FEAT_1_EDX] |= CPUID_APIC;
+}
+
+bool cpu_has_x2apic_feature(CPUX86State *env)
+{
+return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC;
+}
+
 bool cpu_is_bsp(X86CPU *cpu)
 {
 return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 97ad229d8b..240a1f9737 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -630,8 +630,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
  * in CPL=3; remove them if they are ever implemented for system emulation.
  */
 #if defined CONFIG_USER_ONLY
-#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | 
CPUID_EXT_TSC_DEADLINE_TIMER | \
- CPUID_EXT_X2APIC)
+#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | 
CP

[PATCH v6 5/5] amd_iommu: report x2APIC support to the operating system

2023-07-15 Thread Bui Quang Minh
This commit adds XTSup configuration to let user choose to whether enable
this feature or not. When XTSup is enabled, additional bytes in IRTE with
enabled guest virtual VAPIC are used to support 32-bit destination id.

Additionally, this commit exports IVHD type 0x11 besides the old IVHD type
0x10 in ACPI table. IVHD type 0x10 does not report full set of IOMMU
features only the legacy ones, so operating system (e.g. Linux) may only
detects x2APIC support if IVHD type 0x11 is available. The IVHD type 0x10
is kept so that old operating system that only parses type 0x10 can detect
the IOMMU device.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/i386/acpi-build.c | 129 +++
 hw/i386/amd_iommu.c  |  29 +-
 hw/i386/amd_iommu.h  |  16 --
 3 files changed, 117 insertions(+), 57 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9c74fa17ad..4231b80f25 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2336,30 +2336,23 @@ static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id,
 const char *oem_table_id)
 {
-int ivhd_table_len = 24;
 AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
 GArray *ivhd_blob = g_array_new(false, true, 1);
 AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id,
 .oem_table_id = oem_table_id };
+uint64_t feature_report;
 
 acpi_table_begin(&table, table_data);
 /* IVinfo - IO virtualization information common to all
  * IOMMU units in a system
  */
-build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4);
+build_append_int_noprefix(table_data,
+ (1UL << 0) | /* EFRSup */
+ (40UL << 8), /* PASize */
+ 4);
 /* reserved */
 build_append_int_noprefix(table_data, 0, 8);
 
-/* IVHD definition - type 10h */
-build_append_int_noprefix(table_data, 0x10, 1);
-/* virtualization flags */
-build_append_int_noprefix(table_data,
- (1UL << 0) | /* HtTunEn  */
- (1UL << 4) | /* iotblSup */
- (1UL << 6) | /* PrefSup  */
- (1UL << 7),  /* PPRSup   */
- 1);
-
 /*
  * A PCI bus walk, for each PCI host bridge, is necessary to create a
  * complete set of IVHD entries.  Do this into a separate blob so that we
@@ -2379,56 +2372,94 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, 
const char *oem_id,
 build_append_int_noprefix(ivhd_blob, 0x001, 4);
 }
 
-ivhd_table_len += ivhd_blob->len;
-
 /*
  * When interrupt remapping is supported, we add a special IVHD device
- * for type IO-APIC.
- */
-if (x86_iommu_ir_supported(x86_iommu_get_default())) {
-ivhd_table_len += 8;
-}
-
-/* IVHD length */
-build_append_int_noprefix(table_data, ivhd_table_len, 2);
-/* DeviceID */
-build_append_int_noprefix(table_data,
-  object_property_get_int(OBJECT(&s->pci), "addr",
-  &error_abort), 2);
-/* Capability offset */
-build_append_int_noprefix(table_data, s->pci.capab_offset, 2);
-/* IOMMU base address */
-build_append_int_noprefix(table_data, s->mmio.addr, 8);
-/* PCI Segment Group */
-build_append_int_noprefix(table_data, 0, 2);
-/* IOMMU info */
-build_append_int_noprefix(table_data, 0, 2);
-/* IOMMU Feature Reporting */
-build_append_int_noprefix(table_data,
- (48UL << 30) | /* HATS   */
- (48UL << 28) | /* GATS   */
- (1UL << 2)   | /* GTSup  */
- (1UL << 6),/* GASup  */
- 4);
-
-/* IVHD entries as found above */
-g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len);
-g_array_free(ivhd_blob, TRUE);
-
-/*
- * Add a special IVHD device type.
+ * for type IO-APIC
  * Refer to spec - Table 95: IVHD device entry type codes
  *
  * Linux IOMMU driver checks for the special IVHD device (type IO-APIC).
  * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
  */
 if (x86_iommu_ir_supported(x86_iommu_get_default())) {
-build_append_int_noprefix(table_data,
+build_append_int_noprefix(ivhd_blob,
  (0x1ull << 56) |   /* type IOAPIC */
  (IOAPIC_SB_DEVID << 40) |  /* IOAPIC devid */
  0x48,  /* special device 
*/
  8);
 }
+
+/* IVHD definition - type 10h */
+build_append_int_noprefix(table_data, 0x10, 1);
+/* virtualization

[PATCH v6 2/5] apic: add support for x2APIC mode

2023-07-15 Thread Bui Quang Minh
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC
ID limit in userspace APIC. The array that manages local APICs is now
dynamically allocated based on the max APIC ID of created x86 machine.
Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC
mode register access are supported.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/i386/x86.c   |   8 +-
 hw/intc/apic.c  | 266 
 hw/intc/apic_common.c   |   9 ++
 include/hw/i386/apic.h  |   3 +-
 include/hw/i386/apic_internal.h |   7 +-
 target/i386/cpu-sysemu.c|   8 +-
 6 files changed, 231 insertions(+), 70 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index a88a126123..8b70f0a6ea 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
  * Can we support APIC ID 255 or higher?
  *
  * Under Xen: yes.
- * With userspace emulated lapic: no
+ * With userspace emulated lapic: checked later in apic_common_set_id.
  * With KVM's in-kernel lapic: only if X2APIC API is enabled.
  */
 if (x86ms->apic_id_limit > 255 && !xen_enabled() &&
-(!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) {
+kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) {
 error_report("current -smp configuration requires kernel "
  "irqchip and X2APIC API support.");
 exit(EXIT_FAILURE);
@@ -146,6 +146,10 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 kvm_set_max_apic_id(x86ms->apic_id_limit);
 }
 
+if (!kvm_irqchip_in_kernel()) {
+apic_set_max_apic_id(x86ms->apic_id_limit);
+}
+
 possible_cpus = mc->possible_cpu_arch_ids(ms);
 for (i = 0; i < ms->smp.cpus; i++) {
 x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index cb8c20de93..9f741794a7 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -31,15 +31,15 @@
 #include "hw/i386/apic-msidef.h"
 #include "qapi/error.h"
 #include "qom/object.h"
-
-#define MAX_APICS 255
-#define MAX_APIC_WORDS 8
+#include "tcg/helper-tcg.h"
 
 #define SYNC_FROM_VAPIC 0x1
 #define SYNC_TO_VAPIC   0x2
 #define SYNC_ISR_IRR_TO_VAPIC   0x4
 
-static APICCommonState *local_apics[MAX_APICS + 1];
+static APICCommonState **local_apics;
+static uint32_t max_apics;
+static uint32_t max_apic_words;
 
 #define TYPE_APIC "apic"
 /*This is reusing the APICCommonState typedef from APIC_COMMON */
@@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC,
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
 static void apic_update_irq(APICCommonState *s);
 static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask,
-  uint8_t dest, uint8_t dest_mode);
+  uint32_t dest, uint8_t dest_mode);
+
+void apic_set_max_apic_id(uint32_t max_apic_id)
+{
+int word_size = 32;
+
+/* round up the max apic id to next multiple of words */
+max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1);
+
+local_apics = g_malloc0(sizeof(*local_apics) * max_apics);
+max_apic_words = max_apics >> 5;
+}
+
 
 /* Find first bit starting from msb */
 static int apic_fls_bit(uint32_t value)
@@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s)
 #define foreach_apic(apic, deliver_bitmask, code) \
 {\
 int __i, __j;\
-for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\
+for(__i = 0; __i < max_apic_words; __i++) {\
 uint32_t __mask = deliver_bitmask[__i];\
 if (__mask) {\
 for(__j = 0; __j < 32; __j++) {\
@@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t 
*deliver_bitmask,
 {
 int i, d;
 d = -1;
-for(i = 0; i < MAX_APIC_WORDS; i++) {
+for(i = 0; i < max_apic_words; i++) {
 if (deliver_bitmask[i]) {
 d = i * 32 + apic_ffs_bit(deliver_bitmask[i]);
 break;
@@ -276,16 +288,18 @@ static void apic_bus_deliver(const uint32_t 
*deliver_bitmask,
  apic_set_irq(apic_iter, vector_num, trigger_mode) );
 }
 
-void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode,
-  uint8_t vector_num, uint8_t trigger_mode)
+static void apic_deliver_irq(uint32_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode, uint8_t vector_num,
+ uint8_t trigger_mode)
 {
-uint32_t deliver_bitmask[MAX_APIC_WORDS];
+uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t));
 
 trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num,
trigger_mode);
 
 

[PATCH v6 1/5] i386/tcg: implement x2APIC registers MSR access

2023-07-15 Thread Bui Quang Minh
This commit refactors apic_mem_read/write to support both MMIO access in
xAPIC and MSR access in x2APIC.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/intc/apic.c   | 79 ++--
 hw/intc/trace-events |  4 +-
 include/hw/i386/apic.h   |  3 ++
 target/i386/cpu.h|  3 ++
 target/i386/tcg/sysemu/misc_helper.c | 27 ++
 5 files changed, 86 insertions(+), 30 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index ac3d47d231..cb8c20de93 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, 
uint8_t delivery_mode,
 apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode);
 }
 
+bool is_x2apic_mode(DeviceState *dev)
+{
+APICCommonState *s = APIC(dev);
+
+return s->apicbase & MSR_IA32_APICBASE_EXTD;
+}
+
 static void apic_set_base(APICCommonState *s, uint64_t val)
 {
 s->apicbase = (val & 0xf000) |
@@ -636,16 +643,11 @@ static void apic_timer(void *opaque)
 apic_timer_update(s, s->next_time);
 }
 
-static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+uint64_t apic_register_read(int index)
 {
 DeviceState *dev;
 APICCommonState *s;
-uint32_t val;
-int index;
-
-if (size < 4) {
-return 0;
-}
+uint64_t val;
 
 dev = cpu_get_current_apic();
 if (!dev) {
@@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, 
unsigned size)
 }
 s = APIC(dev);
 
-index = (addr >> 4) & 0xff;
 switch(index) {
 case 0x02: /* id */
 val = s->id << 24;
@@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, 
unsigned size)
 val = 0;
 break;
 }
-trace_apic_mem_readl(addr, val);
+
+trace_apic_register_read(index, val);
+return val;
+}
+
+static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size)
+{
+uint32_t val;
+int index;
+
+if (size < 4) {
+return 0;
+}
+
+index = (addr >> 4) & 0xff;
+val = (uint32_t)apic_register_read(index);
+
 return val;
 }
 
@@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi)
 apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode);
 }
 
-static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
-   unsigned size)
+void apic_register_write(int index, uint64_t val)
 {
 DeviceState *dev;
 APICCommonState *s;
-int index = (addr >> 4) & 0xff;
-
-if (size < 4) {
-return;
-}
-
-if (addr > 0xfff || !index) {
-/* MSI and MMIO APIC are at the same memory location,
- * but actually not on the global bus: MSI is on PCI bus
- * APIC is connected directly to the CPU.
- * Mapping them on the global bus happens to work because
- * MSI registers are reserved in APIC MMIO and vice versa. */
-MSIMessage msi = { .address = addr, .data = val };
-apic_send_msi(&msi);
-return;
-}
 
 dev = cpu_get_current_apic();
 if (!dev) {
@@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, 
uint64_t val,
 }
 s = APIC(dev);
 
-trace_apic_mem_writel(addr, val);
+trace_apic_register_write(index, val);
 
 switch(index) {
 case 0x02:
@@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, 
uint64_t val,
 }
 }
 
+static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val,
+   unsigned size)
+{
+int index = (addr >> 4) & 0xff;
+
+if (size < 4) {
+return;
+}
+
+if (addr > 0xfff || !index) {
+/* MSI and MMIO APIC are at the same memory location,
+ * but actually not on the global bus: MSI is on PCI bus
+ * APIC is connected directly to the CPU.
+ * Mapping them on the global bus happens to work because
+ * MSI registers are reserved in APIC MMIO and vice versa. */
+MSIMessage msi = { .address = addr, .data = val };
+apic_send_msi(&msi);
+return;
+}
+
+apic_register_write(index, val);
+}
+
 static void apic_pre_save(APICCommonState *s)
 {
 apic_sync_vapic(s, SYNC_FROM_VAPIC);
diff --git a/hw/intc/trace-events b/hw/intc/trace-events
index 36ff71f947..1ef29d0256 100644
--- a/hw/intc/trace-events
+++ b/hw/intc/trace-events
@@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64
 # apic.c
 apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d"
 apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, 
uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode 
%d vector %d trigger_mode %d"
-apic_mem_readl(uint64_t addr, uint32_t val)  "0x%"PRIx64" = 0x%08x"
-apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x"
+apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x

[PATCH v6 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC

2023-07-15 Thread Bui Quang Minh
As userspace APIC now supports x2APIC, intel interrupt remapping
hardware can be set to EIM mode when userspace local APIC is used.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Bui Quang Minh 
---
 hw/i386/intel_iommu.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index dcc334060c..5e576f6059 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4043,17 +4043,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
   && x86_iommu_ir_supported(x86_iommu) ?
   ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF;
 }
-if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
-if (!kvm_irqchip_is_split()) {
-error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split");
-return false;
-}
-if (!kvm_enable_x2apic()) {
-error_setg(errp, "eim=on requires support on the KVM side"
- "(X2APIC_API, first shipped in v4.7)");
-return false;
-}
-}
 
 /* Currently only address widths supported are 39 and 48 bits */
 if ((s->aw_bits != VTD_HOST_AW_39BIT) &&
-- 
2.25.1




Re: [PATCH v1 6/9] gfxstream + rutabaga: add initial support for gfxstream

2023-07-15 Thread Bernhard Beschow



Am 11. Juli 2023 02:56:46 UTC schrieb Gurchetan Singh 
:
>This adds initial support for gfxstream and cross-domain.  Both
>features rely on virtio-gpu blob resources and context types, which
>are also implemented in this patch.
>
>gfxstream has a long and illustrious history in Android graphics
>paravirtualization.  It has been powering graphics in the Android
>Studio Emulator for more than a decade, which is the main developer
>platform.
>
>Originally conceived by Jesse Hall, it was first known as "EmuGL" [a].
>The key design characteristic was a 1:1 threading model and
>auto-generation, which fit nicely with the OpenGLES spec.  It also
>allowed easy layering with ANGLE on the host, which provides the GLES
>implementations on Windows or MacOS enviroments.
>
>gfxstream has traditionally been maintained by a single engineer, and
>between 2015 to 2021, the goldfish throne passed to Frank Yang.
>Historians often remark this glorious reign ("pax gfxstreama" is the
>academic term) was comparable to that of Augustus and the both Queen
>Elizabeths.  Just to name a few accomplishments in a resplendent
>panoply: higher versions of GLES, address space graphics, snapshot
>support and CTS compliant Vulkan [b].
>
>One major drawback was the use of out-of-tree goldfish drivers.
>Android engineers didn't know much about DRM/KMS and especially TTM so
>a simple guest to host pipe was conceived.
>
>Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of
>the Mesa/virglrenderer communities.  In 2018, the initial virtio-gpu
>port of gfxstream was done by Cuttlefish enthusiast Alistair Delva.
>It was a symbol compatible replacement of virglrenderer [c] and named
>"AVDVirglrenderer".  This implementation forms the basis of the
>current gfxstream host implementation still in use today.
>
>cross-domain support follows a similar arc.  Originally conceived by
>Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in
>2018, it initially relied on the downstream "virtio-wl" device.
>
>In 2020 and 2021, virtio-gpu was extended to include blob resources
>and multiple timelines by yours truly, features gfxstream/cross-domain
>both require to function correctly.
>
>Right now, we stand at the precipice of a truly fantastic possibility:
>the Android Emulator powered by upstream QEMU and upstream Linux
>kernel.  gfxstream will then be packaged properfully, and app
>developers can even fix gfxstream bugs on their own if they encounter
>them.
>
>It's been quite the ride, my friends.  Where will gfxstream head next,
>nobody really knows.  I wouldn't be surprised if it's around for
>another decade, maintained by a new generation of Android graphics
>enthusiasts.

AFAIU gfxstream is a substitute for virglrenderer and relies on an 
auto-generated interface based on OpenGL/Vulkan between host and guest. I would 
like to use it in QEMU (Windows host, Linux guest).

So I tried to test your series under Linux (for now). For now, I couldn't get 
past the point of aborts with generic error messages or no error messages with 
blank screens. Though my Linux host might not provide a recent enough 
environment.

Read on for some technical reviews below.

>
>Technical details:
>  - Very simple initial display integration: just used Pixman
>  - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function
>calls
>
>[a] https://android-review.googlesource.com/c/platform/development/+/34470
>[b] 
>https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22
>[c] 
>https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927
>
>Signed-off-by: Gurchetan Singh 
>---
>v2: Incorported various suggestions by Akihiko Odaki and Bernard Berschow
>- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros
>- Used error_report(..)
>- Used g_autofree to fix leaks on error paths
>- Removed unnecessary casts
>- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files
>
> hw/display/virtio-gpu-pci-rutabaga.c |   48 ++
> hw/display/virtio-gpu-rutabaga.c | 1088 ++
> hw/display/virtio-vga-rutabaga.c |   52 ++
> 3 files changed, 1188 insertions(+)
> create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c
> create mode 100644 hw/display/virtio-gpu-rutabaga.c
> create mode 100644 hw/display/virtio-vga-rutabaga.c
>
>diff --git a/hw/display/virtio-gpu-pci-rutabaga.c 
>b/hw/display/virtio-gpu-pci-rutabaga.c
>new file mode 100644
>index 00..5765bef266
>--- /dev/null
>+++ b/hw/display/virtio-gpu-pci-rutabaga.c
>@@ -0,0 +1,48 @@
>+// SPDX-License-Identifier: GPL-2.0
>+
>+#include "qemu/osdep.h"
>+#include "qapi/error.h"
>+#include "qemu/module.h"
>+#include "hw/pci/pci.h"
>+#include "hw/qdev-properties.h"
>+#include "hw/virtio/virtio.h"
>+#include "hw/virtio/virtio-bus.h"
>+#include "hw/virtio/virtio-gpu-pci.h"
>+#include "qom/object.h"
>+
>+#define TYPE_VIRTIO_GPU_RUTABAGA_PCI "virtio-gpu-rutabaga-pci"
>+typedef struct VirtIOGPURUTABAGAPCI VirtIOGPURUTABA