Re: Boot failure after QEMU's upgrade to OpenSBI v1.3 (was Re: [PATCH for-8.2 6/7] target/riscv: add 'max' CPU type)
On Fri, Jul 14, 2023 at 5:29 AM Conor Dooley wrote: > > On Fri, Jul 14, 2023 at 11:19:34AM +0100, Conor Dooley wrote: > > On Fri, Jul 14, 2023 at 10:00:19AM +0530, Anup Patel wrote: > > > > > > > OpenSBI v1.3 > > > > >_ _ > > > > > / __ \ / | _ \_ _| > > > > > | | | |_ __ ___ _ __ | (___ | |_) || | > > > > > | | | | '_ \ / _ \ '_ \ \___ \| _ < | | > > > > > | |__| | |_) | __/ | | |) | |_) || |_ > > > > > \/| .__/ \___|_| |_|_/|___/_| > > > > > | | > > > > > |_| > > > > > > > > > > init_coldboot: ipi init failed (error -1009) > > > > > > > > > > Just to note, because we use our own firmware that vendors in OpenSBI > > > > > and compiles only a significantly cut down number of files from it, we > > > > > do not use the fw_dynamic etc flow on our hardware. As a result, we > > > > > have > > > > > not tested v1.3, nor do we have any immediate plans to change our > > > > > platform firmware to vendor v1.3 either. > > > > > > > > > > I unless there's something obvious to you, it sounds like I will need > > > > > to > > > > > go and bisect OpenSBI. That's a job for another day though, given the > > > > > time. > > > > > > > > > > > The real issue is some CPU/HART DT nodes marked as disabled in the > > > DT passed to OpenSBI 1.3. > > > > > > This issue does not exist in any of the DTs generated by QEMU but some > > > of the DTs in the kernel (such as microchip and SiFive board DTs) have > > > the E-core disabled. > > > > > > I had discovered this issue in a totally different context after the > > > OpenSBI 1.3 > > > release happened. This issue is already fixed in the latest OpenSBI by the > > > following commit c6a35733b74aeff612398f274ed19a74f81d1f37 ("lib: utils: > > > Fix sbi_hartid_to_scratch() usage in ACLINT drivers"). > > > > Great, thanks Anup! I thought I had tested tip-of-tree too, but > > obviously not. > > > > > I always assumed that Microchip hss.bin is the preferred BIOS for the > > > QEMU microchip-icicle-kit machine but I guess that's not true. > > > > Unfortunately the HSS has not worked in QEMU for a long time, and while > > I would love to fix it, but am pretty stretched for spare time to begin > > with. > > I usually just do direct kernel boots, which use the OpenSBI that comes > > with QEMU, as I am sure you already know :) > > > > > At this point, you can either: > > > 1) Use latest OpenSBI on QEMU microchip-icicle-kit machine > > I forgot to reply to this point, wondering what should be done with > QEMU. Bumping to v1.3 in QEMU introduces a regression here, regardless > of whether I can go and build a fixed version of OpenSBI. > FYI: The no-map fix went in OpenSBI v1.3. Without the upgrade, any user using the latest kernel (> v6.4) may hit those random linear map related issues (in hibernation or EFI booting path). There are three possible scenarios: 1. Upgrade to OpenSBI v1.3: Any user of microchip-icicle-kit machine or sifive fu540 machine users may hit this issue if the device tree has the disabled hart (e core). 2. No upgrade to OpenSBI v1.2. Any user using hibernation or UEFI may have issues [1] 3. Include a non-release version OpenSBI in Qemu with the fix as an exception. #3 probably deviates from policy and sets a bad precedent. So I am not advocating for it though ;) For both #1 & #2, the solution would be to use the latest OpenSBI in -bios argument instead of the stock one. I could be wrong but my guess is the number of users facing #2 would be higher than #1. [1] https://lore.kernel.org/linux-riscv/20230625140931.1266216-1-songshuaish...@tinylab.org/ > > > 2) Ensure CPU0 DT node is enabled in DT when booting on QEMU > > > microchip-icicle-kit machine with OpenSBI 1.3 > > > > Will OpenSBI disable it? If not, I think option 2) needs to be remove > > the DT node. I'll just use tip-of-tree myself & up to the > > Clearly didn't finish this comment. It was meant to say "up to the QEMU > maintainers what they want to do on the QEMU side of things". > > Thanks, > Conor. -- Regards, Atish
[PATCH] For curses display, recognize a few more control keys
The curses display handles most control-X keys, and translates them into their corresponding keycode. Here we recognize a few that are missing, Ctrl-@ (null), Ctrl-\ (backslash), Ctrl-] (right bracket), Ctrl-^ (caret), Ctrl-_ (underscore). Signed-off-by: Sean Estabrooks --- ui/curses_keys.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/ui/curses_keys.h b/ui/curses_keys.h index 71e04acdc7..88a2208ed1 100644 --- a/ui/curses_keys.h +++ b/ui/curses_keys.h @@ -210,6 +210,12 @@ static const int _curses2keycode[CURSES_CHARS] = { ['N' - '@'] = 49 | CNTRL, /* Control + n */ /* Control + m collides with the keycode for Enter */ +['@' - '@'] = 3 | CNTRL, /* Control + @ */ +/* Control + [ collides with the keycode for Escape */ +['\\' - '@'] = 43 | CNTRL, /* Control + Backslash */ +[']' - '@'] = 27 | CNTRL, /* Control + ] */ +['^' - '@'] = 7 | CNTRL, /* Control + ^ */ +['_' - '@'] = 12 | CNTRL, /* Control + Underscore */ }; static const int _curseskey2keycode[CURSES_KEYS] = { -- 2.40.1
[PULL 06/47] linux-user: Use abi_llong not int64_t in syscall_defs.h
Be careful not to change linux_dirent64, which is a host structure. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 0af7249330..2846a8cfa5 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1455,8 +1455,8 @@ struct target_stat64 { unsigned char __pad2[6]; unsigned short st_rdev; -int64_t st_size; -int64_t st_blksize; +abi_llong st_size; +abi_llong st_blksize; unsigned char __pad4[4]; unsigned intst_blocks; @@ -1514,7 +1514,7 @@ struct target_stat64 { unsigned char __pad3[8]; -int64_t st_size; +abi_llong st_size; unsigned intst_blksize; unsigned char __pad4[8]; @@ -1630,10 +1630,10 @@ struct QEMU_PACKED target_stat64 { abi_ullong st_rdev; abi_ullong __pad1; -int64_t st_size; +abi_llong st_size; abi_int st_blksize; abi_uint __pad2; -int64_t st_blocks; /* Number 512-byte blocks allocated. */ +abi_llong st_blocks; inttarget_st_atime; unsigned int target_st_atime_nsec; @@ -1760,7 +1760,7 @@ struct target_stat { int st_gid; abi_ulongst_rdev; abi_ulongst_pad1[3]; /* Reserved for st_rdev expansion */ -int64_t st_size; +abi_llongst_size; abi_long target_st_atime; abi_ulongtarget_st_atime_nsec; /* Reserved for st_atime expansion */ abi_long target_st_mtime; @@ -1769,7 +1769,7 @@ struct target_stat { abi_ulongtarget_st_ctime_nsec; /* Reserved for st_ctime expansion */ abi_ulongst_blksize; abi_ulongst_pad2; -int64_t st_blocks; +abi_llongst_blocks; }; #elif defined(TARGET_ABI_MIPSO32) @@ -1824,7 +1824,7 @@ struct target_stat64 { abi_ulong st_rdev; abi_ulong st_pad1[3]; /* Reserved for st_rdev expansion */ -int64_t st_size; +abi_llong st_size; /* * Actually this should be timestruc_t st_atime, st_mtime and st_ctime @@ -1842,7 +1842,7 @@ struct target_stat64 { abi_ulong st_blksize; abi_ulong st_pad2; -int64_t st_blocks; +abi_llong st_blocks; }; #elif defined(TARGET_ALPHA) @@ -2051,7 +2051,7 @@ struct target_stat64 { unsigned int st_uid; /* User ID of the file's owner. */ unsigned int st_gid; /* Group ID of the file's group. */ abi_ullong st_rdev; /* Device number, if device. */ -int64_t st_size;/* Size of file, in bytes. */ +abi_llong st_size; /* Size of file, in bytes. */ abi_ulong st_blksize; /* Optimal block size for I/O. */ abi_ulong __unused2; abi_ullong st_blocks; /* Number 512-byte blocks allocated. */ @@ -2105,10 +2105,10 @@ struct target_stat64 { unsigned int st_gid; abi_ullong st_rdev; abi_ullong __pad1; -int64_t st_size; +abi_llong st_size; int st_blksize; int __pad2; -int64_t st_blocks; +abi_llong st_blocks; int target_st_atime; unsigned int target_st_atime_nsec; int target_st_mtime; @@ -2165,9 +2165,9 @@ struct target_stat64 { abi_uint st_gid; abi_ullong st_rdev; abi_uint _pad2; -int64_tst_size; +abi_llong st_size; abi_intst_blksize; -int64_tst_blocks; +abi_llong st_blocks; abi_inttarget_st_atime; abi_uint target_st_atime_nsec; abi_inttarget_st_mtime; @@ -2790,7 +2790,7 @@ struct target_user_cap_data { #define TARGET_SYSLOG_ACTION_SIZE_BUFFER 10 struct target_statx_timestamp { -int64_t tv_sec; +abi_llong tv_sec; abi_uint tv_nsec; abi_int __reserved; }; -- 2.34.1
[PULL 15/47] include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2
Based on gcc's nios2.h setting BIGGEST_ALIGNMENT to 32 bits. Signed-off-by: Richard Henderson --- include/exec/user/abitypes.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/exec/user/abitypes.h b/include/exec/user/abitypes.h index beba0a48c7..6191ce9f74 100644 --- a/include/exec/user/abitypes.h +++ b/include/exec/user/abitypes.h @@ -17,7 +17,8 @@ #if (defined(TARGET_I386) && !defined(TARGET_X86_64)) \ || defined(TARGET_SH4) \ -|| defined(TARGET_MICROBLAZE) +|| defined(TARGET_MICROBLAZE) \ +|| defined(TARGET_NIOS2) #define ABI_LLONG_ALIGNMENT 4 #endif -- 2.34.1
[PULL 05/47] linux-user: Use abi_ullong not uint64_t in syscall_defs.h
Be careful not to change linux_dirent64, which is a host structure. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 72 +++ 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index caaa895bec..0af7249330 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1444,8 +1444,8 @@ struct target_stat64 { unsigned char __pad0[6]; unsigned short st_dev; -uint64_tst_ino; -uint64_tst_nlink; +abi_ullong st_ino; +abi_ullong st_nlink; unsigned intst_mode; @@ -1501,7 +1501,7 @@ struct target_stat64 { unsigned char __pad0[6]; unsigned short st_dev; -uint64_t st_ino; +abi_ullong st_ino; unsigned intst_mode; unsigned intst_nlink; @@ -1618,7 +1618,7 @@ struct target_stat { /* FIXME: Microblaze no-mmu user-space has a difference stat64 layout... */ #define TARGET_HAS_STRUCT_STAT64 struct QEMU_PACKED target_stat64 { -uint64_t st_dev; +abi_ullong st_dev; #define TARGET_STAT64_HAS_BROKEN_ST_INO 1 abi_uint pad0; abi_uint __st_ino; @@ -1627,8 +1627,8 @@ struct QEMU_PACKED target_stat64 { abi_uint st_nlink; abi_uint st_uid; abi_uint st_gid; -uint64_t st_rdev; -uint64_t __pad1; +abi_ullong st_rdev; +abi_ullong __pad1; int64_t st_size; abi_int st_blksize; @@ -1641,7 +1641,7 @@ struct QEMU_PACKED target_stat64 { unsigned int target_st_mtime_nsec; inttarget_st_ctime; unsigned int target_st_ctime_nsec; -uint64_t st_ino; +abi_ullong st_ino; }; #elif defined(TARGET_M68K) @@ -1753,7 +1753,7 @@ struct target_stat { struct target_stat { abi_ulongst_dev; abi_ulongst_pad0[3]; /* Reserved for st_dev expansion */ -uint64_t st_ino; +abi_ullong st_ino; unsigned int st_mode; unsigned int st_nlink; int st_uid; @@ -1813,7 +1813,7 @@ struct target_stat64 { abi_ulong st_dev; abi_ulong st_pad0[3]; /* Reserved for st_dev expansion */ -uint64_tst_ino; +abi_ullong st_ino; unsigned intst_mode; unsigned intst_nlink; @@ -2044,17 +2044,17 @@ struct target_stat { #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { -uint64_t st_dev;/* Device */ -uint64_t st_ino;/* File serial number */ +abi_ullong st_dev; /* Device */ +abi_ullong st_ino; /* File serial number */ unsigned int st_mode; /* File mode. */ unsigned int st_nlink; /* Link count. */ unsigned int st_uid; /* User ID of the file's owner. */ unsigned int st_gid; /* Group ID of the file's group. */ -uint64_t st_rdev; /* Device number, if device. */ +abi_ullong st_rdev; /* Device number, if device. */ int64_t st_size;/* Size of file, in bytes. */ abi_ulong st_blksize; /* Optimal block size for I/O. */ abi_ulong __unused2; -uint64_t st_blocks; /* Number 512-byte blocks allocated. */ +abi_ullong st_blocks; /* Number 512-byte blocks allocated. */ abi_ulong target_st_atime; /* Time of last access. */ abi_ulong target_st_atime_nsec; abi_ulong target_st_mtime; /* Time of last modification. */ @@ -2097,14 +2097,14 @@ struct target_stat { #if !defined(TARGET_RISCV64) #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { -uint64_t st_dev; -uint64_t st_ino; +abi_ullong st_dev; +abi_ullong st_ino; unsigned int st_mode; unsigned int st_nlink; unsigned int st_uid; unsigned int st_gid; -uint64_t st_rdev; -uint64_t __pad1; +abi_ullong st_rdev; +abi_ullong __pad1; int64_t st_size; int st_blksize; int __pad2; @@ -2156,14 +2156,14 @@ struct target_stat { #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { -uint64_t st_dev; +abi_ullong st_dev; abi_uint _pad1; abi_uint _res1; abi_uint st_mode; abi_uint st_nlink; abi_uint st_uid; abi_uint st_gid; -uint64_t st_rdev; +abi_ullong st_rdev; abi_uint _pad2; int64_tst_size; abi_intst_blksize; @@ -2174,7 +2174,7 @@ struct target_stat64 { abi_uint target_st_mtime_nsec; abi_inttarget_st_ctime; abi_uint target_st_ctime_nsec; -uint64_t st_ino; +abi_ullong st_ino; }; #elif defined(TARGET_LOONGARCH64) @@ -2231,11 +2231,11 @@ struct target_statfs64 { abi_uintf_bsize; abi_uintf_frsize; /* Fragment size - unsupported */ abi_uint__pad; -uint64_tf_blocks; -uint64_tf_bfree; -uint64_tf_files; -uint64_tf_ffree; -uint64_tf_bavail; +abi_ullong f_blocks; +a
[PULL 11/47] linux-user: Use abi_ushort not unsigned short in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 90 +++ 1 file changed, 45 insertions(+), 45 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 442a8aefe3..21ca03b0f4 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -432,7 +432,7 @@ typedef struct { struct target_dirent { abi_longd_ino; abi_longd_off; -unsigned short d_reclen; +abi_ushort d_reclen; chard_name[]; }; @@ -1210,19 +1210,19 @@ struct target_rtc_pll_info { #define TARGET_NCC 8 struct target_termio { -unsigned short c_iflag; /* input mode flags */ -unsigned short c_oflag; /* output mode flags */ -unsigned short c_cflag; /* control mode flags */ -unsigned short c_lflag; /* local mode flags */ +abi_ushort c_iflag; /* input mode flags */ +abi_ushort c_oflag; /* output mode flags */ +abi_ushort c_cflag; /* control mode flags */ +abi_ushort c_lflag; /* local mode flags */ unsigned char c_line; /* line discipline */ unsigned char c_cc[TARGET_NCC]; /* control characters */ }; struct target_winsize { -unsigned short ws_row; -unsigned short ws_col; -unsigned short ws_xpixel; -unsigned short ws_ypixel; +abi_ushort ws_row; +abi_ushort ws_col; +abi_ushort ws_xpixel; +abi_ushort ws_ypixel; }; #include "termbits.h" @@ -1328,15 +1328,15 @@ struct target_winsize { || defined(TARGET_CRIS) #define TARGET_STAT_HAVE_NSEC struct target_stat { -unsigned short st_dev; -unsigned short __pad1; +abi_ushort st_dev; +abi_ushort __pad1; abi_ulong st_ino; -unsigned short st_mode; -unsigned short st_nlink; -unsigned short st_uid; -unsigned short st_gid; -unsigned short st_rdev; -unsigned short __pad2; +abi_ushort st_mode; +abi_ushort st_nlink; +abi_ushort st_uid; +abi_ushort st_gid; +abi_ushort st_rdev; +abi_ushort __pad2; abi_ulong st_size; abi_ulong st_blksize; abi_ulong st_blocks; @@ -1355,7 +1355,7 @@ struct target_stat { */ #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { -unsigned short st_dev; +abi_ushort st_dev; unsigned char __pad0[10]; #define TARGET_STAT64_HAS_BROKEN_ST_INO 1 @@ -1367,7 +1367,7 @@ struct target_stat64 { abi_ulong st_uid; abi_ulong st_gid; -unsigned short st_rdev; +abi_ushort st_rdev; unsigned char __pad3[10]; abi_llong st_size; @@ -1442,7 +1442,7 @@ struct target_stat { #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { unsigned char __pad0[6]; -unsigned short st_dev; +abi_ushort st_dev; abi_ullong st_ino; abi_ullong st_nlink; @@ -1453,7 +1453,7 @@ struct target_stat64 { abi_uintst_gid; unsigned char __pad2[6]; -unsigned short st_rdev; +abi_ushort st_rdev; abi_llong st_size; abi_llong st_blksize; @@ -1477,13 +1477,13 @@ struct target_stat64 { #define TARGET_STAT_HAVE_NSEC struct target_stat { -unsigned short st_dev; +abi_ushort st_dev; abi_ulong st_ino; -unsigned short st_mode; +abi_ushort st_mode; short st_nlink; -unsigned short st_uid; -unsigned short st_gid; -unsigned short st_rdev; +abi_ushort st_uid; +abi_ushort st_gid; +abi_ushort st_rdev; abi_longst_size; abi_longtarget_st_atime; abi_ulong target_st_atime_nsec; @@ -1499,7 +1499,7 @@ struct target_stat { #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { unsigned char __pad0[6]; -unsigned short st_dev; +abi_ushort st_dev; abi_ullong st_ino; @@ -1510,7 +1510,7 @@ struct target_stat64 { abi_uintst_gid; unsigned char __pad2[6]; -unsigned short st_rdev; +abi_ushort st_rdev; unsigned char __pad3[8]; @@ -1544,7 +1544,7 @@ struct target_stat { abi_uint st_mode; #else abi_uint st_mode; -unsigned short st_nlink; +abi_ushort st_nlink; #endif abi_uint st_uid; abi_uint st_gid; @@ -1598,7 +1598,7 @@ struct target_stat { abi_ulong st_dev; abi_ulong st_ino; abi_uint st_mode; -unsigned short st_nlink; +abi_ushort st_nlink; abi_uint st_uid; abi_uint st_gid; abi_ulong st_rdev; @@ -1647,15 +1647,15 @@ struct QEMU_PACKED target_stat64 { #elif defined(TARGET_M68K) struct target_stat { -unsigned short st_dev; -unsigned short __pad1; -abi_ulong st_ino; -unsigned short st_mode; -unsigned short st_nlink; -unsigned short st_uid; -unsigned short st_gid; -unsigned short st_rdev; -unsigned short __pad2; +abi_u
[PULL 00/47] tcg + linux-user patch queue
The following changes since commit 4633c1e2c576fbabfe5c8c93f4b842504b69c096: Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging (2023-07-14 16:39:46 +0100) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230715 for you to fetch changes up to 76f9d6ad19494290eb2f00d33c6a582ce3447991: tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128 (2023-07-15 08:02:49 +0100) tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128 accel/tcg: Introduce page_check_range_empty accel/tcg: Introduce page_find_range_empty accel/tcg: Accept more page flags in page_check_range accel/tcg: Return bool from page_check_range accel/tcg: Always lock pages before translation linux-user: Use abi_* types for target structures in syscall_defs.h linux-user: Fix abi_llong alignment for microblaze and nios2 linux-user: Fix do_shmat type errors linux-user: Implement execve without execveat linux-user: Make sure initial brk is aligned linux-user: Use a mask with strace flags linux-user: Implement MAP_FIXED_NOREPLACE linux-user: Widen target_mmap offset argument to off_t linux-user: Use page_find_range_empty for mmap_find_vma_reserved linux-user: Use 'last' instead of 'end' in target_mmap and subroutines linux-user: Remove can_passthrough_madvise linux-user: Simplify target_madvise linux-user: Drop uint and ulong types linux-user/arm: Do not allocate a commpage at all for M-profile CPUs bsd-user: Use page_check_range_empty for MAP_EXCL bsd-user: Use page_find_range_empty for mmap_find_vma_reserved Andreas Schwab (1): linux-user: Make sure initial brk(0) is page-aligned Juan Quintela (1): linux-user: Drop uint and ulong Philippe Mathieu-Daudé (1): linux-user/arm: Do not allocate a commpage at all for M-profile CPUs Pierrick Bouvier (1): linux-user/syscall: Implement execve without execveat Richard Henderson (43): linux-user: Reformat syscall_defs.h linux-user: Remove #if 0 block in syscall_defs.h linux-user: Use abi_uint not uint32_t in syscall_defs.h linux-user: Use abi_int not int32_t in syscall_defs.h linux-user: Use abi_ullong not uint64_t in syscall_defs.h linux-user: Use abi_llong not int64_t in syscall_defs.h linux-user: Use abi_uint not unsigned int in syscall_defs.h linux-user: Use abi_ullong not unsigned long long in syscall_defs.h linux-user: Use abi_llong not long long in syscall_defs.h linux-user: Use abi_int not int in syscall_defs.h linux-user: Use abi_ushort not unsigned short in syscall_defs.h linux-user: Use abi_short not short in syscall_defs.h linux-user: Use abi_uint not unsigned in syscall_defs.h include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for microblaze include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for nios2 linux-user: Fix do_shmat type errors accel/tcg: Split out cpu_exec_longjmp_cleanup tcg: Fix info_in_idx increment in layout_arg_by_ref linux-user: Fix formatting of mmap.c linux-user/strace: Expand struct flags to hold a mask linux-user: Split TARGET_MAP_* out of syscall_defs.h linux-user: Split TARGET_PROT_* out of syscall_defs.h linux-user: Populate more bits in mmap_flags_tbl accel/tcg: Introduce page_check_range_empty bsd-user: Use page_check_range_empty for MAP_EXCL linux-user: Implement MAP_FIXED_NOREPLACE linux-user: Split out target_to_host_prot linux-user: Widen target_mmap offset argument to off_t linux-user: Rewrite target_mprotect linux-user: Rewrite mmap_frag accel/tcg: Introduce page_find_range_empty bsd-user: Use page_find_range_empty for mmap_find_vma_reserved linux-user: Use page_find_range_empty for mmap_find_vma_reserved linux-user: Use 'last' instead of 'end' in target_mmap linux-user: Rewrite mmap_reserve linux-user: Rename mmap_reserve to mmap_reserve_or_unmap linux-user: Simplify target_munmap accel/tcg: Accept more page flags in page_check_range accel/tcg: Return bool from page_check_range linux-user: Remove can_passthrough_madvise linux-user: Simplify target_madvise accel/tcg: Always lock pages before translation tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128 accel/tcg/internal.h | 30 +- accel/tcg/tcg-runtime.h|2 +- bsd-user/qemu.h|2 +- include/exec/cpu-all.h | 40 +- include/exec/helper-proto-common.h |2 + include/exec/user/abitypes.h |5 +- linux-user/aarch64/target_mman.h |8 + linux-user/alpha/target_mman.h | 13 + linux-user/generic/target_mman.h | 58 + linux-user/hppa/target_mman.h | 10 + linux-user/mips/target_mman.h |
[PULL 39/47] linux-user: Simplify target_munmap
All of the guest to host page adjustment is handled by mmap_reserve_or_unmap; there is no need to duplicate that. There are no failure modes for munmap after alignment and guest address range have been validated. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-23-richard.hender...@linaro.org> --- linux-user/mmap.c | 47 --- 1 file changed, 4 insertions(+), 43 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 22c2869be8..c0946322fb 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -789,9 +789,6 @@ static void mmap_reserve_or_unmap(abi_ulong start, abi_ulong len) int target_munmap(abi_ulong start, abi_ulong len) { -abi_ulong end, real_start, real_end, addr; -int prot, ret; - trace_target_munmap(start, len); if (start & ~TARGET_PAGE_MASK) { @@ -803,47 +800,11 @@ int target_munmap(abi_ulong start, abi_ulong len) } mmap_lock(); -end = start + len; -real_start = start & qemu_host_page_mask; -real_end = HOST_PAGE_ALIGN(end); - -if (start > real_start) { -/* handle host page containing start */ -prot = 0; -for (addr = real_start; addr < start; addr += TARGET_PAGE_SIZE) { -prot |= page_get_flags(addr); -} -if (real_end == real_start + qemu_host_page_size) { -for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) { -prot |= page_get_flags(addr); -} -end = real_end; -} -if (prot != 0) { -real_start += qemu_host_page_size; -} -} -if (end < real_end) { -prot = 0; -for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) { -prot |= page_get_flags(addr); -} -if (prot != 0) { -real_end -= qemu_host_page_size; -} -} - -ret = 0; -/* unmap what we can */ -if (real_start < real_end) { -mmap_reserve_or_unmap(real_start, real_end - real_start); -} - -if (ret == 0) { -page_set_flags(start, start + len - 1, 0); -} +mmap_reserve_or_unmap(start, len); +page_set_flags(start, start + len - 1, 0); mmap_unlock(); -return ret; + +return 0; } abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, -- 2.34.1
[PULL 04/47] linux-user: Use abi_int not int32_t in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 60 +++ 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 414d88a9ec..caaa895bec 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -501,7 +501,7 @@ int do_sigaction(int sig, const struct target_sigaction *act, #endif #if defined(TARGET_ALPHA) -typedef int32_t target_old_sa_flags; +typedef abi_int target_old_sa_flags; #else typedef abi_ulong target_old_sa_flags; #endif @@ -1631,7 +1631,7 @@ struct QEMU_PACKED target_stat64 { uint64_t __pad1; int64_t st_size; -int32_t st_blksize; +abi_int st_blksize; abi_uint __pad2; int64_t st_blocks; /* Number 512-byte blocks allocated. */ @@ -2192,20 +2192,20 @@ typedef struct { #ifdef TARGET_MIPS #ifdef TARGET_ABI_MIPSN32 struct target_statfs { -int32_t f_type; -int32_t f_bsize; -int32_t f_frsize; /* Fragment size - unsupported */ -int32_t f_blocks; -int32_t f_bfree; -int32_t f_files; -int32_t f_ffree; -int32_t f_bavail; +abi_int f_type; +abi_int f_bsize; +abi_int f_frsize; /* Fragment size - unsupported */ +abi_int f_blocks; +abi_int f_bfree; +abi_int f_files; +abi_int f_ffree; +abi_int f_bavail; /* Linux specials */ target_fsid_t f_fsid; -int32_t f_namelen; -int32_t f_flags; -int32_t f_spare[5]; +abi_int f_namelen; +abi_int f_flags; +abi_int f_spare[5]; }; #else struct target_statfs { @@ -2276,34 +2276,34 @@ struct target_statfs64 { }; #elif defined(TARGET_S390X) struct target_statfs { -int32_t f_type; -int32_t f_bsize; +abi_int f_type; +abi_int f_bsize; abi_long f_blocks; abi_long f_bfree; abi_long f_bavail; abi_long f_files; abi_long f_ffree; kernel_fsid_t f_fsid; -int32_t f_namelen; -int32_t f_frsize; -int32_t f_flags; -int32_t f_spare[4]; +abi_int f_namelen; +abi_int f_frsize; +abi_int f_flags; +abi_int f_spare[4]; }; struct target_statfs64 { -int32_t f_type; -int32_t f_bsize; +abi_int f_type; +abi_int f_bsize; abi_long f_blocks; abi_long f_bfree; abi_long f_bavail; abi_long f_files; abi_long f_ffree; kernel_fsid_t f_fsid; -int32_t f_namelen; -int32_t f_frsize; -int32_t f_flags; -int32_t f_spare[4]; +abi_int f_namelen; +abi_int f_frsize; +abi_int f_flags; +abi_int f_spare[4]; }; #else struct target_statfs { @@ -2718,21 +2718,21 @@ struct target_ucred { abi_uint gid; }; -typedef int32_t target_timer_t; +typedef abi_int target_timer_t; #define TARGET_SIGEV_MAX_SIZE 64 /* This is architecture-specific but most architectures use the default */ #ifdef TARGET_MIPS -#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(int32_t) * 2 + sizeof(abi_long)) +#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(abi_int) * 2 + sizeof(abi_long)) #else -#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(int32_t) * 2 \ +#define TARGET_SIGEV_PREAMBLE_SIZE (sizeof(abi_int) * 2 \ + sizeof(target_sigval_t)) #endif #define TARGET_SIGEV_PAD_SIZE ((TARGET_SIGEV_MAX_SIZE \ - TARGET_SIGEV_PREAMBLE_SIZE) \ - / sizeof(int32_t)) + / sizeof(abi_int)) struct target_sigevent { target_sigval_t sigev_value; @@ -2792,7 +2792,7 @@ struct target_user_cap_data { struct target_statx_timestamp { int64_t tv_sec; abi_uint tv_nsec; -int32_t __reserved; +abi_int __reserved; }; struct target_statx { -- 2.34.1
[PULL 18/47] accel/tcg: Split out cpu_exec_longjmp_cleanup
Share the setjmp cleanup between cpu_exec_step_atomic and cpu_exec_setjmp. Reviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Richard W.M. Jones Signed-off-by: Richard Henderson --- accel/tcg/cpu-exec.c | 43 +++ 1 file changed, 19 insertions(+), 24 deletions(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index ba1890a373..31aa320513 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -526,6 +526,23 @@ static void cpu_exec_exit(CPUState *cpu) } } +static void cpu_exec_longjmp_cleanup(CPUState *cpu) +{ +/* Non-buggy compilers preserve this; assert the correct value. */ +g_assert(cpu == current_cpu); + +#ifdef CONFIG_USER_ONLY +clear_helper_retaddr(); +if (have_mmap_lock()) { +mmap_unlock(); +} +#endif +if (qemu_mutex_iothread_locked()) { +qemu_mutex_unlock_iothread(); +} +assert_no_pages_locked(); +} + void cpu_exec_step_atomic(CPUState *cpu) { CPUArchState *env = cpu->env_ptr; @@ -568,16 +585,7 @@ void cpu_exec_step_atomic(CPUState *cpu) cpu_tb_exec(cpu, tb, &tb_exit); cpu_exec_exit(cpu); } else { -#ifdef CONFIG_USER_ONLY -clear_helper_retaddr(); -if (have_mmap_lock()) { -mmap_unlock(); -} -#endif -if (qemu_mutex_iothread_locked()) { -qemu_mutex_unlock_iothread(); -} -assert_no_pages_locked(); +cpu_exec_longjmp_cleanup(cpu); } /* @@ -1023,20 +1031,7 @@ static int cpu_exec_setjmp(CPUState *cpu, SyncClocks *sc) { /* Prepare setjmp context for exception handling. */ if (unlikely(sigsetjmp(cpu->jmp_env, 0) != 0)) { -/* Non-buggy compilers preserve this; assert the correct value. */ -g_assert(cpu == current_cpu); - -#ifdef CONFIG_USER_ONLY -clear_helper_retaddr(); -if (have_mmap_lock()) { -mmap_unlock(); -} -#endif -if (qemu_mutex_iothread_locked()) { -qemu_mutex_unlock_iothread(); -} - -assert_no_pages_locked(); +cpu_exec_longjmp_cleanup(cpu); } return cpu_exec_loop(cpu, sc); -- 2.34.1
[PULL 02/47] linux-user: Remove #if 0 block in syscall_defs.h
These definitions are in sparc/signal.c. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 24 1 file changed, 24 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index e80d54780b..a4e4df8d3e 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -547,30 +547,6 @@ typedef union target_sigval { int sival_int; abi_ulong sival_ptr; } target_sigval_t; -#if 0 -#if defined (TARGET_SPARC) -typedef struct { -struct { -abi_ulong psr; -abi_ulong pc; -abi_ulong npc; -abi_ulong y; -abi_ulong u_regs[16]; /* globals and ins */ -} si_regs; -int si_mask; -} __siginfo_t; - -typedef struct { -unsigned long si_float_regs [32]; -unsigned long si_fsr; -unsigned long si_fpqdepth; -struct { -unsigned long *insn_addr; -unsigned long insn; -} si_fpqueue [16]; -} __siginfo_fpu_t; -#endif -#endif #define TARGET_SI_MAX_SIZE 128 -- 2.34.1
[PULL 30/47] linux-user: Widen target_mmap offset argument to off_t
We build with _FILE_OFFSET_BITS=64, so off_t = off64_t = uint64_t. With an extra cast, this fixes emulation of mmap2, which could overflow the computation of the full value of offset. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-14-richard.hender...@linaro.org> --- linux-user/user-mmap.h | 2 +- linux-user/mmap.c | 14 -- linux-user/syscall.c | 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/linux-user/user-mmap.h b/linux-user/user-mmap.h index 480ce1c114..3fc986f92f 100644 --- a/linux-user/user-mmap.h +++ b/linux-user/user-mmap.h @@ -20,7 +20,7 @@ int target_mprotect(abi_ulong start, abi_ulong len, int prot); abi_long target_mmap(abi_ulong start, abi_ulong len, int prot, - int flags, int fd, abi_ulong offset); + int flags, int fd, off_t offset); int target_munmap(abi_ulong start, abi_ulong len); abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, abi_ulong new_size, unsigned long flags, diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 12b1308a83..b2c2d85857 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -196,7 +196,7 @@ error: /* map an incomplete host page */ static int mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong end, - int prot, int flags, int fd, abi_ulong offset) + int prot, int flags, int fd, off_t offset) { abi_ulong real_end, addr; void *host_start; @@ -463,11 +463,12 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align) /* NOTE: all the constants are the HOST ones */ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, - int flags, int fd, abi_ulong offset) + int flags, int fd, off_t offset) { -abi_ulong ret, end, real_start, real_end, retaddr, host_offset, host_len, +abi_ulong ret, end, real_start, real_end, retaddr, host_len, passthrough_start = -1, passthrough_end = -1; int page_flags; +off_t host_offset; mmap_lock(); trace_target_mmap(start, len, target_prot, flags, fd, offset); @@ -559,7 +560,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, } if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) { -unsigned long host_start; +uintptr_t host_start; int host_prot; void *p; @@ -578,7 +579,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, goto fail; } /* update start so that it points to the file position at 'offset' */ -host_start = (unsigned long)p; +host_start = (uintptr_t)p; if (!(flags & MAP_ANONYMOUS)) { p = mmap(g2h_untagged(start), len, host_prot, flags | MAP_FIXED, fd, host_offset); @@ -681,7 +682,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, /* map the middle (easier) */ if (real_start < real_end) { void *p; -unsigned long offset1; +off_t offset1; + if (flags & MAP_ANONYMOUS) { offset1 = 0; } else { diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 3a89f6b408..a80d33ecf2 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -10591,7 +10591,7 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1, #endif ret = target_mmap(arg1, arg2, arg3, target_to_host_bitmask(arg4, mmap_flags_tbl), - arg5, arg6 << MMAP_SHIFT); + arg5, (off_t)(abi_ulong)arg6 << MMAP_SHIFT); return get_errno(ret); #endif case TARGET_NR_munmap: -- 2.34.1
[PULL 10/47] linux-user: Use abi_int not int in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 216 +++--- 1 file changed, 108 insertions(+), 108 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index e4fcbd16d2..442a8aefe3 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -361,7 +361,7 @@ struct target_iovec { struct target_msghdr { abi_long msg_name; /* Socket name */ -int msg_namelen;/* Length of name */ +abi_int msg_namelen;/* Length of name */ abi_long msg_iov;/* Data blocks */ abi_long msg_iovlen; /* Number of blocks*/ abi_long msg_control;/* Per protocol magic (eg BSD file descriptor passing) */ @@ -371,8 +371,8 @@ struct target_msghdr { struct target_cmsghdr { abi_long cmsg_len; -int cmsg_level; -int cmsg_type; +abi_int cmsg_level; +abi_int cmsg_type; }; #define TARGET_CMSG_DATA(cmsg) ((unsigned char *) ((struct target_cmsghdr *) (cmsg) + 1)) @@ -426,7 +426,7 @@ struct target_rusage { }; typedef struct { -int val[2]; +abi_int val[2]; } kernel_fsid_t; struct target_dirent { @@ -544,7 +544,7 @@ struct target_sigaction { #endif typedef union target_sigval { -int sival_int; +abi_int sival_int; abi_ulong sival_ptr; } target_sigval_t; @@ -575,17 +575,17 @@ typedef union target_sigval { typedef struct target_siginfo { #ifdef TARGET_MIPS -int si_signo; -int si_code; -int si_errno; +abi_int si_signo; +abi_int si_code; +abi_int si_errno; #else -int si_signo; -int si_errno; -int si_code; +abi_int si_signo; +abi_int si_errno; +abi_int si_code; #endif union { -int _pad[TARGET_SI_PAD_SIZE]; +abi_int _pad[TARGET_SI_PAD_SIZE]; /* kill() */ struct { @@ -610,7 +610,7 @@ typedef struct target_siginfo { struct { pid_t _pid; /* which child */ uid_t _uid; /* sender's uid */ -int _status;/* exit code */ +abi_int _status;/* exit code */ target_clock_t _utime; target_clock_t _stime; } _sigchld; @@ -622,8 +622,8 @@ typedef struct target_siginfo { /* SIGPOLL */ struct { -int _band; /* POLL_IN, POLL_OUT, POLL_MSG */ -int _fd; +abi_int _band; /* POLL_IN, POLL_OUT, POLL_MSG */ +abi_int _fd; } _sigpoll; } _sifields; } target_siginfo_t; @@ -701,7 +701,7 @@ typedef struct target_siginfo { #include "target_resource.h" struct target_pollfd { -int fd; /* file descriptor */ +abi_int fd; /* file descriptor */ short events; /* requested events */ short revents;/* returned events */ }; @@ -722,12 +722,12 @@ struct target_pollfd { #define TARGET_KDSIGACCEPT 0x4B4E struct target_rtc_pll_info { -int pll_ctrl; -int pll_value; -int pll_max; -int pll_min; -int pll_posmult; -int pll_negmult; +abi_int pll_ctrl; +abi_int pll_value; +abi_int pll_max; +abi_int pll_min; +abi_int pll_posmult; +abi_int pll_negmult; abi_long pll_clock; }; @@ -754,14 +754,14 @@ struct target_rtc_pll_info { struct target_rtc_pll_info) #define TARGET_RTC_PLL_SET TARGET_IOW('p', 0x12, \ struct target_rtc_pll_info) -#define TARGET_RTC_VL_READ TARGET_IOR('p', 0x13, int) +#define TARGET_RTC_VL_READ TARGET_IOR('p', 0x13, abi_int) #define TARGET_RTC_VL_CLR TARGET_IO('p', 0x14) #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) || \ defined(TARGET_XTENSA) -#define TARGET_FIOGETOWN TARGET_IOR('f', 123, int) -#define TARGET_FIOSETOWN TARGET_IOW('f', 124, int) -#define TARGET_SIOCATMARK TARGET_IOR('s', 7, int) +#define TARGET_FIOGETOWN TARGET_IOR('f', 123, abi_int) +#define TARGET_FIOSETOWN TARGET_IOW('f', 124, abi_int) +#define TARGET_SIOCATMARK TARGET_IOR('s', 7, abi_int) #define TARGET_SIOCSPGRP TARGET_IOW('s', 8, pid_t) #define TARGET_SIOCGPGRP TARGET_IOR('s', 9, pid_t) #else @@ -851,40 +851,40 @@ struct target_rtc_pll_info { /* From */ -#define TARGET_TUNSETDEBUGTARGET_IOW('T', 201, int) -#define TARGET_TUNSETIFF TARGET_IOW('T', 202, int) -#define TARGET_TUNSETPERSIST TARGET_IOW('T', 203, int) -#define TARGET_TUNSETOWNERTARGET_IOW('T', 204, int) -#define TARGET_TUNSETLINK TARGET_IOW('T', 205, int) -#define TARGET_TUNSETGROUPTARGET_IOW('T', 206, int) +#define TARGET_TUNSETDEBUGTARGET_IOW('T', 201, abi_int) +#defin
[PULL 16/47] linux-user/syscall: Implement execve without execveat
From: Pierrick Bouvier Support for execveat syscall was implemented in 55bbe4 and is available since QEMU 8.0.0. It relies on host execveat, which is widely available on most of Linux kernels today. However, this change breaks qemu-user self emulation, if "host" qemu version is less than 8.0.0. Indeed, it does not implement yet execveat. This strange use case happens with most of distribution today having binfmt support. With a concrete failing example: $ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls /bin/bash: line 1: /bin/ls: Function not implemented -> not implemented means execve returned ENOSYS qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian packages qemu-user-static* [1]. One usage of this is running wine-arm64 from linux-x64 (details [2]). This is by updating qemu embedded in docker image that we ran into this issue. The solution to update host qemu is not always possible. Either it's complicated or ask you to recompile it, or simply is not accessible (GitLab CI, GitHub Actions). Thus, it could be worth to implement execve without relying on execveat, which is the goal of this patch. This patch was tested with example presented in this commit message. [1] http://ftp.us.debian.org/debian/pool/main/q/qemu/ [1] https://www.linaro.org/blog/emulate-windows-on-arm/ Signed-off-by: Pierrick Bouvier Reviewed-by: Richard Henderson Reviewed-by: Michael Tokarev Message-Id: <20230705121023.973284-1-pierrick.bouv...@linaro.org> Signed-off-by: Richard Henderson --- linux-user/syscall.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 420bab7c68..c15d9ad743 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -659,6 +659,7 @@ safe_syscall4(pid_t, wait4, pid_t, pid, int *, status, int, options, \ #endif safe_syscall5(int, waitid, idtype_t, idtype, id_t, id, siginfo_t *, infop, \ int, options, struct rusage *, rusage) +safe_syscall3(int, execve, const char *, filename, char **, argv, char **, envp) safe_syscall5(int, execveat, int, dirfd, const char *, filename, char **, argv, char **, envp, int, flags) #if defined(TARGET_NR_select) || defined(TARGET_NR__newselect) || \ @@ -8629,9 +8630,9 @@ ssize_t do_guest_readlink(const char *pathname, char *buf, size_t bufsiz) return ret; } -static int do_execveat(CPUArchState *cpu_env, int dirfd, - abi_long pathname, abi_long guest_argp, - abi_long guest_envp, int flags) +static int do_execv(CPUArchState *cpu_env, int dirfd, +abi_long pathname, abi_long guest_argp, +abi_long guest_envp, int flags, bool is_execveat) { int ret; char **argp, **envp; @@ -8710,11 +8711,14 @@ static int do_execveat(CPUArchState *cpu_env, int dirfd, goto execve_efault; } +const char *exe = p; if (is_proc_myself(p, "exe")) { -ret = get_errno(safe_execveat(dirfd, exec_path, argp, envp, flags)); -} else { -ret = get_errno(safe_execveat(dirfd, p, argp, envp, flags)); +exe = exec_path; } +ret = is_execveat +? safe_execveat(dirfd, exe, argp, envp, flags) +: safe_execve(exe, argp, envp); +ret = get_errno(ret); unlock_user(p, pathname, 0); @@ -9406,9 +9410,9 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1, return ret; #endif case TARGET_NR_execveat: -return do_execveat(cpu_env, arg1, arg2, arg3, arg4, arg5); +return do_execv(cpu_env, arg1, arg2, arg3, arg4, arg5, true); case TARGET_NR_execve: -return do_execveat(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0); +return do_execv(cpu_env, AT_FDCWD, arg1, arg2, arg3, 0, false); case TARGET_NR_chdir: if (!(p = lock_user_string(arg1))) return -TARGET_EFAULT; -- 2.34.1
[PULL 23/47] linux-user: Split TARGET_MAP_* out of syscall_defs.h
Move the values into the per-target target_mman.h headers Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-7-richard.hender...@linaro.org> --- linux-user/alpha/target_mman.h | 13 + linux-user/generic/target_mman.h | 54 linux-user/hppa/target_mman.h| 10 linux-user/mips/target_mman.h| 16 ++ linux-user/mips64/target_mman.h | 2 +- linux-user/ppc/target_mman.h | 8 +++ linux-user/sparc/target_mman.h | 9 linux-user/syscall_defs.h| 85 +--- linux-user/xtensa/target_mman.h | 16 ++ 9 files changed, 128 insertions(+), 85 deletions(-) diff --git a/linux-user/alpha/target_mman.h b/linux-user/alpha/target_mman.h index 051544f5ab..6bb03e7336 100644 --- a/linux-user/alpha/target_mman.h +++ b/linux-user/alpha/target_mman.h @@ -1,6 +1,19 @@ #ifndef ALPHA_TARGET_MMAN_H #define ALPHA_TARGET_MMAN_H +#define TARGET_MAP_ANONYMOUS0x10 +#define TARGET_MAP_FIXED0x100 +#define TARGET_MAP_GROWSDOWN0x01000 +#define TARGET_MAP_DENYWRITE0x02000 +#define TARGET_MAP_EXECUTABLE 0x04000 +#define TARGET_MAP_LOCKED 0x08000 +#define TARGET_MAP_NORESERVE0x1 +#define TARGET_MAP_POPULATE 0x2 +#define TARGET_MAP_NONBLOCK 0x4 +#define TARGET_MAP_STACK0x8 +#define TARGET_MAP_HUGETLB 0x10 +#define TARGET_MAP_FIXED_NOREPLACE 0x20 + #define TARGET_MADV_DONTNEED 6 #define TARGET_MS_ASYNC 1 diff --git a/linux-user/generic/target_mman.h b/linux-user/generic/target_mman.h index 32bf1a52d0..7b888fb7f8 100644 --- a/linux-user/generic/target_mman.h +++ b/linux-user/generic/target_mman.h @@ -1,6 +1,60 @@ #ifndef LINUX_USER_TARGET_MMAN_H #define LINUX_USER_TARGET_MMAN_H +/* These are defined in linux/mmap.h */ +#define TARGET_MAP_SHARED 0x01 +#define TARGET_MAP_PRIVATE 0x02 +#define TARGET_MAP_SHARED_VALIDATE 0x03 + +/* 0x0100 - 0x4000 flags are defined in asm-generic/mman.h */ +#ifndef TARGET_MAP_GROWSDOWN +#define TARGET_MAP_GROWSDOWN0x0100 +#endif +#ifndef TARGET_MAP_DENYWRITE +#define TARGET_MAP_DENYWRITE0x0800 +#endif +#ifndef TARGET_MAP_EXECUTABLE +#define TARGET_MAP_EXECUTABLE 0x1000 +#endif +#ifndef TARGET_MAP_LOCKED +#define TARGET_MAP_LOCKED 0x2000 +#endif +#ifndef TARGET_MAP_NORESERVE +#define TARGET_MAP_NORESERVE0x4000 +#endif + +/* Other MAP flags are defined in asm-generic/mman-common.h */ +#ifndef TARGET_MAP_TYPE +#define TARGET_MAP_TYPE 0x0f +#endif +#ifndef TARGET_MAP_FIXED +#define TARGET_MAP_FIXED0x10 +#endif +#ifndef TARGET_MAP_ANONYMOUS +#define TARGET_MAP_ANONYMOUS0x20 +#endif +#ifndef TARGET_MAP_POPULATE +#define TARGET_MAP_POPULATE 0x008000 +#endif +#ifndef TARGET_MAP_NONBLOCK +#define TARGET_MAP_NONBLOCK 0x01 +#endif +#ifndef TARGET_MAP_STACK +#define TARGET_MAP_STACK0x02 +#endif +#ifndef TARGET_MAP_HUGETLB +#define TARGET_MAP_HUGETLB 0x04 +#endif +#ifndef TARGET_MAP_SYNC +#define TARGET_MAP_SYNC 0x08 +#endif +#ifndef TARGET_MAP_FIXED_NOREPLACE +#define TARGET_MAP_FIXED_NOREPLACE 0x10 +#endif +#ifndef TARGET_MAP_UNINITIALIZED +#define TARGET_MAP_UNINITIALIZED0x400 +#endif + #ifndef TARGET_MADV_NORMAL #define TARGET_MADV_NORMAL 0 #endif diff --git a/linux-user/hppa/target_mman.h b/linux-user/hppa/target_mman.h index f9b6b97032..97f87d042a 100644 --- a/linux-user/hppa/target_mman.h +++ b/linux-user/hppa/target_mman.h @@ -1,6 +1,16 @@ #ifndef HPPA_TARGET_MMAN_H #define HPPA_TARGET_MMAN_H +#define TARGET_MAP_TYPE 0x2b +#define TARGET_MAP_FIXED0x04 +#define TARGET_MAP_ANONYMOUS0x10 +#define TARGET_MAP_GROWSDOWN0x8000 +#define TARGET_MAP_POPULATE 0x1 +#define TARGET_MAP_NONBLOCK 0x2 +#define TARGET_MAP_STACK0x4 +#define TARGET_MAP_HUGETLB 0x8 +#define TARGET_MAP_UNINITIALIZED0 + #define TARGET_MADV_MERGEABLE 65 #define TARGET_MADV_UNMERGEABLE 66 #define TARGET_MADV_HUGEPAGE 67 diff --git a/linux-user/mips/target_mman.h b/linux-user/mips/target_mman.h index e7ba6070fe..cd566c24b6 100644 --- a/linux-user/mips/target_mman.h +++ b/linux-user/mips/target_mman.h @@ -1 +1,17 @@ +#ifndef MIPS_TARGET_MMAN_H +#define MIPS_TARGET_MMAN_H + +#define TARGET_MAP_NORESERVE0x0400 +#define TARGET_MAP_ANONYMOUS0x0800 +#define TARGET_MAP_GROWSDOWN0x1000 +#define TARGET_MAP_DENYWRITE0x2000 +#define TARGET_MAP_EXECUTABLE 0x4000 +#define TARGET_MAP_LOCKED 0x8000 +#define TARGET_MAP_POPULATE 0x1 +#define TARGET_MAP_NONBLOCK 0x2 +#define TARGET_MAP_STACK
[PULL 01/47] linux-user: Reformat syscall_defs.h
Untabify and re-indent. We had a mix of 2, 3, 4, and 8 space indentation. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 1948 ++--- 1 file changed, 974 insertions(+), 974 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index cc37054cb5..e80d54780b 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -33,18 +33,18 @@ #define TARGET_SYS_SENDMMSG 20/* sendmmsg()*/ #define IPCOP_CALL(VERSION, OP) ((VERSION) << 16 | (OP)) -#define IPCOP_semop1 -#define IPCOP_semget 2 -#define IPCOP_semctl 3 -#define IPCOP_semtimedop 4 -#define IPCOP_msgsnd 11 -#define IPCOP_msgrcv 12 -#define IPCOP_msgget 13 -#define IPCOP_msgctl 14 -#define IPCOP_shmat21 -#define IPCOP_shmdt22 -#define IPCOP_shmget 23 -#define IPCOP_shmctl 24 +#define IPCOP_semop 1 +#define IPCOP_semget2 +#define IPCOP_semctl3 +#define IPCOP_semtimedop4 +#define IPCOP_msgsnd11 +#define IPCOP_msgrcv12 +#define IPCOP_msgget13 +#define IPCOP_msgctl14 +#define IPCOP_shmat 21 +#define IPCOP_shmdt 22 +#define IPCOP_shmget23 +#define IPCOP_shmctl24 #define TARGET_SEMOPM 500 @@ -56,42 +56,42 @@ * this explicit here. Please be sure to use the decoding macros * below from now on. */ -#define TARGET_IOC_NRBITS 8 -#define TARGET_IOC_TYPEBITS8 +#define TARGET_IOC_NRBITS 8 +#define TARGET_IOC_TYPEBITS 8 -#if (defined(TARGET_I386) && defined(TARGET_ABI32)) \ -|| (defined(TARGET_ARM) && defined(TARGET_ABI32)) \ -|| (defined(TARGET_SPARC) && defined(TARGET_ABI32)) \ +#if (defined(TARGET_I386) && defined(TARGET_ABI32)) \ +|| (defined(TARGET_ARM) && defined(TARGET_ABI32)) \ +|| (defined(TARGET_SPARC) && defined(TARGET_ABI32)) \ || defined(TARGET_M68K) || defined(TARGET_SH4) || defined(TARGET_CRIS) -/* 16 bit uid wrappers emulation */ +/* 16 bit uid wrappers emulation */ #define USE_UID16 #define target_id uint16_t #else #define target_id uint32_t #endif -#if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4) \ -|| defined(TARGET_M68K) || defined(TARGET_CRIS) \ -|| defined(TARGET_S390X) || defined(TARGET_OPENRISC) \ -|| defined(TARGET_NIOS2) || defined(TARGET_RISCV) \ +#if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4) \ +|| defined(TARGET_M68K) || defined(TARGET_CRIS) \ +|| defined(TARGET_S390X) || defined(TARGET_OPENRISC)\ +|| defined(TARGET_NIOS2) || defined(TARGET_RISCV) \ || defined(TARGET_XTENSA) || defined(TARGET_LOONGARCH64) -#define TARGET_IOC_SIZEBITS14 -#define TARGET_IOC_DIRBITS 2 +#define TARGET_IOC_SIZEBITS 14 +#define TARGET_IOC_DIRBITS 2 -#define TARGET_IOC_NONE 0U +#define TARGET_IOC_NONE 0U #define TARGET_IOC_WRITE 1U -#define TARGET_IOC_READ 2U +#define TARGET_IOC_READ 2U -#elif defined(TARGET_PPC) || defined(TARGET_ALPHA) || \ - defined(TARGET_SPARC) || defined(TARGET_MICROBLAZE) || \ - defined(TARGET_MIPS) +#elif defined(TARGET_PPC) || defined(TARGET_ALPHA) || \ +defined(TARGET_SPARC) || defined(TARGET_MICROBLAZE) || \ +defined(TARGET_MIPS) -#define TARGET_IOC_SIZEBITS13 -#define TARGET_IOC_DIRBITS 3 +#define TARGET_IOC_SIZEBITS 13 +#define TARGET_IOC_DIRBITS 3 -#define TARGET_IOC_NONE 1U -#define TARGET_IOC_READ 2U +#define TARGET_IOC_NONE 1U +#define TARGET_IOC_READ 2U #define TARGET_IOC_WRITE 4U #elif defined(TARGET_HPPA) @@ -115,32 +115,32 @@ #error unsupported CPU #endif -#define TARGET_IOC_NRMASK ((1 << TARGET_IOC_NRBITS)-1) -#define TARGET_IOC_TYPEMASK((1 << TARGET_IOC_TYPEBITS)-1) -#define TARGET_IOC_SIZEMASK((1 << TARGET_IOC_SIZEBITS)-1) -#define TARGET_IOC_DIRMASK ((1 << TARGET_IOC_DIRBITS)-1) +#define TARGET_IOC_NRMASK ((1 << TARGET_IOC_NRBITS)-1) +#define TARGET_IOC_TYPEMASK ((1 << TARGET_IOC_TYPEBITS)-1) +#define TARGET_IOC_SIZEMASK ((1 << TARGET_IOC_SIZEBITS)-1) +#define TARGET_IOC_DIRMASK ((1 << TARGET_IOC_DIRBITS)-1) -#define TARGET_IOC_NRSHIFT 0 -#define TARGET_IOC_TYPESHIFT (TARGET_IOC_NRSHIFT+TARGET_IOC_NRBITS) -#define TARGET_IOC_SIZESHIFT (TARGET_IOC_TYPESHIFT+TARGET_IOC_TYPEBITS) -#define TARGET_IOC_DIRSHIFT(TARGET_IOC_SIZESHIFT+TARGET_IOC_SIZEBITS) +#define TARGET_IOC_NRSHIFT 0 +#define TARGET_IOC_TYPESHIFT(TARGET_IOC_NRSHIFT+TARGET_IOC_NRBITS) +#define TARGET_IOC_SIZESHIFT(TARGET_IOC_TYPESHIFT+TARGET_IOC_TYPEBITS) +#define TARGET_IOC_DIRSHIFT (TARGET_IOC_SIZESHIFT+TARGET_IOC_
[PULL 34/47] bsd-user: Use page_find_range_empty for mmap_find_vma_reserved
Use the interval tree to find empty space, rather than probing each page in turn. Cc: Warner Losh Cc: Kyle Evans Signed-off-by: Richard Henderson Reviewed-bt: Warner Losh Message-Id: <20230707204054.8792-18-richard.hender...@linaro.org> --- bsd-user/mmap.c | 48 +++- 1 file changed, 7 insertions(+), 41 deletions(-) diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c index 07b5b8055e..aca8764356 100644 --- a/bsd-user/mmap.c +++ b/bsd-user/mmap.c @@ -222,50 +222,16 @@ unsigned long last_brk; static abi_ulong mmap_find_vma_reserved(abi_ulong start, abi_ulong size, abi_ulong alignment) { -abi_ulong addr; -abi_ulong end_addr; -int prot; -int looped = 0; +abi_ulong ret; -if (size > reserved_va) { -return (abi_ulong)-1; +ret = page_find_range_empty(start, reserved_va, size, alignment); +if (ret == -1 && start > TARGET_PAGE_SIZE) { +/* Restart at the beginning of the address space. */ +ret = page_find_range_empty(TARGET_PAGE_SIZE, start - 1, +size, alignment); } -size = HOST_PAGE_ALIGN(size) + alignment; -end_addr = start + size; -if (end_addr > reserved_va) { -end_addr = reserved_va + 1; -} -addr = end_addr - qemu_host_page_size; - -while (1) { -if (addr > end_addr) { -if (looped) { -return (abi_ulong)-1; -} -end_addr = reserved_va + 1; -addr = end_addr - qemu_host_page_size; -looped = 1; -continue; -} -prot = page_get_flags(addr); -if (prot) { -end_addr = addr; -} -if (end_addr - addr >= size) { -break; -} -addr -= qemu_host_page_size; -} - -if (start == mmap_next_start) { -mmap_next_start = addr; -} -/* addr is sufficiently low to align it up */ -if (alignment != 0) { -addr = (addr + alignment) & ~(alignment - 1); -} -return addr; +return ret; } /* -- 2.34.1
[PULL 35/47] linux-user: Use page_find_range_empty for mmap_find_vma_reserved
Use the interval tree to find empty space, rather than probing each page in turn. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-19-richard.hender...@linaro.org> --- linux-user/mmap.c | 52 ++- 1 file changed, 6 insertions(+), 46 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index c4b2515271..738b9b797d 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -318,55 +318,15 @@ unsigned long last_brk; static abi_ulong mmap_find_vma_reserved(abi_ulong start, abi_ulong size, abi_ulong align) { -abi_ulong addr, end_addr, incr = qemu_host_page_size; -int prot; -bool looped = false; +target_ulong ret; -if (size > reserved_va) { -return (abi_ulong)-1; +ret = page_find_range_empty(start, reserved_va, size, align); +if (ret == -1 && start > mmap_min_addr) { +/* Restart at the beginning of the address space. */ +ret = page_find_range_empty(mmap_min_addr, start - 1, size, align); } -/* Note that start and size have already been aligned by mmap_find_vma. */ - -end_addr = start + size; -/* - * Start at the top of the address space, ignoring the last page. - * If reserved_va == UINT32_MAX, then end_addr wraps to 0, - * throwing the rest of the calculations off. - * TODO: rewrite using last_addr instead. - * TODO: use the interval tree instead of probing every page. - */ -if (start > reserved_va - size) { -end_addr = ((reserved_va - size) & -align) + size; -looped = true; -} - -/* Search downward from END_ADDR, checking to see if a page is in use. */ -addr = end_addr; -while (1) { -addr -= incr; -if (addr > end_addr) { -if (looped) { -/* Failure. The entire address space has been searched. */ -return (abi_ulong)-1; -} -/* Re-start at the top of the address space (see above). */ -addr = end_addr = ((reserved_va - size) & -align) + size; -looped = true; -} else { -prot = page_get_flags(addr); -if (prot) { -/* Page in use. Restart below this page. */ -addr = end_addr = ((addr - size) & -align) + size; -} else if (addr && addr + size == end_addr) { -/* Success! All pages between ADDR and END_ADDR are free. */ -if (start == mmap_next_start) { -mmap_next_start = addr; -} -return addr; -} -} -} +return ret; } /* -- 2.34.1
[PULL 33/47] accel/tcg: Introduce page_find_range_empty
Use the interval tree to locate an unused range in the VM. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-17-richard.hender...@linaro.org> --- include/exec/cpu-all.h | 15 +++ accel/tcg/user-exec.c | 41 + 2 files changed, 56 insertions(+) diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 94f828b109..eb1c54701a 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -236,6 +236,21 @@ int page_check_range(target_ulong start, target_ulong len, int flags); */ bool page_check_range_empty(target_ulong start, target_ulong last); +/** + * page_find_range_empty + * @min: first byte of search range + * @max: last byte of search range + * @len: size of the hole required + * @align: alignment of the hole required (power of 2) + * + * If there is a range [x, x+@len) within [@min, @max] such that + * x % @align == 0, then return x. Otherwise return -1. + * The memory lock must be held, as the caller will want to ensure + * the returned range stays empty until a new mapping can be installed. + */ +target_ulong page_find_range_empty(target_ulong min, target_ulong max, + target_ulong len, target_ulong align); + /** * page_get_target_data(address) * @address: guest virtual address diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index ab684a3ea2..e4f9563730 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -605,6 +605,47 @@ bool page_check_range_empty(target_ulong start, target_ulong last) return pageflags_find(start, last) == NULL; } +target_ulong page_find_range_empty(target_ulong min, target_ulong max, + target_ulong len, target_ulong align) +{ +target_ulong len_m1, align_m1; + +assert(min <= max); +assert(max <= GUEST_ADDR_MAX); +assert(len != 0); +assert(is_power_of_2(align)); +assert_memory_lock(); + +len_m1 = len - 1; +align_m1 = align - 1; + +/* Iteratively narrow the search region. */ +while (1) { +PageFlagsNode *p; + +/* Align min and double-check there's enough space remaining. */ +min = (min + align_m1) & ~align_m1; +if (min > max) { +return -1; +} +if (len_m1 > max - min) { +return -1; +} + +p = pageflags_find(min, min + len_m1); +if (p == NULL) { +/* Found! */ +return min; +} +if (max <= p->itree.last) { +/* Existing allocation fills the remainder of the search region. */ +return -1; +} +/* Skip across existing allocation. */ +min = p->itree.last + 1; +} +} + void page_protect(tb_page_addr_t address) { PageFlagsNode *p; -- 2.34.1
[PULL 08/47] linux-user: Use abi_ullong not unsigned long long in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 20986bd1d3..45ebacd4b4 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1385,13 +1385,13 @@ struct target_stat64 { abi_ulong target_st_ctime; abi_ulong target_st_ctime_nsec; -unsigned long long st_ino; +abi_ullong st_ino; } QEMU_PACKED; #ifdef TARGET_ARM #define TARGET_HAS_STRUCT_STAT64 struct target_eabi_stat64 { -unsigned long long st_dev; +abi_ullong st_dev; abi_uint __pad1; abi_ulong__st_ino; abi_uint st_mode; @@ -1400,13 +1400,13 @@ struct target_eabi_stat64 { abi_ulongst_uid; abi_ulongst_gid; -unsigned long long st_rdev; +abi_ullong st_rdev; abi_uint __pad2[2]; long long st_size; abi_ulongst_blksize; abi_uint __pad3; -unsigned long long st_blocks; +abi_ullong st_blocks; abi_ulongtarget_st_atime; abi_ulongtarget_st_atime_nsec; @@ -1417,7 +1417,7 @@ struct target_eabi_stat64 { abi_ulongtarget_st_ctime; abi_ulongtarget_st_ctime_nsec; -unsigned long long st_ino; +abi_ullong st_ino; } QEMU_PACKED; #endif @@ -1568,14 +1568,14 @@ struct target_stat { #if !defined(TARGET_PPC64) #define TARGET_HAS_STRUCT_STAT64 struct QEMU_PACKED target_stat64 { -unsigned long long st_dev; -unsigned long long st_ino; +abi_ullong st_dev; +abi_ullong st_ino; abi_uint st_mode; abi_uint st_nlink; abi_uint st_uid; abi_uint st_gid; -unsigned long long st_rdev; -unsigned long long __pad0; +abi_ullong st_rdev; +abi_ullong __pad0; long long st_size; intst_blksize; abi_uint __pad1; @@ -1674,7 +1674,7 @@ struct target_stat { */ #define TARGET_HAS_STRUCT_STAT64 struct target_stat64 { -unsigned long long st_dev; +abi_ullong st_dev; unsigned char __pad1[2]; #define TARGET_STAT64_HAS_BROKEN_ST_INO 1 @@ -1686,7 +1686,7 @@ struct target_stat64 { abi_ulong st_uid; abi_ulong st_gid; -unsigned long long st_rdev; +abi_ullong st_rdev; unsigned char __pad3[2]; long long st_size; @@ -1704,7 +1704,7 @@ struct target_stat64 { abi_ulong target_st_ctime; abi_ulong target_st_ctime_nsec; -unsigned long long st_ino; +abi_ullong st_ino; } QEMU_PACKED; #elif defined(TARGET_ABI_MIPSN64) @@ -1918,7 +1918,7 @@ struct target_stat { */ #define TARGET_HAS_STRUCT_STAT64 struct QEMU_PACKED target_stat64 { -unsigned long long st_dev; +abi_ullong st_dev; unsigned char __pad0[4]; #define TARGET_STAT64_HAS_BROKEN_ST_INO 1 @@ -1930,13 +1930,13 @@ struct QEMU_PACKED target_stat64 { abi_ulong st_uid; abi_ulong st_gid; -unsigned long long st_rdev; +abi_ullong st_rdev; unsigned char __pad3[4]; long long st_size; abi_ulong st_blksize; -unsigned long long st_blocks; /* Number 512-byte blocks allocated. */ +abi_ullong st_blocks; /* Number 512-byte blocks allocated. */ abi_ulong target_st_atime; abi_ulong target_st_atime_nsec; @@ -1947,7 +1947,7 @@ struct QEMU_PACKED target_stat64 { abi_ulong target_st_ctime; abi_ulong target_st_ctime_nsec; -unsigned long long st_ino; +abi_ullong st_ino; }; #elif defined(TARGET_I386) && !defined(TARGET_ABI32) -- 2.34.1
[PULL 19/47] tcg: Fix info_in_idx increment in layout_arg_by_ref
Off by one error, failing to take into account that layout_arg_1 already incremented info_in_idx for the first piece. We only need care for the n-1 TCG_CALL_ARG_BY_REF_N pieces here. Cc: qemu-sta...@nongnu.org Fixes: 313bdea84d2 ("tcg: Add TCG_CALL_{RET,ARG}_BY_REF") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1751 Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé Tested-by: Peter Maydell --- tcg/tcg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/tcg.c b/tcg/tcg.c index a0628fe424..652e8ea6b9 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -1083,7 +1083,7 @@ static void layout_arg_by_ref(TCGCumulativeArgs *cum, TCGHelperInfo *info) .ref_slot = cum->ref_slot + i, }; } -cum->info_in_idx += n; +cum->info_in_idx += n - 1; /* i=0 accounted for in layout_arg_1 */ cum->ref_slot += n; } -- 2.34.1
[PULL 32/47] linux-user: Rewrite mmap_frag
Use 'last' variables instead of 'end' variables. Always zero MAP_ANONYMOUS fragments, which we previously failed to do if they were not writable; early exit in case we allocate a new page from the kernel, known zeros. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-16-richard.hender...@linaro.org> --- linux-user/mmap.c | 123 +++--- 1 file changed, 62 insertions(+), 61 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index d02d74d279..c4b2515271 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -222,73 +222,76 @@ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) } /* map an incomplete host page */ -static int mmap_frag(abi_ulong real_start, - abi_ulong start, abi_ulong end, - int prot, int flags, int fd, off_t offset) +static bool mmap_frag(abi_ulong real_start, abi_ulong start, abi_ulong last, + int prot, int flags, int fd, off_t offset) { -abi_ulong real_end, addr; +abi_ulong real_last; void *host_start; -int prot1, prot_new; +int prot_old, prot_new; +int host_prot_old, host_prot_new; -real_end = real_start + qemu_host_page_size; -host_start = g2h_untagged(real_start); - -/* get the protection of the target pages outside the mapping */ -prot1 = 0; -for (addr = real_start; addr < real_end; addr++) { -if (addr < start || addr >= end) { -prot1 |= page_get_flags(addr); -} +if (!(flags & MAP_ANONYMOUS) +&& (flags & MAP_TYPE) == MAP_SHARED +&& (prot & PROT_WRITE)) { +/* + * msync() won't work with the partial page, so we return an + * error if write is possible while it is a shared mapping. + */ +errno = EINVAL; +return false; } -if (prot1 == 0) { -/* no page was there, so we allocate one */ +real_last = real_start + qemu_host_page_size - 1; +host_start = g2h_untagged(real_start); + +/* Get the protection of the target pages outside the mapping. */ +prot_old = 0; +for (abi_ulong a = real_start; a < start; a += TARGET_PAGE_SIZE) { +prot_old |= page_get_flags(a); +} +for (abi_ulong a = real_last; a > last; a -= TARGET_PAGE_SIZE) { +prot_old |= page_get_flags(a); +} + +if (prot_old == 0) { +/* + * Since !(prot_old & PAGE_VALID), there were no guest pages + * outside of the fragment we need to map. Allocate a new host + * page to cover, discarding whatever else may have been present. + */ void *p = mmap(host_start, qemu_host_page_size, target_to_host_prot(prot), flags | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { -return -1; +return false; } -prot1 = prot; +prot_old = prot; } -prot1 &= PAGE_BITS; +prot_new = prot | prot_old; -prot_new = prot | prot1; -if (!(flags & MAP_ANONYMOUS)) { -/* - * msync() won't work here, so we return an error if write is - * possible while it is a shared mapping. - */ -if ((flags & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) { -return -1; -} +host_prot_old = target_to_host_prot(prot_old); +host_prot_new = target_to_host_prot(prot_new); -/* adjust protection to be able to read */ -if (!(prot1 & PROT_WRITE)) { -mprotect(host_start, qemu_host_page_size, - target_to_host_prot(prot1) | PROT_WRITE); -} +/* Adjust protection to be able to write. */ +if (!(host_prot_old & PROT_WRITE)) { +host_prot_old |= PROT_WRITE; +mprotect(host_start, qemu_host_page_size, host_prot_old); +} -/* read the corresponding file data */ -if (pread(fd, g2h_untagged(start), end - start, offset) == -1) { -return -1; -} - -/* put final protection */ -if (prot_new != (prot1 | PROT_WRITE)) { -mprotect(host_start, qemu_host_page_size, - target_to_host_prot(prot_new)); -} +/* Read or zero the new guest pages. */ +if (flags & MAP_ANONYMOUS) { +memset(g2h_untagged(start), 0, last - start + 1); } else { -if (prot_new != prot1) { -mprotect(host_start, qemu_host_page_size, - target_to_host_prot(prot_new)); -} -if (prot_new & PROT_WRITE) { -memset(g2h_untagged(start), 0, end - start); +if (pread(fd, g2h_untagged(start), last - start + 1, offset) == -1) { +return false; } } -return 0; + +/* Put final protection */ +if (host_prot_new != host_prot_old) { +mprotect(host_start, qemu_host_page_size, host_prot_new); +} +return true; } #if HOST_LONG_BITS == 64 && TARGET_ABI_BITS ==
[PULL 14/47] include/exec/user: Set ABI_LLONG_ALIGNMENT to 4 for microblaze
Based on gcc's microblaze.h setting BIGGEST_ALIGNMENT to 32 bits. Signed-off-by: Richard Henderson --- include/exec/user/abitypes.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/exec/user/abitypes.h b/include/exec/user/abitypes.h index 743b8bb9ea..beba0a48c7 100644 --- a/include/exec/user/abitypes.h +++ b/include/exec/user/abitypes.h @@ -15,7 +15,9 @@ #define ABI_LLONG_ALIGNMENT 2 #endif -#if (defined(TARGET_I386) && !defined(TARGET_X86_64)) || defined(TARGET_SH4) +#if (defined(TARGET_I386) && !defined(TARGET_X86_64)) \ +|| defined(TARGET_SH4) \ +|| defined(TARGET_MICROBLAZE) #define ABI_LLONG_ALIGNMENT 4 #endif -- 2.34.1
[PULL 17/47] linux-user: Fix do_shmat type errors
The guest address, raddr, should be unsigned, aka abi_ulong. The host addresses should be cast via *intptr_t not long. Drop the inline and fix two other whitespace issues. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Anton Johansson Message-Id: <20230626140250.69572-1-richard.hender...@linaro.org> --- linux-user/syscall.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index c15d9ad743..b78eb686d8 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -4539,14 +4539,14 @@ static inline abi_ulong target_shmlba(CPUArchState *cpu_env) } #endif -static inline abi_ulong do_shmat(CPUArchState *cpu_env, - int shmid, abi_ulong shmaddr, int shmflg) +static abi_ulong do_shmat(CPUArchState *cpu_env, int shmid, + abi_ulong shmaddr, int shmflg) { CPUState *cpu = env_cpu(cpu_env); -abi_long raddr; +abi_ulong raddr; void *host_raddr; struct shmid_ds shm_info; -int i,ret; +int i, ret; abi_ulong shmlba; /* shmat pointers are always untagged */ @@ -4602,9 +4602,9 @@ static inline abi_ulong do_shmat(CPUArchState *cpu_env, if (host_raddr == (void *)-1) { mmap_unlock(); -return get_errno((long)host_raddr); +return get_errno((intptr_t)host_raddr); } -raddr=h2g((unsigned long)host_raddr); +raddr = h2g((uintptr_t)host_raddr); page_set_flags(raddr, raddr + shm_info.shm_segsz - 1, PAGE_VALID | PAGE_RESET | PAGE_READ | @@ -4621,7 +4621,6 @@ static inline abi_ulong do_shmat(CPUArchState *cpu_env, mmap_unlock(); return raddr; - } static inline abi_long do_shmdt(abi_ulong shmaddr) -- 2.34.1
[PULL 25/47] linux-user: Populate more bits in mmap_flags_tbl
Fix translation of TARGET_MAP_SHARED and TARGET_MAP_PRIVATE, which are types not single bits. Add TARGET_MAP_SHARED_VALIDATE, TARGET_MAP_SYNC, TARGET_MAP_NONBLOCK, TARGET_MAP_POPULATE, TARGET_MAP_FIXED_NOREPLACE, and TARGET_MAP_UNINITIALIZED. Update strace to match. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-9-richard.hender...@linaro.org> --- linux-user/strace.c | 23 ++- linux-user/syscall.c | 21 +++-- 2 files changed, 29 insertions(+), 15 deletions(-) diff --git a/linux-user/strace.c b/linux-user/strace.c index 9228b235da..bbd29148d4 100644 --- a/linux-user/strace.c +++ b/linux-user/strace.c @@ -1094,28 +1094,25 @@ UNUSED static const struct flags mmap_prot_flags[] = { }; UNUSED static const struct flags mmap_flags[] = { -FLAG_TARGET(MAP_SHARED), -FLAG_TARGET(MAP_PRIVATE), +FLAG_TARGET_MASK(MAP_SHARED, MAP_TYPE), +FLAG_TARGET_MASK(MAP_PRIVATE, MAP_TYPE), +FLAG_TARGET_MASK(MAP_SHARED_VALIDATE, MAP_TYPE), FLAG_TARGET(MAP_ANONYMOUS), FLAG_TARGET(MAP_DENYWRITE), -FLAG_TARGET(MAP_FIXED), -FLAG_TARGET(MAP_GROWSDOWN), FLAG_TARGET(MAP_EXECUTABLE), -#ifdef MAP_LOCKED +FLAG_TARGET(MAP_FIXED), +FLAG_TARGET(MAP_FIXED_NOREPLACE), +FLAG_TARGET(MAP_GROWSDOWN), +FLAG_TARGET(MAP_HUGETLB), FLAG_TARGET(MAP_LOCKED), -#endif -#ifdef MAP_NONBLOCK FLAG_TARGET(MAP_NONBLOCK), -#endif FLAG_TARGET(MAP_NORESERVE), -#ifdef MAP_POPULATE FLAG_TARGET(MAP_POPULATE), -#endif -#if defined(TARGET_MAP_UNINITIALIZED) && TARGET_MAP_UNINITIALIZED != 0 +FLAG_TARGET(MAP_STACK), +FLAG_TARGET(MAP_SYNC), +#if TARGET_MAP_UNINITIALIZED != 0 FLAG_TARGET(MAP_UNINITIALIZED), #endif -FLAG_TARGET(MAP_HUGETLB), -FLAG_TARGET(MAP_STACK), FLAG_END, }; diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 02d3b6c90a..3a89f6b408 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -6012,9 +6012,19 @@ static const StructEntry struct_termios_def = { .print = print_termios, }; +/* If the host does not provide these bits, they may be safely discarded. */ +#ifndef MAP_SYNC +#define MAP_SYNC 0 +#endif +#ifndef MAP_UNINITIALIZED +#define MAP_UNINITIALIZED 0 +#endif + static const bitmask_transtbl mmap_flags_tbl[] = { -{ TARGET_MAP_SHARED, TARGET_MAP_SHARED, MAP_SHARED, MAP_SHARED }, -{ TARGET_MAP_PRIVATE, TARGET_MAP_PRIVATE, MAP_PRIVATE, MAP_PRIVATE }, +{ TARGET_MAP_TYPE, TARGET_MAP_SHARED, MAP_TYPE, MAP_SHARED }, +{ TARGET_MAP_TYPE, TARGET_MAP_PRIVATE, MAP_TYPE, MAP_PRIVATE }, +{ TARGET_MAP_TYPE, TARGET_MAP_SHARED_VALIDATE, + MAP_TYPE, MAP_SHARED_VALIDATE }, { TARGET_MAP_FIXED, TARGET_MAP_FIXED, MAP_FIXED, MAP_FIXED }, { TARGET_MAP_ANONYMOUS, TARGET_MAP_ANONYMOUS, MAP_ANONYMOUS, MAP_ANONYMOUS }, @@ -6032,6 +6042,13 @@ static const bitmask_transtbl mmap_flags_tbl[] = { Recognize it for the target insofar as we do not want to pass it through to the host. */ { TARGET_MAP_STACK, TARGET_MAP_STACK, 0, 0 }, +{ TARGET_MAP_SYNC, TARGET_MAP_SYNC, MAP_SYNC, MAP_SYNC }, +{ TARGET_MAP_NONBLOCK, TARGET_MAP_NONBLOCK, MAP_NONBLOCK, MAP_NONBLOCK }, +{ TARGET_MAP_POPULATE, TARGET_MAP_POPULATE, MAP_POPULATE, MAP_POPULATE }, +{ TARGET_MAP_FIXED_NOREPLACE, TARGET_MAP_FIXED_NOREPLACE, + MAP_FIXED_NOREPLACE, MAP_FIXED_NOREPLACE }, +{ TARGET_MAP_UNINITIALIZED, TARGET_MAP_UNINITIALIZED, + MAP_UNINITIALIZED, MAP_UNINITIALIZED }, { 0, 0, 0, 0 } }; -- 2.34.1
[PULL 28/47] linux-user: Implement MAP_FIXED_NOREPLACE
Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-12-richard.hender...@linaro.org> --- linux-user/mmap.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 639921dba0..9dc34fc29d 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -509,7 +509,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, * If the user is asking for the kernel to find a location, do that * before we truncate the length for mapping files below. */ -if (!(flags & MAP_FIXED)) { +if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) { host_len = len + offset - host_offset; host_len = HOST_PAGE_ALIGN(host_len); start = mmap_find_vma(real_start, host_len, TARGET_PAGE_SIZE); @@ -551,7 +551,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, } } -if (!(flags & MAP_FIXED)) { +if (!(flags & (MAP_FIXED | MAP_FIXED_NOREPLACE))) { unsigned long host_start; void *p; @@ -600,6 +600,13 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, goto fail; } +/* Validate that the chosen range is empty. */ +if ((flags & MAP_FIXED_NOREPLACE) +&& !page_check_range_empty(start, end - 1)) { +errno = EEXIST; +goto fail; +} + /* * worst case: we cannot map the file because the offset is not * aligned, so we read it @@ -615,7 +622,8 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, goto fail; } retaddr = target_mmap(start, len, target_prot | PROT_WRITE, - MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, + (flags & (MAP_FIXED | MAP_FIXED_NOREPLACE)) + | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (retaddr == -1) { goto fail; -- 2.34.1
[PULL 20/47] linux-user: Make sure initial brk(0) is page-aligned
From: Andreas Schwab Fixes: 86f04735ac ("linux-user: Fix brk() to release pages") Signed-off-by: Andreas Schwab Message-Id: Reviewed-by: Richard Henderson Signed-off-by: Richard Henderson --- linux-user/syscall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b78eb686d8..02d3b6c90a 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -806,7 +806,7 @@ static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { -target_brk = new_brk; +target_brk = TARGET_PAGE_ALIGN(new_brk); brk_page = HOST_PAGE_ALIGN(target_brk); } -- 2.34.1
[PULL 27/47] bsd-user: Use page_check_range_empty for MAP_EXCL
The previous check returned -1 when any page within [start, start+len) is unmapped, not when all are unmapped. Cc: Warner Losh Cc: Kyle Evans Signed-off-by: Richard Henderson Reviewed-by: Warner Losh Message-Id: <20230707204054.8792-11-richard.hender...@linaro.org> --- bsd-user/mmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c index 565b9f97ed..07b5b8055e 100644 --- a/bsd-user/mmap.c +++ b/bsd-user/mmap.c @@ -609,7 +609,7 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int prot, } /* Reject the mapping if any page within the range is mapped */ -if ((flags & MAP_EXCL) && page_check_range(start, len, 0) < 0) { +if ((flags & MAP_EXCL) && !page_check_range_empty(start, end - 1)) { errno = EINVAL; goto fail; } -- 2.34.1
[PULL 26/47] accel/tcg: Introduce page_check_range_empty
Examine the interval tree to validate that a region has no existing mappings. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-10-richard.hender...@linaro.org> --- include/exec/cpu-all.h | 12 accel/tcg/user-exec.c | 7 +++ 2 files changed, 19 insertions(+) diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index 472fe9ad9c..94f828b109 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -224,6 +224,18 @@ void page_set_flags(target_ulong start, target_ulong last, int flags); void page_reset_target_data(target_ulong start, target_ulong last); int page_check_range(target_ulong start, target_ulong len, int flags); +/** + * page_check_range_empty: + * @start: first byte of range + * @last: last byte of range + * Context: holding mmap lock + * + * Return true if the entire range [@start, @last] is unmapped. + * The memory lock must be held so that the caller will can ensure + * the result stays true until a new mapping can be installed. + */ +bool page_check_range_empty(target_ulong start, target_ulong last); + /** * page_get_target_data(address) * @address: guest virtual address diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index d95b875a6a..ab684a3ea2 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -598,6 +598,13 @@ int page_check_range(target_ulong start, target_ulong len, int flags) return ret; } +bool page_check_range_empty(target_ulong start, target_ulong last) +{ +assert(last >= start); +assert_memory_lock(); +return pageflags_find(start, last) == NULL; +} + void page_protect(tb_page_addr_t address) { PageFlagsNode *p; -- 2.34.1
[PULL 47/47] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128
We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with CONFIG_ATOMIC128_OPT in atomic128.h. It is difficult to tell when those changes have been applied with the ifdef we must use with CONFIG_CMPXCHG128. So instead use HAVE_CMPXCHG128, which triggers -Werror-undef when the proper header has not been included. Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which requires CONFIG_ATOMIC128_OPT. Without this we fall back to EXCP_ATOMIC to single-step 128-bit atomics, which is slow enough to cause some tests to time out. Reported-by: Thomas Huth Tested-by: Thomas Huth Signed-off-by: Richard Henderson --- accel/tcg/tcg-runtime.h| 2 +- include/exec/helper-proto-common.h | 2 ++ accel/tcg/cputlb.c | 2 +- accel/tcg/user-exec.c | 2 +- tcg/tcg-op-ldst.c | 2 +- accel/tcg/atomic_common.c.inc | 2 +- 6 files changed, 7 insertions(+), 5 deletions(-) diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 39e68007f9..186899a2c7 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -58,7 +58,7 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG, DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i32) #endif -#ifdef CONFIG_CMPXCHG128 +#if HAVE_CMPXCHG128 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG, i128, env, i64, i128, i128, i32) DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG, diff --git a/include/exec/helper-proto-common.h b/include/exec/helper-proto-common.h index 4d4b022668..8b67170a22 100644 --- a/include/exec/helper-proto-common.h +++ b/include/exec/helper-proto-common.h @@ -7,6 +7,8 @@ #ifndef HELPER_PROTO_COMMON_H #define HELPER_PROTO_COMMON_H +#include "qemu/atomic128.h" /* for HAVE_CMPXCHG128 */ + #define HELPER_H "accel/tcg/tcg-runtime.h" #include "exec/helper-proto.h.inc" #undef HELPER_H diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index c2b81ec569..e0079c9a9d 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -3105,7 +3105,7 @@ void cpu_st16_mmu(CPUArchState *env, target_ulong addr, Int128 val, #include "atomic_template.h" #endif -#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128) +#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128 #define DATA_SIZE 16 #include "atomic_template.h" #endif diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index df60c7d673..ac38c2bf96 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -1433,7 +1433,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi, #include "atomic_template.h" #endif -#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128) +#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128 #define DATA_SIZE 16 #include "atomic_template.h" #endif diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c index 0fcc1618e5..d54c305598 100644 --- a/tcg/tcg-op-ldst.c +++ b/tcg/tcg-op-ldst.c @@ -778,7 +778,7 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv_i64, #else # define WITH_ATOMIC64(X) #endif -#ifdef CONFIG_CMPXCHG128 +#if HAVE_CMPXCHG128 # define WITH_ATOMIC128(X) X, #else # define WITH_ATOMIC128(X) diff --git a/accel/tcg/atomic_common.c.inc b/accel/tcg/atomic_common.c.inc index ee222fd7e7..95a5c5ff12 100644 --- a/accel/tcg/atomic_common.c.inc +++ b/accel/tcg/atomic_common.c.inc @@ -41,7 +41,7 @@ CMPXCHG_HELPER(cmpxchgq_be, uint64_t) CMPXCHG_HELPER(cmpxchgq_le, uint64_t) #endif -#ifdef CONFIG_CMPXCHG128 +#if HAVE_CMPXCHG128 CMPXCHG_HELPER(cmpxchgo_be, Int128) CMPXCHG_HELPER(cmpxchgo_le, Int128) #endif -- 2.34.1
[PULL 37/47] linux-user: Rewrite mmap_reserve
Use 'last' variables instead of 'end' variables; be careful about avoiding overflow. Assert that the mmap succeeded. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-21-richard.hender...@linaro.org> --- linux-user/mmap.c | 68 +-- 1 file changed, 42 insertions(+), 26 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index bb9cbe52cd..6308787942 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -722,47 +722,63 @@ fail: return -1; } -static void mmap_reserve(abi_ulong start, abi_ulong size) +static void mmap_reserve(abi_ulong start, abi_ulong len) { abi_ulong real_start; -abi_ulong real_end; -abi_ulong addr; -abi_ulong end; +abi_ulong real_last; +abi_ulong real_len; +abi_ulong last; +abi_ulong a; +void *host_start, *ptr; int prot; +last = start + len - 1; real_start = start & qemu_host_page_mask; -real_end = HOST_PAGE_ALIGN(start + size); -end = start + size; -if (start > real_start) { -/* handle host page containing start */ +real_last = HOST_PAGE_ALIGN(last) - 1; + +/* + * If guest pages remain on the first or last host pages, + * adjust the deallocation to retain those guest pages. + * The single page special case is required for the last page, + * lest real_start overflow to zero. + */ +if (real_last - real_start < qemu_host_page_size) { prot = 0; -for (addr = real_start; addr < start; addr += TARGET_PAGE_SIZE) { -prot |= page_get_flags(addr); +for (a = real_start; a < start; a += TARGET_PAGE_SIZE) { +prot |= page_get_flags(a); } -if (real_end == real_start + qemu_host_page_size) { -for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) { -prot |= page_get_flags(addr); -} -end = real_end; +for (a = last; a < real_last; a += TARGET_PAGE_SIZE) { +prot |= page_get_flags(a + 1); +} +if (prot != 0) { +return; +} +} else { +for (prot = 0, a = real_start; a < start; a += TARGET_PAGE_SIZE) { +prot |= page_get_flags(a); } if (prot != 0) { real_start += qemu_host_page_size; } -} -if (end < real_end) { -prot = 0; -for (addr = end; addr < real_end; addr += TARGET_PAGE_SIZE) { -prot |= page_get_flags(addr); + +for (prot = 0, a = last; a < real_last; a += TARGET_PAGE_SIZE) { +prot |= page_get_flags(a + 1); } if (prot != 0) { -real_end -= qemu_host_page_size; +real_last -= qemu_host_page_size; +} + +if (real_last < real_start) { +return; } } -if (real_start != real_end) { -mmap(g2h_untagged(real_start), real_end - real_start, PROT_NONE, - MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, - -1, 0); -} + +real_len = real_last - real_start + 1; +host_start = g2h_untagged(real_start); + +ptr = mmap(host_start, real_len, PROT_NONE, + MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0); +assert(ptr == host_start); } int target_munmap(abi_ulong start, abi_ulong len) -- 2.34.1
[PULL 41/47] accel/tcg: Return bool from page_check_range
Replace the 0/-1 result with true/false. Invert the sense of the test of all callers. Document the function. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-25-richard.hender...@linaro.org> --- bsd-user/qemu.h| 2 +- include/exec/cpu-all.h | 13 - linux-user/qemu.h | 2 +- accel/tcg/user-exec.c | 22 +++--- linux-user/syscall.c | 2 +- target/hppa/op_helper.c| 2 +- target/riscv/vector_helper.c | 2 +- target/sparc/ldst_helper.c | 2 +- accel/tcg/ldst_atomicity.c.inc | 4 ++-- 9 files changed, 31 insertions(+), 20 deletions(-) diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h index 41d84e0b81..edf9602f9b 100644 --- a/bsd-user/qemu.h +++ b/bsd-user/qemu.h @@ -267,7 +267,7 @@ abi_long do_freebsd_sysarch(void *cpu_env, abi_long arg1, abi_long arg2); static inline bool access_ok(int type, abi_ulong addr, abi_ulong size) { -return page_check_range((target_ulong)addr, size, type) == 0; +return page_check_range((target_ulong)addr, size, type); } /* diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index eb1c54701a..94f44f1f59 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -222,7 +222,18 @@ int walk_memory_regions(void *, walk_memory_regions_fn); int page_get_flags(target_ulong address); void page_set_flags(target_ulong start, target_ulong last, int flags); void page_reset_target_data(target_ulong start, target_ulong last); -int page_check_range(target_ulong start, target_ulong len, int flags); + +/** + * page_check_range + * @start: first byte of range + * @len: length of range + * @flags: flags required for each page + * + * Return true if every page in [@start, @start+@len) has @flags set. + * Return false if any page is unmapped. Thus testing flags == 0 is + * equivalent to testing for flags == PAGE_VALID. + */ +bool page_check_range(target_ulong start, target_ulong last, int flags); /** * page_check_range_empty: diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 9b8e0860d7..802794db63 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -182,7 +182,7 @@ static inline bool access_ok_untagged(int type, abi_ulong addr, abi_ulong size) : !guest_range_valid_untagged(addr, size)) { return false; } -return page_check_range((target_ulong)addr, size, type) == 0; +return page_check_range((target_ulong)addr, size, type); } static inline bool access_ok(CPUState *cpu, int type, diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index 1e8fcaf6b0..df60c7d673 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -520,19 +520,19 @@ void page_set_flags(target_ulong start, target_ulong last, int flags) } } -int page_check_range(target_ulong start, target_ulong len, int flags) +bool page_check_range(target_ulong start, target_ulong len, int flags) { target_ulong last; int locked; /* tri-state: =0: unlocked, +1: global, -1: local */ -int ret; +bool ret; if (len == 0) { -return 0; /* trivial length */ +return true; /* trivial length */ } last = start + len - 1; if (last < start) { -return -1; /* wrap around */ +return false; /* wrap around */ } locked = have_mmap_lock(); @@ -551,33 +551,33 @@ int page_check_range(target_ulong start, target_ulong len, int flags) p = pageflags_find(start, last); } if (!p) { -ret = -1; /* entire region invalid */ +ret = false; /* entire region invalid */ break; } } if (start < p->itree.start) { -ret = -1; /* initial bytes invalid */ +ret = false; /* initial bytes invalid */ break; } missing = flags & ~p->flags; if (missing & ~PAGE_WRITE) { -ret = -1; /* page doesn't match */ +ret = false; /* page doesn't match */ break; } if (missing & PAGE_WRITE) { if (!(p->flags & PAGE_WRITE_ORG)) { -ret = -1; /* page not writable */ +ret = false; /* page not writable */ break; } /* Asking about writable, but has been protected: undo. */ if (!page_unprotect(start, 0)) { -ret = -1; +ret = false; break; } /* TODO: page_unprotect should take a range, not a single page. */ if (last - start < TARGET_PAGE_SIZE) { -ret = 0; /* ok */ +ret = true; /* ok */ break; } start += TARGET_PAGE_SIZE; @@ -585,7 +585,7 @@ int page_check_range(target_ulong start, target_ulong len, int flags) } if (last <= p->itree.last) { -ret = 0; /* ok */ +ret = tru
[PULL 38/47] linux-user: Rename mmap_reserve to mmap_reserve_or_unmap
If !reserved_va, munmap instead and assert success. Update all callers. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-22-richard.hender...@linaro.org> --- linux-user/mmap.c | 29 - 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 6308787942..22c2869be8 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -722,14 +722,14 @@ fail: return -1; } -static void mmap_reserve(abi_ulong start, abi_ulong len) +static void mmap_reserve_or_unmap(abi_ulong start, abi_ulong len) { abi_ulong real_start; abi_ulong real_last; abi_ulong real_len; abi_ulong last; abi_ulong a; -void *host_start, *ptr; +void *host_start; int prot; last = start + len - 1; @@ -776,9 +776,15 @@ static void mmap_reserve(abi_ulong start, abi_ulong len) real_len = real_last - real_start + 1; host_start = g2h_untagged(real_start); -ptr = mmap(host_start, real_len, PROT_NONE, - MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0); -assert(ptr == host_start); +if (reserved_va) { +void *ptr = mmap(host_start, real_len, PROT_NONE, + MAP_FIXED | MAP_ANONYMOUS + | MAP_PRIVATE | MAP_NORESERVE, -1, 0); +assert(ptr == host_start); +} else { +int ret = munmap(host_start, real_len); +assert(ret == 0); +} } int target_munmap(abi_ulong start, abi_ulong len) @@ -830,11 +836,7 @@ int target_munmap(abi_ulong start, abi_ulong len) ret = 0; /* unmap what we can */ if (real_start < real_end) { -if (reserved_va) { -mmap_reserve(real_start, real_end - real_start); -} else { -ret = munmap(g2h_untagged(real_start), real_end - real_start); -} +mmap_reserve_or_unmap(real_start, real_end - real_start); } if (ret == 0) { @@ -871,7 +873,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, * If new and old addresses overlap then the above mremap will * already have failed with EINVAL. */ -mmap_reserve(old_addr, old_size); +mmap_reserve_or_unmap(old_addr, old_size); } } else if (flags & MREMAP_MAYMOVE) { abi_ulong mmap_start; @@ -886,7 +888,7 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, flags | MREMAP_FIXED, g2h_untagged(mmap_start)); if (reserved_va) { -mmap_reserve(old_addr, old_size); +mmap_reserve_or_unmap(old_addr, old_size); } } } else { @@ -912,7 +914,8 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, errno = ENOMEM; host_addr = MAP_FAILED; } else if (reserved_va && old_size > new_size) { -mmap_reserve(old_addr + old_size, old_size - new_size); +mmap_reserve_or_unmap(old_addr + old_size, + old_size - new_size); } } } else { -- 2.34.1
[PULL 09/47] linux-user: Use abi_llong not long long in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 45ebacd4b4..e4fcbd16d2 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1370,7 +1370,7 @@ struct target_stat64 { unsigned short st_rdev; unsigned char __pad3[10]; -long long st_size; +abi_llong st_size; abi_ulong st_blksize; abi_ulong st_blocks; /* Number 512-byte blocks allocated. */ @@ -1403,7 +1403,7 @@ struct target_eabi_stat64 { abi_ullong st_rdev; abi_uint __pad2[2]; -long long st_size; +abi_llong st_size; abi_ulongst_blksize; abi_uint __pad3; abi_ullong st_blocks; @@ -1576,10 +1576,10 @@ struct QEMU_PACKED target_stat64 { abi_uint st_gid; abi_ullong st_rdev; abi_ullong __pad0; -long long st_size; +abi_llong st_size; intst_blksize; abi_uint __pad1; -long long st_blocks; /* Number 512-byte blocks allocated. */ +abi_llong st_blocks; /* Number 512-byte blocks allocated. */ inttarget_st_atime; abi_uint target_st_atime_nsec; inttarget_st_mtime; @@ -1689,7 +1689,7 @@ struct target_stat64 { abi_ullong st_rdev; unsigned char __pad3[2]; -long long st_size; +abi_llong st_size; abi_ulong st_blksize; abi_ulong __pad4; /* future possible st_blocks high bits */ @@ -1933,7 +1933,7 @@ struct QEMU_PACKED target_stat64 { abi_ullong st_rdev; unsigned char __pad3[4]; -long long st_size; +abi_llong st_size; abi_ulong st_blksize; abi_ullong st_blocks; /* Number 512-byte blocks allocated. */ -- 2.34.1
[PULL 46/47] accel/tcg: Always lock pages before translation
We had done this for user-mode by invoking page_protect within the translator loop. Extend this to handle system mode as well. Move page locking out of tb_link_page. Reported-by: Liren Wei Reported-by: Richard W.M. Jones Signed-off-by: Richard Henderson Tested-by: Richard W.M. Jones --- accel/tcg/internal.h | 30 - accel/tcg/cpu-exec.c | 20 accel/tcg/tb-maint.c | 242 -- accel/tcg/translate-all.c | 43 ++- accel/tcg/translator.c| 34 -- 5 files changed, 236 insertions(+), 133 deletions(-) diff --git a/accel/tcg/internal.h b/accel/tcg/internal.h index 650c3ac53f..e8cbbde581 100644 --- a/accel/tcg/internal.h +++ b/accel/tcg/internal.h @@ -10,6 +10,7 @@ #define ACCEL_TCG_INTERNAL_H #include "exec/exec-all.h" +#include "exec/translate-all.h" /* * Access to the various translations structures need to be serialised @@ -35,6 +36,32 @@ static inline void page_table_config_init(void) { } void page_table_config_init(void); #endif +#ifdef CONFIG_USER_ONLY +/* + * For user-only, page_protect sets the page read-only. + * Since most execution is already on read-only pages, and we'd need to + * account for other TBs on the same page, defer undoing any page protection + * until we receive the write fault. + */ +static inline void tb_lock_page0(tb_page_addr_t p0) +{ +page_protect(p0); +} + +static inline void tb_lock_page1(tb_page_addr_t p0, tb_page_addr_t p1) +{ +page_protect(p1); +} + +static inline void tb_unlock_page1(tb_page_addr_t p0, tb_page_addr_t p1) { } +static inline void tb_unlock_pages(TranslationBlock *tb) { } +#else +void tb_lock_page0(tb_page_addr_t); +void tb_lock_page1(tb_page_addr_t, tb_page_addr_t); +void tb_unlock_page1(tb_page_addr_t, tb_page_addr_t); +void tb_unlock_pages(TranslationBlock *); +#endif + #ifdef CONFIG_SOFTMMU void tb_invalidate_phys_range_fast(ram_addr_t ram_addr, unsigned size, @@ -48,8 +75,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, vaddr pc, void page_init(void); void tb_htable_init(void); void tb_reset_jump(TranslationBlock *tb, int n); -TranslationBlock *tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, - tb_page_addr_t phys_page2); +TranslationBlock *tb_link_page(TranslationBlock *tb); bool tb_invalidate_phys_page_unwind(tb_page_addr_t addr, uintptr_t pc); void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb, uintptr_t host_pc); diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 31aa320513..fdd6d3e0e4 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -536,6 +536,26 @@ static void cpu_exec_longjmp_cleanup(CPUState *cpu) if (have_mmap_lock()) { mmap_unlock(); } +#else +/* + * For softmmu, a tlb_fill fault during translation will land here, + * and we need to release any page locks held. In system mode we + * have one tcg_ctx per thread, so we know it was this cpu doing + * the translation. + * + * Alternative 1: Install a cleanup to be called via an exception + * handling safe longjmp. It seems plausible that all our hosts + * support such a thing. We'd have to properly register unwind info + * for the JIT for EH, rather that just for GDB. + * + * Alternative 2: Set and restore cpu->jmp_env in tb_gen_code to + * capture the cpu_loop_exit longjmp, perform the cleanup, and + * jump again to arrive here. + */ +if (tcg_ctx->gen_tb) { +tb_unlock_pages(tcg_ctx->gen_tb); +tcg_ctx->gen_tb = NULL; +} #endif if (qemu_mutex_iothread_locked()) { qemu_mutex_unlock_iothread(); diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c index 9566224d18..c406b2f7b7 100644 --- a/accel/tcg/tb-maint.c +++ b/accel/tcg/tb-maint.c @@ -70,17 +70,7 @@ typedef struct PageDesc PageDesc; */ #define assert_page_locked(pd) tcg_debug_assert(have_mmap_lock()) -static inline void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1, - PageDesc **ret_p2, tb_page_addr_t phys2, - bool alloc) -{ -*ret_p1 = NULL; -*ret_p2 = NULL; -} - -static inline void page_unlock(PageDesc *pd) { } -static inline void page_lock_tb(const TranslationBlock *tb) { } -static inline void page_unlock_tb(const TranslationBlock *tb) { } +static inline void tb_lock_pages(const TranslationBlock *tb) { } /* * For user-only, since we are protecting all of memory with a single lock, @@ -96,7 +86,7 @@ static void tb_remove_all(void) } /* Call with mmap_lock held. */ -static void tb_record(TranslationBlock *tb, PageDesc *p1, PageDesc *p2) +static void tb_record(TranslationBlock *tb) { vaddr addr; int flags; @@ -391,12 +381,108 @@ static void page_lock(PageDesc *pd) qemu_spin_lock(&pd->lock); } +/* Like qemu_spin_trylock, returns false on success */ +static bool p
[PULL 24/47] linux-user: Split TARGET_PROT_* out of syscall_defs.h
Move the values into the per-target target_mman.h headers Reviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-8-richard.hender...@linaro.org> --- linux-user/aarch64/target_mman.h | 8 linux-user/generic/target_mman.h | 6 +- linux-user/mips/target_mman.h| 2 ++ linux-user/syscall_defs.h| 11 --- linux-user/xtensa/target_mman.h | 2 ++ 5 files changed, 17 insertions(+), 12 deletions(-) diff --git a/linux-user/aarch64/target_mman.h b/linux-user/aarch64/target_mman.h index e7ba6070fe..f721295fe1 100644 --- a/linux-user/aarch64/target_mman.h +++ b/linux-user/aarch64/target_mman.h @@ -1 +1,9 @@ +#ifndef AARCH64_TARGET_MMAN_H +#define AARCH64_TARGET_MMAN_H + +#define TARGET_PROT_BTI 0x10 +#define TARGET_PROT_MTE 0x20 + #include "../generic/target_mman.h" + +#endif diff --git a/linux-user/generic/target_mman.h b/linux-user/generic/target_mman.h index 7b888fb7f8..ec76a91b46 100644 --- a/linux-user/generic/target_mman.h +++ b/linux-user/generic/target_mman.h @@ -23,7 +23,11 @@ #define TARGET_MAP_NORESERVE0x4000 #endif -/* Other MAP flags are defined in asm-generic/mman-common.h */ +/* Defined in asm-generic/mman-common.h */ +#ifndef TARGET_PROT_SEM +#define TARGET_PROT_SEM 0x08 +#endif + #ifndef TARGET_MAP_TYPE #define TARGET_MAP_TYPE 0x0f #endif diff --git a/linux-user/mips/target_mman.h b/linux-user/mips/target_mman.h index cd566c24b6..e97694aa4e 100644 --- a/linux-user/mips/target_mman.h +++ b/linux-user/mips/target_mman.h @@ -1,6 +1,8 @@ #ifndef MIPS_TARGET_MMAN_H #define MIPS_TARGET_MMAN_H +#define TARGET_PROT_SEM 0x10 + #define TARGET_MAP_NORESERVE0x0400 #define TARGET_MAP_ANONYMOUS0x0800 #define TARGET_MAP_GROWSDOWN0x1000 diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 041105b7a7..77ba343c85 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1227,17 +1227,6 @@ struct target_winsize { #include "termbits.h" -#if defined(TARGET_MIPS) || defined(TARGET_XTENSA) -#define TARGET_PROT_SEM 0x10 -#else -#define TARGET_PROT_SEM 0x08 -#endif - -#ifdef TARGET_AARCH64 -#define TARGET_PROT_BTI 0x10 -#define TARGET_PROT_MTE 0x20 -#endif - #include "target_mman.h" #if (defined(TARGET_I386) && defined(TARGET_ABI32)) \ diff --git a/linux-user/xtensa/target_mman.h b/linux-user/xtensa/target_mman.h index 3891bb5e07..3933771b5b 100644 --- a/linux-user/xtensa/target_mman.h +++ b/linux-user/xtensa/target_mman.h @@ -1,6 +1,8 @@ #ifndef XTENSA_TARGET_MMAN_H #define XTENSA_TARGET_MMAN_H +#define TARGET_PROT_SEM 0x10 + #define TARGET_MAP_NORESERVE0x0400 #define TARGET_MAP_ANONYMOUS0x0800 #define TARGET_MAP_GROWSDOWN0x1000 -- 2.34.1
[PULL 45/47] linux-user/arm: Do not allocate a commpage at all for M-profile CPUs
From: Philippe Mathieu-Daudé Since commit fbd3c4cff6 ("linux-user/arm: Mark the commpage executable") executing bare-metal (linked with rdimon.specs) cortex-M code fails as: $ qemu-arm -cpu cortex-m3 ~/hello.exe.m3 qemu-arm: ../../accel/tcg/user-exec.c:492: page_set_flags: Assertion `last <= GUEST_ADDR_MAX' failed. Aborted (core dumped) Commit 4f5c67f8df ("linux-user/arm: Take more care allocating commpage") already took care of not allocating a commpage for M-profile CPUs, however it had to be reverted as commit 6cda41daa2. Re-introduce the M-profile fix from commit 4f5c67f8df. Fixes: fbd3c4cff6 ("linux-user/arm: Mark the commpage executable") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1755 Reported-by: Christophe Lyon Suggested-by: Richard Henderson Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Anton Johansson Reviewed-by: Richard Henderson Message-Id: <20230711153408.68389-1-phi...@linaro.org> Signed-off-by: Richard Henderson --- linux-user/elfload.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/linux-user/elfload.c b/linux-user/elfload.c index d3d1352c4e..a26200d9f3 100644 --- a/linux-user/elfload.c +++ b/linux-user/elfload.c @@ -424,10 +424,23 @@ enum { static bool init_guest_commpage(void) { -abi_ptr commpage = HI_COMMPAGE & -qemu_host_page_size; -void *want = g2h_untagged(commpage); -void *addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); +ARMCPU *cpu = ARM_CPU(thread_cpu); +abi_ptr commpage; +void *want; +void *addr; + +/* + * M-profile allocates maximum of 2GB address space, so can never + * allocate the commpage. Skip it. + */ +if (arm_feature(&cpu->env, ARM_FEATURE_M)) { +return true; +} + +commpage = HI_COMMPAGE & -qemu_host_page_size; +want = g2h_untagged(commpage); +addr = mmap(want, qemu_host_page_size, PROT_READ | PROT_WRITE, +MAP_ANONYMOUS | MAP_PRIVATE | MAP_FIXED, -1, 0); if (addr == MAP_FAILED) { perror("Allocating guest commpage"); -- 2.34.1
[PULL 13/47] linux-user: Use abi_uint not unsigned in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 9dc41828cf..c8ffb4f785 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -1776,14 +1776,14 @@ struct target_stat { #define TARGET_STAT_HAVE_NSEC struct target_stat { -unsignedst_dev; +abi_uintst_dev; abi_longst_pad1[3]; /* Reserved for network id */ abi_ulong st_ino; abi_uintst_mode; abi_uintst_nlink; abi_int st_uid; abi_int st_gid; -unsignedst_rdev; +abi_uintst_rdev; abi_longst_pad2[2]; abi_longst_size; abi_longst_pad3; -- 2.34.1
[PULL 21/47] linux-user: Fix formatting of mmap.c
Fix all checkpatch.pl errors within mmap.c. Reviewed-by: Alex Bennée Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-5-richard.hender...@linaro.org> --- linux-user/mmap.c | 199 -- 1 file changed, 122 insertions(+), 77 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 2692936773..639921dba0 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -56,10 +56,11 @@ void mmap_fork_start(void) void mmap_fork_end(int child) { -if (child) +if (child) { pthread_mutex_init(&mmap_mutex, NULL); -else +} else { pthread_mutex_unlock(&mmap_mutex); +} } /* @@ -203,40 +204,47 @@ static int mmap_frag(abi_ulong real_start, /* get the protection of the target pages outside the mapping */ prot1 = 0; -for(addr = real_start; addr < real_end; addr++) { -if (addr < start || addr >= end) +for (addr = real_start; addr < real_end; addr++) { +if (addr < start || addr >= end) { prot1 |= page_get_flags(addr); +} } if (prot1 == 0) { /* no page was there, so we allocate one */ void *p = mmap(host_start, qemu_host_page_size, prot, flags | MAP_ANONYMOUS, -1, 0); -if (p == MAP_FAILED) +if (p == MAP_FAILED) { return -1; +} prot1 = prot; } prot1 &= PAGE_BITS; prot_new = prot | prot1; if (!(flags & MAP_ANONYMOUS)) { -/* msync() won't work here, so we return an error if write is - possible while it is a shared mapping */ -if ((flags & MAP_TYPE) == MAP_SHARED && -(prot & PROT_WRITE)) +/* + * msync() won't work here, so we return an error if write is + * possible while it is a shared mapping. + */ +if ((flags & MAP_TYPE) == MAP_SHARED && (prot & PROT_WRITE)) { return -1; +} /* adjust protection to be able to read */ -if (!(prot1 & PROT_WRITE)) +if (!(prot1 & PROT_WRITE)) { mprotect(host_start, qemu_host_page_size, prot1 | PROT_WRITE); +} /* read the corresponding file data */ -if (pread(fd, g2h_untagged(start), end - start, offset) == -1) +if (pread(fd, g2h_untagged(start), end - start, offset) == -1) { return -1; +} /* put final protection */ -if (prot_new != (prot1 | PROT_WRITE)) +if (prot_new != (prot1 | PROT_WRITE)) { mprotect(host_start, qemu_host_page_size, prot_new); +} } else { if (prot_new != prot1) { mprotect(host_start, qemu_host_page_size, prot_new); @@ -265,8 +273,10 @@ abi_ulong mmap_next_start = TASK_UNMAPPED_BASE; unsigned long last_brk; -/* Subroutine of mmap_find_vma, used when we have pre-allocated a chunk - of guest address space. */ +/* + * Subroutine of mmap_find_vma, used when we have pre-allocated + * a chunk of guest address space. + */ static abi_ulong mmap_find_vma_reserved(abi_ulong start, abi_ulong size, abi_ulong align) { @@ -362,15 +372,17 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align) * - shmat() with SHM_REMAP flag */ ptr = mmap(g2h_untagged(addr), size, PROT_NONE, - MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE, -1, 0); + MAP_ANONYMOUS | MAP_PRIVATE | MAP_NORESERVE, -1, 0); /* ENOMEM, if host address space has no memory */ if (ptr == MAP_FAILED) { return (abi_ulong)-1; } -/* Count the number of sequential returns of the same address. - This is used to modify the search algorithm below. */ +/* + * Count the number of sequential returns of the same address. + * This is used to modify the search algorithm below. + */ repeat = (ptr == prev ? repeat + 1 : 0); if (h2g_valid(ptr + size - 1)) { @@ -387,14 +399,18 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align) /* The address is not properly aligned for the target. */ switch (repeat) { case 0: -/* Assume the result that the kernel gave us is the - first with enough free space, so start again at the - next higher target page. */ +/* + * Assume the result that the kernel gave us is the + * first with enough free space, so start again at the + * next higher target page. + */ addr = ROUND_UP(addr, align); break; case 1: -/* Sometimes the kernel decides to perform the allocation - at the top end of memory instead. */ +/*
[PULL 40/47] accel/tcg: Accept more page flags in page_check_range
Only PAGE_WRITE needs special attention, all others can be handled as we do for PAGE_READ. Adjust the mask. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé Message-Id: <20230707204054.8792-24-richard.hender...@linaro.org> --- accel/tcg/user-exec.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index e4f9563730..1e8fcaf6b0 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -561,8 +561,8 @@ int page_check_range(target_ulong start, target_ulong len, int flags) } missing = flags & ~p->flags; -if (missing & PAGE_READ) { -ret = -1; /* page not readable */ +if (missing & ~PAGE_WRITE) { +ret = -1; /* page doesn't match */ break; } if (missing & PAGE_WRITE) { -- 2.34.1
[PULL 12/47] linux-user: Use abi_short not short in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 21ca03b0f4..9dc41828cf 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -702,8 +702,8 @@ typedef struct target_siginfo { struct target_pollfd { abi_int fd; /* file descriptor */ -short events; /* requested events */ -short revents;/* returned events */ +abi_short events; /* requested events */ +abi_short revents;/* returned events */ }; /* virtual terminal ioctls */ @@ -1480,7 +1480,7 @@ struct target_stat { abi_ushort st_dev; abi_ulong st_ino; abi_ushort st_mode; -short st_nlink; +abi_short st_nlink; abi_ushort st_uid; abi_ushort st_gid; abi_ushort st_rdev; -- 2.34.1
[PULL 44/47] linux-user: Drop uint and ulong
From: Juan Quintela These are types not used anymore anywhere else. Signed-off-by: Juan Quintela Reviewed-by: Richard Henderson Reviewed-by: Laurent Vivier Reviewed-by: Philippe Mathieu-Daudé Message-id: <20230511085056.13809-1-quint...@redhat.com> Signed-off-by: Richard Henderson --- linux-user/syscall.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 33bc242e6a..1464151826 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -309,16 +309,16 @@ _syscall0(int, sys_gettid) #endif #if defined(TARGET_NR_getdents) && defined(EMULATE_GETDENTS_WITH_GETDENTS) -_syscall3(int, sys_getdents, uint, fd, struct linux_dirent *, dirp, uint, count); +_syscall3(int, sys_getdents, unsigned int, fd, struct linux_dirent *, dirp, unsigned int, count); #endif #if (defined(TARGET_NR_getdents) && \ !defined(EMULATE_GETDENTS_WITH_GETDENTS)) || \ (defined(TARGET_NR_getdents64) && defined(__NR_getdents64)) -_syscall3(int, sys_getdents64, uint, fd, struct linux_dirent64 *, dirp, uint, count); +_syscall3(int, sys_getdents64, unsigned int, fd, struct linux_dirent64 *, dirp, unsigned int, count); #endif #if defined(TARGET_NR__llseek) && defined(__NR_llseek) -_syscall5(int, _llseek, uint, fd, ulong, hi, ulong, lo, - loff_t *, res, uint, wh); +_syscall5(int, _llseek, unsigned int, fd, unsigned long, hi, unsigned long, lo, + loff_t *, res, unsigned int, wh); #endif _syscall3(int, sys_rt_sigqueueinfo, pid_t, pid, int, sig, siginfo_t *, uinfo) _syscall4(int, sys_rt_tgsigqueueinfo, pid_t, pid, pid_t, tid, int, sig, -- 2.34.1
[PULL 31/47] linux-user: Rewrite target_mprotect
Use 'last' variables instead of 'end' variables. When host page size > guest page size, detect when adjacent host pages have the same protection and merge that expanded host range into fewer syscalls. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-15-richard.hender...@linaro.org> --- linux-user/mmap.c | 106 +- 1 file changed, 67 insertions(+), 39 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index b2c2d85857..d02d74d279 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -120,8 +120,11 @@ static int target_to_host_prot(int prot) /* NOTE: all the constants are the HOST ones, but addresses are target. */ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) { -abi_ulong end, host_start, host_end, addr; -int prot1, ret, page_flags; +abi_ulong starts[3]; +abi_ulong lens[3]; +int prots[3]; +abi_ulong host_start, host_last, last; +int prot1, ret, page_flags, nranges; trace_target_mprotect(start, len, target_prot); @@ -132,63 +135,88 @@ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) if (!page_flags) { return -TARGET_EINVAL; } -len = TARGET_PAGE_ALIGN(len); -end = start + len; -if (!guest_range_valid_untagged(start, len)) { -return -TARGET_ENOMEM; -} if (len == 0) { return 0; } +len = TARGET_PAGE_ALIGN(len); +if (!guest_range_valid_untagged(start, len)) { +return -TARGET_ENOMEM; +} + +last = start + len - 1; +host_start = start & qemu_host_page_mask; +host_last = HOST_PAGE_ALIGN(last) - 1; +nranges = 0; mmap_lock(); -host_start = start & qemu_host_page_mask; -host_end = HOST_PAGE_ALIGN(end); -if (start > host_start) { -/* handle host page containing start */ + +if (host_last - host_start < qemu_host_page_size) { +/* Single host page contains all guest pages: sum the prot. */ prot1 = target_prot; -for (addr = host_start; addr < start; addr += TARGET_PAGE_SIZE) { -prot1 |= page_get_flags(addr); +for (abi_ulong a = host_start; a < start; a += TARGET_PAGE_SIZE) { +prot1 |= page_get_flags(a); } -if (host_end == host_start + qemu_host_page_size) { -for (addr = end; addr < host_end; addr += TARGET_PAGE_SIZE) { -prot1 |= page_get_flags(addr); +for (abi_ulong a = last; a < host_last; a += TARGET_PAGE_SIZE) { +prot1 |= page_get_flags(a + 1); +} +starts[nranges] = host_start; +lens[nranges] = qemu_host_page_size; +prots[nranges] = prot1; +nranges++; +} else { +if (host_start < start) { +/* Host page contains more than one guest page: sum the prot. */ +prot1 = target_prot; +for (abi_ulong a = host_start; a < start; a += TARGET_PAGE_SIZE) { +prot1 |= page_get_flags(a); +} +/* If the resulting sum differs, create a new range. */ +if (prot1 != target_prot) { +starts[nranges] = host_start; +lens[nranges] = qemu_host_page_size; +prots[nranges] = prot1; +nranges++; +host_start += qemu_host_page_size; } -end = host_end; } -ret = mprotect(g2h_untagged(host_start), qemu_host_page_size, - target_to_host_prot(prot1)); -if (ret != 0) { -goto error; + +if (last < host_last) { +/* Host page contains more than one guest page: sum the prot. */ +prot1 = target_prot; +for (abi_ulong a = last; a < host_last; a += TARGET_PAGE_SIZE) { +prot1 |= page_get_flags(a + 1); +} +/* If the resulting sum differs, create a new range. */ +if (prot1 != target_prot) { +host_last -= qemu_host_page_size; +starts[nranges] = host_last + 1; +lens[nranges] = qemu_host_page_size; +prots[nranges] = prot1; +nranges++; +} } -host_start += qemu_host_page_size; -} -if (end < host_end) { -prot1 = target_prot; -for (addr = end; addr < host_end; addr += TARGET_PAGE_SIZE) { -prot1 |= page_get_flags(addr); + +/* Create a range for the middle, if any remains. */ +if (host_start < host_last) { +starts[nranges] = host_start; +lens[nranges] = host_last - host_start + 1; +prots[nranges] = target_prot; +nranges++; } -ret = mprotect(g2h_untagged(host_end - qemu_host_page_size), - qemu_host_page_size, target_to_host_prot(prot1)); -if (ret != 0) { -goto error; -} -host_end -= qemu_host_page_size;
[PULL 43/47] linux-user: Simplify target_madvise
The trivial length 0 check can be moved up, simplifying some of the other cases. The end < start test is handled by guest_range_valid_untagged. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-27-richard.hender...@linaro.org> --- linux-user/mmap.c | 19 --- 1 file changed, 4 insertions(+), 15 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 49cfa873e0..44b53bd446 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -900,28 +900,17 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice) { -abi_ulong len, end; +abi_ulong len; int ret = 0; if (start & ~TARGET_PAGE_MASK) { return -TARGET_EINVAL; } -len = TARGET_PAGE_ALIGN(len_in); - -if (len_in && !len) { -return -TARGET_EINVAL; -} - -end = start + len; -if (end < start) { -return -TARGET_EINVAL; -} - -if (end == start) { +if (len_in == 0) { return 0; } - -if (!guest_range_valid_untagged(start, len)) { +len = TARGET_PAGE_ALIGN(len_in); +if (len == 0 || !guest_range_valid_untagged(start, len)) { return -TARGET_EINVAL; } -- 2.34.1
[PULL 42/47] linux-user: Remove can_passthrough_madvise
Use page_check_range instead, which uses the interval tree instead of checking each page individually. Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-26-richard.hender...@linaro.org> --- linux-user/mmap.c | 24 +++- 1 file changed, 3 insertions(+), 21 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index c0946322fb..49cfa873e0 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -898,23 +898,6 @@ abi_long target_mremap(abi_ulong old_addr, abi_ulong old_size, return new_addr; } -static bool can_passthrough_madvise(abi_ulong start, abi_ulong end) -{ -ulong addr; - -if ((start | end) & ~qemu_host_page_mask) { -return false; -} - -for (addr = start; addr < end; addr += TARGET_PAGE_SIZE) { -if (!(page_get_flags(addr) & PAGE_PASSTHROUGH)) { -return false; -} -} - -return true; -} - abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice) { abi_ulong len, end; @@ -964,9 +947,8 @@ abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice) * * A straight passthrough for those may not be safe because qemu sometimes * turns private file-backed mappings into anonymous mappings. - * can_passthrough_madvise() helps to check if a passthrough is possible by - * comparing mappings that are known to have the same semantics in the host - * and the guest. In this case passthrough is safe. + * If all guest pages have PAGE_PASSTHROUGH set, mappings have the + * same semantics for the host as for the guest. * * We pass through MADV_WIPEONFORK and MADV_KEEPONFORK if possible and * return failure if not. @@ -984,7 +966,7 @@ abi_long target_madvise(abi_ulong start, abi_ulong len_in, int advice) ret = -EINVAL; /* fall through */ case MADV_DONTNEED: -if (can_passthrough_madvise(start, end)) { +if (page_check_range(start, len, PAGE_PASSTHROUGH)) { ret = get_errno(madvise(g2h_untagged(start), len, advice)); if ((advice == MADV_DONTNEED) && (ret == 0)) { page_reset_target_data(start, start + len - 1); -- 2.34.1
[PULL 22/47] linux-user/strace: Expand struct flags to hold a mask
A zero bit value does not make sense -- it must relate to some field in some way. Define FLAG_BASIC with a build-time sanity check. Adjust FLAG_GENERIC and FLAG_TARGET to use it. Add FLAG_GENERIC_MASK and FLAG_TARGET_MASK. Fix up the existing flag definitions for build errors. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-6-richard.hender...@linaro.org> --- linux-user/strace.c | 40 ++-- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/linux-user/strace.c b/linux-user/strace.c index 669200c4a4..9228b235da 100644 --- a/linux-user/strace.c +++ b/linux-user/strace.c @@ -46,15 +46,21 @@ struct syscallname { */ struct flags { abi_longf_value; /* flag */ +abi_longf_mask; /* mask */ const char *f_string; /* stringified flag */ }; +/* No 'struct flags' element should have a zero mask. */ +#define FLAG_BASIC(V, M, N) { V, M | QEMU_BUILD_BUG_ON_ZERO(!(M)), N } + /* common flags for all architectures */ -#define FLAG_GENERIC(name) { name, #name } +#define FLAG_GENERIC_MASK(V, M) FLAG_BASIC(V, M, #V) +#define FLAG_GENERIC(V) FLAG_BASIC(V, V, #V) /* target specific flags (syscall_defs.h has TARGET_) */ -#define FLAG_TARGET(name) { TARGET_ ## name, #name } +#define FLAG_TARGET_MASK(V, M) FLAG_BASIC(TARGET_##V, TARGET_##M, #V) +#define FLAG_TARGET(V) FLAG_BASIC(TARGET_##V, TARGET_##V, #V) /* end of flags array */ -#define FLAG_END { 0, NULL } +#define FLAG_END { 0, 0, NULL } /* Structure used to translate enumerated values into strings */ struct enums { @@ -963,7 +969,7 @@ print_syscall_ret_ioctl(CPUArchState *cpu_env, const struct syscallname *name, #endif UNUSED static const struct flags access_flags[] = { -FLAG_GENERIC(F_OK), +FLAG_GENERIC_MASK(F_OK, R_OK | W_OK | X_OK), FLAG_GENERIC(R_OK), FLAG_GENERIC(W_OK), FLAG_GENERIC(X_OK), @@ -999,9 +1005,9 @@ UNUSED static const struct flags mode_flags[] = { }; UNUSED static const struct flags open_access_flags[] = { -FLAG_TARGET(O_RDONLY), -FLAG_TARGET(O_WRONLY), -FLAG_TARGET(O_RDWR), +FLAG_TARGET_MASK(O_RDONLY, O_ACCMODE), +FLAG_TARGET_MASK(O_WRONLY, O_ACCMODE), +FLAG_TARGET_MASK(O_RDWR, O_ACCMODE), FLAG_END, }; @@ -1010,7 +1016,9 @@ UNUSED static const struct flags open_flags[] = { FLAG_TARGET(O_CREAT), FLAG_TARGET(O_DIRECTORY), FLAG_TARGET(O_EXCL), +#if TARGET_O_LARGEFILE != 0 FLAG_TARGET(O_LARGEFILE), +#endif FLAG_TARGET(O_NOCTTY), FLAG_TARGET(O_NOFOLLOW), FLAG_TARGET(O_NONBLOCK), /* also O_NDELAY */ @@ -1075,7 +1083,7 @@ UNUSED static const struct flags umount2_flags[] = { }; UNUSED static const struct flags mmap_prot_flags[] = { -FLAG_GENERIC(PROT_NONE), +FLAG_GENERIC_MASK(PROT_NONE, PROT_READ | PROT_WRITE | PROT_EXEC), FLAG_GENERIC(PROT_EXEC), FLAG_GENERIC(PROT_READ), FLAG_GENERIC(PROT_WRITE), @@ -1103,7 +,7 @@ UNUSED static const struct flags mmap_flags[] = { #ifdef MAP_POPULATE FLAG_TARGET(MAP_POPULATE), #endif -#ifdef TARGET_MAP_UNINITIALIZED +#if defined(TARGET_MAP_UNINITIALIZED) && TARGET_MAP_UNINITIALIZED != 0 FLAG_TARGET(MAP_UNINITIALIZED), #endif FLAG_TARGET(MAP_HUGETLB), @@ -1201,13 +1209,13 @@ UNUSED static const struct flags statx_flags[] = { FLAG_GENERIC(AT_SYMLINK_NOFOLLOW), #endif #ifdef AT_STATX_SYNC_AS_STAT -FLAG_GENERIC(AT_STATX_SYNC_AS_STAT), +FLAG_GENERIC_MASK(AT_STATX_SYNC_AS_STAT, AT_STATX_SYNC_TYPE), #endif #ifdef AT_STATX_FORCE_SYNC -FLAG_GENERIC(AT_STATX_FORCE_SYNC), +FLAG_GENERIC_MASK(AT_STATX_FORCE_SYNC, AT_STATX_SYNC_TYPE), #endif #ifdef AT_STATX_DONT_SYNC -FLAG_GENERIC(AT_STATX_DONT_SYNC), +FLAG_GENERIC_MASK(AT_STATX_DONT_SYNC, AT_STATX_SYNC_TYPE), #endif FLAG_END, }; @@ -1481,14 +1489,10 @@ print_flags(const struct flags *f, abi_long flags, int last) const char *sep = ""; int n; -if ((flags == 0) && (f->f_value == 0)) { -qemu_log("%s%s", f->f_string, get_comma(last)); -return; -} for (n = 0; f->f_string != NULL; f++) { -if ((f->f_value != 0) && ((flags & f->f_value) == f->f_value)) { +if ((flags & f->f_mask) == f->f_value) { qemu_log("%s%s", sep, f->f_string); -flags &= ~f->f_value; +flags &= ~f->f_mask; sep = "|"; n++; } -- 2.34.1
[PULL 03/47] linux-user: Use abi_uint not uint32_t in syscall_defs.h
Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 108 +++--- 1 file changed, 54 insertions(+), 54 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index a4e4df8d3e..414d88a9ec 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -67,7 +67,7 @@ #define USE_UID16 #define target_id uint16_t #else -#define target_id uint32_t +#define target_id abi_uint #endif #if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4) \ @@ -215,9 +215,9 @@ struct target_ip_mreqn { struct target_ip_mreq_source { /* big endian */ -uint32_t imr_multiaddr; -uint32_t imr_interface; -uint32_t imr_sourceaddr; +abi_uint imr_multiaddr; +abi_uint imr_interface; +abi_uint imr_sourceaddr; }; struct target_linger { @@ -508,9 +508,9 @@ typedef abi_ulong target_old_sa_flags; #if defined(TARGET_MIPS) struct target_sigaction { -uint32_tsa_flags; +abi_uintsa_flags; #if defined(TARGET_ABI_MIPSN32) -uint32_t_sa_handler; +abi_uint_sa_handler; #else abi_ulong _sa_handler; #endif @@ -1620,19 +1620,19 @@ struct target_stat { struct QEMU_PACKED target_stat64 { uint64_t st_dev; #define TARGET_STAT64_HAS_BROKEN_ST_INO 1 -uint32_t pad0; -uint32_t __st_ino; +abi_uint pad0; +abi_uint __st_ino; -uint32_t st_mode; -uint32_t st_nlink; -uint32_t st_uid; -uint32_t st_gid; +abi_uint st_mode; +abi_uint st_nlink; +abi_uint st_uid; +abi_uint st_gid; uint64_t st_rdev; uint64_t __pad1; int64_t st_size; int32_t st_blksize; -uint32_t __pad2; +abi_uint __pad2; int64_t st_blocks; /* Number 512-byte blocks allocated. */ inttarget_st_atime; @@ -2227,19 +2227,19 @@ struct target_statfs { #endif struct target_statfs64 { -uint32_tf_type; -uint32_tf_bsize; -uint32_tf_frsize; /* Fragment size - unsupported */ -uint32_t__pad; +abi_uintf_type; +abi_uintf_bsize; +abi_uintf_frsize; /* Fragment size - unsupported */ +abi_uint__pad; uint64_tf_blocks; uint64_tf_bfree; uint64_tf_files; uint64_tf_ffree; uint64_tf_bavail; target_fsid_t f_fsid; -uint32_tf_namelen; -uint32_tf_flags; -uint32_tf_spare[5]; +abi_uintf_namelen; +abi_uintf_flags; +abi_uintf_spare[5]; }; #elif (defined(TARGET_PPC64) || defined(TARGET_X86_64) || \ defined(TARGET_SPARC64) || defined(TARGET_AARCH64) ||\ @@ -2307,33 +2307,33 @@ struct target_statfs64 { }; #else struct target_statfs { -uint32_t f_type; -uint32_t f_bsize; -uint32_t f_blocks; -uint32_t f_bfree; -uint32_t f_bavail; -uint32_t f_files; -uint32_t f_ffree; +abi_uint f_type; +abi_uint f_bsize; +abi_uint f_blocks; +abi_uint f_bfree; +abi_uint f_bavail; +abi_uint f_files; +abi_uint f_ffree; target_fsid_t f_fsid; -uint32_t f_namelen; -uint32_t f_frsize; -uint32_t f_flags; -uint32_t f_spare[4]; +abi_uint f_namelen; +abi_uint f_frsize; +abi_uint f_flags; +abi_uint f_spare[4]; }; struct target_statfs64 { -uint32_t f_type; -uint32_t f_bsize; +abi_uint f_type; +abi_uint f_bsize; uint64_t f_blocks; uint64_t f_bfree; uint64_t f_bavail; uint64_t f_files; uint64_t f_ffree; target_fsid_t f_fsid; -uint32_t f_namelen; -uint32_t f_frsize; -uint32_t f_flags; -uint32_t f_spare[4]; +abi_uint f_namelen; +abi_uint f_frsize; +abi_uint f_flags; +abi_uint f_spare[4]; }; #endif @@ -2713,9 +2713,9 @@ struct target_epoll_event { #endif struct target_ucred { -uint32_t pid; -uint32_t uid; -uint32_t gid; +abi_uint pid; +abi_uint uid; +abi_uint gid; }; typedef int32_t target_timer_t; @@ -2754,14 +2754,14 @@ struct target_sigevent { }; struct target_user_cap_header { -uint32_t version; +abi_uint version; int pid; }; struct target_user_cap_data { -uint32_t effective; -uint32_t permitted; -uint32_t inheritable; +abi_uint effective; +abi_uint permitted; +abi_uint inheritable; }; /* from kernel's include/linux/syslog.h */ @@ -2791,19 +2791,19 @@ struct target_user_cap_data { struct target_statx_timestamp { int64_t tv_sec; -uint32_t tv_nsec; +abi_uint tv_nsec; int32_t __reserved; }; struct target_statx { /* 0x00 */ -uint32_t stx_mask; /* What results were written [uncond] */ -uint32_t stx_blksize;/* Preferred general I/O size [uncond] */ +abi_uint stx_mask; /* What results were written [uncond] */ +abi_uint stx_blksize;/* Preferre
[PULL 29/47] linux-user: Split out target_to_host_prot
Split out from validate_prot_to_pageflags, as there is not one single host_prot for the entire range. We need to adjust prot for every host page that overlaps multiple guest pages. Reviewed-by: Alex Bennée Signed-off-by: Richard Henderson Message-Id: <20230707204054.8792-13-richard.hender...@linaro.org> --- linux-user/mmap.c | 78 ++- 1 file changed, 44 insertions(+), 34 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 9dc34fc29d..12b1308a83 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -69,24 +69,11 @@ void mmap_fork_end(int child) * Return 0 if the target prot bitmask is invalid, otherwise * the internal qemu page_flags (which will include PAGE_VALID). */ -static int validate_prot_to_pageflags(int *host_prot, int prot) +static int validate_prot_to_pageflags(int prot) { int valid = PROT_READ | PROT_WRITE | PROT_EXEC | TARGET_PROT_SEM; int page_flags = (prot & PAGE_BITS) | PAGE_VALID; -/* - * For the host, we need not pass anything except read/write/exec. - * While PROT_SEM is allowed by all hosts, it is also ignored, so - * don't bother transforming guest bit to host bit. Any other - * target-specific prot bits will not be understood by the host - * and will need to be encoded into page_flags for qemu emulation. - * - * Pages that are executable by the guest will never be executed - * by the host, but the host will need to be able to read them. - */ -*host_prot = (prot & (PROT_READ | PROT_WRITE)) - | (prot & PROT_EXEC ? PROT_READ : 0); - #ifdef TARGET_AARCH64 { ARMCPU *cpu = ARM_CPU(thread_cpu); @@ -114,18 +101,34 @@ static int validate_prot_to_pageflags(int *host_prot, int prot) return prot & ~valid ? 0 : page_flags; } +/* + * For the host, we need not pass anything except read/write/exec. + * While PROT_SEM is allowed by all hosts, it is also ignored, so + * don't bother transforming guest bit to host bit. Any other + * target-specific prot bits will not be understood by the host + * and will need to be encoded into page_flags for qemu emulation. + * + * Pages that are executable by the guest will never be executed + * by the host, but the host will need to be able to read them. + */ +static int target_to_host_prot(int prot) +{ +return (prot & (PROT_READ | PROT_WRITE)) | + (prot & PROT_EXEC ? PROT_READ : 0); +} + /* NOTE: all the constants are the HOST ones, but addresses are target. */ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) { abi_ulong end, host_start, host_end, addr; -int prot1, ret, page_flags, host_prot; +int prot1, ret, page_flags; trace_target_mprotect(start, len, target_prot); if ((start & ~TARGET_PAGE_MASK) != 0) { return -TARGET_EINVAL; } -page_flags = validate_prot_to_pageflags(&host_prot, target_prot); +page_flags = validate_prot_to_pageflags(target_prot); if (!page_flags) { return -TARGET_EINVAL; } @@ -143,7 +146,7 @@ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) host_end = HOST_PAGE_ALIGN(end); if (start > host_start) { /* handle host page containing start */ -prot1 = host_prot; +prot1 = target_prot; for (addr = host_start; addr < start; addr += TARGET_PAGE_SIZE) { prot1 |= page_get_flags(addr); } @@ -154,19 +157,19 @@ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) end = host_end; } ret = mprotect(g2h_untagged(host_start), qemu_host_page_size, - prot1 & PAGE_BITS); + target_to_host_prot(prot1)); if (ret != 0) { goto error; } host_start += qemu_host_page_size; } if (end < host_end) { -prot1 = host_prot; +prot1 = target_prot; for (addr = end; addr < host_end; addr += TARGET_PAGE_SIZE) { prot1 |= page_get_flags(addr); } ret = mprotect(g2h_untagged(host_end - qemu_host_page_size), - qemu_host_page_size, prot1 & PAGE_BITS); + qemu_host_page_size, target_to_host_prot(prot1)); if (ret != 0) { goto error; } @@ -175,8 +178,8 @@ int target_mprotect(abi_ulong start, abi_ulong len, int target_prot) /* handle the pages in the middle */ if (host_start < host_end) { -ret = mprotect(g2h_untagged(host_start), - host_end - host_start, host_prot); +ret = mprotect(g2h_untagged(host_start), host_end - host_start, + target_to_host_prot(target_prot)); if (ret != 0) { goto error; } @@ -212,7 +215,8 @@ static int mmap_frag(abi_ulong real_start, if (prot1 == 0) { /* no page was there, so we allocate one */ -void *p = mmap(host_start,
[PULL 07/47] linux-user: Use abi_uint not unsigned int in syscall_defs.h
Signed-off-by: Richard Henderson --- linux-user/syscall_defs.h | 290 +++--- 1 file changed, 145 insertions(+), 145 deletions(-) diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h index 2846a8cfa5..20986bd1d3 100644 --- a/linux-user/syscall_defs.h +++ b/linux-user/syscall_defs.h @@ -366,7 +366,7 @@ struct target_msghdr { abi_long msg_iovlen; /* Number of blocks*/ abi_long msg_control;/* Per protocol magic (eg BSD file descriptor passing) */ abi_long msg_controllen; /* Length of cmsg list */ -unsigned int msg_flags; +abi_uint msg_flags; }; struct target_cmsghdr { @@ -403,7 +403,7 @@ __target_cmsg_nxthdr(struct target_msghdr *__mhdr, struct target_mmsghdr { struct target_msghdr msg_hdr; /* Message header */ -unsigned int msg_len; /* Number of bytes transmitted */ +abi_uint msg_len; /* Number of bytes transmitted */ }; struct target_rusage { @@ -595,8 +595,8 @@ typedef struct target_siginfo { /* POSIX.1b timers */ struct { -unsigned int _timer1; -unsigned int _timer2; +abi_uint _timer1; +abi_uint _timer2; } _timer; /* POSIX.1b signals */ @@ -857,10 +857,10 @@ struct target_rtc_pll_info { #define TARGET_TUNSETOWNERTARGET_IOW('T', 204, int) #define TARGET_TUNSETLINK TARGET_IOW('T', 205, int) #define TARGET_TUNSETGROUPTARGET_IOW('T', 206, int) -#define TARGET_TUNGETFEATURES TARGET_IOR('T', 207, unsigned int) -#define TARGET_TUNSETOFFLOAD TARGET_IOW('T', 208, unsigned int) -#define TARGET_TUNSETTXFILTER TARGET_IOW('T', 209, unsigned int) -#define TARGET_TUNGETIFF TARGET_IOR('T', 210, unsigned int) +#define TARGET_TUNGETFEATURES TARGET_IOR('T', 207, abi_uint) +#define TARGET_TUNSETOFFLOAD TARGET_IOW('T', 208, abi_uint) +#define TARGET_TUNSETTXFILTER TARGET_IOW('T', 209, abi_uint) +#define TARGET_TUNGETIFF TARGET_IOR('T', 210, abi_uint) #define TARGET_TUNGETSNDBUF TARGET_IOR('T', 211, int) #define TARGET_TUNSETSNDBUF TARGET_IOW('T', 212, int) /* @@ -870,7 +870,7 @@ struct target_rtc_pll_info { #define TARGET_TUNGETVNETHDRSZTARGET_IOR('T', 215, int) #define TARGET_TUNSETVNETHDRSZTARGET_IOW('T', 216, int) #define TARGET_TUNSETQUEUETARGET_IOW('T', 217, int) -#define TARGET_TUNSETIFINDEX TARGET_IOW('T', 218, unsigned int) +#define TARGET_TUNSETIFINDEX TARGET_IOW('T', 218, abi_uint) /* TUNGETFILTER is not supported: see TUNATTACHFILTER. */ #define TARGET_TUNSETVNETLE TARGET_IOW('T', 220, int) #define TARGET_TUNGETVNETLE TARGET_IOR('T', 221, int) @@ -1361,8 +1361,8 @@ struct target_stat64 { #define TARGET_STAT64_HAS_BROKEN_ST_INO 1 abi_ulong __st_ino; -unsigned intst_mode; -unsigned intst_nlink; +abi_uintst_mode; +abi_uintst_nlink; abi_ulong st_uid; abi_ulong st_gid; @@ -1392,20 +1392,20 @@ struct target_stat64 { #define TARGET_HAS_STRUCT_STAT64 struct target_eabi_stat64 { unsigned long long st_dev; -unsigned int__pad1; +abi_uint __pad1; abi_ulong__st_ino; -unsigned intst_mode; -unsigned intst_nlink; +abi_uint st_mode; +abi_uint st_nlink; abi_ulongst_uid; abi_ulongst_gid; unsigned long long st_rdev; -unsigned int__pad2[2]; +abi_uint __pad2[2]; long long st_size; abi_ulongst_blksize; -unsigned int__pad3; +abi_uint __pad3; unsigned long long st_blocks; abi_ulongtarget_st_atime; @@ -1423,13 +1423,13 @@ struct target_eabi_stat64 { #elif defined(TARGET_SPARC64) && !defined(TARGET_ABI32) struct target_stat { -unsigned intst_dev; +abi_uintst_dev; abi_ulong st_ino; -unsigned intst_mode; -unsigned intst_nlink; -unsigned intst_uid; -unsigned intst_gid; -unsigned intst_rdev; +abi_uintst_mode; +abi_uintst_nlink; +abi_uintst_uid; +abi_uintst_gid; +abi_uintst_rdev; abi_longst_size; abi_longtarget_st_atime; abi_longtarget_st_mtime; @@ -1447,10 +1447,10 @@ struct target_stat64 { abi_ullong st_ino; abi_ullong st_nlink; -unsigned intst_mode; +abi_uintst_mode; -unsigned intst_uid; -unsigned intst_gid; +abi_uintst_uid; +abi_uintst_gid; unsigned char __pad2[6]; unsigned short st_rdev; @@ -1459,7 +1459,7 @@ struct target_stat64 { abi_llong st_blksize; unsigned char __pad4[4]; -unsigned intst_blocks; +abi_uintst_blocks; abi_ulong target_st_atime; abi_ulong target_st_atime
[PULL 36/47] linux-user: Use 'last' instead of 'end' in target_mmap
Complete the transition within the mmap functions to a formulation that does not overflow at the end of the address space. Signed-off-by: Richard Henderson Reviewed-by: Philippe Mathieu-Daudé Message-Id: <20230707204054.8792-20-richard.hender...@linaro.org> --- linux-user/mmap.c | 45 +++-- 1 file changed, 23 insertions(+), 22 deletions(-) diff --git a/linux-user/mmap.c b/linux-user/mmap.c index 738b9b797d..bb9cbe52cd 100644 --- a/linux-user/mmap.c +++ b/linux-user/mmap.c @@ -456,8 +456,8 @@ abi_ulong mmap_find_vma(abi_ulong start, abi_ulong size, abi_ulong align) abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, int flags, int fd, off_t offset) { -abi_ulong ret, end, real_start, real_end, retaddr, host_len, - passthrough_start = -1, passthrough_end = -1; +abi_ulong ret, last, real_start, real_last, retaddr, host_len; +abi_ulong passthrough_start = -1, passthrough_last = 0; int page_flags; off_t host_offset; @@ -581,29 +581,30 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, host_start += offset - host_offset; } start = h2g(host_start); +last = start + len - 1; passthrough_start = start; -passthrough_end = start + len; +passthrough_last = last; } else { if (start & ~TARGET_PAGE_MASK) { errno = EINVAL; goto fail; } -end = start + len; -real_end = HOST_PAGE_ALIGN(end); +last = start + len - 1; +real_last = HOST_PAGE_ALIGN(last) - 1; /* * Test if requested memory area fits target address space * It can fail only on 64-bit host with 32-bit target. * On any other target/host host mmap() handles this error correctly. */ -if (end < start || !guest_range_valid_untagged(start, len)) { +if (last < start || !guest_range_valid_untagged(start, len)) { errno = ENOMEM; goto fail; } /* Validate that the chosen range is empty. */ if ((flags & MAP_FIXED_NOREPLACE) -&& !page_check_range_empty(start, end - 1)) { +&& !page_check_range_empty(start, last)) { errno = EEXIST; goto fail; } @@ -642,9 +643,9 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, /* handle the start of the mapping */ if (start > real_start) { -if (real_end == real_start + qemu_host_page_size) { +if (real_last == real_start + qemu_host_page_size - 1) { /* one single host page */ -if (!mmap_frag(real_start, start, end - 1, +if (!mmap_frag(real_start, start, last, target_prot, flags, fd, offset)) { goto fail; } @@ -658,18 +659,18 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, real_start += qemu_host_page_size; } /* handle the end of the mapping */ -if (end < real_end) { -if (!mmap_frag(real_end - qemu_host_page_size, - real_end - qemu_host_page_size, end - 1, +if (last < real_last) { +abi_ulong real_page = real_last - qemu_host_page_size + 1; +if (!mmap_frag(real_page, real_page, last, target_prot, flags, fd, - offset + real_end - qemu_host_page_size - start)) { + offset + real_page - start)) { goto fail; } -real_end -= qemu_host_page_size; +real_last -= qemu_host_page_size; } /* map the middle (easier) */ -if (real_start < real_end) { +if (real_start < real_last) { void *p; off_t offset1; @@ -678,13 +679,13 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, } else { offset1 = offset + real_start - start; } -p = mmap(g2h_untagged(real_start), real_end - real_start, +p = mmap(g2h_untagged(real_start), real_last - real_start + 1, target_to_host_prot(target_prot), flags, fd, offset1); if (p == MAP_FAILED) { goto fail; } passthrough_start = real_start; -passthrough_end = real_end; +passthrough_last = real_last; } } the_end1: @@ -692,16 +693,16 @@ abi_long target_mmap(abi_ulong start, abi_ulong len, int target_prot, page_flags |= PAGE_ANON; } page_flags |= PAGE_RESET; -if (passthrough_start == passthrough_end) { -page_set_flags(start, start + len - 1, page_flags); +if (passthrough_start > passthrough_last) { +page_set_flags(start, last,
[PATCH v5 1/5] i386/tcg: implement x2APIC registers MSR access
This commit refactors apic_mem_read/write to support both MMIO access in xAPIC and MSR access in x2APIC. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/intc/apic.c | 79 ++-- hw/intc/trace-events | 4 +- include/hw/i386/apic.h | 3 ++ target/i386/cpu.h| 3 ++ target/i386/tcg/sysemu/misc_helper.c | 27 ++ 5 files changed, 86 insertions(+), 30 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index ac3d47d231..cb8c20de93 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode); } +bool is_x2apic_mode(DeviceState *dev) +{ +APICCommonState *s = APIC(dev); + +return s->apicbase & MSR_IA32_APICBASE_EXTD; +} + static void apic_set_base(APICCommonState *s, uint64_t val) { s->apicbase = (val & 0xf000) | @@ -636,16 +643,11 @@ static void apic_timer(void *opaque) apic_timer_update(s, s->next_time); } -static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) +uint64_t apic_register_read(int index) { DeviceState *dev; APICCommonState *s; -uint32_t val; -int index; - -if (size < 4) { -return 0; -} +uint64_t val; dev = cpu_get_current_apic(); if (!dev) { @@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) } s = APIC(dev); -index = (addr >> 4) & 0xff; switch(index) { case 0x02: /* id */ val = s->id << 24; @@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) val = 0; break; } -trace_apic_mem_readl(addr, val); + +trace_apic_register_read(index, val); +return val; +} + +static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +uint32_t val; +int index; + +if (size < 4) { +return 0; +} + +index = (addr >> 4) & 0xff; +val = (uint32_t)apic_register_read(index); + return val; } @@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi) apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode); } -static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, - unsigned size) +void apic_register_write(int index, uint64_t val) { DeviceState *dev; APICCommonState *s; -int index = (addr >> 4) & 0xff; - -if (size < 4) { -return; -} - -if (addr > 0xfff || !index) { -/* MSI and MMIO APIC are at the same memory location, - * but actually not on the global bus: MSI is on PCI bus - * APIC is connected directly to the CPU. - * Mapping them on the global bus happens to work because - * MSI registers are reserved in APIC MMIO and vice versa. */ -MSIMessage msi = { .address = addr, .data = val }; -apic_send_msi(&msi); -return; -} dev = cpu_get_current_apic(); if (!dev) { @@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, } s = APIC(dev); -trace_apic_mem_writel(addr, val); +trace_apic_register_write(index, val); switch(index) { case 0x02: @@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, } } +static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, + unsigned size) +{ +int index = (addr >> 4) & 0xff; + +if (size < 4) { +return; +} + +if (addr > 0xfff || !index) { +/* MSI and MMIO APIC are at the same memory location, + * but actually not on the global bus: MSI is on PCI bus + * APIC is connected directly to the CPU. + * Mapping them on the global bus happens to work because + * MSI registers are reserved in APIC MMIO and vice versa. */ +MSIMessage msi = { .address = addr, .data = val }; +apic_send_msi(&msi); +return; +} + +apic_register_write(index, val); +} + static void apic_pre_save(APICCommonState *s) { apic_sync_vapic(s, SYNC_FROM_VAPIC); diff --git a/hw/intc/trace-events b/hw/intc/trace-events index 36ff71f947..1ef29d0256 100644 --- a/hw/intc/trace-events +++ b/hw/intc/trace-events @@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64 # apic.c apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d" apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d" -apic_mem_readl(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x" -apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x" +apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x
[PATCH v5 3/5] apic, i386/tcg: add x2apic transitions
This commit adds support for x2APIC transitions when writing to MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to TCG_EXT_FEATURES. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/intc/apic.c | 50 hw/intc/apic_common.c| 7 ++-- target/i386/cpu-sysemu.c | 10 ++ target/i386/cpu.c| 8 ++--- target/i386/cpu.h| 6 target/i386/tcg/sysemu/misc_helper.c | 4 +++ 6 files changed, 76 insertions(+), 9 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index 9f741794a7..b8f56836a6 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -309,8 +309,41 @@ bool is_x2apic_mode(DeviceState *dev) return s->apicbase & MSR_IA32_APICBASE_EXTD; } +static void apic_set_base_check(APICCommonState *s, uint64_t val) +{ +/* Enable x2apic when x2apic is not supported by CPU */ +if (!cpu_has_x2apic_feature(&s->cpu->env) && +val & MSR_IA32_APICBASE_EXTD) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); + +/* + * Transition into invalid state + * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) && + * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1 + */ +if (!(val & MSR_IA32_APICBASE_ENABLE) && +(val & MSR_IA32_APICBASE_EXTD)) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); + +/* Invalid transition from disabled mode to x2APIC */ +if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) && +!(s->apicbase & MSR_IA32_APICBASE_EXTD) && +(val & MSR_IA32_APICBASE_ENABLE) && +(val & MSR_IA32_APICBASE_EXTD)) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); + +/* Invalid transition from x2APIC to xAPIC */ +if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) && +(s->apicbase & MSR_IA32_APICBASE_EXTD) && +(val & MSR_IA32_APICBASE_ENABLE) && +!(val & MSR_IA32_APICBASE_EXTD)) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); +} + static void apic_set_base(APICCommonState *s, uint64_t val) { +apic_set_base_check(s, val); + s->apicbase = (val & 0xf000) | (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE)); /* if disabled, cannot be enabled again */ @@ -319,6 +352,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val) cpu_clear_apic_feature(&s->cpu->env); s->spurious_vec &= ~APIC_SV_ENABLE; } + +/* Transition from disabled mode to xAPIC */ +if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) && +(val & MSR_IA32_APICBASE_ENABLE)) { +s->apicbase |= MSR_IA32_APICBASE_ENABLE; +cpu_set_apic_feature(&s->cpu->env); +} + +/* Transition from xAPIC to x2APIC */ +if (cpu_has_x2apic_feature(&s->cpu->env) && +!(s->apicbase & MSR_IA32_APICBASE_EXTD) && +(val & MSR_IA32_APICBASE_EXTD)) { +s->apicbase |= MSR_IA32_APICBASE_EXTD; + +s->log_dest = ((s->initial_apic_id & 0x0) << 16) | + (1 << (s->initial_apic_id & 0xf)); +} } static void apic_set_tpr(APICCommonState *s, uint8_t val) diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index d95914066e..396f828be8 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -43,11 +43,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val) if (dev) { APICCommonState *s = APIC_COMMON(dev); APICCommonClass *info = APIC_COMMON_GET_CLASS(s); -/* switching to x2APIC, reset possibly modified xAPIC ID */ -if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) && -(val & MSR_IA32_APICBASE_EXTD)) { -s->id = s->initial_apic_id; -} +/* Reset possibly modified xAPIC ID */ +s->id = s->initial_apic_id; info->set_base(s, val); } } diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c index a9ff10c517..f6bbe33372 100644 --- a/target/i386/cpu-sysemu.c +++ b/target/i386/cpu-sysemu.c @@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env) env->features[FEAT_1_EDX] &= ~CPUID_APIC; } +void cpu_set_apic_feature(CPUX86State *env) +{ +env->features[FEAT_1_EDX] |= CPUID_APIC; +} + +bool cpu_has_x2apic_feature(CPUX86State *env) +{ +return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC; +} + bool cpu_is_bsp(X86CPU *cpu) { return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP; diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 97ad229d8b..240a1f9737 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -630,8 +630,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, * in CPL=3; remove them if they are ever implemented for system emulation. */ #if defined CONFIG_USER_ONLY -#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | CPUID_EXT_TSC_DEADLINE_TIMER | \ - CPUID_EXT_X2APIC) +#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | CP
[PATCH v5 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC
As userspace APIC now supports x2APIC, intel interrupt remapping hardware can be set to EIM mode when userspace local APIC is used. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/i386/intel_iommu.c | 11 --- 1 file changed, 11 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index dcc334060c..5e576f6059 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -4043,17 +4043,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) && x86_iommu_ir_supported(x86_iommu) ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF; } -if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) { -if (!kvm_irqchip_is_split()) { -error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split"); -return false; -} -if (!kvm_enable_x2apic()) { -error_setg(errp, "eim=on requires support on the KVM side" - "(X2APIC_API, first shipped in v4.7)"); -return false; -} -} /* Currently only address widths supported are 39 and 48 bits */ if ((s->aw_bits != VTD_HOST_AW_39BIT) && -- 2.25.1
[PATCH v5 0/5] Support x2APIC mode with TCG accelerator
Hi everyone, This series implements x2APIC mode in userspace local APIC and the RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu and AMD iommu are adjusted to support x2APIC interrupt remapping. With this series, we can now boot Linux kernel into x2APIC mode with TCG accelerator using either Intel or AMD iommu. Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot with enabled x2APIC and can enumerate CPU with APIC ID 257 Using Intel IOMMU qemu/build/qemu-system-x86_64 \ -smp 2,maxcpus=260 \ -cpu qemu64,x2apic=on \ -machine q35 \ -device intel-iommu,intremap=on,eim=on \ -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \ -m 2G \ -kernel $KERNEL_DIR \ -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \ -drive file=$IMAGE_DIR,format=raw \ -nographic \ -s Using AMD IOMMU qemu/build/qemu-system-x86_64 \ -smp 2,maxcpus=260 \ -cpu qemu64,x2apic=on \ -machine q35 \ -device amd-iommu,intremap=on,xtsup=on \ -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \ -m 2G \ -kernel $KERNEL_DIR \ -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \ -drive file=$IMAGE_DIR,format=raw \ -nographic \ -s Testing the emulated userspace APIC with kvm-unit-tests, disable test device with this patch diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c index 1734afb..f56fe1c 100644 --- a/lib/x86/fwcfg.c +++ b/lib/x86/fwcfg.c @@ -27,6 +27,7 @@ static void read_cfg_override(void) if ((str = getenv("TEST_DEVICE"))) no_test_device = !atol(str); + no_test_device = true; if ((str = getenv("MEMLIMIT"))) fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024; ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \ ./run_tests.sh -v -g apic TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL apic-split (54 tests, 8 unexpected failures, 1 skipped) TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests) TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures, 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests, 6 unexpected failures, 2 skipped) FAIL: apic_disable: *0xfee00030: 50014 FAIL: apic_disable: *0xfee00080: f0 FAIL: apic_disable: *0xfee00030: 50014 FAIL: apic_disable: *0xfee00080: f0 FAIL: apicbase: relocate apic These errors are because we don't disable MMIO region when switching to x2APIC and don't support relocate MMIO region yet. This is a problem because, MMIO region is the same for all CPUs, in order to support these we need to figure out how to allocate and manage different MMIO regions for each CPUs. This can be an improvement in the future. FAIL: nmi-after-sti FAIL: multiple nmi These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG. FAIL: TMCCT should stay at zero This error is related to APIC timer which should be addressed in separate patch. Version 5 changes, - Patch 3: + Rebase to master and fix conflict - Patch 5: + Create a helper function to get amdvi extended feature register instead of storing it in AMDVIState Version 4 changes, - Patch 5: + Instead of replacing IVHD type 0x10 with type 0x11, export both types for backward compatibility with old guest operating system + Flip the xtsup feature check condition in amdvi_int_remap_ga for readability Version 3 changes, - Patch 2: + Allow APIC ID > 255 only when x2APIC feature is supported on CPU + Make physical destination mode IPI which has destination id 0x a broadcast to xAPIC CPUs + Make cluster address 0xf in cluster model of xAPIC logical destination mode a broadcast to all clusters + Create new extended_log_dest to store APIC_LDR information in x2APIC instead of extending log_dest for backward compatibility in vmstate Version 2 changes, - Add support for APIC ID larger than 255 - Adjust AMD iommu for x2APIC suuport - Reorganize and split patch 1,2 into patch 1,2,3 in version 2 Thanks, Quang Minh. Bui Quang Minh (5): i386/tcg: implement x2APIC registers MSR access apic: add support for x2APIC mode apic, i386/tcg: add x2apic transitions intel_iommu: allow Extended Interrupt Mode when using userspace APIC amd_iommu: report x2APIC support to the operating system hw/i386/acpi-build.c | 127 + hw/i386/amd_iommu.c | 30 +- hw/i386/amd_iommu.h | 16 +- hw/i386/intel_iommu.c| 11 - hw/i386/x86.c| 8 +- hw/intc/apic.c | 395 +++
[PATCH v5 5/5] amd_iommu: report x2APIC support to the operating system
This commit adds XTSup configuration to let user choose to whether enable this feature or not. When XTSup is enabled, additional bytes in IRTE with enabled guest virtual VAPIC are used to support 32-bit destination id. Additionally, this commit exports IVHD type 0x11 besides the old IVHD type 0x10 in ACPI table. IVHD type 0x10 does not report full set of IOMMU features only the legacy ones, so operating system (e.g. Linux) may only detects x2APIC support if IVHD type 0x11 is available. The IVHD type 0x10 is kept so that old operating system that only parses type 0x10 can detect the IOMMU device. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/i386/acpi-build.c | 127 ++- hw/i386/amd_iommu.c | 30 +- hw/i386/amd_iommu.h | 16 -- 3 files changed, 117 insertions(+), 56 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 9c74fa17ad..aeb41d917f 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2336,30 +2336,23 @@ static void build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id, const char *oem_table_id) { -int ivhd_table_len = 24; AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default()); GArray *ivhd_blob = g_array_new(false, true, 1); AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id, .oem_table_id = oem_table_id }; +uint64_t feature_report; acpi_table_begin(&table, table_data); /* IVinfo - IO virtualization information common to all * IOMMU units in a system */ -build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4); +build_append_int_noprefix(table_data, + (1UL << 0) | /* EFRSup */ + (40UL << 8), /* PASize */ + 4); /* reserved */ build_append_int_noprefix(table_data, 0, 8); -/* IVHD definition - type 10h */ -build_append_int_noprefix(table_data, 0x10, 1); -/* virtualization flags */ -build_append_int_noprefix(table_data, - (1UL << 0) | /* HtTunEn */ - (1UL << 4) | /* iotblSup */ - (1UL << 6) | /* PrefSup */ - (1UL << 7), /* PPRSup */ - 1); - /* * A PCI bus walk, for each PCI host bridge, is necessary to create a * complete set of IVHD entries. Do this into a separate blob so that we @@ -2379,56 +2372,92 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id, build_append_int_noprefix(ivhd_blob, 0x001, 4); } -ivhd_table_len += ivhd_blob->len; - /* * When interrupt remapping is supported, we add a special IVHD device - * for type IO-APIC. - */ -if (x86_iommu_ir_supported(x86_iommu_get_default())) { -ivhd_table_len += 8; -} - -/* IVHD length */ -build_append_int_noprefix(table_data, ivhd_table_len, 2); -/* DeviceID */ -build_append_int_noprefix(table_data, - object_property_get_int(OBJECT(&s->pci), "addr", - &error_abort), 2); -/* Capability offset */ -build_append_int_noprefix(table_data, s->pci.capab_offset, 2); -/* IOMMU base address */ -build_append_int_noprefix(table_data, s->mmio.addr, 8); -/* PCI Segment Group */ -build_append_int_noprefix(table_data, 0, 2); -/* IOMMU info */ -build_append_int_noprefix(table_data, 0, 2); -/* IOMMU Feature Reporting */ -build_append_int_noprefix(table_data, - (48UL << 30) | /* HATS */ - (48UL << 28) | /* GATS */ - (1UL << 2) | /* GTSup */ - (1UL << 6),/* GASup */ - 4); - -/* IVHD entries as found above */ -g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len); -g_array_free(ivhd_blob, TRUE); - -/* - * Add a special IVHD device type. + * for type IO-APIC * Refer to spec - Table 95: IVHD device entry type codes * * Linux IOMMU driver checks for the special IVHD device (type IO-APIC). * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059' */ if (x86_iommu_ir_supported(x86_iommu_get_default())) { -build_append_int_noprefix(table_data, +build_append_int_noprefix(ivhd_blob, (0x1ull << 56) | /* type IOAPIC */ (IOAPIC_SB_DEVID << 40) | /* IOAPIC devid */ 0x48, /* special device */ 8); } + +/* IVHD definition - type 10h */ +build_append_int_noprefix(table_data, 0x10, 1); +/* virtualization
[PATCH v5 2/5] apic: add support for x2APIC mode
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC ID limit in userspace APIC. The array that manages local APICs is now dynamically allocated based on the max APIC ID of created x86 machine. Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC mode register access are supported. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/i386/x86.c | 8 +- hw/intc/apic.c | 266 hw/intc/apic_common.c | 9 ++ include/hw/i386/apic.h | 3 +- include/hw/i386/apic_internal.h | 7 +- target/i386/cpu-sysemu.c| 8 +- 6 files changed, 231 insertions(+), 70 deletions(-) diff --git a/hw/i386/x86.c b/hw/i386/x86.c index a88a126123..8b70f0a6ea 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version) * Can we support APIC ID 255 or higher? * * Under Xen: yes. - * With userspace emulated lapic: no + * With userspace emulated lapic: checked later in apic_common_set_id. * With KVM's in-kernel lapic: only if X2APIC API is enabled. */ if (x86ms->apic_id_limit > 255 && !xen_enabled() && -(!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) { +kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) { error_report("current -smp configuration requires kernel " "irqchip and X2APIC API support."); exit(EXIT_FAILURE); @@ -146,6 +146,10 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version) kvm_set_max_apic_id(x86ms->apic_id_limit); } +if (!kvm_irqchip_in_kernel()) { +apic_set_max_apic_id(x86ms->apic_id_limit); +} + possible_cpus = mc->possible_cpu_arch_ids(ms); for (i = 0; i < ms->smp.cpus; i++) { x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal); diff --git a/hw/intc/apic.c b/hw/intc/apic.c index cb8c20de93..9f741794a7 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -31,15 +31,15 @@ #include "hw/i386/apic-msidef.h" #include "qapi/error.h" #include "qom/object.h" - -#define MAX_APICS 255 -#define MAX_APIC_WORDS 8 +#include "tcg/helper-tcg.h" #define SYNC_FROM_VAPIC 0x1 #define SYNC_TO_VAPIC 0x2 #define SYNC_ISR_IRR_TO_VAPIC 0x4 -static APICCommonState *local_apics[MAX_APICS + 1]; +static APICCommonState **local_apics; +static uint32_t max_apics; +static uint32_t max_apic_words; #define TYPE_APIC "apic" /*This is reusing the APICCommonState typedef from APIC_COMMON */ @@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC, static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode); static void apic_update_irq(APICCommonState *s); static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, - uint8_t dest, uint8_t dest_mode); + uint32_t dest, uint8_t dest_mode); + +void apic_set_max_apic_id(uint32_t max_apic_id) +{ +int word_size = 32; + +/* round up the max apic id to next multiple of words */ +max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1); + +local_apics = g_malloc0(sizeof(*local_apics) * max_apics); +max_apic_words = max_apics >> 5; +} + /* Find first bit starting from msb */ static int apic_fls_bit(uint32_t value) @@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s) #define foreach_apic(apic, deliver_bitmask, code) \ {\ int __i, __j;\ -for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\ +for(__i = 0; __i < max_apic_words; __i++) {\ uint32_t __mask = deliver_bitmask[__i];\ if (__mask) {\ for(__j = 0; __j < 32; __j++) {\ @@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, { int i, d; d = -1; -for(i = 0; i < MAX_APIC_WORDS; i++) { +for(i = 0; i < max_apic_words; i++) { if (deliver_bitmask[i]) { d = i * 32 + apic_ffs_bit(deliver_bitmask[i]); break; @@ -276,16 +288,18 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, apic_set_irq(apic_iter, vector_num, trigger_mode) ); } -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, - uint8_t vector_num, uint8_t trigger_mode) +static void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t trigger_mode) { -uint32_t deliver_bitmask[MAX_APIC_WORDS]; +uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t)); trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num, trigger_mode);
Re: [PATCH v5 0/5] Support x2APIC mode with TCG accelerator
On 7/15/23 21:28, Bui Quang Minh wrote: Hi everyone, This series implements x2APIC mode in userspace local APIC and the RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu and AMD iommu are adjusted to support x2APIC interrupt remapping. With this series, we can now boot Linux kernel into x2APIC mode with TCG accelerator using either Intel or AMD iommu. Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot with enabled x2APIC and can enumerate CPU with APIC ID 257 Using Intel IOMMU qemu/build/qemu-system-x86_64 \ -smp 2,maxcpus=260 \ -cpu qemu64,x2apic=on \ -machine q35 \ -device intel-iommu,intremap=on,eim=on \ -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \ -m 2G \ -kernel $KERNEL_DIR \ -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \ -drive file=$IMAGE_DIR,format=raw \ -nographic \ -s Using AMD IOMMU qemu/build/qemu-system-x86_64 \ -smp 2,maxcpus=260 \ -cpu qemu64,x2apic=on \ -machine q35 \ -device amd-iommu,intremap=on,xtsup=on \ -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \ -m 2G \ -kernel $KERNEL_DIR \ -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \ -drive file=$IMAGE_DIR,format=raw \ -nographic \ -s Testing the emulated userspace APIC with kvm-unit-tests, disable test device with this patch diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c index 1734afb..f56fe1c 100644 --- a/lib/x86/fwcfg.c +++ b/lib/x86/fwcfg.c @@ -27,6 +27,7 @@ static void read_cfg_override(void) if ((str = getenv("TEST_DEVICE"))) no_test_device = !atol(str); + no_test_device = true; if ((str = getenv("MEMLIMIT"))) fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024; ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \ ./run_tests.sh -v -g apic TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL apic-split (54 tests, 8 unexpected failures, 1 skipped) TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests) TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures, 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests, 6 unexpected failures, 2 skipped) FAIL: apic_disable: *0xfee00030: 50014 FAIL: apic_disable: *0xfee00080: f0 FAIL: apic_disable: *0xfee00030: 50014 FAIL: apic_disable: *0xfee00080: f0 FAIL: apicbase: relocate apic These errors are because we don't disable MMIO region when switching to x2APIC and don't support relocate MMIO region yet. This is a problem because, MMIO region is the same for all CPUs, in order to support these we need to figure out how to allocate and manage different MMIO regions for each CPUs. This can be an improvement in the future. FAIL: nmi-after-sti FAIL: multiple nmi These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG. FAIL: TMCCT should stay at zero This error is related to APIC timer which should be addressed in separate patch. Version 5 changes, - Patch 3: + Rebase to master and fix conflict - Patch 5: + Create a helper function to get amdvi extended feature register instead of storing it in AMDVIState Version 4 changes, - Patch 5: + Instead of replacing IVHD type 0x10 with type 0x11, export both types for backward compatibility with old guest operating system + Flip the xtsup feature check condition in amdvi_int_remap_ga for readability Version 3 changes, - Patch 2: + Allow APIC ID > 255 only when x2APIC feature is supported on CPU + Make physical destination mode IPI which has destination id 0x a broadcast to xAPIC CPUs + Make cluster address 0xf in cluster model of xAPIC logical destination mode a broadcast to all clusters + Create new extended_log_dest to store APIC_LDR information in x2APIC instead of extending log_dest for backward compatibility in vmstate Version 2 changes, - Add support for APIC ID larger than 255 - Adjust AMD iommu for x2APIC suuport - Reorganize and split patch 1,2 into patch 1,2,3 in version 2 Thanks, Quang Minh. Bui Quang Minh (5): i386/tcg: implement x2APIC registers MSR access apic: add support for x2APIC mode apic, i386/tcg: add x2apic transitions intel_iommu: allow Extended Interrupt Mode when using userspace APIC amd_iommu: report x2APIC support to the operating system hw/i386/acpi-build.c | 127 + hw/i386/amd_iommu.c | 30 +- hw/i386/amd_iommu.h | 16 +- hw/i386/intel_iommu.c| 11
[PATCH v6 0/5] Support x2APIC mode with TCG accelerator
Hi everyone, This series implements x2APIC mode in userspace local APIC and the RDMSR/WRMSR helper to access x2APIC registers in x2APIC mode. Intel iommu and AMD iommu are adjusted to support x2APIC interrupt remapping. With this series, we can now boot Linux kernel into x2APIC mode with TCG accelerator using either Intel or AMD iommu. Testing to boot my own built Linux 6.3.0-rc2, the kernel successfully boot with enabled x2APIC and can enumerate CPU with APIC ID 257 Using Intel IOMMU qemu/build/qemu-system-x86_64 \ -smp 2,maxcpus=260 \ -cpu qemu64,x2apic=on \ -machine q35 \ -device intel-iommu,intremap=on,eim=on \ -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \ -m 2G \ -kernel $KERNEL_DIR \ -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \ -drive file=$IMAGE_DIR,format=raw \ -nographic \ -s Using AMD IOMMU qemu/build/qemu-system-x86_64 \ -smp 2,maxcpus=260 \ -cpu qemu64,x2apic=on \ -machine q35 \ -device amd-iommu,intremap=on,xtsup=on \ -device qemu64-x86_64-cpu,x2apic=on,core-id=257,socket-id=0,thread-id=0 \ -m 2G \ -kernel $KERNEL_DIR \ -append "nokaslr console=ttyS0 root=/dev/sda earlyprintk=serial net.ifnames=0" \ -drive file=$IMAGE_DIR,format=raw \ -nographic \ -s Testing the emulated userspace APIC with kvm-unit-tests, disable test device with this patch diff --git a/lib/x86/fwcfg.c b/lib/x86/fwcfg.c index 1734afb..f56fe1c 100644 --- a/lib/x86/fwcfg.c +++ b/lib/x86/fwcfg.c @@ -27,6 +27,7 @@ static void read_cfg_override(void) if ((str = getenv("TEST_DEVICE"))) no_test_device = !atol(str); + no_test_device = true; if ((str = getenv("MEMLIMIT"))) fw_override[FW_CFG_MAX_RAM] = atol(str) * 1024 * 1024; ~ env QEMU=/home/minh/Desktop/oss/qemu/build/qemu-system-x86_64 ACCEL=tcg \ ./run_tests.sh -v -g apic TESTNAME=apic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline -machine kernel_irqchip=split FAIL apic-split (54 tests, 8 unexpected failures, 1 skipped) TESTNAME=ioapic-split TIMEOUT=90s ACCEL=tcg ./x86/run x86/ioapic.flat -smp 1 -cpu qemu64 -machine kernel_irqchip=split PASS ioapic-split (19 tests) TESTNAME=x2apic TIMEOUT=30 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,+x2apic,+tsc-deadline FAIL x2apic (54 tests, 8 unexpected failures, 1 skipped) TESTNAME=xapic TIMEOUT=60 ACCEL=tcg ./x86/run x86/apic.flat -smp 2 -cpu qemu64,-x2apic,+tsc-deadline -machine pit=off FAIL xapic (43 tests, 6 unexpected failures, 2 skipped) FAIL: apic_disable: *0xfee00030: 50014 FAIL: apic_disable: *0xfee00080: f0 FAIL: apic_disable: *0xfee00030: 50014 FAIL: apic_disable: *0xfee00080: f0 FAIL: apicbase: relocate apic These errors are because we don't disable MMIO region when switching to x2APIC and don't support relocate MMIO region yet. This is a problem because, MMIO region is the same for all CPUs, in order to support these we need to figure out how to allocate and manage different MMIO regions for each CPUs. This can be an improvement in the future. FAIL: nmi-after-sti FAIL: multiple nmi These errors are in the way we handle CPU_INTERRUPT_NMI in core TCG. FAIL: TMCCT should stay at zero This error is related to APIC timer which should be addressed in separate patch. Version 6 changes, - Patch 5: + Make all places use the amdvi_extended_feature_register to get extended feature register Version 5 changes, - Patch 3: + Rebase to master and fix conflict - Patch 5: + Create a helper function to get amdvi extended feature register instead of storing it in AMDVIState Version 4 changes, - Patch 5: + Instead of replacing IVHD type 0x10 with type 0x11, export both types for backward compatibility with old guest operating system + Flip the xtsup feature check condition in amdvi_int_remap_ga for readability Version 3 changes, - Patch 2: + Allow APIC ID > 255 only when x2APIC feature is supported on CPU + Make physical destination mode IPI which has destination id 0x a broadcast to xAPIC CPUs + Make cluster address 0xf in cluster model of xAPIC logical destination mode a broadcast to all clusters + Create new extended_log_dest to store APIC_LDR information in x2APIC instead of extending log_dest for backward compatibility in vmstate Version 2 changes, - Add support for APIC ID larger than 255 - Adjust AMD iommu for x2APIC suuport - Reorganize and split patch 1,2 into patch 1,2,3 in version 2 Thanks, Quang Minh. Bui Quang Minh (5): i386/tcg: implement x2APIC registers MSR access apic: add support for x2APIC mode apic, i386/tcg: add x2apic transitions intel_iommu: allow Extended Interrupt Mode when using userspace APIC amd_iommu: report x2APIC support to the operating system hw/i386/acpi-build.c | 129 + hw/i386/amd_iommu.c | 29 +- hw/i386/amd_iommu.h | 16 +- hw/i386/intel_iommu.c
[PATCH v6 3/5] apic, i386/tcg: add x2apic transitions
This commit adds support for x2APIC transitions when writing to MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to TCG_EXT_FEATURES. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/intc/apic.c | 50 hw/intc/apic_common.c| 7 ++-- target/i386/cpu-sysemu.c | 10 ++ target/i386/cpu.c| 8 ++--- target/i386/cpu.h| 6 target/i386/tcg/sysemu/misc_helper.c | 4 +++ 6 files changed, 76 insertions(+), 9 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index 9f741794a7..b8f56836a6 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -309,8 +309,41 @@ bool is_x2apic_mode(DeviceState *dev) return s->apicbase & MSR_IA32_APICBASE_EXTD; } +static void apic_set_base_check(APICCommonState *s, uint64_t val) +{ +/* Enable x2apic when x2apic is not supported by CPU */ +if (!cpu_has_x2apic_feature(&s->cpu->env) && +val & MSR_IA32_APICBASE_EXTD) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); + +/* + * Transition into invalid state + * (s->apicbase & MSR_IA32_APICBASE_ENABLE == 0) && + * (s->apicbase & MSR_IA32_APICBASE_EXTD) == 1 + */ +if (!(val & MSR_IA32_APICBASE_ENABLE) && +(val & MSR_IA32_APICBASE_EXTD)) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); + +/* Invalid transition from disabled mode to x2APIC */ +if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) && +!(s->apicbase & MSR_IA32_APICBASE_EXTD) && +(val & MSR_IA32_APICBASE_ENABLE) && +(val & MSR_IA32_APICBASE_EXTD)) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); + +/* Invalid transition from x2APIC to xAPIC */ +if ((s->apicbase & MSR_IA32_APICBASE_ENABLE) && +(s->apicbase & MSR_IA32_APICBASE_EXTD) && +(val & MSR_IA32_APICBASE_ENABLE) && +!(val & MSR_IA32_APICBASE_EXTD)) +raise_exception_ra(&s->cpu->env, EXCP0D_GPF, GETPC()); +} + static void apic_set_base(APICCommonState *s, uint64_t val) { +apic_set_base_check(s, val); + s->apicbase = (val & 0xf000) | (s->apicbase & (MSR_IA32_APICBASE_BSP | MSR_IA32_APICBASE_ENABLE)); /* if disabled, cannot be enabled again */ @@ -319,6 +352,23 @@ static void apic_set_base(APICCommonState *s, uint64_t val) cpu_clear_apic_feature(&s->cpu->env); s->spurious_vec &= ~APIC_SV_ENABLE; } + +/* Transition from disabled mode to xAPIC */ +if (!(s->apicbase & MSR_IA32_APICBASE_ENABLE) && +(val & MSR_IA32_APICBASE_ENABLE)) { +s->apicbase |= MSR_IA32_APICBASE_ENABLE; +cpu_set_apic_feature(&s->cpu->env); +} + +/* Transition from xAPIC to x2APIC */ +if (cpu_has_x2apic_feature(&s->cpu->env) && +!(s->apicbase & MSR_IA32_APICBASE_EXTD) && +(val & MSR_IA32_APICBASE_EXTD)) { +s->apicbase |= MSR_IA32_APICBASE_EXTD; + +s->log_dest = ((s->initial_apic_id & 0x0) << 16) | + (1 << (s->initial_apic_id & 0xf)); +} } static void apic_set_tpr(APICCommonState *s, uint8_t val) diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index d95914066e..396f828be8 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -43,11 +43,8 @@ void cpu_set_apic_base(DeviceState *dev, uint64_t val) if (dev) { APICCommonState *s = APIC_COMMON(dev); APICCommonClass *info = APIC_COMMON_GET_CLASS(s); -/* switching to x2APIC, reset possibly modified xAPIC ID */ -if (!(s->apicbase & MSR_IA32_APICBASE_EXTD) && -(val & MSR_IA32_APICBASE_EXTD)) { -s->id = s->initial_apic_id; -} +/* Reset possibly modified xAPIC ID */ +s->id = s->initial_apic_id; info->set_base(s, val); } } diff --git a/target/i386/cpu-sysemu.c b/target/i386/cpu-sysemu.c index a9ff10c517..f6bbe33372 100644 --- a/target/i386/cpu-sysemu.c +++ b/target/i386/cpu-sysemu.c @@ -235,6 +235,16 @@ void cpu_clear_apic_feature(CPUX86State *env) env->features[FEAT_1_EDX] &= ~CPUID_APIC; } +void cpu_set_apic_feature(CPUX86State *env) +{ +env->features[FEAT_1_EDX] |= CPUID_APIC; +} + +bool cpu_has_x2apic_feature(CPUX86State *env) +{ +return env->features[FEAT_1_ECX] & CPUID_EXT_X2APIC; +} + bool cpu_is_bsp(X86CPU *cpu) { return cpu_get_apic_base(cpu->apic_state) & MSR_IA32_APICBASE_BSP; diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 97ad229d8b..240a1f9737 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -630,8 +630,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1, * in CPL=3; remove them if they are ever implemented for system emulation. */ #if defined CONFIG_USER_ONLY -#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | CPUID_EXT_TSC_DEADLINE_TIMER | \ - CPUID_EXT_X2APIC) +#define CPUID_EXT_KERNEL_FEATURES (CPUID_EXT_PCID | CP
[PATCH v6 5/5] amd_iommu: report x2APIC support to the operating system
This commit adds XTSup configuration to let user choose to whether enable this feature or not. When XTSup is enabled, additional bytes in IRTE with enabled guest virtual VAPIC are used to support 32-bit destination id. Additionally, this commit exports IVHD type 0x11 besides the old IVHD type 0x10 in ACPI table. IVHD type 0x10 does not report full set of IOMMU features only the legacy ones, so operating system (e.g. Linux) may only detects x2APIC support if IVHD type 0x11 is available. The IVHD type 0x10 is kept so that old operating system that only parses type 0x10 can detect the IOMMU device. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/i386/acpi-build.c | 129 +++ hw/i386/amd_iommu.c | 29 +- hw/i386/amd_iommu.h | 16 -- 3 files changed, 117 insertions(+), 57 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 9c74fa17ad..4231b80f25 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2336,30 +2336,23 @@ static void build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id, const char *oem_table_id) { -int ivhd_table_len = 24; AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default()); GArray *ivhd_blob = g_array_new(false, true, 1); AcpiTable table = { .sig = "IVRS", .rev = 1, .oem_id = oem_id, .oem_table_id = oem_table_id }; +uint64_t feature_report; acpi_table_begin(&table, table_data); /* IVinfo - IO virtualization information common to all * IOMMU units in a system */ -build_append_int_noprefix(table_data, 40UL << 8/* PASize */, 4); +build_append_int_noprefix(table_data, + (1UL << 0) | /* EFRSup */ + (40UL << 8), /* PASize */ + 4); /* reserved */ build_append_int_noprefix(table_data, 0, 8); -/* IVHD definition - type 10h */ -build_append_int_noprefix(table_data, 0x10, 1); -/* virtualization flags */ -build_append_int_noprefix(table_data, - (1UL << 0) | /* HtTunEn */ - (1UL << 4) | /* iotblSup */ - (1UL << 6) | /* PrefSup */ - (1UL << 7), /* PPRSup */ - 1); - /* * A PCI bus walk, for each PCI host bridge, is necessary to create a * complete set of IVHD entries. Do this into a separate blob so that we @@ -2379,56 +2372,94 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker, const char *oem_id, build_append_int_noprefix(ivhd_blob, 0x001, 4); } -ivhd_table_len += ivhd_blob->len; - /* * When interrupt remapping is supported, we add a special IVHD device - * for type IO-APIC. - */ -if (x86_iommu_ir_supported(x86_iommu_get_default())) { -ivhd_table_len += 8; -} - -/* IVHD length */ -build_append_int_noprefix(table_data, ivhd_table_len, 2); -/* DeviceID */ -build_append_int_noprefix(table_data, - object_property_get_int(OBJECT(&s->pci), "addr", - &error_abort), 2); -/* Capability offset */ -build_append_int_noprefix(table_data, s->pci.capab_offset, 2); -/* IOMMU base address */ -build_append_int_noprefix(table_data, s->mmio.addr, 8); -/* PCI Segment Group */ -build_append_int_noprefix(table_data, 0, 2); -/* IOMMU info */ -build_append_int_noprefix(table_data, 0, 2); -/* IOMMU Feature Reporting */ -build_append_int_noprefix(table_data, - (48UL << 30) | /* HATS */ - (48UL << 28) | /* GATS */ - (1UL << 2) | /* GTSup */ - (1UL << 6),/* GASup */ - 4); - -/* IVHD entries as found above */ -g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len); -g_array_free(ivhd_blob, TRUE); - -/* - * Add a special IVHD device type. + * for type IO-APIC * Refer to spec - Table 95: IVHD device entry type codes * * Linux IOMMU driver checks for the special IVHD device (type IO-APIC). * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059' */ if (x86_iommu_ir_supported(x86_iommu_get_default())) { -build_append_int_noprefix(table_data, +build_append_int_noprefix(ivhd_blob, (0x1ull << 56) | /* type IOAPIC */ (IOAPIC_SB_DEVID << 40) | /* IOAPIC devid */ 0x48, /* special device */ 8); } + +/* IVHD definition - type 10h */ +build_append_int_noprefix(table_data, 0x10, 1); +/* virtualization
[PATCH v6 2/5] apic: add support for x2APIC mode
This commit extends the APIC ID to 32-bit long and remove the 255 max APIC ID limit in userspace APIC. The array that manages local APICs is now dynamically allocated based on the max APIC ID of created x86 machine. Also, new x2APIC IPI destination determination scheme, self IPI and x2APIC mode register access are supported. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/i386/x86.c | 8 +- hw/intc/apic.c | 266 hw/intc/apic_common.c | 9 ++ include/hw/i386/apic.h | 3 +- include/hw/i386/apic_internal.h | 7 +- target/i386/cpu-sysemu.c| 8 +- 6 files changed, 231 insertions(+), 70 deletions(-) diff --git a/hw/i386/x86.c b/hw/i386/x86.c index a88a126123..8b70f0a6ea 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -132,11 +132,11 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version) * Can we support APIC ID 255 or higher? * * Under Xen: yes. - * With userspace emulated lapic: no + * With userspace emulated lapic: checked later in apic_common_set_id. * With KVM's in-kernel lapic: only if X2APIC API is enabled. */ if (x86ms->apic_id_limit > 255 && !xen_enabled() && -(!kvm_irqchip_in_kernel() || !kvm_enable_x2apic())) { +kvm_irqchip_in_kernel() && !kvm_enable_x2apic()) { error_report("current -smp configuration requires kernel " "irqchip and X2APIC API support."); exit(EXIT_FAILURE); @@ -146,6 +146,10 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version) kvm_set_max_apic_id(x86ms->apic_id_limit); } +if (!kvm_irqchip_in_kernel()) { +apic_set_max_apic_id(x86ms->apic_id_limit); +} + possible_cpus = mc->possible_cpu_arch_ids(ms); for (i = 0; i < ms->smp.cpus; i++) { x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal); diff --git a/hw/intc/apic.c b/hw/intc/apic.c index cb8c20de93..9f741794a7 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -31,15 +31,15 @@ #include "hw/i386/apic-msidef.h" #include "qapi/error.h" #include "qom/object.h" - -#define MAX_APICS 255 -#define MAX_APIC_WORDS 8 +#include "tcg/helper-tcg.h" #define SYNC_FROM_VAPIC 0x1 #define SYNC_TO_VAPIC 0x2 #define SYNC_ISR_IRR_TO_VAPIC 0x4 -static APICCommonState *local_apics[MAX_APICS + 1]; +static APICCommonState **local_apics; +static uint32_t max_apics; +static uint32_t max_apic_words; #define TYPE_APIC "apic" /*This is reusing the APICCommonState typedef from APIC_COMMON */ @@ -49,7 +49,19 @@ DECLARE_INSTANCE_CHECKER(APICCommonState, APIC, static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode); static void apic_update_irq(APICCommonState *s); static void apic_get_delivery_bitmask(uint32_t *deliver_bitmask, - uint8_t dest, uint8_t dest_mode); + uint32_t dest, uint8_t dest_mode); + +void apic_set_max_apic_id(uint32_t max_apic_id) +{ +int word_size = 32; + +/* round up the max apic id to next multiple of words */ +max_apics = (max_apic_id + word_size - 1) & ~(word_size - 1); + +local_apics = g_malloc0(sizeof(*local_apics) * max_apics); +max_apic_words = max_apics >> 5; +} + /* Find first bit starting from msb */ static int apic_fls_bit(uint32_t value) @@ -199,7 +211,7 @@ static void apic_external_nmi(APICCommonState *s) #define foreach_apic(apic, deliver_bitmask, code) \ {\ int __i, __j;\ -for(__i = 0; __i < MAX_APIC_WORDS; __i++) {\ +for(__i = 0; __i < max_apic_words; __i++) {\ uint32_t __mask = deliver_bitmask[__i];\ if (__mask) {\ for(__j = 0; __j < 32; __j++) {\ @@ -226,7 +238,7 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, { int i, d; d = -1; -for(i = 0; i < MAX_APIC_WORDS; i++) { +for(i = 0; i < max_apic_words; i++) { if (deliver_bitmask[i]) { d = i * 32 + apic_ffs_bit(deliver_bitmask[i]); break; @@ -276,16 +288,18 @@ static void apic_bus_deliver(const uint32_t *deliver_bitmask, apic_set_irq(apic_iter, vector_num, trigger_mode) ); } -void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, - uint8_t vector_num, uint8_t trigger_mode) +static void apic_deliver_irq(uint32_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t trigger_mode) { -uint32_t deliver_bitmask[MAX_APIC_WORDS]; +uint32_t *deliver_bitmask = g_malloc(max_apic_words * sizeof(uint32_t)); trace_apic_deliver_irq(dest, dest_mode, delivery_mode, vector_num, trigger_mode);
[PATCH v6 1/5] i386/tcg: implement x2APIC registers MSR access
This commit refactors apic_mem_read/write to support both MMIO access in xAPIC and MSR access in x2APIC. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/intc/apic.c | 79 ++-- hw/intc/trace-events | 4 +- include/hw/i386/apic.h | 3 ++ target/i386/cpu.h| 3 ++ target/i386/tcg/sysemu/misc_helper.c | 27 ++ 5 files changed, 86 insertions(+), 30 deletions(-) diff --git a/hw/intc/apic.c b/hw/intc/apic.c index ac3d47d231..cb8c20de93 100644 --- a/hw/intc/apic.c +++ b/hw/intc/apic.c @@ -288,6 +288,13 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, apic_bus_deliver(deliver_bitmask, delivery_mode, vector_num, trigger_mode); } +bool is_x2apic_mode(DeviceState *dev) +{ +APICCommonState *s = APIC(dev); + +return s->apicbase & MSR_IA32_APICBASE_EXTD; +} + static void apic_set_base(APICCommonState *s, uint64_t val) { s->apicbase = (val & 0xf000) | @@ -636,16 +643,11 @@ static void apic_timer(void *opaque) apic_timer_update(s, s->next_time); } -static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) +uint64_t apic_register_read(int index) { DeviceState *dev; APICCommonState *s; -uint32_t val; -int index; - -if (size < 4) { -return 0; -} +uint64_t val; dev = cpu_get_current_apic(); if (!dev) { @@ -653,7 +655,6 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) } s = APIC(dev); -index = (addr >> 4) & 0xff; switch(index) { case 0x02: /* id */ val = s->id << 24; @@ -720,7 +721,23 @@ static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) val = 0; break; } -trace_apic_mem_readl(addr, val); + +trace_apic_register_read(index, val); +return val; +} + +static uint64_t apic_mem_read(void *opaque, hwaddr addr, unsigned size) +{ +uint32_t val; +int index; + +if (size < 4) { +return 0; +} + +index = (addr >> 4) & 0xff; +val = (uint32_t)apic_register_read(index); + return val; } @@ -737,27 +754,10 @@ static void apic_send_msi(MSIMessage *msi) apic_deliver_irq(dest, dest_mode, delivery, vector, trigger_mode); } -static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, - unsigned size) +void apic_register_write(int index, uint64_t val) { DeviceState *dev; APICCommonState *s; -int index = (addr >> 4) & 0xff; - -if (size < 4) { -return; -} - -if (addr > 0xfff || !index) { -/* MSI and MMIO APIC are at the same memory location, - * but actually not on the global bus: MSI is on PCI bus - * APIC is connected directly to the CPU. - * Mapping them on the global bus happens to work because - * MSI registers are reserved in APIC MMIO and vice versa. */ -MSIMessage msi = { .address = addr, .data = val }; -apic_send_msi(&msi); -return; -} dev = cpu_get_current_apic(); if (!dev) { @@ -765,7 +765,7 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, } s = APIC(dev); -trace_apic_mem_writel(addr, val); +trace_apic_register_write(index, val); switch(index) { case 0x02: @@ -843,6 +843,29 @@ static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, } } +static void apic_mem_write(void *opaque, hwaddr addr, uint64_t val, + unsigned size) +{ +int index = (addr >> 4) & 0xff; + +if (size < 4) { +return; +} + +if (addr > 0xfff || !index) { +/* MSI and MMIO APIC are at the same memory location, + * but actually not on the global bus: MSI is on PCI bus + * APIC is connected directly to the CPU. + * Mapping them on the global bus happens to work because + * MSI registers are reserved in APIC MMIO and vice versa. */ +MSIMessage msi = { .address = addr, .data = val }; +apic_send_msi(&msi); +return; +} + +apic_register_write(index, val); +} + static void apic_pre_save(APICCommonState *s) { apic_sync_vapic(s, SYNC_FROM_VAPIC); diff --git a/hw/intc/trace-events b/hw/intc/trace-events index 36ff71f947..1ef29d0256 100644 --- a/hw/intc/trace-events +++ b/hw/intc/trace-events @@ -14,8 +14,8 @@ cpu_get_apic_base(uint64_t val) "0x%016"PRIx64 # apic.c apic_local_deliver(int vector, uint32_t lvt) "vector %d delivery mode %d" apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t trigger_mode) "dest %d dest_mode %d delivery_mode %d vector %d trigger_mode %d" -apic_mem_readl(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x" -apic_mem_writel(uint64_t addr, uint32_t val) "0x%"PRIx64" = 0x%08x" +apic_register_read(uint8_t reg, uint64_t val) "register 0x%02x = 0x
[PATCH v6 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC
As userspace APIC now supports x2APIC, intel interrupt remapping hardware can be set to EIM mode when userspace local APIC is used. Reviewed-by: Michael S. Tsirkin Signed-off-by: Bui Quang Minh --- hw/i386/intel_iommu.c | 11 --- 1 file changed, 11 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index dcc334060c..5e576f6059 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -4043,17 +4043,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) && x86_iommu_ir_supported(x86_iommu) ? ON_OFF_AUTO_ON : ON_OFF_AUTO_OFF; } -if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) { -if (!kvm_irqchip_is_split()) { -error_setg(errp, "eim=on requires accel=kvm,kernel-irqchip=split"); -return false; -} -if (!kvm_enable_x2apic()) { -error_setg(errp, "eim=on requires support on the KVM side" - "(X2APIC_API, first shipped in v4.7)"); -return false; -} -} /* Currently only address widths supported are 39 and 48 bits */ if ((s->aw_bits != VTD_HOST_AW_39BIT) && -- 2.25.1
Re: [PATCH v1 6/9] gfxstream + rutabaga: add initial support for gfxstream
Am 11. Juli 2023 02:56:46 UTC schrieb Gurchetan Singh : >This adds initial support for gfxstream and cross-domain. Both >features rely on virtio-gpu blob resources and context types, which >are also implemented in this patch. > >gfxstream has a long and illustrious history in Android graphics >paravirtualization. It has been powering graphics in the Android >Studio Emulator for more than a decade, which is the main developer >platform. > >Originally conceived by Jesse Hall, it was first known as "EmuGL" [a]. >The key design characteristic was a 1:1 threading model and >auto-generation, which fit nicely with the OpenGLES spec. It also >allowed easy layering with ANGLE on the host, which provides the GLES >implementations on Windows or MacOS enviroments. > >gfxstream has traditionally been maintained by a single engineer, and >between 2015 to 2021, the goldfish throne passed to Frank Yang. >Historians often remark this glorious reign ("pax gfxstreama" is the >academic term) was comparable to that of Augustus and the both Queen >Elizabeths. Just to name a few accomplishments in a resplendent >panoply: higher versions of GLES, address space graphics, snapshot >support and CTS compliant Vulkan [b]. > >One major drawback was the use of out-of-tree goldfish drivers. >Android engineers didn't know much about DRM/KMS and especially TTM so >a simple guest to host pipe was conceived. > >Luckily, virtio-gpu 3D started to emerge in 2016 due to the work of >the Mesa/virglrenderer communities. In 2018, the initial virtio-gpu >port of gfxstream was done by Cuttlefish enthusiast Alistair Delva. >It was a symbol compatible replacement of virglrenderer [c] and named >"AVDVirglrenderer". This implementation forms the basis of the >current gfxstream host implementation still in use today. > >cross-domain support follows a similar arc. Originally conceived by >Wayland aficionado David Reveman and crosvm enjoyer Zach Reizner in >2018, it initially relied on the downstream "virtio-wl" device. > >In 2020 and 2021, virtio-gpu was extended to include blob resources >and multiple timelines by yours truly, features gfxstream/cross-domain >both require to function correctly. > >Right now, we stand at the precipice of a truly fantastic possibility: >the Android Emulator powered by upstream QEMU and upstream Linux >kernel. gfxstream will then be packaged properfully, and app >developers can even fix gfxstream bugs on their own if they encounter >them. > >It's been quite the ride, my friends. Where will gfxstream head next, >nobody really knows. I wouldn't be surprised if it's around for >another decade, maintained by a new generation of Android graphics >enthusiasts. AFAIU gfxstream is a substitute for virglrenderer and relies on an auto-generated interface based on OpenGL/Vulkan between host and guest. I would like to use it in QEMU (Windows host, Linux guest). So I tried to test your series under Linux (for now). For now, I couldn't get past the point of aborts with generic error messages or no error messages with blank screens. Though my Linux host might not provide a recent enough environment. Read on for some technical reviews below. > >Technical details: > - Very simple initial display integration: just used Pixman > - Largely, 1:1 mapping of virtio-gpu hypercalls to rutabaga function >calls > >[a] https://android-review.googlesource.com/c/platform/development/+/34470 >[b] >https://android-review.googlesource.com/q/topic:%22vulkan-hostconnection-start%22 >[c] >https://android-review.googlesource.com/c/device/generic/goldfish-opengl/+/761927 > >Signed-off-by: Gurchetan Singh >--- >v2: Incorported various suggestions by Akihiko Odaki and Bernard Berschow >- Removed GET_VIRTIO_GPU_GL / GET_RUTABAGA macros >- Used error_report(..) >- Used g_autofree to fix leaks on error paths >- Removed unnecessary casts >- added virtio-gpu-pci-rutabaga.c + virtio-vga-rutabaga.c files > > hw/display/virtio-gpu-pci-rutabaga.c | 48 ++ > hw/display/virtio-gpu-rutabaga.c | 1088 ++ > hw/display/virtio-vga-rutabaga.c | 52 ++ > 3 files changed, 1188 insertions(+) > create mode 100644 hw/display/virtio-gpu-pci-rutabaga.c > create mode 100644 hw/display/virtio-gpu-rutabaga.c > create mode 100644 hw/display/virtio-vga-rutabaga.c > >diff --git a/hw/display/virtio-gpu-pci-rutabaga.c >b/hw/display/virtio-gpu-pci-rutabaga.c >new file mode 100644 >index 00..5765bef266 >--- /dev/null >+++ b/hw/display/virtio-gpu-pci-rutabaga.c >@@ -0,0 +1,48 @@ >+// SPDX-License-Identifier: GPL-2.0 >+ >+#include "qemu/osdep.h" >+#include "qapi/error.h" >+#include "qemu/module.h" >+#include "hw/pci/pci.h" >+#include "hw/qdev-properties.h" >+#include "hw/virtio/virtio.h" >+#include "hw/virtio/virtio-bus.h" >+#include "hw/virtio/virtio-gpu-pci.h" >+#include "qom/object.h" >+ >+#define TYPE_VIRTIO_GPU_RUTABAGA_PCI "virtio-gpu-rutabaga-pci" >+typedef struct VirtIOGPURUTABAGAPCI VirtIOGPURUTABA