Re: [Xen-devel] [PATCH v7 01/28] linkage: new macros for assembler symbols

2019-01-31 Thread Borislav Petkov
On Wed, Jan 30, 2019 at 01:46:44PM +0100, Jiri Slaby wrote:
> Introduce new C macros for annotations of functions and data in
> assembly. There is a long-standing mess in macros like ENTRY, END,
> ENDPROC and similar. They are used in different manners and sometimes
> incorrectly.
> 
> So introduce macros with clear use to annotate assembly as follows:

...

> Signed-off-by: Jiri Slaby 
> Cc: Andrew Morton 
> Cc: Boris Ostrovsky 
> Cc: h...@zytor.com
> Cc: Ingo Molnar 
> Cc: jpoim...@redhat.com
> Cc: Juergen Gross 
> Cc: Len Brown 
> Cc: Linus Torvalds 
> Cc: linux-ker...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: mi...@redhat.com
> Cc: Pavel Machek 
> Cc: Peter Zijlstra 
> Cc: "Rafael J. Wysocki" 
> Cc: Thomas Gleixner 
> Cc: xen-devel@lists.xenproject.org
> Cc: x...@kernel.org
> ---
>  Documentation/asm-annotations.rst | 217 ++

I guess you wanna integrate that into the doc hierarchy. Hunk ontop:

---
diff --git a/Documentation/index.rst b/Documentation/index.rst
index c858c2e66e36..754055d9565c 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -91,6 +91,14 @@ needed).
vm/index
bpf/index
 
+Architecture-agnostic documentation
+---
+
+.. toctree::
+   :maxdepth: 2
+
+   asm-annotations
+
 Architecture-specific documentation
 ---
 

>  arch/x86/include/asm/linkage.h|  10 +-
>  include/linux/linkage.h   | 245 +-
>  3 files changed, 461 insertions(+), 11 deletions(-)
>  create mode 100644 Documentation/asm-annotations.rst
> 
> diff --git a/Documentation/asm-annotations.rst 
> b/Documentation/asm-annotations.rst
> new file mode 100644
> index ..265d64a1fc0b
> --- /dev/null
> +++ b/Documentation/asm-annotations.rst
> @@ -0,0 +1,217 @@
> +Assembler Annotations
> +=
> +
> +Copyright (c) 2017 Jiri Slaby
> +
> +This document describes the new macros for annotation of data and code in
> +assembler. In particular, it contains information about ``SYM_FUNC_START``,

s/assembler/assembly/

> +``SYM_FUNC_END``, ``SYM_CODE_START``, and similar.
> +
> +Rationale
> +-
> +Some code like entries, trampolines, or boot code needs to be written in
> +assembly. The same as in C, we group such code into functions and accompany
> +them with data. Standard assemblers do not force users into precisely marking
> +these pieces as code, data, or even specifying their length. Nevertheless,
> +assemblers provide developers with such marks to aid debuggers throughout
> +assembly. On the top of that, developers also want to stamp some functions as
> +*global* to be visible outside of their translation units.
> +
> +Over the time, the Linux kernel took over macros from various projects (like

s/the //

> +``binutils``) to ease these markings. So for historic reasons, we have been
> +using ``ENTRY``, ``END``, ``ENDPROC``, and other annotations in assembly. Due
> +to the lack of their documentation, the macros are used in rather wrong
> +contexts at some locations. Clearly, ``ENTRY`` was intended for starts of
> +global symbols (be it data or code). ``END`` used to be the end of data or 
> end
> +of special functions with *non-standard* calling convention. In contrast,
> +``ENDPROC`` should annotate only ends of *standard* functions.

...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 01/28] linkage: new macros for assembler symbols

2019-08-12 Thread Borislav Petkov
Hi,

this time a more detailed look. :)

> Subject: Re: [PATCH v8 01/28] linkage: new macros for assembler symbols

Patch subject needs a verb, like, for example:

"linkage: Introduce new macros for assembler symbols"

On Thu, Aug 08, 2019 at 12:38:27PM +0200, Jiri Slaby wrote:
> Introduce new C macros for annotations of functions and data in
> assembly. There is a long-standing mess in macros like ENTRY, END,
> ENDPROC and similar. They are used in different manners and sometimes
> incorrectly.
> 
> So introduce macros with clear use to annotate assembly as follows:
> 
> a) Support macros for the ones below
>SYM_T_FUNC -- type used by assembler to mark functions
>SYM_T_OBJECT -- type used by assembler to mark data
>SYM_T_NONE -- type used by assembler to mark entries of unknown type
> 
>They are defined as STT_FUNC, STT_OBJECT, and STT_NOTYPE
>respectively. According to the gas manual, this is the most portable
>way. I am not sure about other assemblers, so we can switch this back
>to %function and %object if this turns into a problem. Architectures
>can also override them by something like ", @function" if they need.
> 
>SYM_A_ALIGN, SYM_A_NONE -- align the symbol?
>SYM_L_GLOBAL, SYM_L_WEAK, SYM_L_LOCAL -- linkage of symbols
> 
> b) Mostly internal annotations, used by the ones below
>SYM_ENTRY -- use only if you have to (for non-paired symbols)
>SYM_START -- use only if you have to (for paired symbols)
>SYM_END -- use only if you have to (for paired symbols)
> 
> c) Annotations for code
>SYM_INNER_LABEL_ALIGN -- only for labels in the middle of code
>SYM_INNER_LABEL -- only for labels in the middle of code
> 
>SYM_FUNC_START_LOCAL_ALIAS -- use where there are two local names for
>   one function
>SYM_FUNC_START_ALIAS -- use where there are two global names for one
>   function
>SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function
> 
>SYM_FUNC_START -- use for global functions
>SYM_FUNC_START_NOALIGN -- use for global functions, w/o alignment
>SYM_FUNC_START_LOCAL -- use for local functions
>SYM_FUNC_START_LOCAL_NOALIGN -- use for local functions, w/o
>   alignment
>SYM_FUNC_START_WEAK -- use for weak functions
>SYM_FUNC_START_WEAK_NOALIGN -- use for weak functions, w/o alignment
>SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START,
>   SYM_FUNC_START_WEAK, ...
> 
>For functions with special (non-C) calling conventions:
>SYM_CODE_START -- use for non-C (special) functions
>SYM_CODE_START_NOALIGN -- use for non-C (special) functions, w/o
>   alignment
>SYM_CODE_START_LOCAL -- use for local non-C (special) functions
>SYM_CODE_START_LOCAL_NOALIGN -- use for local non-C (special)
>   functions, w/o alignment
>SYM_CODE_END -- the end of SYM_CODE_START_LOCAL or SYM_CODE_START
> 
> d) For data
>SYM_DATA_START -- global data symbol
>SYM_DATA_START_LOCAL -- local data symbol
>SYM_DATA_END -- the end of the SYM_DATA_START symbol
>SYM_DATA_END_LABEL -- the labeled end of SYM_DATA_START symbol
>SYM_DATA -- start+end wrapper around simple global data
>SYM_DATA_LOCAL -- start+end wrapper around simple local data
> 
> ==
> 
> The macros allow to pair starts and ends of functions and mark functions
> correctly in the output ELF objects.
> 
> All users of the old macros in x86 are converted to use these in further
> patches.
> 
> [v2]
> * use SYM_ prefix and sane names
> * add SYM_START and SYM_END and parametrize all the macros
> 
> [v3]
> * add SYM_DATA, SYM_DATA_LOCAL, and SYM_DATA_END_LABEL
> 
> [v4]
> * add _NOALIGN versions of some macros
> * add _CODE_ derivates of _FUNC_ macros
> 
> [v5]
> * drop "SIMPLE" from data annotations
> * switch NOALIGN and ALIGN variants of inner labels
> * s/visibility/linkage/; s@SYM_V_@SYM_L_@
> * add Documentation
> 
> [v6]
> * fixed typos found by Randy Dunlap
> * remove doubled INNER_LABEL macros, one pair was unused
> 
> [v8]
> * use lkml.kernel.org for links
> * link the docs from index.rst (by bpetkov)
> * fixed typos on the docs

Patch version history which doesn't belong in the commit message goes...

> Signed-off-by: Jiri Slaby 
> Cc: Andrew Morton 
> Cc: Boris Ostrovsky 
> Cc: h...@zytor.com
> Cc: Ingo Molnar 
> Cc: jpoim...@redhat.com
> Cc: Juergen Gross 
> Cc: Len Brown 
> Cc: Linus Torvalds 
> Cc: linux-ker...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: mi...@redhat.com
> Cc: Pavel Machek 
> Cc: Peter Zijlstra 
> Cc: "Rafael J. Wysocki" 
> Cc: Thomas Gleixner 
> Cc: xen-devel@lists.xenproject.org
> Cc: x...@kernel.org
> ---

... here, under the "---" so that git can ignore it when applying.

>  Documentation/asm-annotations.rst | 217 ++
>  Documentation/index.rst   |   8 +
>  arch/x86/include/asm/linkage.h|  10 +-
>  include/linux/linkage.h   | 245 +-
>  4 files changed, 469 insertions(+),

Re: [Xen-devel] [PATCH v9 24/28] x86_64/asm: Change all ENTRY+ENDPROC to SYM_FUNC_*

2019-10-16 Thread Borislav Petkov
Hi,

On Fri, Oct 11, 2019 at 01:51:04PM +0200, Jiri Slaby wrote:
> These are all functions which are invoked from elsewhere, so annotate
> them as global using the new SYM_FUNC_START. And their ENDPROC's by
> SYM_FUNC_END.
> 
> And make sure ENTRY/ENDPROC is not defined on X86_64, given these were
> the last users.
> 
> Signed-off-by: Jiri Slaby 
> Reviewed-by: Rafael J. Wysocki  [hibernate]
> Reviewed-by: Boris Ostrovsky  [xen bits]
> Cc: "H. Peter Anvin" 
> Cc: Borislav Petkov 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: x...@kernel.org
> Cc: Herbert Xu 
> Cc: "David S. Miller" 
> Cc: "Rafael J. Wysocki" 
> Cc: Len Brown 
> Cc: Pavel Machek 
> Cc: Matt Fleming 
> Cc: Ard Biesheuvel 
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: linux-cry...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> ---
>  arch/x86/boot/compressed/efi_thunk_64.S  |  4 +-
>  arch/x86/boot/compressed/head_64.S   | 16 +++---
>  arch/x86/boot/compressed/mem_encrypt.S   |  8 +--
>  arch/x86/crypto/aegis128-aesni-asm.S | 28 -
>  arch/x86/crypto/aes_ctrby8_avx-x86_64.S  | 12 ++--
>  arch/x86/crypto/aesni-intel_asm.S| 60 ++--
>  arch/x86/crypto/aesni-intel_avx-x86_64.S | 32 +--
>  arch/x86/crypto/blowfish-x86_64-asm_64.S | 16 +++---
>  arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 24 
>  arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 24 
>  arch/x86/crypto/camellia-x86_64-asm_64.S | 16 +++---
>  arch/x86/crypto/cast5-avx-x86_64-asm_64.S| 16 +++---
>  arch/x86/crypto/cast6-avx-x86_64-asm_64.S| 24 
>  arch/x86/crypto/chacha-avx2-x86_64.S | 12 ++--
>  arch/x86/crypto/chacha-avx512vl-x86_64.S | 12 ++--
>  arch/x86/crypto/chacha-ssse3-x86_64.S| 12 ++--
>  arch/x86/crypto/crc32-pclmul_asm.S   |  4 +-
>  arch/x86/crypto/crc32c-pcl-intel-asm_64.S|  4 +-
>  arch/x86/crypto/crct10dif-pcl-asm_64.S   |  4 +-
>  arch/x86/crypto/des3_ede-asm_64.S|  8 +--
>  arch/x86/crypto/ghash-clmulni-intel_asm.S|  8 +--
>  arch/x86/crypto/nh-avx2-x86_64.S |  4 +-
>  arch/x86/crypto/nh-sse2-x86_64.S |  4 +-
>  arch/x86/crypto/poly1305-avx2-x86_64.S   |  4 +-
>  arch/x86/crypto/poly1305-sse2-x86_64.S   |  8 +--
>  arch/x86/crypto/serpent-avx-x86_64-asm_64.S  | 24 
>  arch/x86/crypto/serpent-avx2-asm_64.S| 24 
>  arch/x86/crypto/serpent-sse2-x86_64-asm_64.S |  8 +--
>  arch/x86/crypto/sha1_avx2_x86_64_asm.S   |  4 +-
>  arch/x86/crypto/sha1_ni_asm.S|  4 +-
>  arch/x86/crypto/sha1_ssse3_asm.S |  4 +-
>  arch/x86/crypto/sha256-avx-asm.S |  4 +-
>  arch/x86/crypto/sha256-avx2-asm.S|  4 +-
>  arch/x86/crypto/sha256-ssse3-asm.S   |  4 +-
>  arch/x86/crypto/sha256_ni_asm.S  |  4 +-
>  arch/x86/crypto/sha512-avx-asm.S |  4 +-
>  arch/x86/crypto/sha512-avx2-asm.S|  4 +-
>  arch/x86/crypto/sha512-ssse3-asm.S   |  4 +-
>  arch/x86/crypto/twofish-avx-x86_64-asm_64.S  | 24 
>  arch/x86/crypto/twofish-x86_64-asm_64-3way.S |  8 +--
>  arch/x86/crypto/twofish-x86_64-asm_64.S  |  8 +--

I could use an ACK for the crypto bits...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 0/3] x86/boot: Introduce the kernel_info et consortes

2019-11-06 Thread Borislav Petkov
On Mon, Nov 04, 2019 at 04:13:51PM +0100, Daniel Kiper wrote:
> Hi,
> 
> Due to very limited space in the setup_header this patch series introduces new
> kernel_info struct which will be used to convey information from the kernel to
> the bootloader. This way the boot protocol can be extended regardless of the
> setup_header limitations. Additionally, the patch series introduces some
> convenience features like the setup_indirect struct and the
> kernel_info.setup_type_max field.

That's all fine and dandy but I'm missing an example about what that'll
be used for, in practice.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 0/3] x86/boot: Introduce the kernel_info et consortes

2019-11-06 Thread Borislav Petkov
On Wed, Nov 06, 2019 at 09:56:48AM -0800, h...@zytor.com wrote:
> For one thing, we already have people asking for more than 4 GiB
> worth of initramfs, and especially with initramfs that huge it would
> make a *lot* of sense to allow loading it in chunks without having to
> concatenate them.

Yeah, tglx gave me his use case on IRC where they have the rootfs in the
initrd and how they would hit the limit when the rootfs has a bunch of
debug libs etc tools, which would blow up its size.

> I have been asking for a long time for initramfs creators to split the
> kernel-dependent and kernel independent parts into separate initramfs
> modules.

Right.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/3] x86/boot: Introduce the kernel_info.setup_type_max

2019-11-08 Thread Borislav Petkov
On Mon, Nov 04, 2019 at 04:13:53PM +0100, Daniel Kiper wrote:
> This field contains maximal allowed type for setup_data.
> 
> This patch does not bump setup_header version in arch/x86/boot/header.S
> because it will be followed by additional changes coming into the
> Linux/x86 boot protocol.
> 
> Suggested-by: H. Peter Anvin (Intel) 
> Signed-off-by: Daniel Kiper 
> Reviewed-by: Konrad Rzeszutek Wilk 
> Reviewed-by: Ross Philipson 
> Reviewed-by: H. Peter Anvin (Intel) 
> ---
> v5 - suggestions/fixes:
>- move incorrect references to the setup_indirect to the
>  patch introducing it,
>- do not bump setup_header version in arch/x86/boot/header.S
>  (suggested by H. Peter Anvin).
> ---
>  Documentation/x86/boot.rst | 9 -
>  arch/x86/boot/compressed/kernel_info.S | 5 +
>  arch/x86/include/uapi/asm/bootparam.h  | 3 +++
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
> index c60fafda9427..1dad6eee8a5c 100644
> --- a/Documentation/x86/boot.rst
> +++ b/Documentation/x86/boot.rst
> @@ -73,7 +73,7 @@ Protocol 2.14:  BURNT BY INCORRECT COMMIT 
> ae7e1238e68f2a472a125673ab506d49158c188
>   (x86/boot: Add ACPI RSDP address to setup_header)
>   DO NOT USE!!! ASSUME SAME AS 2.13.
>  
> -Protocol 2.15:   (Kernel 5.5) Added the kernel_info.
> +Protocol 2.15:   (Kernel 5.5) Added the kernel_info and 
> kernel_info.setup_type_max.
>  =
> 
>  
>  .. note::
> @@ -981,6 +981,13 @@ Offset/size: 0x0008/4
>This field contains the size of the kernel_info including 
> kernel_info.header
>and kernel_info.kernel_info_var_len_data.
>  
> + ==
> +Field name:  setup_type_max
> +Offset/size: 0x0008/4

You already have

Field name: size_total
Offset/size:0x0008/4

at that offset.

I guess you mean setup_type_max's offset to be 0x000c and it would be
that member:

.long   0x01234567  /* Some fixed size data for the bootloaders. */

?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/3] x86/boot: Introduce the kernel_info.setup_type_max

2019-11-08 Thread Borislav Petkov
On Fri, Nov 08, 2019 at 11:47:02AM +0100, Daniel Kiper wrote:
> Yeah, you are right. Would you like me to repost whole patch series or
> could you fix it before committing?

Lemme finish looking at patch 3 first.

If you have to resend, please remove "This patch" and "We" in your text.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/3] x86/boot: Introduce the setup_indirect

2019-11-08 Thread Borislav Petkov
On Mon, Nov 04, 2019 at 04:13:54PM +0100, Daniel Kiper wrote:
> diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
> index edaa30b20841..701a98300f86 100644
> --- a/arch/x86/kernel/kdebugfs.c
> +++ b/arch/x86/kernel/kdebugfs.c
> @@ -44,7 +44,11 @@ static ssize_t setup_data_read(struct file *file, char 
> __user *user_buf,
>   if (count > node->len - pos)
>   count = node->len - pos;
>  
> - pa = node->paddr + sizeof(struct setup_data) + pos;
> + pa = node->paddr + pos;
> +
> + if (!(node->type & SETUP_INDIRECT) || node->type == SETUP_INDIRECT)

This check looks strange at a first glance and could use a comment.

> + pa += sizeof(struct setup_data);
> +
>   p = memremap(pa, count, MEMREMAP_WB);
>   if (!p)
>   return -ENOMEM;
> @@ -108,9 +112,17 @@ static int __init create_setup_data_nodes(struct dentry 
> *parent)
>   goto err_dir;
>   }
>  
> - node->paddr = pa_data;
> - node->type = data->type;
> - node->len = data->len;
> + if (data->type == SETUP_INDIRECT &&
> + ((struct setup_indirect *)data->data)->type != 
> SETUP_INDIRECT) {
> + node->paddr = ((struct setup_indirect 
> *)data->data)->addr;
> + node->type = ((struct setup_indirect 
> *)data->data)->type;
> + node->len = ((struct setup_indirect *)data->data)->len;

Align them vertically on the "=" sign even if they stick out over the
80-cols rule.

> + } else {
> + node->paddr = pa_data;
> + node->type = data->type;
> + node->len = data->len;
> + }
> +
>   create_setup_data_node(d, no, node);
>   pa_data = data->next;
>  
> diff --git a/arch/x86/kernel/ksysfs.c b/arch/x86/kernel/ksysfs.c
> index 7969da939213..14ef8121aa53 100644
> --- a/arch/x86/kernel/ksysfs.c
> +++ b/arch/x86/kernel/ksysfs.c
> @@ -100,7 +100,11 @@ static int __init get_setup_data_size(int nr, size_t 
> *size)
>   if (!data)
>   return -ENOMEM;
>   if (nr == i) {
> - *size = data->len;
> + if (data->type == SETUP_INDIRECT &&
> + ((struct setup_indirect *)data->data)->type != 
> SETUP_INDIRECT)
> + *size = ((struct setup_indirect 
> *)data->data)->len;
> + else
> + *size = data->len;

< newline here.

>   memunmap(data);
>   return 0;
>   }
> @@ -130,7 +134,10 @@ static ssize_t type_show(struct kobject *kobj,
>   if (!data)
>   return -ENOMEM;
>  
> - ret = sprintf(buf, "0x%x\n", data->type);
> + if (data->type == SETUP_INDIRECT)
> + ret = sprintf(buf, "0x%x\n", ((struct setup_indirect 
> *)data->data)->type);
> + else
> + ret = sprintf(buf, "0x%x\n", data->type);
>   memunmap(data);
>   return ret;
>  }
> @@ -142,7 +149,7 @@ static ssize_t setup_data_data_read(struct file *fp,
>   loff_t off, size_t count)
>  {
>   int nr, ret = 0;
> - u64 paddr;
> + u64 paddr, len;
>   struct setup_data *data;
>   void *p;
>  
> @@ -157,19 +164,28 @@ static ssize_t setup_data_data_read(struct file *fp,
>   if (!data)
>   return -ENOMEM;
>  
> - if (off > data->len) {
> + if (data->type == SETUP_INDIRECT &&
> + ((struct setup_indirect *)data->data)->type != SETUP_INDIRECT) {
> + paddr = ((struct setup_indirect *)data->data)->addr;
> + len = ((struct setup_indirect *)data->data)->len;
> + } else {
> + paddr += sizeof(*data);
> + len = data->len;
> + }
> +
> + if (off > len) {
>   ret = -EINVAL;
>   goto out;
>   }
>  
> - if (count > data->len - off)
> - count = data->len - off;
> + if (count > len - off)
> + count = len - off;
>  
>   if (!count)
>   goto out;
>  
>   ret = count;
> - p = memremap(paddr + sizeof(*data), data->len, MEMREMAP_WB);
> + p = memremap(paddr, len, MEMREMAP_WB);
>   if (!p) {
>   ret = -ENOMEM;
>   goto out;
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 77ea96b794bd..4603702dbfc1 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -438,6 +438,10 @@ static void __init 
> memblock_x86_reserve_range_setup_data(void)
>   while (pa_data) {
>   data = early_memremap(pa_data, sizeof(*data));
>   memblock_reserve(pa_data, sizeof(*data) + data->len);

< newline here.

> + if (data->type == SETUP_INDIRECT &&
> + ((struct setup_indirect *)data->data)->type != 
> SETUP_INDIRECT)
> +   

Re: [Xen-devel] [PATCH v5 2/3] x86/boot: Introduce the kernel_info.setup_type_max

2019-11-08 Thread Borislav Petkov
On Fri, Nov 08, 2019 at 01:52:48PM +0100, Daniel Kiper wrote:
> OK, got your comments. I will repost the patch series probably on Tuesday.
> I hope that it will land in 5.5 then.

I don't see why not if you base it ontop of tip:x86/boot and test it
properly before sending.

Out of curiosity, is there any particular reason this should be in 5.5?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] AMD EPYC Topology problems

2018-12-03 Thread Borislav Petkov
On Sun, Dec 02, 2018 at 08:23:05PM +, Andrew Cooper wrote:
> Hello,
> 
> I have dual socket server with the following processor:
> 
> [root@xrtmia-09-01 ~]# head /proc/cpuinfo 
> processor : 0
> vendor_id : AuthenticAMD
> cpu family: 23
> model : 1
> model name: AMD EPYC 7281 16-Core Processor
> stepping  : 2
> 
> Which has highlighted a issue in the topology derivation logic. 
> (Actually, it was discovered with Xen, but we share the same topology
> infrastructure and the issue is also present with Linux).
> 
> There are a total of 64 threads in the system, made of two 32-thread
> sockets.  The APIC IDs for this system are sparse - they are 0x0-0x3,
> 0x8-0xb, 0x10-0x13 etc, all the way up to 0x7b.
> 
> This is because the socket is made of 4 nodes with 4 cores each, but
> space has been left in the layout for the maximum possible number of
> APIC IDs.
> 
> In particular, CPUID 0x8008:ecx reports 0x601f.  That is, an
> APIC ID shift of 6 (reporting a maximum of 64 threads per socket), and
> NC as 31 (reporting 32 threads per socket in the current configuration).
> 
> c->x86_max_cores is derived from NC and shifted once to exclude threads,
> giving it a final value of 16 cores per socket.

So far so good.

> Given the sparseness of the APIC IDs, it is unsafe to allocate an array

Do we do this somewhere or is this a hypothetical thing?

> of c->x86_max_cores entries, then index it with c->cpu_core_id, as half
> the cores in the system have a cpu_core_id greater than x86_max_cores. 

You lost me here. ->cpu_core_id comes from CPUID_Fn801E_EBX[7:0].
Are you saying, those core IDs on your box are sparse like the APIC IDs
you mention above?

> There is no logical core ID derived during boot which might be a safe to
> use as an index.
> 
> Furthermore, the documentation indicates that these values are expected
> to be per-package, while they are all actually per-socket (with up to 4
> nodes per socket) in the EPYC case.

From Documentation/x86/topology.txt:
"
  - cpuinfo_x86.x86_max_cores:

The number of cores in a package. This information is retrieved via CPUID."

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-06 Thread Borislav Petkov
On Thu, Dec 06, 2018 at 10:21:12PM +0100, Paolo Bonzini wrote:
> Thanks!  I should be able to post a Tested-by next Monday.  Boris, are
> you going to pick it up for 4.21?

Boris me or Boris O.?

:-)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-07 Thread Borislav Petkov
On Thu, Dec 06, 2018 at 11:14:34PM +0100, Paolo Bonzini wrote:
> > There are some minor changes in non-xen x86 code so it would be good to
> > get x86 maintainers' ack.
> 
> It's not really code, only Kconfig (and I remarked on it just now), but
> it doesn't hurt of course.

Yeah, I don't see anything objectionable.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-07 Thread Borislav Petkov
On Fri, Dec 07, 2018 at 11:07:54AM -0500, Boris Ostrovsky wrote:
> Can this be considered as an ACK from you?

I'll look at v9 next week and add tags, assuming v9 is going to be the
final one, of course.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] AMD EPYC Topology problems

2018-12-09 Thread Borislav Petkov
On Mon, Dec 03, 2018 at 11:23:49AM +, Andrew Cooper wrote:
> Right, but the documentation also states that where it says package, it
> means "Node" in AMD's terminology, and the information in CPUID is per
> socket, not per node.
> 
> My point is that the numbers ending up in cpuinfo_x86 don't match the
> semantics described by the documentation.

Ok, I think I know where the issue stems from:

definition of "package" in the AMD docs != definition of "package" in 
Documentation/x86/topology.txt

AMD's is "Processor: A package containing one or more Nodes." whereas
ours is:

  "Packages contain a number of cores plus shared resources, e.g. DRAM
   controller, shared caches etc."

and physical sockets we don't care about because they're not relevant to
sw.

Yeah, lemme discuss this with tglx to refresh what we were thinking then. :)

Stay tuned.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-11 Thread Borislav Petkov
On Mon, Dec 10, 2018 at 11:05:34AM -0800, Maran Wilson wrote:
> For certain applications it is desirable to rapidly boot a KVM virtual
> machine. In cases where legacy hardware and software support within the
> guest is not needed, Qemu should be able to boot directly into the
> uncompressed Linux kernel binary without the need to run firmware.
> 
> There already exists an ABI to allow this for Xen PVH guests and the ABI
> is supported by Linux and FreeBSD:
> 
>https://xenbits.xen.org/docs/unstable/misc/pvh.html
> 
> This patch series would enable Qemu to use that same entry point for
> booting KVM guests.

How would I do that, practically?

Looking at those here:

>  * Qemu and qboot RFC patches have been posted to show one example of how
>this functionality can be used. Some preliminary numbers are available
>in those cover letters showing the KVM guest boot time improvement.
>   Qemu:
>   http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00957.html
>   qboot:
>   http://lists.nongnu.org/archive/html/qemu-devel/2018-12/msg00953.html

I might still need to do some dancing to get stuff going.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 1/7] xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH

2018-12-11 Thread Borislav Petkov
On Mon, Dec 10, 2018 at 11:07:28AM -0800, Maran Wilson wrote:
> In order to pave the way for hypervisors other than Xen to use the PVH
> entry point for VMs, we need to factor the PVH entry code into Xen specific
> and hypervisor agnostic components. The first step in doing that, is to
> create a new config option for PVH entry that can be enabled
> independently from CONFIG_XEN.
> 
> Signed-off-by: Maran Wilson 
> Reviewed-by: Juergen Gross 
> ---
>  arch/x86/Kconfig  | 6 ++
>  arch/x86/kernel/head_64.S | 2 +-
>  arch/x86/xen/Kconfig  | 3 ++-
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 8689e794a43c..c2a22a74abee 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -796,6 +796,12 @@ config KVM_GUEST
> underlying device model, the host provides the guest with
> timing infrastructure such as time of day, and system time
>  
> +config PVH
> + bool "Support for running PVH guests"
> + ---help---
> +   This option enables the PVH entry point for guest virtual machines
> +   as specified in the x86/HVM direct boot ABI.
> +
>  config KVM_DEBUG_FS
>   bool "Enable debug information for KVM Guests in debugfs"
>   depends on KVM_GUEST && DEBUG_FS
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 747c758f67b7..d1dbe8e4eb82 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -386,7 +386,7 @@ NEXT_PAGE(early_dynamic_pgts)
>  
>   .data
>  
> -#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH)
> +#if defined(CONFIG_XEN_PV) || defined(CONFIG_PVH)
>  NEXT_PGD_PAGE(init_top_pgt)
>   .quad   level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
>   .orginit_top_pgt + L4_PAGE_OFFSET*8, 0
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index 1ef391aa184d..e07abefd3d26 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -74,6 +74,7 @@ config XEN_DEBUG_FS
> Enabling this option may incur a significant performance overhead.
>  
>  config XEN_PVH
> - bool "Support for running as a PVH guest"
> + bool "Support for running as a Xen PVH guest"
>   depends on XEN && XEN_PVHVM && ACPI
> + select PVH
>   def_bool n
> -- 

LGTM:

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-12 Thread Borislav Petkov
On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:
> Is your question about what options you need to provide to Qemu? Or is your
> question about the SW implementation choices?
> 
> Assuming the former...

Yeah, that's what I wanted to know. But looking at it, I'm booting
bzImage here just as quickly and as flexible so I don't see the
advantage of this new method for my use case here of booting kernels
in qemu.

But maybe there's a good use case where firmware is slow and one doesn't
really wanna noodle through it or when one does start a gazillion VMs
per second or whatever...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] x86/microcode: Do not upload microcode if CPUs are offline

2018-04-13 Thread Borislav Petkov
On Fri, Apr 13, 2018 at 10:57:46AM -0700, Raj, Ashok wrote:
> > I'm afraid I don't fully agree - not applying an ucode update to the online
> > CPUs because some are offline isn't any better. Plus (while updating)
> > there's always going to be some discrepancy between ucode versions.
> > As long as we apply updates while bringing a CPU online, I think we're fine.

When a core is soft-offlined as we do it, it doesn't mean that it
doesn't execute instructions...

> This is the safest option. Microcode is considered part of the cpu. We don't
> allow cpus with different capabilities in the same system.. yes they might
> work, but not something we allow.

... so yes, we better be safe than having to debug some insane lockups.

> In general we recommend early update. Earliest the best. 
> 
> - BIOS update (difficult to deploy, but some microcodes have to be done
>   this way.)
> - early update from initrd.. almost same as #1, since we apply at earliest
>   chance that's the closest and most recommended method.
> - late update. Before this procedure of stopping all cpus, we did have a 
>   time when some are updated and some werent uptodate yet. This synchronized
>   update is precicely to get as close as possible to updating all of them.

Yap.

> > Also please consider valid cases you make not work anymore, like someone
> > having brought offline all sibling hyperthreads, with each core still having
> > one thread active. In that case an ucode update will implicitly update all
> > offline threads as well.

That's some strange use case which I can't imagine worked, like ever.
Because the microcode engine is shared between the HT threads so no
matter on which of the hyperthreads you update, the same, one and only
engine gets updated.

> Of cource we can tweak this to be much better, there are other ideas, but
> this is an effort to keep this simple, and also address microcode requirements
> post spectre for some processors.  In the grand scheme of things although its
> interesting to allow such updates we think it may not be best practice.
> 
> We want to get this working right first before getting fancy.

Ack.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 00/11] Add support for Hygon's Dhyana Family 18h processor

2018-06-13 Thread Borislav Petkov
On Sat, Jun 09, 2018 at 09:20:10PM +0800, Pu Wen wrote:
> As a new x86 CPU Vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
> is a Joint Venture between AMD and Haiguang Information Technology Co.,
> Ltd., and aims at providing high performance x86 processor for China
> server market.
> 
> The first generation Hygon's processor(Dhyana) originates from AMD
> technology and shares most of the architecture with AMD's family 17h,
> but with different CPU Vendor ID("HygonGenuine")/PCIE Device Vendor ID
> (0x1D94)/Family series number(Family 18h).
> 
> To enable the support of Linux kernel to Hygon's CPU, we added a new
> vendor type (X86_VENDOR_HYGON, with value of 9) in arch/x86/include/
> asm/processor.h, and shared most of kernel support codes with AMD
> family 17h.
> 
> These patches have been applied and tested successfully in Hygon's
> Dhyana SoC silicon. Also tested on AMD's EPYC (Family 17h) processor
> works fine and makes no harm to existing codes.

Well, I don't like this diffstat:

 37 files changed, 183 insertions(+), 56 deletions(-)

which adds a lot of code checking this new vendor. But then it adds in
the AMD paths and I don't see it being any different from an AMD CPU. So
it does the same a Zen does but then it is Hygon.

So I'd prefer to *not* sprinkle those X86_VENDOR_HYGON checks everywhere
but simply have the vendor be X86_VENDOR_AMD and only the user-visible
reporting to show that it is Hygon. Because to the kernel it is an AMD
CPU - only the superficial attributes are something else. Oh well, and
PCI device IDs but that's like another CPU revision.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: remove redundant 'default n' from Kconfig-s

2018-10-17 Thread Borislav Petkov
Hi Bart,

On Tue, Oct 16, 2018 at 03:42:16PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 'default n' is the default value for any bool or tristate Kconfig
> setting so there is no need to write it explicitly.
> 
> Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
> is not set' for visible symbols") the Kconfig behavior is the same
> regardless of 'default n' being present or not:
> 
> ...
> One side effect of (and the main motivation for) this change is making
> the following two definitions behave exactly the same:
> 
> config FOO
> bool
> 
> config FOO
> bool
> default n
> 
> With this change, neither of these will generate a
> '# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
> That might make it clearer to people that a bare 'default n' is
> redundant.
> ...
> 
> Signed-off-by: Bartlomiej Zolnierkiewicz 
> ---
>  arch/x86/Kconfig   |7 ---
>  arch/x86/Kconfig.debug |1 -
>  arch/x86/xen/Kconfig   |1 -
>  3 files changed, 9 deletions(-)

looks good, no difference of allmodconfigs before and after.

But that close before the merge window and it not being urgent, I'll
queue it after the merge window.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov
On Thu, Nov 15, 2018 at 02:19:23PM +0800, Dave Young wrote:
> It would be good to copy some background info from cover letter to the
> patch description so that we can get better understanding why this is
> needed now.
> 
> BTW, Lianbo is working on a documentation of the vmcoreinfo exported
> fields. Ccing him so that he is aware of this.
> 
> Also cc Boris,  although I do not want the doc changes blocks this
> he might have different opinion :)

Yeah, my initial reaction is that exporting an mm-internal flag to
userspace is a no-no.

What would be better, IMHO, is having a general method of telling the
kdump kernel - maybe ranges of physical addresses - which to skip.

Because the moment there's a set of pages which do not have PG_offline
set but kdump would still like to skip, this breaks.

But that's mm guys' call.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov
On Thu, Nov 15, 2018 at 12:20:40PM +0100, David Hildenbrand wrote:
> Sorry to say, but that is the current practice without which
> makedumpfile would not be able to work at all. (exclude user pages,
> exclude page cache, exclude buddy pages). Let's not reinvent the wheel
> here. This is how dumping works forever.

Sorry, but "we've always done this in the past" doesn't make it better.

> I don't see how there should be "set of pages which do not have
> PG_offline".

It doesn't have to be a set of pages. Think a (mmconfig perhaps) region
which the kdump kernel should completely skip because poking in it in
the kdump kernel, causes all kinds of havoc like machine checks. etc.
We've had and still have one issue like that.

But let me clarify my note: I don't want to be discussing with you the
design of makedumpfile and how it should or should not work - that ship
has already sailed. Apparently there are valid reasons to do it this
way.

I was *simply* stating that it feels wrong to export mm flags like that.

But as I said already, that is mm guys' call and looking at how we're
already exporting a bunch of stuff in the vmcoreinfo - including other
mm flags - I guess one more flag doesn't matter anymore.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov
On Thu, Nov 15, 2018 at 01:11:02PM +0100, Michal Hocko wrote:
> I am not familiar with kexec to judge this particular patch but we
> cannot simply define any range for these pages (same as for hwpoison
> ones) because they can be almost anywhere in the available memory range.
> Then there can be countless of them. There is no other way to rule them
> out but to check the page state.

I guess, especially if it is a monster box with a lot of memory in it.

> I am not really sure what is the concern here exactly. Kdump is so
> closly tight to the specific kernel version that the api exported
> specifically for its purpose cannot be seriously considered an ABI.
> Kdump has to adopt all the time.

Right...

Except, when people start ogling vmcoreinfo for other things and start
exporting all kinds of kernel internals in there, my alarm bells start
ringing.

But ok, kdump *is* special and I guess that's fine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov
On Thu, Nov 15, 2018 at 01:01:17PM +0100, David Hildenbrand wrote:
> Just saying that "I'm not the first to do it, don't hit me with a stick" :)

:-)

> Indeed. And we still have without makedumpfile. I think you are aware of
> this, but I'll explain it just for consistency: PG_hwpoison

No, I appreciate an explanation very much! So thanks for that. :)

> At some point we detect a HW error and mask a page as PG_hwpoison.
> 
> makedumpfile knows how to treat that flag and can exclude it from the
> dump (== not access it). No crash.
> 
> kdump itself has no clue about old "struct pages". Especially:
> a) Where they are located in memory (e.g. SPARSE)
> b) What their format is ("where are the flags")
> c) What the meaning of flags is ("what does bit X mean")
> 
> In order to know such information, we would have to do parsing of quite
> some information inside the kernel in kdump. Basically what makedumpfile
> does just now. Is this feasible? I don't think so.
> 
> So we would need another approach to communicate such information as you
> said. I can't think of any, but if anybody reading this has an idea,
> please speak up. I am interested.

Yeah but that ship has sailed. And even if we had a great idea, we'd
have to support kdump before and after the idea. And that would be a
serious mess.

And if you have a huge box with gazillion piles of memory and an alpha
particle passes through a bunch of them on its way down to the earth's
core, and while doing so, flips a bunch of bits, you need to go and
collect all those regions and update some list which you then need to
shove into the second kernel.

And you probably need to do all that through perhaps a piece of memory
which is used for communication between first and second kernel and that
list better fit in there, or you need to realloc. And that piece of
memory's layout needs to be properly defined so that the second kernel
can parse it correctly.

And so on...

> The *only* way right now we would have to handle such scenarios:
> 
> 1. While dumping memory and we get a machine check, fake reading a zero
> page instead of crashing.
> 2. While dumping memory and we get a fault, fake reading a zero page
> instead of crashing.

Yap.

> Indeed, and the basic design is to export these flags. (let's say
> "unfortunately", being able to handle such stuff in kdump directly would
> be the dream).

Well, AFAICT, the minimum work you need to always do before starting the
dumping is somehow generate that list of pages or ranges to not dump.
And that work needs to be done by the first or the second kernel, I'd
say.

If the first kernel would do it, then you'd have to probably have
callbacks to certain operations which go and add ranges or pages to
exclude, to a list which is then readily accessible to the second
kernel. Which means, when you reserve memory for the second kernel,
you'd have to reserve memory also for such a list.

But then what do you do when that memory gets filled up...?

So I guess exporting those things in vmcoreinfo is probably the only
thing we *can* do in the end.

Oh well, enough rambling... :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH] xen x86: fix early boot crash with gcc-10

2020-04-13 Thread Borislav Petkov
On Mon, Apr 13, 2020 at 02:35:35PM +0200, Frédéric Pierret (fepitre) wrote:
> The change fixes boot failure on VM where kernel (at least v5.4 and v5.6)
> is built with gcc-10 and STACKPROTECTOR_STRONG enabled:
> 
> ```
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
> cpu_bringup_and_idle+0x93/0xa0
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.31-1.qubes.x86_64 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> Call Trace:
>   dump_stack+0x64/0x88
>panic+0x10b/0x2ed
>? cpu_bringup_and_idle+0x93/0xa0
>__stack_chk_fail+0x15/0x20
>cpu_bringup_and_idle+0x93/0xa
> ```
> The change makes successfully booting the VM. The VM is hosted by
> KVM hypervisor and is running Xen into.
> 
> Based on work done by Sergei Trofimovich: https://lkml.org/lkml/2020/3/26/1133

I was waiting for the merge window to finish to queue his patch. That is
done now, you can rebase yours ontop.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2] xen x86: fix early boot crash with gcc-10

2020-04-13 Thread Borislav Petkov
On Mon, Apr 13, 2020 at 05:03:14PM +0200, Frédéric Pierret (fepitre) wrote:
> The change fixes boot failure on VM where kernel (at least v5.4 and v5.6)
> is built with gcc-10 and STACKPROTECTOR_STRONG enabled:
> 
> ```
> Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 
> cpu_bringup_and_idle+0x93/0xa0
> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.31-1.qubes.x86_64 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-1 04/01/2014
> Call Trace:
>   dump_stack+0x64/0x88
>panic+0x10b/0x2ed
>? cpu_bringup_and_idle+0x93/0xa0
>__stack_chk_fail+0x15/0x20
>cpu_bringup_and_idle+0x93/0xa
> ```
> The change makes successfully booting the VM. The VM is hosted by
> KVM hypervisor and is running Xen into.
> 
> Based on work done by Sergei Trofimovich: https://lkml.org/lkml/2020/3/26/1133
> 
> Signed-off-by: Frédéric Pierret (fepitre) 
> ---
>  arch/x86/xen/smp_pv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> index 8fb8a50a28b4..5c8ee4a5bb0c 100644
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -88,7 +88,7 @@ static void cpu_bringup(void)
>   local_irq_enable();
>  }
>  
> -asmlinkage __visible void cpu_bringup_and_idle(void)
> +asmlinkage __visible void __no_stack_protector cpu_bringup_and_idle(void)
>  {
>   cpu_bringup();
>   boot_init_stack_canary();
> -- 

Boris O, Jürgen,

you guys might wanna wait a bit with this one:

https://lkml.kernel.org/r/20200413163540.gd3...@zn.tnic

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 07/15] x86/alternative: support "not feature" and ALTERNATIVE_TERNARY

2021-01-19 Thread Borislav Petkov
On Tue, Jan 19, 2021 at 12:35:42PM +0100, Jürgen Groß wrote:
> In fact this should rather be named "X86_FEATURE_TRUE", as this is its
> semantics.
>
> And I think I can define it to the value 0x instead of using a
> "real" bit for it.

A real bit is cheap - a special value to pay attention to in the future
not so much. Also we do have X86_FEATURE_ALWAYS already which has a
similar purpose...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v4 07/15] x86/paravirt: switch time pvops functions to use static_call()

2021-02-01 Thread Borislav Petkov
On Wed, Jan 20, 2021 at 02:55:47PM +0100, Juergen Gross wrote:
> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Signed-off-by: Juergen Gross 
> ---
> V4:
> - drop paravirt_time.h again
> - don't move Hyper-V code (Michael Kelley)
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/include/asm/mshyperv.h   |  2 +-
>  arch/x86/include/asm/paravirt.h   | 17 ++---
>  arch/x86/include/asm/paravirt_types.h |  6 --
>  arch/x86/kernel/cpu/vmware.c  |  5 +++--
>  arch/x86/kernel/kvm.c |  2 +-
>  arch/x86/kernel/kvmclock.c|  2 +-
>  arch/x86/kernel/paravirt.c| 16 
>  arch/x86/kernel/tsc.c |  2 +-
>  arch/x86/xen/time.c   | 11 ---
>  drivers/clocksource/hyperv_timer.c|  5 +++--
>  drivers/xen/time.c|  2 +-
>  12 files changed, 42 insertions(+), 29 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 21f851179ff0..7ccd4a80788c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -771,6 +771,7 @@ if HYPERVISOR_GUEST
>  
>  config PARAVIRT
>   bool "Enable paravirtualization code"
> + depends on HAVE_STATIC_CALL
>   help
> This changes the kernel so it can modify itself when it is run
> under a hypervisor, potentially improving performance significantly
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 30f76b966857..b4ee331d29a7 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -63,7 +63,7 @@ typedef int (*hyperv_fill_flush_list_func)(
>  static __always_inline void hv_setup_sched_clock(void *sched_clock)
>  {
>  #ifdef CONFIG_PARAVIRT
> - pv_ops.time.sched_clock = sched_clock;
> + paravirt_set_sched_clock(sched_clock);
>  #endif
>  }
>  
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 4abf110e2243..1e45b46fae84 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -15,11 +15,22 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
> -static inline unsigned long long paravirt_sched_clock(void)
> +u64 dummy_steal_clock(int cpu);
> +u64 dummy_sched_clock(void);
> +
> +DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
> +DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);

Did you build this before sending?

I'm test-applying this on rc6 + tip/master so I probably am using a
different tree so it looks like something has changed in the meantime.
-rc6 has a couple of Xen changes which made applying those to need some
wiggling in...

Maybe you should redo them ontop of tip/master. That is, *if* they're
going to eventually go through tip. The diffstat has Xen stuff too so we
might need some synchronization here what goes where how...

./arch/x86/include/asm/paravirt.h:24:1: warning: data definition has no type or 
storage class
   24 | DECLARE_STATIC_CALL(pv_steal_clock, dummy_steal_clock);
  | ^~~
./arch/x86/include/asm/paravirt.h:24:1: error: type defaults to ‘int’ in 
declaration of ‘DECLARE_STATIC_CALL’ [-Werror=implicit-int]
./arch/x86/include/asm/paravirt.h:24:1: warning: parameter names (without 
types) in function declaration
./arch/x86/include/asm/paravirt.h:25:1: warning: data definition has no type or 
storage class
   25 | DECLARE_STATIC_CALL(pv_sched_clock, dummy_sched_clock);
  | ^~~
./arch/x86/include/asm/paravirt.h:25:1: error: type defaults to ‘int’ in 
declaration of ‘DECLARE_STATIC_CALL’ [-Werror=implicit-int]
./arch/x86/include/asm/paravirt.h:25:1: warning: parameter names (without 
types) in function declaration
./arch/x86/include/asm/paravirt.h: In function ‘paravirt_sched_clock’:
./arch/x86/include/asm/paravirt.h:33:9: error: implicit declaration of function 
‘static_call’ [-Werror=implicit-function-declaration]
   33 |  return static_call(pv_sched_clock)();
  | ^~~
./arch/x86/include/asm/paravirt.h:33:21: error: ‘pv_sched_clock’ undeclared 
(first use in this function); did you mean ‘dummy_sched_clock’?
   33 |  return static_call(pv_sched_clock)();
  | ^~
  | dummy_sched_clock
./arch/x86/include/asm/paravirt.h:33:21: note: each undeclared identifier is 
reported only once for each function it appears in
./arch/x86/include/asm/paravirt.h: In function ‘paravirt_steal_clock’:
./arch/x86/include/asm/paravirt.h:47:21: error: ‘pv_steal_clock’ undeclared 
(first use in this function); did you mean ‘dummy_steal_clock’?
   47 |  return static_call(pv_steal_clock)(cpu);
  | ^~
  | dummy_steal_clock
cc1: some 

Re: [PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen

2022-04-25 Thread Borislav Petkov
On Mon, Apr 25, 2022 at 11:38:36PM +0300, Oleksandr wrote:
> diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
> index efd8205..d06bc7a 100644
> --- a/include/linux/cc_platform.h
> +++ b/include/linux/cc_platform.h
> @@ -72,6 +72,19 @@ enum cc_attr {
>  * Examples include TDX guest & SEV.
>  */
>     CC_ATTR_GUEST_UNROLL_STRING_IO,
> +
> +   /**
> +    * @CC_ATTR_GUEST_MEM_ACCESS_RESTRICTED: Restricted memory access to
> +    *   Guest memory is active
> +    *
> +    * The platform/OS is running as a guest/virtual machine and uses
> +    * the restricted access to its memory. This attribute is set if
> either
> +    * Guest memory encryption or restricted memory access using Xen
> grant
> +    * mappings is active.
> +    *
> +    * Examples include Xen guest and SEV.

Wait, whaaat?

The cc_platform* stuff is for *confidential computing* guests to check
different platform aspects.

>From quickly skimming over this, this looks like a misuse to me.

Why can't you query this from the hypervisor just like you do your other
querying about what is supported, etc? Hypercalls, CPUID, whatever...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH V4 04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

2021-11-02 Thread Borislav Petkov
On Tue, Oct 26, 2021 at 10:13:34PM +0800, Lai Jiangshan wrote:
> From: Lai Jiangshan 
> 
> While in the native case, PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is the
> trampoline stack.  But XEN pv doesn't use trampoline stack, so
> PER_CPU_VAR(cpu_tss_rw + TSS_sp0) is also the kernel stack.  Hence source
> and destination stacks are identical in that case, which means reusing
> swapgs_restore_regs_and_return_to_usermode() in XEN pv would cause %rsp
> to move up to the top of the kernel stack and leave the IRET frame below
> %rsp, which is dangerous to be corrupted if #NMI / #MC hit as either of
> these events occurring in the middle of the stack pushing would clobber
> data on the (original) stack.
> 
> And swapgs_restore_regs_and_return_to_usermode() pushing the IRET frame
> on to the original address is useless and error-prone when there is any
> future attempt to modify the code.
> 
> Fixes: 7f2590a110b8 ("x86/entry/64: Use a per-CPU trampoline stack for IDT 
> entries")
> Cc: Jan Beulich 
> Cc: Thomas Gleixner 
> Cc: Juergen Gross 
> Cc: Peter Anvin 
> Cc: xen-devel@lists.xenproject.org>
> Reviewed-by: Boris Ostrovsky 
> Signed-off-by: Lai Jiangshan 
> ---
>  arch/x86/entry/entry_64.S|  9 ++---
>  arch/x86/entry/entry_64_compat.S |  7 ---
>  arch/x86/xen/xen-asm.S   | 27 +++
>  3 files changed, 37 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 9d468c8877e2..0dde5a253dda 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -119,7 +119,7 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
> SYM_L_GLOBAL)
>* In the Xen PV case we must use iret anyway.
>*/
>  
> - ALTERNATIVE "", "jmpswapgs_restore_regs_and_return_to_usermode", \
> + ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \

Instead of sprinkling all those ALTERNATIVE calls everywhere,
why don't you simply jump to the xenpv-one at the
swapgs_restore_regs_and_return_to_usermode label itself and have a
single ALTERNATIVE there?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH V4 04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

2021-11-02 Thread Borislav Petkov
On Tue, Nov 02, 2021 at 05:19:46PM +0800, Lai Jiangshan wrote:
> It will add a 5-byte NOP at the beginning of the native
> swapgs_restore_regs_and_return_to_usermode.

So?

> I avoided adding unneeded code in the native code even if it is NOPs
> and avoided melting xenpv-one into the native one which will reduce
> the code readability.

How does this reduce code readability?!

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index e38a4cf795d9..bf1de54a1fca 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -567,6 +567,10 @@ __irqentry_text_end:
 
 SYM_CODE_START_LOCAL(common_interrupt_return)
 SYM_INNER_LABEL(swapgs_restore_regs_and_return_to_usermode, SYM_L_GLOBAL)
+
+   ALTERNATIVE "", "jmp xenpv_restore_regs_and_return_to_usermode", \
+X86_FEATURE_XENPV
+
 #ifdef CONFIG_DEBUG_ENTRY
/* Assert that pt_regs indicates user mode. */
testb   $3, CS(%rsp)

> I will follow your preference since a 5-byte NOP is so negligible in the slow
> path with an iret instruction.

Yes, we do already gazillion things on those entry and exit paths.

> Or other option that adds macros to wrap the ALTERNATIVE.
> RESTORE_REGS_AND_RETURN_TO_USERMODE and
> COND_RESTORE_REGS_AND_RETURN_TO_USERMODE (test %eax before jmp in native case)

No, the main goal is to keep the asm code as readable and as simple as
possible.

If macros or whatever need to be added, there better be a good reason
for them. Saving a NOP is not one of them.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH V4 04/50] x86/xen: Add xenpv_restore_regs_and_return_to_usermode()

2021-11-02 Thread Borislav Petkov
On Tue, Nov 02, 2021 at 12:22:50PM +0100, H. Peter Anvin wrote:
> It would be interesting to have an "override function with jmp"
> alternatives macro. It doesn't require any changes to the alternatives
> mechanism proper (but possibly to objtool): it would just insert an
> alternatives entry without adding any code including nops to the main
> path. It would of course only be applicable to a jmp, so a syntax like
> OVERRIDE_JMP feature, target rather than open-coding the instruction
> would probably be a good idea.

I think you wanna say ALTERNATIVE_JMP here seeing how we have
ALTERNATIVE_CALL already :)

As to marking it properly, we can finally add that struct
alt_instr.flags thing we have been trying to add for years now.

/me adds it to his evergrowing todo.

If anyone beats /me to it, /me will gladly have a look at it.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



[PATCH v0 02/42] xen/x86: Check notifier registration return value

2021-11-08 Thread Borislav Petkov
From: Borislav Petkov 

Avoid homegrown notifier registration checks.

No functional changes.

Signed-off-by: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
---
 arch/x86/xen/enlighten.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 95d970359e17..2264dd6e157f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -354,7 +354,9 @@ static struct notifier_block xen_panic_block = {
 
 int xen_panic_handler_init(void)
 {
-   atomic_notifier_chain_register(&panic_notifier_list, &xen_panic_block);
+   if (atomic_notifier_chain_register(&panic_notifier_list, 
&xen_panic_block))
+   pr_warn("Xen panic notifier already registered\n");
+
return 0;
 }
 
-- 
2.29.2




[PATCH v0 18/42] drivers/xen: Check notifier registration return value

2021-11-08 Thread Borislav Petkov
From: Borislav Petkov 

Avoid homegrown notifier registration checks.

No functional changes.

Signed-off-by: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
---
 drivers/xen/manage.c  | 3 ++-
 drivers/xen/xenbus/xenbus_probe.c | 8 +---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 374d36de7f5a..f3c5cef0995f 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -48,7 +48,8 @@ static RAW_NOTIFIER_HEAD(xen_resume_notifier);
 
 void xen_resume_notifier_register(struct notifier_block *nb)
 {
-   raw_notifier_chain_register(&xen_resume_notifier, nb);
+   if (raw_notifier_chain_register(&xen_resume_notifier, nb))
+   pr_warn("Xen resume notifier already registered\n");
 }
 EXPORT_SYMBOL_GPL(xen_resume_notifier_register);
 
diff --git a/drivers/xen/xenbus/xenbus_probe.c 
b/drivers/xen/xenbus/xenbus_probe.c
index bd003ca8acbe..4e83ce95acd1 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -731,10 +731,12 @@ int register_xenstore_notifier(struct notifier_block *nb)
 {
int ret = 0;
 
-   if (xenstored_ready > 0)
+   if (xenstored_ready > 0) {
ret = nb->notifier_call(nb, 0, NULL);
-   else
-   blocking_notifier_chain_register(&xenstore_chain, nb);
+   } else {
+   if (blocking_notifier_chain_register(&xenstore_chain, nb))
+   pr_warn("Xenstore notifier already registered\n");
+   }
 
return ret;
 }
-- 
2.29.2




[PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
From: Borislav Petkov 

The notifier registration routine doesn't return a proper error value
when a callback has already been registered, leading people to track
whether that registration has happened at the call site:

  https://lore.kernel.org/amd-gfx/20210512013058.6827-1-mukul.jo...@amd.com/

Which is unnecessary.

Return -EEXIST to signal that case so that callers can act accordingly.
Enforce callers to check the return value, leading to loud screaming
during build:

  arch/x86/kernel/cpu/mce/core.c: In function ‘mce_register_decode_chain’:
  arch/x86/kernel/cpu/mce/core.c:167:2: error: ignoring return value of \
   ‘blocking_notifier_chain_register’, declared with attribute 
warn_unused_result [-Werror=unused-result]
blocking_notifier_chain_register(&x86_mce_decoder_chain, nb);
  ^~~~

Drop the WARN too, while at it.

Suggested-by: Thomas Gleixner 
Signed-off-by: Borislav Petkov 
Cc: Arnd Bergmann 
Cc: Ayush Sawal 
Cc: Greg Kroah-Hartman 
Cc: Rohit Maheshwari 
Cc: Steven Rostedt 
Cc: Vinay Kumar Yadav 
Cc: alsa-de...@alsa-project.org
Cc: bcm-kernel-feedback-l...@broadcom.com
Cc: intel-...@lists.freedesktop.org
Cc: intel-gvt-...@lists.freedesktop.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-l...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-remotep...@vger.kernel.org
Cc: linux-renesas-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-stag...@lists.linux.dev
Cc: linux-te...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: net...@vger.kernel.org
Cc: openipmi-develo...@lists.sourceforge.net
Cc: r...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: xen-devel@lists.xenproject.org
---
 include/linux/notifier.h |  8 
 kernel/notifier.c| 36 +++-
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 87069b8459af..45cc5a8d0fd8 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -141,13 +141,13 @@ extern void srcu_init_notifier_head(struct 
srcu_notifier_head *nh);
 
 #ifdef __KERNEL__
 
-extern int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+extern int __must_check atomic_notifier_chain_register(struct 
atomic_notifier_head *nh,
struct notifier_block *nb);
-extern int blocking_notifier_chain_register(struct blocking_notifier_head *nh,
+extern int __must_check blocking_notifier_chain_register(struct 
blocking_notifier_head *nh,
struct notifier_block *nb);
-extern int raw_notifier_chain_register(struct raw_notifier_head *nh,
+extern int __must_check raw_notifier_chain_register(struct raw_notifier_head 
*nh,
struct notifier_block *nb);
-extern int srcu_notifier_chain_register(struct srcu_notifier_head *nh,
+extern int __must_check srcu_notifier_chain_register(struct srcu_notifier_head 
*nh,
struct notifier_block *nb);
 
 extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b8251dc0bc0f..451ef3f73ad2 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -20,13 +20,11 @@ BLOCKING_NOTIFIER_HEAD(reboot_notifier_list);
  */
 
 static int notifier_chain_register(struct notifier_block **nl,
-   struct notifier_block *n)
+  struct notifier_block *n)
 {
while ((*nl) != NULL) {
-   if (unlikely((*nl) == n)) {
-   WARN(1, "double register detected");
-   return 0;
-   }
+   if (unlikely((*nl) == n))
+   return -EEXIST;
if (n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
@@ -134,10 +132,11 @@ static int notifier_call_chain_robust(struct 
notifier_block **nl,
  *
  * Adds a notifier to an atomic notifier chain.
  *
- * Currently always returns zero.
+ * Returns 0 on success, %-EEXIST on error.
  */
-int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
-   struct notifier_block *n)
+int __must_check
+atomic_notifier_chain_register(struct atomic_notifier_head *nh,
+  struct notifier_block *n)
 {
unsigned long flags;
int ret;
@@ -216,10 +215,11 @@ NOKPROBE_SYMBOL(atomic_notifier_call_chain);
  * Adds a no

[PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
From: Borislav Petkov 

Hi all,

this is a huge patchset for something which is really trivial - it
changes the notifier registration routines to return an error value
if a notifier callback is already present on the respective list of
callbacks. For more details scroll to the last patch.

Everything before it is converting the callers to check the return value
of the registration routines and issue a warning, instead of the WARN()
notifier_chain_register() does now.

Before the last patch has been applied, though, that checking is a
NOP which would make the application of those patches trivial - every
maintainer can pick a patch at her/his discretion - only the last one
enables the build warnings and that one will be queued only after the
preceding patches have all been merged so that there are no build
warnings.

Due to the sheer volume of the patches, I have addressed the respective
patch and the last one, which enables the warning, with addressees for
each maintained area so as not to spam people unnecessarily.

If people prefer I carry some through tip, instead, I'll gladly do so -
your call.

And, if you think the warning messages need to be more precise, feel
free to adjust them before committing.

Thanks!

Cc: Arnd Bergmann 
Cc: Ayush Sawal 
Cc: Greg Kroah-Hartman 
Cc: Rohit Maheshwari 
Cc: Steven Rostedt 
Cc: Vinay Kumar Yadav  
Cc: alsa-de...@alsa-project.org
Cc: bcm-kernel-feedback-l...@broadcom.com
Cc: intel-...@lists.freedesktop.org
Cc: intel-gvt-...@lists.freedesktop.org
Cc: linux-al...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-fb...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-l...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-remotep...@vger.kernel.org
Cc: linux-renesas-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-stag...@lists.linux.dev
Cc: linux-te...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: net...@vger.kernel.org
Cc: openipmi-develo...@lists.sourceforge.net
Cc: r...@vger.kernel.org
Cc: sparcli...@vger.kernel.org
Cc: x...@kernel.org
Cc: xen-devel@lists.xenproject.org

Borislav Petkov (42):
  x86: Check notifier registration return value
  xen/x86: Check notifier registration return value
  impi: Check notifier registration return value
  clk: renesas: Check notifier registration return value
  dca: Check notifier registration return value
  firmware: Check notifier registration return value
  drm/i915: Check notifier registration return value
  Drivers: hv: vmbus: Check notifier registration return value
  iio: proximity: cros_ec: Check notifier registration return value
  leds: trigger: Check notifier registration return value
  misc: Check notifier registration return value
  ethernet: chelsio: Check notifier registration return value
  power: reset: Check notifier registration return value
  remoteproc: Check notifier registration return value
  scsi: target: Check notifier registration return value
  USB: Check notifier registration return value
  drivers: video: Check notifier registration return value
  drivers/xen: Check notifier registration return value
  kernel/hung_task: Check notifier registration return value
  rcu: Check notifier registration return value
  tracing: Check notifier registration return value
  net: fib_notifier: Check notifier registration return value
  ASoC: soc-jack: Check notifier registration return value
  staging: olpc_dcon: Check notifier registration return value
  arch/um: Check notifier registration return value
  alpha: Check notifier registration return value
  bus: brcmstb_gisb: Check notifier registration return value
  soc: bcm: brcmstb: pm: pm-arm: Check notifier registration return
value
  arm64: Check notifier registration return value
  soc/tegra: Check notifier registration return value
  parisc: Check notifier registration return value
  macintosh/adb: Check notifier registration return value
  mips: Check notifier registration return value
  powerpc: Check notifier registration return value
  sh: Check notifier registration return value
  s390: Check notifier registration return value
  sparc: Check notifier registration return value
  xtensa: Check notifier registration return value
  crypto: ccree - check notifier registration return value
  EDAC/altera: Check notifier registration return value
  power: supply: ab8500: Check notifier registration return value
  notifier: Return an error when callback is already registered

 arch/alpha/kernel/setup.c |  5 +--
 arch/arm64/kernel/setup.c |  6 ++--
 arch/mips/kernel/reloc

Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 03:07:03PM +0100, Geert Uytterhoeven wrote:
> I think the addition of __must_check is overkill, leading to the
> addition of useless error checks and message printing.

See the WARN in notifier_chain_register() - it will already do "message
printing".

> Many callers call this where it cannot fail, and where nothing can
> be done in the very unlikely event that the call would ever start to
> fail.

This is an attempt to remove this WARN() hack in
notifier_chain_register() and have the function return a proper error
value instead of this "Currently always returns zero." which is bad
design.

Some of the registration functions around the tree check that retval and
some don't. So if "it cannot fail" those registration either should not
return a value or callers should check that return value - what we have
now doesn't make a whole lot of sense.

Oh, and then fixing this should avoid stuff like:

+   if (notifier_registered == false) {
+   mce_register_decode_chain(&amdgpu_bad_page_nb);
+   notifier_registered = true;
+   }

from propagating in the code.

The other idea I have is to add another indirection in
notifier_chain_register() which will check the *proper* return value of
a lower level __notifier_chain_register() and then issue that warning.
That would definitely avoid touch so many call sites.

Bottom line is: what we have now needs cleaning up.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 09:17:03AM -0500, Alan Stern wrote:
> What reason is there for moving the check into the callers?  It seems 
> like pointless churn.  Why not add the error return code, change the 
> WARN to pr_warn, and leave the callers as they are?  Wouldn't that end 
> up having exactly the same effect?
> 
> For that matter, what sort of remedial action can a caller take if the 
> return code is -EEXIST?  Is there any point in forcing callers to check 
> the return code if they can't do anything about it?

See my reply to Geert from just now:

https://lore.kernel.org/r/yykyueqcsowqm...@zn.tnic

I guess I can add another indirection to notifier_chain_register() and
avoid touching all the call sites.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 03:24:39PM +0100, Borislav Petkov wrote:
> I guess I can add another indirection to notifier_chain_register() and
> avoid touching all the call sites.

IOW, something like this below.

This way I won't have to touch all the callsites and the registration
routines would still return a proper value instead of returning 0
unconditionally.

---
diff --git a/kernel/notifier.c b/kernel/notifier.c
index b8251dc0bc0f..04f08b2ef17f 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -19,14 +19,12 @@ BLOCKING_NOTIFIER_HEAD(reboot_notifier_list);
  * are layered on top of these, with appropriate locking added.
  */
 
-static int notifier_chain_register(struct notifier_block **nl,
-   struct notifier_block *n)
+static int __notifier_chain_register(struct notifier_block **nl,
+struct notifier_block *n)
 {
while ((*nl) != NULL) {
-   if (unlikely((*nl) == n)) {
-   WARN(1, "double register detected");
-   return 0;
-   }
+   if (unlikely((*nl) == n))
+   return -EEXIST;
if (n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
@@ -36,6 +34,18 @@ static int notifier_chain_register(struct notifier_block 
**nl,
return 0;
 }
 
+static int notifier_chain_register(struct notifier_block **nl,
+  struct notifier_block *n)
+{
+   int ret = __notifier_chain_register(nl, n);
+
+   if (ret == -EEXIST)
+   WARN(1, "double register of notifier callback %ps detected",
+   n->notifier_call);
+
+   return ret;
+}
+
 static int notifier_chain_unregister(struct notifier_block **nl,
struct notifier_block *n)
 {


-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 04:25:47PM +0100, Geert Uytterhoeven wrote:
> I'm not against returning proper errors codes.  I'm against forcing
> callers to check things that cannot fail and to add individual error
> printing to each and every caller.

If you're against checking things at the callers, then the registration
function should be void. IOW, those APIs are not optimally designed atm.

> Note that in other areas, we are moving in the other direction,
> to a centralized printing of error messages, cfr. e.g. commit
> 7723f4c5ecdb8d83 ("driver core: platform: Add an error message to
> platform_get_irq*()").

Yes, thus my other idea to add a lower level __notifier_chain_register()
to do the checking.

I'll see if I can convert those notifier registration functions to
return void, in the process. But let's see what the others think first.

Thanks for taking the time.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 05:12:16PM +0100, Geert Uytterhoeven wrote:
> Returning void is the other extreme ;-)
> 
> There are 3 levels (ignoring BUG_ON()/panic () inside the callee):
>   1. Return void: no one can check success or failure,
>   2. Return an error code: up to the caller to decide,
>   3. Return a __must_check error code: every caller must check.
> 
> I'm in favor of 2, as there are several places where it cannot fail.

Makes sense to me. I'll do that in the next iteration.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v0 00/42] notifiers: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 11:23:13AM -0500, Steven Rostedt wrote:
> Question, how often does this warning trigger? Is it common to see in
> development?

Yeah, haven't seen it myself yet.

But we hashed it out over IRC. :-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v0 42/42] notifier: Return an error when callback is already registered

2021-11-08 Thread Borislav Petkov
On Mon, Nov 08, 2021 at 03:59:26PM -0500, Alan Stern wrote:
> Is there really any reason for returning an error code?  For example, is 
> it anticipated that at some point in the future these registration calls 
> might fail?
> 
> Currently, the only reason for failing...

Right, I believe with not making it return void we're leaving the door
open for some, *hypothetical* future return values if we decide we need
to return them too, at some point.

Yes, I can't think of another fact to state besides that the callback
was already registered or return success but who knows what we wanna do
in the future...

And so if we change them all to void now, I think it'll be a lot more
churn to switch back to returning a non-void value and having the
callers who choose to handle that value, do so again.

So, long story short, keeping the retval - albeit not very useful right
now - is probably easier.

I hope I'm making some sense here.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-11-16 Thread Borislav Petkov
On Tue, Nov 16, 2021 at 10:39:21AM -0500, Tianyu Lan wrote:
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 35487305d8af..65bc385ae07a 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "mm_internal.h"
>  
> @@ -203,7 +204,8 @@ void __init sev_setup_arch(void)
>   phys_addr_t total_mem = memblock_phys_mem_size();
>   unsigned long size;
>  
> - if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
> + if (!cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)
> + && !hv_is_isolation_supported())

Are we gonna start sprinkling this hv_is_isolation_supported() check
everywhere now?

Are those isolation VMs SEV-like guests? Is CC_ATTR_GUEST_MEM_ENCRYPT
set on them?

What you should do, instead, is add an isol. VM specific
hv_cc_platform_has() just like amd_cc_platform_has() and handle
the cc_attrs there for your platform, like return false for
CC_ATTR_GUEST_MEM_ENCRYPT and then you won't need to add that hv_* thing
everywhere.

And then fix it up in __set_memory_enc_dec() too.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: PING [PATCH 3/3] x86: decouple pat and mtrr handling

2022-08-13 Thread Borislav Petkov
On Sat, Aug 13, 2022 at 12:56:44PM -0400, Chuck Zmudzinski wrote:
> Why has Juergen not at least responded in some way to the
> comments that Boris has made here? Why has Boris not
> pinged Juergen by now,

How about: it is summer here and people usually take their vacations
during that time and everything takes a bit longer than usual?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: PING [PATCH 3/3] x86: decouple pat and mtrr handling

2022-08-13 Thread Borislav Petkov
On Sat, Aug 13, 2022 at 05:40:34PM -0400, Chuck Zmudzinski wrote:
> I did a search for Juergen Gross on lkml and he is active submitting and
> reviewing patches during the past few weeks. However, he is ignoring
> comments on his patch to fix this regression.

Please stop this non-sense and be patient. We will fix this soon. For
the time being you can use Jan's patch locally.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

2022-08-21 Thread Borislav Petkov
On Sat, Aug 20, 2022 at 11:25:24AM +0200, Juergen Gross wrote:
> When booting or resuming the system MTRR state is saved on the boot
> processor and then this state is loaded into MTRRs of all other cpus.

s/cpu/CPU/g

Pls check all commit messages.

> During update of the MTRRs the MTRR mechanism needs to be disabled by
> writing the related MSR. The old contents of this MSR are saved in a
> set of static variables and later those static variables are used to
> restore the MSR.
> 
> In case the MSR contents need to be modified on a cpu due to the MSR
> not having been initialized properly by the BIOS, the related update
> function is modifying the static variables accordingly.
> 
> Unfortunately the MTRR state update is usually running on all cpus
> at the same time, so using just one set of static variables for all
> cpus is racy in case the MSR contents differ across cpus.
> 
> Fix that by using percpu variables for saving the MSR contents.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Juergen Gross 
> ---
> I thought adding a "Fixes:" tag for the kernel's initial git commit
> would maybe be entertaining, but without being really helpful.
> The percpu variables were preferred over on-stack ones in order to
> avoid more code churn in followup patches decoupling PAT from MTRR
> support.

So if that thing has been broken for so long and no one noticed, we
could just as well not backport to stable at all...

> V2:
> - new patch
> ---
>  arch/x86/kernel/cpu/mtrr/generic.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
> b/arch/x86/kernel/cpu/mtrr/generic.c
> index 558108296f3c..3d185fcf08ca 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -679,7 +679,8 @@ static bool set_mtrr_var_ranges(unsigned int index, 
> struct mtrr_var_range *vr)
>   return changed;
>  }
>  
> -static u32 deftype_lo, deftype_hi;
> +static DEFINE_PER_CPU(u32, deftype_lo);
> +static DEFINE_PER_CPU(u32, deftype_hi);

My APM says that the high 32 bits of the MTRRdefType MSR are reserved
and MBZ. So you can drop the _hi thing and use 0 everywhere. Or rather a
dummy = 0 var because of that damn rdmsr() macro.

Or simply use a

u64 deftype;

use the rdmsrl/wrmsrl() variants and split it when passing to
umtrr_wrmsr() because that thing wants 2 32s.

>  /**
>   * set_mtrr_state - Set the MTRR state for this CPU.
> @@ -691,6 +692,7 @@ static unsigned long set_mtrr_state(void)
>  {
>   unsigned long change_mask = 0;
>   unsigned int i;
> + u32 *lo = this_cpu_ptr(&deftype_lo);

The tip-tree preferred ordering of variable declarations at the
beginning of a function is reverse fir tree order::

struct long_struct_name *descriptive_name;
unsigned long foo, bar;
unsigned int tmp;
int ret;

The above is faster to parse than the reverse ordering::

int ret;
unsigned int tmp;
unsigned long foo, bar;
struct long_struct_name *descriptive_name;

And even more so than random ordering::

unsigned long foo, bar;
int ret;
struct long_struct_name *descriptive_name;
unsigned int tmp;

Please check all your patches.

>   for (i = 0; i < num_var_ranges; i++) {
>   if (set_mtrr_var_ranges(i, &mtrr_state.var_ranges[i]))
> @@ -704,10 +706,10 @@ static unsigned long set_mtrr_state(void)
>* Set_mtrr_restore restores the old value of MTRRdefType,
>* so to set it we fiddle with the saved value:
>*/
> - if ((deftype_lo & 0xff) != mtrr_state.def_type
> - || ((deftype_lo & 0xc00) >> 10) != mtrr_state.enabled) {
> + if ((*lo & 0xff) != mtrr_state.def_type
> + || ((*lo & 0xc00) >> 10) != mtrr_state.enabled) {
>  
> - deftype_lo = (deftype_lo & ~0xcff) | mtrr_state.def_type |
> + *lo = (*lo & ~0xcff) | mtrr_state.def_type |
>(mtrr_state.enabled << 10);
>   change_mask |= MTRR_CHANGE_MASK_DEFTYPE;
>   }
> @@ -729,6 +731,8 @@ static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
>  static void prepare_set(void) __acquires(set_atomicity_lock)
>  {
>   unsigned long cr0;
> + u32 *lo = this_cpu_ptr(&deftype_lo);
> + u32 *hi = this_cpu_ptr(&deftype_hi);

You don't need the pointers here - this_cpu_read() should be enough.

>   /*
>* Note that this is not ideal
> @@ -763,10 +767,10 @@ static void prepare_set(void) 
> __acquires(set_atomicity_lock)
>   flush_tlb_local();
>  
>   /* Save MTRR state */
> - rdmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
> + rdmsr(MSR_MTRRdefType, *lo, *hi);
>  
>   /* Disable MTRRs, and set the default type to uncached */
> - mtrr_wrmsr(MSR_MTRRdefType, deftype_lo & ~0xcff, deftype_hi);
> + mtrr_wrmsr(MSR_MTRRdefType, *lo & ~0xcff, *hi);
>  
>   /* Again, only flush caches if we have to. */
>   if (!static_cpu_has(X86_FEATURE_

Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

2022-08-21 Thread Borislav Petkov
On Sun, Aug 21, 2022 at 02:25:59PM +0200, Borislav Petkov wrote:
> > Fix that by using percpu variables for saving the MSR contents.
> > 
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Juergen Gross 
> > ---
> > I thought adding a "Fixes:" tag for the kernel's initial git commit
> > would maybe be entertaining, but without being really helpful.
> > The percpu variables were preferred over on-stack ones in order to
> > avoid more code churn in followup patches decoupling PAT from MTRR
> > support.
> 
> So if that thing has been broken for so long and no one noticed, we
> could just as well not backport to stable at all...

Yeah, you can't do that.

The whole day today I kept thinking that something's wrong with this
here. As in, why hasn't it been reported until now.

You say above:

"... for all cpus is racy in case the MSR contents differ across cpus."

But they don't differ:

"7.7.5 MTRRs in Multi-Processing Environments

In multi-processing environments, the MTRRs located in all processors
must characterize memory in the same way. Generally, this means that
identical values are written to the MTRRs used by the processors. This
also means that values CR0.CD and the PAT must be consistent across
processors. Failure to do so may result in coherency violations or loss
of atomicity. Processor implementations do not check the MTRR settings
in other processors to ensure consistency. It is the responsibility of
system software to initialize and maintain MTRR consistency across all
processors."

And you can't have different fixed MTRR type on each CPU - that would
lead to all kinds of nasty bugs.

And here's from a big fat box:

$ rdmsr -a 0x2ff | uniq -c
256 c00

All 256 CPUs have the def type set to the same thing.

Now, if all CPUs go write that same deftype_lo variable in the
rendezvous handler, the only issue that could happen is if a read
sees a partial write. BUT, AFAIK, x86 doesn't tear 32-bit writes so I
*think* all CPUs see the same value being corrected by using mtrr_state
previously saved on the BSP.

As I said, we should've seen this exploding left and right otherwise...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 01/10] x86/mtrr: fix MTRR fixup on APs

2022-08-22 Thread Borislav Petkov
On Mon, Aug 22, 2022 at 07:17:40AM +0200, Juergen Gross wrote:
> And then there is mtrr_state_warn() in arch/x86/kernel/cpu/mtrr/generic.c
> which has a comment saying:
> 
> /* Some BIOS's are messed up and don't set all MTRRs the same! */

That thing also says:

pr_info("mtrr: probably your BIOS does not setup all CPUs.\n");
pr_info("mtrr: corrected configuration.\n");

because it'll go and force on all CPUs the MTRR state it read from the
BSP in mtrr_bp_init->get_mtrr_state.

> Yes, the chances are slim to hit such a box,

Well, my workstation says:

$ dmesg | grep -i mtrr
[0.391514] mtrr: your CPUs had inconsistent variable MTRR settings
[0.395199] mtrr: probably your BIOS does not setup all CPUs.
[0.399199] mtrr: corrected configuration.

but that's the variable MTRRs.

> but your reasoning suggests I should remove the related code?

My reasoning says you should not do anything at all here - works as
advertized. :-)

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 02/10] x86/mtrr: remove unused cyrix_set_all() function

2022-08-25 Thread Borislav Petkov
On Thu, Aug 25, 2022 at 12:41:05PM +0200, Juergen Gross wrote:
> Maybe the alternative reasoning is much faster to understand: if the
> Cyrix set_all() could be called, the AMD and Centaur ones would be callable,
> too.

Right.

> Those being called would result in a NULL deref, so why should we keep
> the Cyrix one?

I know you're eager to remove dead code - I'd love that too. But before
we do that, we need to find out whether some Cyrix hw out there would
not need this.

I know, I know, they should've complained by now ... maybe they have but
we haven't heard about it.

What it most likely looks like is that those machines - a commit from
before git

commit 8fbdcb188e31ac901e216b466b97e90e8b057daa
Author: Dave Jones 
Date:   Wed Aug 14 21:14:22 2002 -0700

[PATCH] Modular x86 MTRR driver.

talks about

+/*
+ * On Cyrix 6x86(MX) and M II the ARR3 is special: it has connection
+ * with the SMM (System Management Mode) mode. So we need the following:
+ * Check whether SMI_LOCK (CCR3 bit 0) is set
+ *   if it is set, write a warning message: ARR3 cannot be changed!
+ * (it cannot be changed until the next processor reset)

which sounds like old rust. And which no one uses or such machines are
long dead already.

Wikipedia says:

https://en.wikipedia.org/wiki/Cyrix_6x86

"The Cyrix 6x86 is a line of sixth-generation, 32-bit x86
microprocessors designed and released by Cyrix in 1995..."

So I'm thinking removing it would be ok...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 03/10] x86/mtrr: replace use_intel() with a local flag

2022-09-19 Thread Borislav Petkov
On Mon, Sep 12, 2022 at 11:10:29AM +0200, Juergen Gross wrote:
> In the end this variable doesn't specify which caching types are available,
> but the ways to select/control the caching types.
> 
> So what about "memory_caching_select" or "memory_caching_control" instead?

_control sounds like the right thing. As in, which of the memory caching
things the kernel controls. Along with a comment above it what this
exactly is. Yap.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 05/10] x86/mtrr: split generic_set_all()

2022-09-19 Thread Borislav Petkov
On Thu, Sep 08, 2022 at 10:49:09AM +0200, Juergen Gross wrote:
> Split generic_set_all() into multiple parts, while moving the main
> function body into cacheinfo.c.
> 
> This prepares the support of PAT without needing MTRR support by

"Prepare PAT support without ... "

> moving the main function body of generic_set_all() into cacheinfo.c
> while renaming it to cache_cpu_init(). The MTRR specific parts are
> moved into a dedicated small function called by cache_cpu_init().
> The PAT and MTRR specific functions are called conditionally based
> on the cache_generic bit settings.
> 
> The setting of smp_changes_mask is merged into the (new) function

... and so on. So the commit message should not say what you're doing -
that should be visible from the diff itself. It should talk more about
the *why* you're doing it.

...

> diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> index 47e2c72fa8a4..36378604ec61 100644
> --- a/arch/x86/kernel/cpu/cacheinfo.c
> +++ b/arch/x86/kernel/cpu/cacheinfo.c
> @@ -1120,3 +1120,22 @@ void cache_enable(void) __releases(cache_disable_lock)
>  
>   raw_spin_unlock(&cache_disable_lock);
>  }
> +
> +void cache_cpu_init(void)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + cache_disable();
> +
> + /* Set MTRR state. */

Yeah, and when you name the functions properly as you've done, you don't
really need those comments anymore either.

> + if (cache_generic & CACHE_GENERIC_MTRR)
> + mtrr_generic_set_state();
> +
> + /* Set PAT. */

And this one.

> + if (cache_generic & CACHE_GENERIC_PAT)
> + pat_init();
> +
> + cache_enable();
> + local_irq_restore(flags);
> +}
> diff --git a/arch/x86/kernel/cpu/mtrr/generic.c 
> b/arch/x86/kernel/cpu/mtrr/generic.c
> index 5ed397f03a87..fc7b2d952737 100644
> --- a/arch/x86/kernel/cpu/mtrr/generic.c
> +++ b/arch/x86/kernel/cpu/mtrr/generic.c
> @@ -731,30 +731,19 @@ void mtrr_enable(void)
>   mtrr_wrmsr(MSR_MTRRdefType, deftype_lo, deftype_hi);
>  }
>  
> -static void generic_set_all(void)
> +void mtrr_generic_set_state(void)
>  {
>   unsigned long mask, count;
> - unsigned long flags;
> -
> - local_irq_save(flags);
> - cache_disable();
>  
>   /* Actually set the state */
>   mask = set_mtrr_state();
>  
> - /* also set PAT */
> - pat_init();
> -
> - cache_enable();
> - local_irq_restore(flags);
> -
>   /* Use the atomic bitops to update the global mask */
>   for (count = 0; count < sizeof(mask) * 8; ++count) {
>   if (mask & 0x01)
>   set_bit(count, &smp_changes_mask);
>   mask >>= 1;
>   }
> -
>  }
>  
>  /**
> @@ -854,7 +843,7 @@ int positive_have_wrcomb(void)
>   * Generic structure...
>   */
>  const struct mtrr_ops generic_mtrr_ops = {
> - .set_all= generic_set_all,
> + .set_all= cache_cpu_init,

I was gonna say that this looks weird - a set_all function pointer
assigned to a init function but that changes in the next patch.

But yeah, I like where this is going.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 2/7] x86/boot: Delay sev_verify_cbit() a bit

2023-01-19 Thread Borislav Petkov
On Mon, Jan 16, 2023 at 03:25:35PM +0100, Peter Zijlstra wrote:
> Per the comment it is important to call sev_verify_cbit() before the
> first RET instruction, this means we can delay calling this until more

Make that "... this means that this can be delayed until... "

And I believe this is not about the first RET insn but about the *next* RET
which will pop poisoned crap from the unencrypted stack and do shits with it.

Also, there's this over sev_verify_cbit():

 * sev_verify_cbit() is called before switching to a new long-mode page-table
 * at boot.

so you can't move it under the

movq%rax, %cr3

Looking at this more, there's a sme_enable() call on the BSP which is already in
C.

So, can we do that C-bit verification once on the BSP, *in C* which would be a
lot easier, and be done with it?

Once it is verified there, the bit is the same on all APs so all good.

Right?

joro?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 3/7] x86/power: De-paravirt restore_processor_state()

2023-01-20 Thread Borislav Petkov
On Mon, Jan 16, 2023 at 03:25:36PM +0100, Peter Zijlstra wrote:
> Since Xen PV doesn't use restore_processor_state(), and we're going to
> have to avoid CALL/RET until at least GS is restored,

Drop the "we":

"..., and CALL/RET cannot happen before GS has been restored, ..."

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 4/7] x86/power: Inline write_cr[04]()

2023-01-20 Thread Borislav Petkov
On Mon, Jan 16, 2023 at 03:25:37PM +0100, Peter Zijlstra wrote:
> Since we can't do CALL/RET until GS is restored and CR[04] pinning is
^^

Ditto like for the previous one.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v7 13/41] mm: Make pte_mkwrite() take a VMA

2023-03-02 Thread Borislav Petkov
On Mon, Feb 27, 2023 at 02:29:29PM -0800, Rick Edgecombe wrote:
> [0] 
> https://lore.kernel.org/lkml/0e29a2d0-08d8-bcd6-ff26-4bea0e403...@redhat.com/#t

I guess that sub-thread about how you arrived at this "pass a VMA"
decision should be in the Link tag. But that's for the committer, I'd
say.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

2022-07-05 Thread Borislav Petkov
On Tue, Jul 05, 2022 at 05:56:36PM +0200, Jan Beulich wrote:
> Re-using pat_disabled like you do in your suggestion below won't
> work, because mtrr_bp_init() calls pat_disable() when MTRRs
> appear to be disabled (from the kernel's view). The goal is to
> honor "nopat" without honoring any other calls to pat_disable().

Actually, the current goal is to adjust Xen dom0 because:

1. it uses the PAT code

2. but then it does something special and hides the MTRRs

which is not something real hardware does.

So this one-off thing should be prominent, visible and not get in the
way.

As to mtrr_bp_init(), can you use X86_FEATURE_XENPV there to detect this
special case when the kernel is running as dom0 and set stuff there
accordingly so that it doesn't disable PAT?

Then you don't have to touch pat_disabled() either but intergrate the
Xen variant properly...

> I can probably fiddle with pat_enabled() instead of with
> init_cache_modes(), but when making the change I had the feeling
> this might be less liked (as looking more hacky, at least to me).

Why would that be more hacky?

I'd much rather check upfront what the kernel is running on and act
accordingly instead of hooking into random functions and then years
later wonder why was it done in the first place.

> But besides the "where" the other question is: Do you really want
> me to limit this to Xen/PV, rather than - as I have it now -
> extending it to any hypervisor, which may behave in similar ways?

Well, do you know of some other HV which hides MTRRs from the guest?

I haven't heard of any...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH] x86/PAT: have pat_enabled() properly reflect state when running on e.g. Xen

2022-07-11 Thread Borislav Petkov
On Thu, Jul 07, 2022 at 08:38:44AM +0200, Jan Beulich wrote:
> Well, right now the pvops hook for Xen swallows #GP anyway (wrongly
> so imo, but any of my earlier pointing out of that has been left
> unheard, despite even the code comments there saying "It may be worth
> changing that").

Oh great. ;-\

> The point is therefore that after writing PAT, it would need reading
> back. In which case it feels (slightly) more clean to me to avoid the
> write attempt in the first place, when we know it's not going to work.

X86_FEATURE_XENPV check then.

> If I may ask - doesn't this mean this patch, in its current shape, is
> already a (small) step in that direction? In any event what you say
> doesn't sound to me like a viable (backportable) route to addressing
> the regression at hand.

Backportable to where? To whatever tree has

bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()")

? If it is that, then 5.17 and newer.

Anyway, I don't mind it as long as you put the proper splitting out
ontop and it all goes as a single patchset, with the minimal fix
CC:stable and queued first.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 0/3] x86: make pat and mtrr independent from each other

2022-07-16 Thread Borislav Petkov
On Sat, Jul 16, 2022 at 07:32:46AM -0400, Chuck Zmudzinski wrote:
> Can you confirm that with this patch series you are trying
> to fix that regression?

Yes, this patchset is aimed to fix the whole situation but please don't
do anything yet - I need to find time and look at the whole approach
before you can test it. Just be patient and we'll ping you when the time
comes.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 1/3] x86: move some code out of arch/x86/kernel/cpu/mtrr

2022-07-18 Thread Borislav Petkov
On Fri, Jul 15, 2022 at 04:25:47PM +0200, Juergen Gross wrote:
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 736262a76a12..e43322f8a4ef 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c

I guess the move's ok but not into cpu/common.c pls. That thing is
huuuge and is a dumping ground for everything.

arch/x86/kernel/cpu/cacheinfo.c looks like a more viable candidate for
all things cache.

Rest looks trivial.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 3/3] x86: decouple pat and mtrr handling

2022-07-19 Thread Borislav Petkov
On Fri, Jul 15, 2022 at 04:25:49PM +0200, Juergen Gross wrote:
> Today PAT is usable only with MTRR being active, with some nasty tweaks
> to make PAT usable when running as Xen PV guest, which doesn't support
> MTRR.
> 
> The reason for this coupling is, that both, PAT MSR changes and MTRR
> changes, require a similar sequence and so full PAT support was added
> using the already available MTRR handling.
> 
> Xen PV PAT handling can work without MTRR, as it just needs to consume
> the PAT MSR setting done by the hypervisor without the ability and need
> to change it. This in turn has resulted in a convoluted initialization
> sequence and wrong decisions regarding cache mode availability due to
> misguiding PAT availability flags.
> 
> Fix all of that by allowing to use PAT without MTRR and by adding an
> environment dependent PAT init function.

Aha, there's the explanation I was looking for.

> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 0a1bd14f7966..3edfb779dab5 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2408,8 +2408,8 @@ void __init cache_bp_init(void)
>  {
>   if (IS_ENABLED(CONFIG_MTRR))
>   mtrr_bp_init();
> - else
> - pat_disable("PAT support disabled because CONFIG_MTRR is 
> disabled in the kernel.");
> +
> + pat_cpu_init();
>  }
>  
>  void cache_ap_init(void)
> @@ -2417,7 +2417,8 @@ void cache_ap_init(void)
>   if (cache_aps_delayed_init)
>   return;
>  
> - mtrr_ap_init();
> + if (!mtrr_ap_init())
> + pat_ap_init_nomtrr();
>  }

So I'm reading this as: if it couldn't init AP's MTRRs, init its PAT.

But currently, the code sets the MTRRs for the delayed case or when the
CPU is not online by doing ->set_all and in there it sets first MTRRs
and then PAT.

I think the code above should simply try the two things, one after the
other, independently from one another.

And I see you've added another stomp machine call for PAT only.

Now, what I think the design of all this should be, is:

you have a bunch of things you need to do at each point:

* cache_ap_init

* cache_aps_init

* ...

Now, in each those, you look at whether PAT or MTRR is supported and you
do only those which are supported.

Also, the rendezvous handler should do:

if MTRR:
do MTRR specific stuff

if PAT:
do PAT specific stuff

This way you have clean definitions of what needs to happen when and you
also do *only* the things that the platform supports, by keeping the
proper order of operations - I believe MTRRs first and then PAT.

This way we'll get rid of that crazy maze of who calls what and when.

But first we need to define those points where stuff needs to happen and
then for each point define what stuff needs to happen.

How does that sound?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2] x86/paravirt: merge activate_mm and dup_mmap callbacks

2023-03-06 Thread Borislav Petkov
On Thu, Feb 23, 2023 at 04:05:51PM +0100, Juergen Gross wrote:
> x86 maintainers, I think this patch should be carried via the tip tree.

You missed a spot. I'll whack it.

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index a8b323266179..c3ad8a526378 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -16,13 +16,6 @@
 
 extern atomic64_t last_mm_ctx_id;
 
-#ifndef CONFIG_PARAVIRT_XXL
-static inline void paravirt_activate_mm(struct mm_struct *prev,
-   struct mm_struct *next)
-{
-}
-#endif /* !CONFIG_PARAVIRT_XXL */
-
 #ifdef CONFIG_PERF_EVENTS
 DECLARE_STATIC_KEY_FALSE(rdpmc_never_available_key);
 DECLARE_STATIC_KEY_FALSE(rdpmc_always_available_key);

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v4 05/12] x86/xen: set MTRR state when running as Xen PV initial domain

2023-03-23 Thread Borislav Petkov
On Mon, Mar 06, 2023 at 05:34:18PM +0100, Juergen Gross wrote:
> + for (reg = 0; reg < MTRR_MAX_VAR_RANGES; reg++) {
> + op.u.read_memtype.reg = reg;
> + if (HYPERVISOR_platform_op(&op))
> + break;
> +
> + /*
> +  * Only called in dom0, which has all RAM PFNs mapped at
> +  * RAM MFNs, and all PCI space etc. is identity mapped.
> +  * This means we can treat MFN == PFN regarding MTTR settings.


"MTRR"

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3] x86/boot: skip realmode init code when running as Xen PV guest

2022-11-24 Thread Borislav Petkov
On Wed, Nov 23, 2022 at 12:45:23PM +0100, Juergen Gross wrote:
> When running as a Xen PV guest there is no need for setting up the
> realmode trampoline, as realmode isn't supported in this environment.
> 
> Trying to setup the trampoline has been proven to be problematic in
> some cases, especially when trying to debug early boot problems with
> Xen requiring to keep the EFI boot-services memory mapped (some
> firmware variants seem to claim basically all memory below 1M for boot
> services).
> 
> Introduce new x86_platform_ops operations for that purpose, which can
> be set to a nop by the Xen PV specific kernel boot code.
> 
> Fixes: 084ee1c641a0 ("x86, realmode: Relocator for realmode code")

This text and Fixes: tag sounds like this needs to go to Linus and
stable now?

> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 41d7669a97ad..247aca9f8ed1 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -200,14 +200,18 @@ static void __init set_real_mode_permissions(void)
>   set_memory_x((unsigned long) text_start, text_size >> PAGE_SHIFT);
>  }
>  
> -static int __init init_real_mode(void)
> +void __init init_real_mode(void)
>  {
>   if (!real_mode_header)
>   panic("Real mode trampoline was not allocated");
>  
>   setup_real_mode();
>   set_real_mode_permissions();
> +}
>  
> +static int __init call_init_real_mode(void)
> +{
> + x86_platform.realmode_init();
>   return 0;
>  }
> -early_initcall(init_real_mode);
> +early_initcall(call_init_real_mode);

I'll name that one "do_init_real_mode" as "call init" sounds weird.

Otherwise, it is as straightforward as it gets.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3] x86/boot: skip realmode init code when running as Xen PV guest

2022-11-24 Thread Borislav Petkov
On Thu, Nov 24, 2022 at 02:30:39PM +0100, Juergen Gross wrote:
> Looking at the date when 084ee1c641a0 went in I don't think it _needs_
> to go in now, but I wouldn't complain ...

So if you don't have a particular and specific reason, I won't queue it
for stable at all.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 1/2] x86: Check return values from early_memremap calls

2023-01-04 Thread Borislav Petkov
On Thu, Nov 10, 2022 at 08:07:53AM -0800, Dave Hansen wrote:
> On 11/10/22 07:45, Ross Philipson wrote:
> > dt = early_memremap(initial_dtb, map_len);
> > +   if (!dt) {
> > +   pr_warn("failed to memremap initial dtb\n");
> > +   return;
> > +   }
> 
> Are all of these new pr_warn/err()'s really adding much value?  They all
> look pretty generic.  It makes me wonder if we should just spit out a
> generic message in early_memremap() and save all the callers the trouble.

Well, let's see.

early_memremap() calls __early_ioremap() and that one already warns befofe each
NULL return. So I guess we don't need the error messages as we will know where
it starts failing.

I guess we still need the error handling though.

I.e., this above should be:

dt = early_memremap(initial_dtb, map_len);
+   if (!dt)
+   return;

so that we don't go off into the weeds with a NULL ptr.

Or?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [RFC PATCH V3 03/11] x86/Hyper-V: Add new hvcall guest address host visibility support

2021-05-30 Thread Borislav Petkov
On Sun, May 30, 2021 at 11:06:20AM -0400, Tianyu Lan wrote:
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 156cd235659f..a82975600107 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -29,6 +29,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  #include "../mm_internal.h"
>  
> @@ -1986,8 +1988,14 @@ static int __set_memory_enc_dec(unsigned long addr, 
> int numpages, bool enc)
>   int ret;
>  
>   /* Nothing to do if memory encryption is not active */
> - if (!mem_encrypt_active())
> + if (hv_is_isolation_supported()) {
> + return hv_set_mem_host_visibility((void *)addr,
> + numpages * HV_HYP_PAGE_SIZE,
> + enc ? VMBUS_PAGE_NOT_VISIBLE
> + : VMBUS_PAGE_VISIBLE_READ_WRITE);

Put all this gunk in a hv-specific function somewhere in hv-land which
you only call from here. This way you probably won't even need to export
hv_set_mem_host_visibility() and so on...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH 4/5] x86/kernel: Move page table macros to new header

2024-05-23 Thread Borislav Petkov
On Thu, May 23, 2024 at 03:59:43PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> > ---
> >  arch/x86/kernel/head_64.S| 22 ++
> >  arch/x86/kernel/pgtable_64_helpers.h | 28 
> 
> That's the wrong place as you want to include it from arch/x86/platform.
> 
> arch/x86/include/asm/

... and there already is a header waiting:

arch/x86/include/asm/pgtable_64.h

so no need for a new one.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-05-31 Thread Borislav Petkov
On Wed, May 31, 2023 at 11:31:37AM +0200, Juergen Gross wrote:
> What it did would have been printed if pr_debug() would have been
> active. :-(

Lemme turn those into pr_info(). pr_debug() is nuts.

> Did you check whether CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT was the same in 
> both
> kernels you've tested?

Yes, it is enabled.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov
On Thu, Jun 01, 2023 at 08:39:17AM +0200, Juergen Gross wrote:
> Does this translate to: "we should remove that cleanup crap"? I'd be
> positive to that. :-)

Why, what's wrong with that thing?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov
On Thu, Jun 01, 2023 at 10:19:01AM +0200, Juergen Gross wrote:
> Patch 2 wants this diff on top:

Obviously. :-)

That fixes it, thx.

Now lemme restart testing.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v6 00/16] x86/mtrr: fix handling with PAT but without MTRR

2023-06-01 Thread Borislav Petkov
On Thu, Jun 01, 2023 at 03:22:33PM +0200, Borislav Petkov wrote:
> Now lemme restart testing.

This is from another box, with the latest changes incorporated:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=rc1-mtrr

--- proc-mtrr.before2011-03-04 01:03:35.243994733 +0100
+++ proc-mtrr.after 2023-06-01 16:28:54.95456 +0200
@@ -1,3 +1,3 @@
 reg00: base=0x0 (0MB), size= 2048MB, count=1: write-back
 reg01: base=0x08000 ( 2048MB), size= 1024MB, count=1: write-back
-reg02: base=0x0c000 ( 3072MB), size=  256MB, count=1: write-back
+reg02: base=0x0c000 ( 3072MB), size=  128MB, count=1: write-back

Want mtrr=debug output again?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 1/5] x86/paravirt: move some functions and defines to alternative

2023-10-25 Thread Borislav Petkov
On Thu, Oct 19, 2023 at 11:15:16AM +0200, Juergen Gross wrote:
> +/* Low-level backend functions usable from alternative code replacements. */
> +DEFINE_ASM_FUNC(x86_nop, "", .entry.text);
> +EXPORT_SYMBOL_GPL(x86_nop);

This is all x86 code so you don't really need the "x86_" prefix - "nop"
is perfectly fine.

> +noinstr void x86_BUG(void)
> +{
> + BUG();
> +}
> +EXPORT_SYMBOL_GPL(x86_BUG);

That export is needed for?

Paravirt stuff in modules?

It builds here without it - I guess I need to do an allmodconfig.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 1/5] x86/paravirt: move some functions and defines to alternative

2023-10-25 Thread Borislav Petkov
On Wed, Oct 25, 2023 at 03:31:07PM +0200, Juergen Gross wrote:
> There is
> 
> #define nop() asm volatile ("nop")
> 
> in arch/x86/include/asm/special_insns.h already.

Then call it "nop_func" or so.

> It might not be needed now, but are you sure we won't need it in future?

No, I'm not.

What I'm sure of is: stuff should be added to the kernel only when
really needed. Not in the expectation that it might potentially be
needed at some point.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

2023-11-08 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:22PM -0700, Xin Li wrote:
> Subject: Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for 
> WRMSRNS
  

For all your text:

s/cpu/CPU/g

> WRMSRNS is an instruction that behaves exactly like WRMSR, with
> the only difference being that it is not a serializing instruction
> by default. Under certain conditions, WRMSRNS may replace WRMSR to
> improve performance.
> 
> Add the CPU feature bit for WRMSRNS.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/cpufeatures.h   | 1 +
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  2 files changed, 2 insertions(+)

It looks to me like you can merge the first three patches into one as
all they do is add that insn support.

Then, further down in the patchset, it says:

+   if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+   /* WRMSRNS is a baseline feature for FRED. */

but WRMSRNS is not mentioned in the FRED spec "Document Number:
346446-005US, Revision: 5.0" which, according to

https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html

is the latest.

Am I looking at the wrong one?

> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 58cb9495e40f..330876d34b68 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -322,6 +322,7 @@
>  #define X86_FEATURE_FSRS (12*32+11) /* "" Fast short REP STOSB */
>  #define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP 
> {CMPSB,SCASB} */
>  #define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" 
> (userspace) GS */
> +#define X86_FEATURE_WRMSRNS  (12*32+19) /* "" Non-Serializing Write 
> to Model Specific Register instruction */

  /* "" Non-serializing WRMSR */

is more than enough.

And now I'm wondering: when you're adding a separate CPUID bit, then the
above should be

+   if (cpu_feature_enabled(X86_FEATURE_WRMSRNS)) {
+   /* WRMSRNS is a baseline feature for FRED. */

I see that you're adding a dependency:

+   { X86_FEATURE_FRED, X86_FEATURE_WRMSRNS   },

which then means you don't need the X86_FEATURE_WRMSRNS definition at
all and can use X86_FEATURE_FRED only.

So, what's up?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 19/37] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch

2023-11-13 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:40PM -0700, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> MSR_IA32_FRED_RSP0 is used during ring 3 event delivery, and needs to
> be updated to point to the top of next task stack during task switch.
> 
> Signed-off-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/switch_to.h | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/switch_to.h 
> b/arch/x86/include/asm/switch_to.h
> index f42dbf17f52b..c3bd0c0758c9 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -70,9 +70,13 @@ static inline void update_task_stack(struct task_struct 
> *task)
>  #ifdef CONFIG_X86_32
>   this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
>  #else
> - /* Xen PV enters the kernel on the thread stack. */
> - if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> + /* WRMSRNS is a baseline feature for FRED. */
> + wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
> long)task_stack_page(task) + THREAD_SIZE);

If this non-serializing write happens now and, AFAICT, the CR3 write
during the task switch has already happened in switch_mm* earlier, what
is the serialization point that's going to make sure that write is
committed before the new task starts executing?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 19/37] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch

2023-11-13 Thread Borislav Petkov
On Mon, Nov 13, 2023 at 12:36:04PM -0500, H. Peter Anvin wrote:
> A resource cannot be consumed after the value has been written; this
> is the only necessary level of serialization, equivalent to, say, RAX.

Lemme see if I understand this correctly using this context as an
example: after this MSR_IA32_FRED_RSP0 write, any FRED events determined
to be delivered to level 0 will use this new task stack ptr?

And since the new task is not running yet and the old one isn't running
either, we're fine here. So the "serialization point" I was talking
about above is bollocks.

Close? :)

> A serializing instruction stops the entire pipeline until everything
> has retired and any stores have become globally visible.

Right, we don't need that here.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

2023-11-13 Thread Borislav Petkov
On Tue, Nov 14, 2023 at 12:43:38AM +, Li, Xin3 wrote:
> No.  tglx asked for it:
> https://lkml.kernel.org/kvm/87y1h81ht4.ffs@tglx/

Aha

"According to the CPU folks FRED systems are guaranteed to have WRMSRNS -
I asked for that :). It's just not yet documented."

so I'm going to expect that to appear in the next FRED spec revision...

> Because we are doing 
>   wrmsrns(MSR_IA32_FRED_RSP0, ...)
> here, and X86_FEATURE_WRMSRNS doesn't guarantee MSR_IA32_FRED_RSP0 exists.
> 
> Or I missed something?

Well, according to what I'm hearing and reading so far:

FRED means WRMSRNS
FRED means MSR_IA32_FRED_RSP0

and if you had to be precise, the code should do:

if (cpu_feature_enabled(X86_FEATURE_FRED)) {
if (cpu_feature_enabled(X86_FEATURE_WRMSRNS))
wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
else
wrmsr(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
}

but apparently FRED implies WRMSRNS - not documented anywhere currently
- so you can save yourself one check.

But your version checks FRED if it can do WRMSRNS while there's
a separate WRMSRNS flag and that made me wonder...

> Another patch set should replace WRMSR with WRMSRNS, with SERIALIZE added
> when needed.

I sense someone wants to optimize MSR writes ... :-)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH V1 3/6] xen/virtio: Add option to restrict memory access under Xen

2022-04-26 Thread Borislav Petkov
On Tue, Apr 26, 2022 at 11:36:40AM +0200, Juergen Gross wrote:
> As the suggestion was to add another flag this wouldn't be a problem IMO.

We had a problem already with adding one flag would break the same flag
on the other guest type. That's why we added cc_vendor too. So it can be
tricky.

> platform_has() doesn't seem too bad IMO.
> 
> I will write a patch for starting the discussion.

Yeah, I guess such a proposal would need a wider audience - maybe CC
linux-arch...

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 15/37] x86/ptrace: Cleanup the definition of the pt_regs structure

2023-11-28 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:36PM -0700, Xin Li wrote:
> -/* Return frame for iretq */
> +
> + /* The IRETQ return frame starts here */
>   unsigned long ip;
> - unsigned long cs;
> +
> + union {
> + u64 csx;// The full 64-bit data slot containing CS
> + u16 cs; // CS selector

I know people want to start using those // one-line comments now but
this struct already uses the /* multiline ones. Can we stick to one type
pls, otherwise it'll turn into a mess.

And pls put those comments ontop, not on the side.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 16/37] x86/ptrace: Add FRED additional information to the pt_regs structure

2023-11-28 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:37PM -0700, Xin Li wrote:
> FRED defines additional information in the upper 48 bits of cs/ss
> fields. Therefore add the information definitions into the pt_regs
> structure.
> 
> Specially introduce a new structure fred_ss to denote the FRED flags
> above SS selector, which avoids FRED_SSX_ macros and makes the code
> simpler and easier to read.
> 
> Signed-off-by: H. Peter Anvin (Intel) 

You and hpa need to go through all the patches and figure out who's the
author that's going to land in git.

Because this and others have hpa's SOB first, suggesting he's the
author. However, the mail doesn't start with

From: H. Peter Anvin (Intel) 

and then git will make *you* the author.

> Tested-by: Shan Kang 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Xin Li 

...

>   union {
> - u64 ssx;// The full 64-bit data slot containing SS
> - u16 ss; // SS selector
> + /* SS selector */
> + u16 ss;
> + /* The extended 64-bit data slot containing SS */
> + u64 ssx;
> + /* The FRED SS extension */
> + struct fred_ss  fred_ss;

Aha, sanity about the right comments has come to your mind in this next
patch. :-P

Just do them right in the previous one.

>   /*
> -  * Top of stack on IDT systems.
> +  * Top of stack on IDT systems, while FRED systems have extra fields
> +  * defined above for storing exception related information, e.g. CR2 or
> +  * DR6.

Btw, I really appreciate the good commenting - thanks for that!

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 20/37] x86/fred: Disallow the swapgs instruction when FRED is enabled

2023-11-28 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:41PM -0700, Xin Li wrote:
> +  * Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE
> +  * MSR instead of the GS segment’s descriptor cache. As such, the

:verify_diff: WARNING: Unicode char [’] (0x8217 in line: + * MSR instead of 
the GS segment’s descriptor cache. As such, the

Just do a normal ' - char number 0x27.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 23/37] x86/fred: Make exc_page_fault() work for FRED

2023-11-28 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:44PM -0700, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> On a FRED system, the faulting address (CR2) is passed on the stack,
> to avoid the problem of transient state. Thus we get the page fault
^^

Please use passive voice in your commit message: no "we" or "I", etc,
and describe your changes in imperative mood.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 24/37] x86/idtentry: Incorporate definitions/declarations of the FRED entries

2023-11-28 Thread Borislav Petkov
On Mon, Oct 02, 2023 at 11:24:45PM -0700, Xin Li wrote:
> FRED and IDT can share most of the definitions and declarations so
> that in the majority of cases the actual handler implementation is the
> same.
> 
> The differences are the exceptions where FRED stores exception related
> information on the stack and the sysvec implementations as FRED can
> handle irqentry/exit() in the dispatcher instead of having it in each
> handler.
> 
> Also add stub defines for vectors which are not used due to Kconfig
> decisions to spare the ifdeffery in the actual FRED dispatch code.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Xin Li 

This makes me wonder too who the author is. The commit message text
sounds like tglx. :)

> @@ -137,6 +141,17 @@ static __always_inline void __##func(struct pt_regs 
> *regs,   \
>  #define DEFINE_IDTENTRY_RAW(func)\
>  __visible noinstr void func(struct pt_regs *regs)
>  
> +/**
> + * DEFINE_FREDENTRY_RAW - Emit code for raw FRED entry points

LOL, "FREDENTRY"

...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v12 24/37] x86/idtentry: Incorporate definitions/declarations of the FRED entries

2023-11-28 Thread Borislav Petkov
On Tue, Nov 28, 2023 at 10:58:50AM -0800, H. Peter Anvin wrote:
> >You have a very good sense 😊

I've been reading code of a couple of people for a couple of years. :-)

> Remember that Signed-off-by: relates to the *patch flow*.

Yes, you should take the time to read Documentation/process/ and
especially submitting-patches.rst.

We have it all written down and I point new(-er) people at that
directory at least once a week. :-\

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [linus:master] [x86/entry] be5341eb0d: WARNING:CPU:#PID:#at_int80_emulation

2023-12-19 Thread Borislav Petkov
On Tue, Dec 19, 2023 at 04:49:14PM +0800, kernel test robot wrote:
> [ 13.481107][ T48] WARNING: CPU: 0 PID: 48 at int80_emulation 
> (arch/x86/entry/common.c:164) 
> [   13.481454][   T48] Modules linked in:
> [   13.481655][   T48] CPU: 0 PID: 48 Comm: init Tainted: G N 
> 6.7.0-rc4-2-gbe5341eb0d43 #1
> [ 13.482162][ T48] RIP: 0010:int80_emulation (arch/x86/entry/common.c:164) 

Looking at the dmesg, I think you missed the most important part - the
preceding line:

[   13.480504][   T48] CFI failure at int80_emulation+0x67/0xb0 (target: 
sys_ni_posix_timers+0x0/0x70; expected type: 0xb02b34d9)
^^^

[   13.481107][   T48] WARNING: CPU: 0 PID: 48 at int80_emulation+0x67/0xb0
[   13.481454][   T48] Modules linked in:
[   13.481655][   T48] CPU: 0 PID: 48 Comm: init Tainted: G N 
6.7.0-rc4-2-gbe5341eb0d43 #1

The CFI bla is also in the stack trace.

Now, decode_cfi_insn() has a comment there which says what the compiler
generates about indirect call checks:

 *   movl-, %r10d   ; 6 bytes
 *   addl-4(%reg), %r10d; 4 bytes
 *   je  .Ltmp1 ; 2 bytes
 *   ud2; <- regs->ip
 *   .Ltmp1:


and the opcodes you decoded...

> [ 13.482437][ T48] Code: 01 00 00 77 43 89 c1 48 81 f9 c9 01 00 00 48 19 c9 
> 21 c1 48 89 df 4c 8b 1c cd 90 12 20 9a 41 ba 27 cb d4 4f 45 03 53 fc 74 02 
> <0f> 0b 41 ff d3 48 89 c1 48 89 4b 50 90 48 89 df 5b 41 5e 31 c0 31
> All code
> 
>0: 01 00   add%eax,(%rax)
>2: 00 77 43add%dh,0x43(%rdi)
>5: 89 c1   mov%eax,%ecx
>7: 48 81 f9 c9 01 00 00cmp$0x1c9,%rcx
>e: 48 19 c9sbb%rcx,%rcx
>   11: 21 c1   and%eax,%ecx
>   13: 48 89 dfmov%rbx,%rdi
>   16: 4c 8b 1c cd 90 12 20mov-0x65dfed70(,%rcx,8),%r11
>   1d: 9a 
>   1e: 41 ba 27 cb d4 4f   mov$0x4fd4cb27,%r10d
>   24: 45 03 53 fc add-0x4(%r11),%r10d
>   28: 74 02   je 0x2c
>   2a:*0f 0b   ud2 <-- trapping instruction

... these guys here, look exactly like what the compiler did issue.

This is the first time I'm looking at this CFI bla but it sounds like it
is trying to compare the syscall target's address of
sys_ni_posix_timers with something it is expecting to call and the
comparison doesn't work out (%r10 is not 0).

There's that special symbol __cfi_sys_ni_posix_timers which also gets
generated...

Someone would need to dig into that whole CFI gunk to figure out why
this is not happy.

Oh well.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support

2024-01-02 Thread Borislav Petkov
On Tue, Dec 05, 2023 at 02:49:50AM -0800, Xin Li wrote:

> Subject: Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS 
> instruction support

Or simply "x86/fred: Add ... "

Other than that,

Acked-by: Borislav Petkov (AMD) 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support

2024-01-03 Thread Borislav Petkov
On Tue, Jan 02, 2024 at 10:06:27PM +, Li, Xin3 wrote:
> Do I need to send an updated patch?

> Or just leave it to the maintainer who is going to take care of it?

While waiting, please take a look at this:

https://kernel.org/doc/html/latest/process/submitting-patches.html#don-t-get-discouraged-or-impatient

Might want to read the whole doc too.

But to answer your question: you wait a few weeks and collect all
comments and review feedback that you've received and incorporate them
into the patchset.

Then, after the time passes you send a new revision and explain in the
0th message what has changed.

Ok?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v13 07/35] x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled

2024-01-22 Thread Borislav Petkov
On Tue, Dec 05, 2023 at 02:49:56AM -0800, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> Add CONFIG_X86_FRED to  to make
> cpu_feature_enabled() work correctly with FRED.
> 
> Originally-by: Megha Dey 
> Signed-off-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
> 
> Changes since v10:
> * FRED feature is defined in cpuid word 12, not 13 (Nikolay Borisov).
> ---
>  arch/x86/include/asm/disabled-features.h   | 8 +++-
>  tools/arch/x86/include/asm/disabled-features.h | 8 +++-
>  2 files changed, 14 insertions(+), 2 deletions(-)

Whoever applies this: this one and the previous one can be merged into
one patch.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v13 08/35] x86/fred: Disable FRED by default in its early stage

2024-01-22 Thread Borislav Petkov
On Tue, Dec 05, 2023 at 02:49:57AM -0800, Xin Li wrote:
>   Warning: use of this parameter will taint the kernel
>   and may cause unknown problems.
>  
> + fred[X86-64]
> + Enable flexible return and event delivery

Let's make it accept multiple options from the get-go:

fred=on,disable-when,foo,bar,bla...

in case we need to tweak its behavior.

If it is only "fred" it will propagate this way downstream and it'll
lead to confusion later when people have to update their scripts and
config files when "fred" alone doesn't do what they're expecting
anymore.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 03/12] x86/pv: switch SWAPGS to ALTERNATIVE

2020-11-27 Thread Borislav Petkov
On Fri, Nov 20, 2020 at 12:46:21PM +0100, Juergen Gross wrote:
> SWAPGS is used only for interrupts coming from user mode or for
> returning to user mode. So there is no reason to use the PARAVIRT
> framework, as it can easily be replaced by an ALTERNATIVE depending
> on X86_FEATURE_XENPV.
> 
> There are several instances using the PV-aware SWAPGS macro in paths
> which are never executed in a Xen PV guest. Replace those with the
> plain swapgs instruction. For SWAPGS_UNSAFE_STACK the same applies.
> 
> Signed-off-by: Juergen Gross 
> Acked-by: Andy Lutomirski 
> Acked-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/entry/entry_64.S | 10 +-
>  arch/x86/include/asm/irqflags.h   | 20 
>  arch/x86/include/asm/paravirt.h   | 20 
>  arch/x86/include/asm/paravirt_types.h |  2 --
>  arch/x86/kernel/asm-offsets_64.c  |  1 -
>  arch/x86/kernel/paravirt.c|  1 -
>  arch/x86/kernel/paravirt_patch.c  |  3 ---
>  arch/x86/xen/enlighten_pv.c   |  3 ---
>  8 files changed, 13 insertions(+), 47 deletions(-)

I love patches like this one! Give me more...

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call

2020-12-02 Thread Borislav Petkov
On Fri, Nov 20, 2020 at 12:46:22PM +0100, Juergen Gross wrote:
> @@ -123,12 +115,15 @@ SYM_INNER_LABEL(entry_SYSCALL_64_after_hwframe, 
> SYM_L_GLOBAL)
>* Try to use SYSRET instead of IRET if we're returning to
>* a completely clean 64-bit userspace context.  If we're not,
>* go to the slow exit path.
> +  * In the Xen PV case we must use iret anyway.
>*/
> - movqRCX(%rsp), %rcx
> - movqRIP(%rsp), %r11
>  
> - cmpq%rcx, %r11  /* SYSRET requires RCX == RIP */
> - jne swapgs_restore_regs_and_return_to_usermode
> + ALTERNATIVE __stringify( \
> + movqRCX(%rsp), %rcx; \
> + movqRIP(%rsp), %r11; \
> + cmpq%rcx, %r11; /* SYSRET requires RCX == RIP */ \
> + jne swapgs_restore_regs_and_return_to_usermode), \
> + "jmpswapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV

Why such a big ALTERNATIVE when you can simply do:

/*
 * Try to use SYSRET instead of IRET if we're returning to
 * a completely clean 64-bit userspace context.  If we're not,
 * go to the slow exit path.
 * In the Xen PV case we must use iret anyway.
 */
ALTERNATIVE "", "jmp swapgs_restore_regs_and_return_to_usermode", 
X86_FEATURE_XENPV

movqRCX(%rsp), %rcx;
movqRIP(%rsp), %r11;
cmpq%rcx, %r11; /* SYSRET requires RCX == RIP */ \
jne swapgs_restore_regs_and_return_to_usermode

?

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 04/12] x86/xen: drop USERGS_SYSRET64 paravirt call

2020-12-02 Thread Borislav Petkov
On Wed, Dec 02, 2020 at 03:48:21PM +0100, Jürgen Groß wrote:
> I wanted to avoid the additional NOPs for the bare metal case.

Yeah, in that case it gets optimized to a single NOP:

[0.176692] SMP alternatives: 81a00068: [0:5) optimized NOPs: 0f 1f 
44 00 00

which is nopl 0x0(%rax,%rax,1) and I don't think that's noticeable on
modern CPUs where a NOP is basically a rIP increment only and that goes
down the pipe almost for free. :-)

> If you don't mind them I can do as you are suggesting.

Yes pls, I think asm readability is more important than a 5-byte NOP.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 07/12] x86: add new features for paravirt patching

2020-12-08 Thread Borislav Petkov
On Fri, Nov 20, 2020 at 12:46:25PM +0100, Juergen Gross wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index dad350d42ecf..ffa23c655412 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -237,6 +237,9 @@
>  #define X86_FEATURE_VMCALL   ( 8*32+18) /* "" Hypervisor supports 
> the VMCALL instruction */
>  #define X86_FEATURE_VMW_VMMCALL  ( 8*32+19) /* "" VMware prefers 
> VMMCALL hypercall instruction */
>  #define X86_FEATURE_SEV_ES   ( 8*32+20) /* AMD Secure Encrypted 
> Virtualization - Encrypted State */
> +#define X86_FEATURE_NOT_XENPV( 8*32+21) /* "" Inverse of 
> X86_FEATURE_XENPV */
> +#define X86_FEATURE_NO_PVUNLOCK  ( 8*32+22) /* "" No PV unlock 
> function */
> +#define X86_FEATURE_NO_VCPUPREEMPT   ( 8*32+23) /* "" No PV 
> vcpu_is_preempted function */

Ew, negative features. ;-\

/me goes forward and looks at usage sites:

+   ALTERNATIVE_2 "jmp *paravirt_iret(%rip);",  \
+ "jmp native_iret;", X86_FEATURE_NOT_XENPV,\
+ "jmp xen_iret;", X86_FEATURE_XENPV

Can we make that:

ALTERNATIVE_TERNARY "jmp *paravirt_iret(%rip);",
  "jmp xen_iret;", X86_FEATURE_XENPV,
  "jmp native_iret;", X86_FEATURE_XENPV,

where the last two lines are supposed to mean

X86_FEATURE_XENPV ? "jmp xen_iret;" : "jmp 
native_iret;"

Now, in order to convey that logic to apply_alternatives(), you can do:

struct alt_instr {
s32 instr_offset;   /* original instruction */
s32 repl_offset;/* offset to replacement instruction */
u16 cpuid;  /* cpuid bit set for replacement */
u8  instrlen;   /* length of original instruction */
u8  replacementlen; /* length of new instruction */
u8  padlen; /* length of build-time padding */
u8  flags;  /* patching flags */<--- 
THIS
} __packed;

and yes, we have had the flags thing in a lot of WIP diffs over the
years but we've never come to actually needing it.

Anyway, then, apply_alternatives() will do:

if (flags & ALT_NOT_FEATURE)

or something like that - I'm bad at naming stuff - then it should patch
only when the feature is NOT set and vice versa.

There in that

if (!boot_cpu_has(a->cpuid)) {

branch.

Hmm?

>  /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
>  #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
> RDGSBASE, WRGSBASE instructions*/
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 2400ad62f330..f8f9700719cf 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -593,6 +593,18 @@ int alternatives_text_reserved(void *start, void *end)
>  #endif /* CONFIG_SMP */
>  
>  #ifdef CONFIG_PARAVIRT
> +static void __init paravirt_set_cap(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_XENPV))
> + setup_force_cpu_cap(X86_FEATURE_NOT_XENPV);
> +
> + if (pv_is_native_spin_unlock())
> + setup_force_cpu_cap(X86_FEATURE_NO_PVUNLOCK);
> +
> + if (pv_is_native_vcpu_is_preempted())
> + setup_force_cpu_cap(X86_FEATURE_NO_VCPUPREEMPT);
> +}
> +
>  void __init_or_module apply_paravirt(struct paravirt_patch_site *start,
>struct paravirt_patch_site *end)
>  {
> @@ -616,6 +628,8 @@ void __init_or_module apply_paravirt(struct 
> paravirt_patch_site *start,
>  }
>  extern struct paravirt_patch_site __start_parainstructions[],
>   __stop_parainstructions[];
> +#else
> +static void __init paravirt_set_cap(void) { }
>  #endif   /* CONFIG_PARAVIRT */
>  
>  /*
> @@ -723,6 +737,18 @@ void __init alternative_instructions(void)
>* patching.
>*/
>  
> + paravirt_set_cap();

Can that be called from somewhere in the Xen init path and not from
here? Somewhere before check_bugs() gets called.

> + /*
> +  * First patch paravirt functions, such that we overwrite the indirect
> +  * call with the direct call.
> +  */
> + apply_paravirt(__parainstructions, __parainstructions_end);
> +
> + /*
> +  * Then patch alternatives, such that those paravirt calls that are in
> +  * alternatives can be overwritten by their immediate fragments.
> +  */
>   apply_alternatives(__alt_instructions, __alt_instructions_end);

Can you give an example here pls why the paravirt patching needs to run
first?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 07/12] x86: add new features for paravirt patching

2020-12-09 Thread Borislav Petkov
On Wed, Dec 09, 2020 at 08:30:53AM +0100, Jürgen Groß wrote:
> Hey, I already suggested to use ~FEATURE for that purpose (see
> https://lore.kernel.org/lkml/f105a63d-6b51-3afb-83e0-e899ea408...@suse.com/

Great minds think alike!

:-P

> I'd rather make the syntax:
> 
> ALTERNATIVE_TERNARY   
>  
> 
> as this ...

Sure, that is ok too.

> ... would match perfectly to this interpretation.

Yap.

> Hmm, using flags is an alternative (pun intended :-) ).

LOL!

Btw, pls do check how much the vmlinux size of an allyesconfig grows
with this as we will be adding a byte per patch site. Not that it would
matter too much - the flags are a long way a comin'. :-)

> No, this is needed for non-Xen cases, too (especially pv spinlocks).

I see.

> > Can you give an example here pls why the paravirt patching needs to run
> > first?
> 
> Okay.

I meant an example for me to have a look at. :)

If possible pls.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v2 07/12] x86: add new features for paravirt patching

2020-12-10 Thread Borislav Petkov
On Wed, Dec 09, 2020 at 01:22:24PM +0100, Jürgen Groß wrote:
> Lets take the spin_unlock() case. With patch 11 of the series this is
> 
> PVOP_ALT_VCALLEE1(lock.queued_spin_unlock, lock,
>   "movb $0, (%%" _ASM_ARG1 ");",
>   X86_FEATURE_NO_PVUNLOCK);
> 
> which boils down to ALTERNATIVE "call *lock.queued_spin_unlock"
> "movb $0,(%rdi)" X86_FEATURE_NO_PVUNLOCK
> 
> The initial (paravirt) code is an indirect call in order to allow
> spin_unlock() before paravirt/alternative patching takes place.
> 
> Paravirt patching will then replace the indirect call with a direct call
> to the correct unlock function. Then alternative patching might replace
> the direct call to the bare metal unlock with a plain "movb $0,(%rdi)"
> in case pvlocks are not enabled.

Aha, that zeros the locking var on unlock, I see.

> In case alternative patching would occur first, the indirect call might
> be replaced with the "movb ...", and then paravirt patching would
> clobber that with the direct call, resulting in the bare metal
> optimization being removed again.

Yeah, that explains the whole situation much better - thanks - and
considering how complex the whole patching is, I wouldn't mind the gist
of it as text in alternative_instructions() or in a comment above it so
that we don't have to swap everything back in, months and years from
now, when we optimize it yet again. :-}

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2021-01-06 Thread Borislav Petkov
On Thu, Dec 17, 2020 at 10:31:24AM +0100, Juergen Gross wrote:
> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Due to include hell this requires to split out the time interfaces
> into a new header file.

I guess you can add Peter's patch to your set, no?

https://lkml.kernel.org/r/20201110005609.40989-3-frede...@kernel.org

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2021-01-06 Thread Borislav Petkov
On Thu, Dec 17, 2020 at 05:31:50PM +, Michael Kelley wrote:
> These Hyper-V changes are problematic as we want to keep hyperv_timer.c
> architecture independent.  While only the code for x86/x64 is currently
> accepted upstream, code for ARM64 support is in progress.   So we need
> to use hv_setup_sched_clock() in hyperv_timer.c, and have the per-arch
> implementation in mshyperv.h.

Why, because ARM doesn't have static_call yet? I hear someone is working
on that...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 07/15] x86/alternative: support "not feature" and ALTERNATIVE_TERNARY

2021-01-07 Thread Borislav Petkov
On Thu, Dec 17, 2020 at 10:31:25AM +0100, Juergen Gross wrote:
> Instead of only supporting to modify instructions when a specific
> feature is set, support doing so for the case a feature is not set.
> 
> As today a feature is specified using a 16 bit quantity and the highest
> feature number in use is around 600, using a negated feature number for
> specifying the inverted case seems to be appropriate.
> 
>   ALTERNATIVE "default_instr", "patched_instr", ~FEATURE_NR
> 
> will start with "default_instr" and patch that with "patched_instr" in
> case FEATURE_NR is not set.
> 
> Using that add ALTERNATIVE_TERNARY:
> 
>   ALTERNATIVE_TERNARY "default_instr", FEATURE_NR,
>   "feature_on_instr", "feature_off_instr"
> 
> which will start with "default_instr" and at patch time will, depending
> on FEATURE_NR being set or not, patch that with either
> "feature_on_instr" or "feature_off_instr".

How about an even simpler one (only build-tested):

---
diff --git a/arch/x86/include/asm/alternative-asm.h 
b/arch/x86/include/asm/alternative-asm.h
index 464034db299f..d52b423d3cab 100644
--- a/arch/x86/include/asm/alternative-asm.h
+++ b/arch/x86/include/asm/alternative-asm.h
@@ -109,6 +109,9 @@
.popsection
 .endm
 
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2)   \
+   ALTERNATIVE_2 oldinstr, newinstr1, feature, newinstr2, 
X86_FEATURE_TERNARY
+
 #endif  /*  __ASSEMBLY__  */
 
 #endif /* _ASM_X86_ALTERNATIVE_ASM_H */
diff --git a/arch/x86/include/asm/alternative.h 
b/arch/x86/include/asm/alternative.h
index 13adca37c99a..f170cbe89539 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -175,6 +175,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
ALTINSTR_REPLACEMENT(newinstr2, feature2, 2)\
".popsection\n"
 
+#define ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, newinstr2)   \
+   ALTERNATIVE_2(oldinstr, newinstr1, feature, newinstr2, 
X86_FEATURE_TERNARY)
+
 #define ALTERNATIVE_3(oldinsn, newinsn1, feat1, newinsn2, feat2, newinsn3, 
feat3) \
OLDINSTR_3(oldinsn, 1, 2, 3)
\
".pushsection .altinstructions,\"a\"\n" 
\
@@ -206,6 +209,9 @@ static inline int alternatives_text_reserved(void *start, 
void *end)
 #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, 
newinstr2, feature2) ::: "memory")
 
+#define alternative_ternary(oldinstr, feature, newinstr1, newinstr2)   \
+   asm_inline volatile(ALTERNATIVE_TERNARY(oldinstr, feature, newinstr1, 
newinstr2) ::: "memory")
+
 /*
  * Alternative inline assembly with input.
  *
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..cc634db0b91f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -108,7 +108,7 @@
 #define X86_FEATURE_EXTD_APICID( 3*32+26) /* Extended APICID 
(8 bits) */
 #define X86_FEATURE_AMD_DCM( 3*32+27) /* AMD multi-node processor 
*/
 #define X86_FEATURE_APERFMPERF ( 3*32+28) /* P-State hardware 
coordination feedback capability (APERF/MPERF MSRs) */
-/* free( 3*32+29) */
+#define X86_FEATURE_TERNARY( 3*32+29) /* "" Synthetic bit for 
ALTERNATIVE_TERNARY() */
 #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 
state */
 #define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */
 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 8d778e46725d..2cb29d4d8dd9 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -393,7 +393,7 @@ void __init_or_module noinline apply_alternatives(struct 
alt_instr *start,
replacement = (u8 *)&a->repl_offset + a->repl_offset;
BUG_ON(a->instrlen > sizeof(insn_buff));
BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
-   if (!boot_cpu_has(a->cpuid)) {
+   if (!boot_cpu_has(a->cpuid) && (a->cpuid != 
X86_FEATURE_TERNARY)) {
if (a->padlen > 1)
optimize_nops(a, instr);
 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



Re: [PATCH v3 08/15] x86: add new features for paravirt patching

2021-01-14 Thread Borislav Petkov
On Thu, Dec 17, 2020 at 09:12:57PM +0800, kernel test robot wrote:
>ld: arch/x86/kernel/alternative.o: in function `paravirt_set_cap':
> >> arch/x86/kernel/alternative.c:605: undefined reference to 
> >> `pv_is_native_spin_unlock'
> >> ld: arch/x86/kernel/alternative.c:608: undefined reference to 
> >> `pv_is_native_vcpu_is_preempted'

Looks like alternative.c needs #include .

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette



  1   2   >