Re: [PATCHv3 0/2]

2025-02-28 Thread Lei Yang
Hi Keith

V3 introduced a new bug, the following error messages from qemu output
after applying this patch to boot up a guest.
Error messages:
error: kvm run failed Invalid argument
error: kvm run failed Invalid argument
EAX= EBX= ECX= EDX=000806f4
ESI= EDI= EBP= ESP=
EIP=fff0 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000   9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200error: kvm run failed Invalid argument

TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f>
20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00
00 00 00 00 00 00
EAX= EBX= ECX= EDX=000806f4
ESI= EDI= EBP= ESP=
EIP=fff0 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000   9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f>
20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00
00 00 00 00 00 00
EAX= EBX= ECX= EDX=000806f4
ESI= EDI= EBP= ESP=
EIP=fff0 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000   9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=c5 5a 08 2d 00 00 00 00 00 00 00 00 00 00 00 00 56 54 46 00 <0f>
20 c0 a8 01 74 05 e9 2c ff ff ff e9 11 ff 90 00 00 00 00 00 00 00 00
00 00 00 00 00 00

Thanks
Lei

On Fri, Feb 28, 2025 at 7:06 AM Keith Busch  wrote:
>
> From: Keith Busch 
>
> changes from v2:
>
>   Fixed up the logical error in vhost on the new failure criteria
>
> Keith Busch (1):
>   vhost: return task creation error instead of NULL
>
> Sean Christopherson (1):
>   kvm: retry nx_huge_page_recovery_thread creation
>
>  arch/x86/kvm/mmu/mmu.c| 12 +---
>  drivers/vhost/vhost.c |  2 +-
>  include/linux/call_once.h | 16 +++-
>  kernel/vhost_task.c   |  4 ++--
>  4 files changed, 19 insertions(+), 15 deletions(-)
>
> --
> 2.43.5
>




Re: [PATCHv3 0/2]

2025-02-28 Thread Sean Christopherson
On Fri, Feb 28, 2025, Lei Yang wrote:
> Hi Keith
> 
> V3 introduced a new bug, the following error messages from qemu output
> after applying this patch to boot up a guest.

Doh, my bug.  Not yet tested, but this should fix things.  Assuming it does, 
I'll
post a v3 so I can add my SoB.

diff --git a/include/linux/call_once.h b/include/linux/call_once.h
index ddcfd91493ea..b053f4701c94 100644
--- a/include/linux/call_once.h
+++ b/include/linux/call_once.h
@@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int 
(*cb)(struct once *))
return 0;
 
 guard(mutex)(&once->lock);
-WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
-if (atomic_read(&once->state) != ONCE_NOT_STARTED)
+if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
 return -EINVAL;
 
+if (atomic_read(&once->state) == ONCE_COMPLETED)
+return 0;
+
 atomic_set(&once->state, ONCE_RUNNING);
r = cb(once);
if (r)



Re: [PATCHv3 0/2]

2025-02-28 Thread Sean Christopherson
On Fri, Feb 28, 2025, Paolo Bonzini wrote:
> On 2/28/25 16:36, Keith Busch wrote:
> > On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote:
> > > On Fri, Feb 28, 2025, Keith Busch wrote:
> > > > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> > > > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, 
> > > > > > int (*cb)(struct once *))
> > > > > >  return 0;
> > > > > >   guard(mutex)(&once->lock);
> > > > > > -WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > > > > > -if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > > > > > +if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> > > > > >   return -EINVAL;
> > > > > > +if (atomic_read(&once->state) == ONCE_COMPLETED)
> > > > > > +return 0;
> > > > > > +
> > > > > >   atomic_set(&once->state, ONCE_RUNNING);
> > > > > >  r = cb(once);
> > > > > >  if (r)
> > > > 
> > > > Possible suggestion since it seems odd to do an atomic_read twice on the
> > > > same value.
> > > 
> > > Yeah, good call.  At the risk of getting too cute, how about this?
> > 
> > Sure, that also looks good to me.
> 
> Just to overthink it a bit more, I'm changing "if (r)" to "if (r < 0)". Not
> because it's particularly useful to return a meaningful nonzero value on the
> first initialization, but more because 0+ for success and -errno for failure
> is a more common.
> 
> Queued with this change, thanks.

If it's not too late, the first patch can/should use ERR_CAST() instead of a
PTR_ERR() => ERR_PTR():

tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args);
if (IS_ERR(tsk)) {
kfree(vtsk);
return ERR_CAST(tsk);
}

And I was going to get greedy and replace spaces with tabs in call_once.

The changelog for this patch is also misleading.  KVM_RUN doesn't currently 
return
-ERESTARTNOINTR, it only ever returns -ENOMEN.  copy_process() is what returns
-ERESTARTNOINTR.

I also think it's worth calling out that it's a non-fatal signal.

--
From: Sean Christopherson 
Date: Thu, 27 Feb 2025 15:06:31 -0800
Subject: [PATCH] KVM: x86/mmu: Allow retry of nx_huge_page_recovery_thread
 creation

A VMM may send a non-fatal signal to its threads, including vCPU tasks,
at any time, and thus may signal vCPU tasks during KVM_RUN.  If a vCPU
task receives the signal while its trying to spawn the huge page recovery
vhost task, then KVM_RUN will fail due to copy_process() returning
-ERESTARTNOINTR.

Rework call_once() to mark the call complete if and only if the called
function succeeds, and plumb the function's true error code back to the
call_once() invoker.  This provides userspace with the correct, non-fatal
error code so that the VMM doesn't terminate the VM on -ENOMEM, and allows
subsequent KVM_RUN a succeed by virtue of retrying creation of the NX huge
page task.

Opportunistically replace spaces with tabs in call_once.h.

Fixes: 931656b9e2ff ("kvm: defer huge page recovery vhost task to later")
Co-developed-by: Keith Busch 
Signed-off-by: Keith Busch 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c| 10 --
 include/linux/call_once.h | 36 +---
 2 files changed, 25 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 70af12b693a3..63bb77ee1bb1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7633,7 +7633,7 @@ static bool kvm_nx_huge_page_recovery_worker(void *data)
return true;
 }
 
-static void kvm_mmu_start_lpage_recovery(struct once *once)
+static int kvm_mmu_start_lpage_recovery(struct once *once)
 {
struct kvm_arch *ka = container_of(once, struct kvm_arch, nx_once);
struct kvm *kvm = container_of(ka, struct kvm, arch);
@@ -7645,12 +7645,13 @@ static void kvm_mmu_start_lpage_recovery(struct once 
*once)
  kvm, "kvm-nx-lpage-recovery");
 
if (IS_ERR(nx_thread))
-   return;
+   return PTR_ERR(nx_thread);
 
vhost_task_start(nx_thread);
 
/* Make the task visible only once it is fully started. */
WRITE_ONCE(kvm->arch.nx_huge_page_recovery_thread, nx_thread);
+   return 0;
 }
 
 int kvm_mmu_post_init_vm(struct kvm *kvm)
@@ -7658,10 +7659,7 @@ int kvm_mmu_post_init_vm(struct kvm *kvm)
if (nx_hugepage_mitigation_hard_disabled)
return 0;
 
-   call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
-   if (!kvm->arch.nx_huge_page_recovery_thread)
-   return -ENOMEM;
-   return 0;
+   return call_once(&kvm->arch.nx_once, kvm_mmu_start_lpage_recovery);
 }
 
 void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
diff --git a/include/linux/call_once.h b/include/linux/call_once.h
index 6261aa0b3fb0..56cb9625b48b 100644
--- a/include/linux/call_once.h
+++ b/include/linux/call_once.h
@@ -9,15 +

Re: [PATCHv3 1/2] vhost: return task creation error instead of NULL

2025-02-28 Thread Mike Christie
On 2/27/25 5:06 PM, Keith Busch wrote:
> From: Keith Busch 
> 
> Lets callers distinguish why the vhost task creation failed. No one
> currently cares why it failed, so no real runtime change from this
> patch, but that will not be the case for long.
> 
> Signed-off-by: Keith Busch 
> ---
>  arch/x86/kvm/mmu/mmu.c | 2 +-
>  drivers/vhost/vhost.c  | 2 +-
>  kernel/vhost_task.c| 4 ++--
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index d4ac4a1f8b81b..18ca1ea6dc240 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -7471,7 +7471,7 @@ static void kvm_mmu_start_lpage_recovery(struct once 
> *once)
> kvm_nx_huge_page_recovery_worker_kill,
> kvm, "kvm-nx-lpage-recovery");
>  
> - if (!nx_thread)
> + if (IS_ERR(nx_thread))
>   return;
>  
>   vhost_task_start(nx_thread);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 9ac25d08f473e..63612faeab727 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -666,7 +666,7 @@ static struct vhost_worker *vhost_worker_create(struct 
> vhost_dev *dev)
>  
>   vtsk = vhost_task_create(vhost_run_work_list, vhost_worker_killed,
>worker, name);
> - if (!vtsk)
> + if (IS_ERR(vtsk))
>   goto free_worker;
>  
>   mutex_init(&worker->mutex);
> diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
> index 8800f5acc0071..2ef2e1b800916 100644
> --- a/kernel/vhost_task.c
> +++ b/kernel/vhost_task.c
> @@ -133,7 +133,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
>  
>   vtsk = kzalloc(sizeof(*vtsk), GFP_KERNEL);
>   if (!vtsk)
> - return NULL;
> + return ERR_PTR(-ENOMEM);
>   init_completion(&vtsk->exited);
>   mutex_init(&vtsk->exit_mutex);
>   vtsk->data = arg;
> @@ -145,7 +145,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
>   tsk = copy_process(NULL, 0, NUMA_NO_NODE, &args);
>   if (IS_ERR(tsk)) {
>   kfree(vtsk);
> - return NULL;
> + return ERR_PTR(PTR_ERR(tsk));
>   }
>  
>   vtsk->task = tsk;


The vhost task parts look ok to me.

Reviewed-by: Mike Christie 



Re: [PATCHv3 0/2]

2025-02-28 Thread Paolo Bonzini

On 2/28/25 16:36, Keith Busch wrote:

On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote:

On Fri, Feb 28, 2025, Keith Busch wrote:

On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:

@@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int 
(*cb)(struct once *))
 return 0;
  
  guard(mutex)(&once->lock);

-WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
-if (atomic_read(&once->state) != ONCE_NOT_STARTED)
+if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
  return -EINVAL;
  
+if (atomic_read(&once->state) == ONCE_COMPLETED)

+return 0;
+
  atomic_set(&once->state, ONCE_RUNNING);
 r = cb(once);
 if (r)


Possible suggestion since it seems odd to do an atomic_read twice on the
same value.


Yeah, good call.  At the risk of getting too cute, how about this?


Sure, that also looks good to me.


Just to overthink it a bit more, I'm changing "if (r)" to "if (r < 0)". 
Not because it's particularly useful to return a meaningful nonzero 
value on the first initialization, but more because 0+ for success and 
-errno for failure is a more common.


Queued with this change, thanks.

(Keith, I haven't forgotten about AVX by the way).

Paolo




Re: [PATCHv3 0/2]

2025-02-28 Thread Keith Busch
On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> On Fri, Feb 28, 2025, Sean Christopherson wrote:
> > On Fri, Feb 28, 2025, Lei Yang wrote:
> > > Hi Keith
> > > 
> > > V3 introduced a new bug, the following error messages from qemu output
> > > after applying this patch to boot up a guest.
> > 
> > Doh, my bug.  Not yet tested, but this should fix things.  Assuming it 
> > does, I'll
> > post a v3 so I can add my SoB.
>  v4
> 
> Confirmed that it worked, but deleting the pre-mutex check for ONCE_COMPLETED.
> Will post v4 later today.
> 
> > diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> > index ddcfd91493ea..b053f4701c94 100644
> > --- a/include/linux/call_once.h
> > +++ b/include/linux/call_once.h
> > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int 
> > (*cb)(struct once *))
> > return 0;
> >  
> >  guard(mutex)(&once->lock);
> > -WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > -if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > +if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> >  return -EINVAL;
> >  
> > +if (atomic_read(&once->state) == ONCE_COMPLETED)
> > +return 0;
> > +
> >  atomic_set(&once->state, ONCE_RUNNING);
> > r = cb(once);
> > if (r)

Possible suggestion since it seems odd to do an atomic_read twice on the
same value. Maybe make this a switch:

switch (atomic_read(&once->state) {
case ONCE_NOT_STARTED:
atomic_set(&once->state, ONCE_RUNNING);
break;
case ONCE_COMPLETED:
return 0;
case ONCE_RUNNING:
default:
WARN_ON(1);
return -EINVAL;
} 



Re: [PATCHv3 0/2]

2025-02-28 Thread Sean Christopherson
On Fri, Feb 28, 2025, Keith Busch wrote:
> On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> > > diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> > > index ddcfd91493ea..b053f4701c94 100644
> > > --- a/include/linux/call_once.h
> > > +++ b/include/linux/call_once.h
> > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int 
> > > (*cb)(struct once *))
> > > return 0;
> > >  
> > >  guard(mutex)(&once->lock);
> > > -WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > > -if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > > +if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> > >  return -EINVAL;
> > >  
> > > +if (atomic_read(&once->state) == ONCE_COMPLETED)
> > > +return 0;
> > > +
> > >  atomic_set(&once->state, ONCE_RUNNING);
> > > r = cb(once);
> > > if (r)
> 
> Possible suggestion since it seems odd to do an atomic_read twice on the
> same value.

Yeah, good call.  At the risk of getting too cute, how about this?

static inline int call_once(struct once *once, int (*cb)(struct once *))
{
int r, state;

/* Pairs with atomic_set_release() below.  */
if (atomic_read_acquire(&once->state) == ONCE_COMPLETED)
return 0;

guard(mutex)(&once->lock);
state = atomic_read(&once->state);
if (unlikely(state != ONCE_NOT_STARTED))
return WARN_ON_ONCE(state != ONCE_COMPLETED) ? -EINVAL : 0;

atomic_set(&once->state, ONCE_RUNNING);
r = cb(once);
if (r)
atomic_set(&once->state, ONCE_NOT_STARTED);
else
atomic_set_release(&once->state, ONCE_COMPLETED);
return r;
}



Re: [PATCHv3 0/2]

2025-02-28 Thread Keith Busch
On Fri, Feb 28, 2025 at 07:29:45AM -0800, Sean Christopherson wrote:
> On Fri, Feb 28, 2025, Keith Busch wrote:
> > On Fri, Feb 28, 2025 at 06:32:47AM -0800, Sean Christopherson wrote:
> > > > @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int 
> > > > (*cb)(struct once *))
> > > > return 0;
> > > >  
> > > >  guard(mutex)(&once->lock);
> > > > -WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> > > > -if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> > > > +if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
> > > >  return -EINVAL;
> > > >  
> > > > +if (atomic_read(&once->state) == ONCE_COMPLETED)
> > > > +return 0;
> > > > +
> > > >  atomic_set(&once->state, ONCE_RUNNING);
> > > > r = cb(once);
> > > > if (r)
> > 
> > Possible suggestion since it seems odd to do an atomic_read twice on the
> > same value.
> 
> Yeah, good call.  At the risk of getting too cute, how about this?

Sure, that also looks good to me.



Re: [PATCHv3 0/2]

2025-02-28 Thread Sean Christopherson
On Fri, Feb 28, 2025, Sean Christopherson wrote:
> On Fri, Feb 28, 2025, Lei Yang wrote:
> > Hi Keith
> > 
> > V3 introduced a new bug, the following error messages from qemu output
> > after applying this patch to boot up a guest.
> 
> Doh, my bug.  Not yet tested, but this should fix things.  Assuming it does, 
> I'll
> post a v3 so I can add my SoB.
 v4

Confirmed that it worked, but deleting the pre-mutex check for ONCE_COMPLETED.
Will post v4 later today.

> diff --git a/include/linux/call_once.h b/include/linux/call_once.h
> index ddcfd91493ea..b053f4701c94 100644
> --- a/include/linux/call_once.h
> +++ b/include/linux/call_once.h
> @@ -35,10 +35,12 @@ static inline int call_once(struct once *once, int 
> (*cb)(struct once *))
> return 0;
>  
>  guard(mutex)(&once->lock);
> -WARN_ON(atomic_read(&once->state) == ONCE_RUNNING);
> -if (atomic_read(&once->state) != ONCE_NOT_STARTED)
> +if (WARN_ON(atomic_read(&once->state) == ONCE_RUNNING))
>  return -EINVAL;
>  
> +if (atomic_read(&once->state) == ONCE_COMPLETED)
> +return 0;
> +
>  atomic_set(&once->state, ONCE_RUNNING);
> r = cb(once);
> if (r)