Re: [EXTERNAL] Re: [PATCH] hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC

2023-11-06 Thread Boqun Feng
On Mon, Nov 06, 2023 at 06:18:48PM +, Peter Martincic wrote:
> Sorry for the formatting/recipient/procedure mistakes. I've an updated commit 
> message based on feedback from Michael. I'll wait to hear back from you and 
> Boqun before I post V2/updates.
> 
> Thanks again,
> Peter
> 
> -Original Message-
> From: Wei Liu  
> Sent: Sunday, November 5, 2023 5:31 PM
> To: Peter Martincic 
> Cc: linux-hyperv@vger.kernel.org; Michael Kelley (LINUX) 
> ; Boqun Feng ; Wei Liu 
> ; Wei Liu 
> Subject: [EXTERNAL] Re: [PATCH] hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC
> 
> On Fri, Oct 27, 2023 at 08:42:33PM +, Peter Martincic wrote:
> > From 529fcea5d296c22b1dc6c23d55bd6417794b3cda Mon Sep 17 00:00:00 2001
> > From: Peter Martincic 
> > Date: Mon, 16 Oct 2023 16:41:10 -0700
> > Subject: [PATCH] hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC
> > 
> > Windows hosts can omit the _SYNC flag to due a bug on resume from 
> > modern suspend. If the guest is sufficiently behind, treat a _SAMPLE 
> > the same as if _SYNC was received.
> > 
> > This is hidden behind param hv_utils.timesync_implicit.
> > 
> > Signed-off-by: Peter Martincic 
> 
> Boqun, what do you think about this patch?
> 

The patch looks good to me. Peter, feel free to send the V2 and I will
give my Acked-by.

Regards,
Boqun




Re: [PATCH v2] hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC

2023-11-30 Thread Boqun Feng
On Mon, Nov 27, 2023 at 01:35:24PM -0800, pmartin...@linux.microsoft.com wrote:
> From: Peter Martincic 
> 
> Hyper-V hosts can omit the _SYNC flag to due a bug on resume from modern
> suspend. In such a case, the guest may fail to update its time-of-day to
> account for the period when it was suspended, and could proceed with a
> significantly wrong time-of-day. In such a case when the guest is
> significantly behind, fix it by treating a _SAMPLE the same as if _SYNC
> was received so that the guest time-of-day is updated.
> 
> This is hidden behind param hv_utils.timesync_implicit.
> 
> Signed-off-by: Peter Martincic 

Looks good to me.

Acked-by: Boqun Feng 

Regards,
Boqun

> ---
>  drivers/hv/hv_util.c | 31 ++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
> index 42aec2c5606a..9c97c4065fe7 100644
> --- a/drivers/hv/hv_util.c
> +++ b/drivers/hv/hv_util.c
> @@ -296,6 +296,11 @@ static struct {
>   spinlock_t  lock;
>  } host_ts;
>  
> +static bool timesync_implicit;
> +
> +module_param(timesync_implicit, bool, 0644);
> +MODULE_PARM_DESC(timesync_implicit, "If set treat SAMPLE as SYNC when clock 
> is behind");
> +
>  static inline u64 reftime_to_ns(u64 reftime)
>  {
>   return (reftime - WLTIMEDELTA) * 100;
> @@ -344,6 +349,29 @@ static void hv_set_host_time(struct work_struct *work)
>   do_settimeofday64(&ts);
>  }
>  
> +/*
> + * Due to a bug on Hyper-V hosts, the sync flag may not always be sent on 
> resume.
> + * Force a sync if the guest is behind.
> + */
> +static inline bool hv_implicit_sync(u64 host_time)
> +{
> + struct timespec64 new_ts;
> + struct timespec64 threshold_ts;
> +
> + new_ts = ns_to_timespec64(reftime_to_ns(host_time));
> + ktime_get_real_ts64(&threshold_ts);
> +
> + threshold_ts.tv_sec += 5;
> +
> + /*
> +  * If guest behind the host by 5 or more seconds.
> +  */
> + if (timespec64_compare(&new_ts, &threshold_ts) >= 0)
> + return true;
> +
> + return false;
> +}
> +
>  /*
>   * Synchronize time with host after reboot, restore, etc.
>   *
> @@ -384,7 +412,8 @@ static inline void adj_guesttime(u64 hosttime, u64 
> reftime, u8 adj_flags)
>   spin_unlock_irqrestore(&host_ts.lock, flags);
>  
>   /* Schedule work to do do_settimeofday64() */
> - if (adj_flags & ICTIMESYNCFLAG_SYNC)
> + if ((adj_flags & ICTIMESYNCFLAG_SYNC) ||
> + (timesync_implicit && hv_implicit_sync(host_ts.host_time)))
>   schedule_work(&adj_time_work);
>  }
>  
> -- 
> 2.34.1
> 



Re: [PATCH 1/1] Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory

2024-02-20 Thread Boqun Feng
On Mon, Feb 19, 2024 at 10:30:07PM -0800, Saurabh Singh Sengar wrote:
> On Mon, Feb 12, 2024 at 10:19:59PM -0800, mhkelle...@gmail.com wrote:
> > From: Michael Kelley 
> > 
> > The VMBUS_RING_SIZE macro adds space for a ring buffer header to the
> > requested ring buffer size.  The header size is always 1 page, and so
> > its size varies based on the PAGE_SIZE for which the kernel is built.
> > If the requested ring buffer size is a large power-of-2 size and the header
> > size is small, the resulting size is inefficient in its use of memory.
> > For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in
> > a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory
> > allocator, and wastes 508 Kbytes of memory.
> > 
> > In such situations, the exact size of the ring buffer isn't that important,
> > and it's OK to allocate the 4 Kbyte header at the beginning of the 512
> > Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory
> > allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted.
> > 
> > Update VMBUS_RING_SIZE to implement this approach for "large" ring buffer
> > sizes.  "Large" is somewhat arbitrarily defined as 8 times the size of
> > the ring buffer header (which is of size PAGE_SIZE).  For example, for
> > 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first
> > 4 Kbytes as the ring buffer header.  For 64 Kbyte PAGE_SIZE, ring buffers
> > of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer
> > header.  In both cases, smaller sizes add space for the header so
> > the ring size isn't reduced too much by using part of the space for
> > the header.  For example, with a 64 Kbyte page size, we don't want
> > a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half
> > of the space for the header.  In such a case, the memory allocation
> > is less efficient, but it's the best that can be done.
> > 
> > Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL")
> > Signed-off-by: Michael Kelley 
> > ---
> >  include/linux/hyperv.h | 22 +-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> > index 2b00faf98017..6ef0557b4bff 100644
> > --- a/include/linux/hyperv.h
> > +++ b/include/linux/hyperv.h
> > @@ -164,8 +164,28 @@ struct hv_ring_buffer {
> > u8 buffer[];
> >  } __packed;
> >  
> > +
> > +/*
> > + * If the requested ring buffer size is at least 8 times the size of the
> > + * header, steal space from the ring buffer for the header. Otherwise, add
> > + * space for the header so that is doesn't take too much of the ring buffer
> > + * space.
> > + *
> > + * The factor of 8 is somewhat arbitrary. The goal is to prevent adding a
> > + * relatively small header (4 Kbytes on x86) to a large-ish power-of-2 ring
> > + * buffer size (such as 128 Kbytes) and so end up making a nearly twice as
> > + * large allocation that will be almost half wasted. As a contrasting 
> > example,
> > + * on ARM64 with 64 Kbyte page size, we don't want to take 64 Kbytes for 
> > the
> > + * header from a 128 Kbyte allocation, leaving only 64 Kbytes for the ring.
> > + * In this latter case, we must add 64 Kbytes for the header and not worry
> > + * about what's wasted.
> > + */
> > +#define VMBUS_HEADER_ADJ(payload_sz) \
> > +   ((payload_sz) >=  8 * sizeof(struct hv_ring_buffer) ? \
> > +   0 : sizeof(struct hv_ring_buffer))
> > +
> >  /* Calculate the proper size of a ringbuffer, it must be page-aligned */
> > -#define VMBUS_RING_SIZE(payload_sz) PAGE_ALIGN(sizeof(struct 
> > hv_ring_buffer) + \
> > +#define VMBUS_RING_SIZE(payload_sz) 
> > PAGE_ALIGN(VMBUS_HEADER_ADJ(payload_sz) + \
> >(payload_sz))

I generally see the point of this patch, however, it changes the
semantics of VMBUS_RING_SIZE() (similiar as Saurabh mentioned below),
before VMBUS_RING_SIZE() will give you a ring buffer size which has at
least "payload_sz" bytes, but after the change, you may not get "enough"
bytes for the vmbus ring buffer.

One cause of the waste memory is using alloc_pages() to get physical
continuous, however, after a quick look into GPADL, looks like it also
supports uncontinuous pages. Maybe that's the longer-term solution?

Regards,
Boqun

> >  
> >  struct hv_ring_buffer_info {
> 
> Thanks for the patch.
> It's worth noting that this will affect the size of ringbuffer calculation for
> some of the drivers: netvsc, storvsc_drv, hid-hyperv, and hyperv-keyboard.c.
> It will be nice to have this comment added in commit for future reference.
> 
> Looks a good improvement to me,
> Reviewed-by: Saurabh Sengar 
> 
> > -- 
> > 2.25.1
> > 



Re: [PATCH 1/1] Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory

2024-02-25 Thread Boqun Feng
On Tue, Feb 20, 2024 at 06:15:52PM +, Michael Kelley wrote:
> From: Boqun Feng  Sent: Tuesday, February 20, 2024 9:29 
> AM
> > 
> > On Mon, Feb 19, 2024 at 10:30:07PM -0800, Saurabh Singh Sengar wrote:
> > > On Mon, Feb 12, 2024 at 10:19:59PM -0800, mhkelle...@gmail.com wrote:
> > > > From: Michael Kelley 
> > > >
> > > > The VMBUS_RING_SIZE macro adds space for a ring buffer header to the
> > > > requested ring buffer size.  The header size is always 1 page, and so
> > > > its size varies based on the PAGE_SIZE for which the kernel is built.
> > > > If the requested ring buffer size is a large power-of-2 size and the 
> > > > header
> > > > size is small, the resulting size is inefficient in its use of memory.
> > > > For example, a 512 Kbyte ring buffer with a 4 Kbyte page size results in
> > > > a 516 Kbyte allocation, which is rounded to up 1 Mbyte by the memory
> > > > allocator, and wastes 508 Kbytes of memory.
> > > >
> > > > In such situations, the exact size of the ring buffer isn't that 
> > > > important,
> > > > and it's OK to allocate the 4 Kbyte header at the beginning of the 512
> > > > Kbytes, leaving the ring buffer itself with just 508 Kbytes. The memory
> > > > allocation can be 512 Kbytes instead of 1 Mbyte and nothing is wasted.
> > > >
> > > > Update VMBUS_RING_SIZE to implement this approach for "large" ring 
> > > > buffer
> > > > sizes.  "Large" is somewhat arbitrarily defined as 8 times the size of
> > > > the ring buffer header (which is of size PAGE_SIZE).  For example, for
> > > > 4 Kbyte PAGE_SIZE, ring buffers of 32 Kbytes and larger use the first
> > > > 4 Kbytes as the ring buffer header.  For 64 Kbyte PAGE_SIZE, ring 
> > > > buffers
> > > > of 512 Kbytes and larger use the first 64 Kbytes as the ring buffer
> > > > header.  In both cases, smaller sizes add space for the header so
> > > > the ring size isn't reduced too much by using part of the space for
> > > > the header.  For example, with a 64 Kbyte page size, we don't want
> > > > a 128 Kbyte ring buffer to be reduced to 64 Kbytes by allocating half
> > > > of the space for the header.  In such a case, the memory allocation
> > > > is less efficient, but it's the best that can be done.
> > > >
> > > > Fixes: c1135c7fd0e9 ("Drivers: hv: vmbus: Introduce types of GPADL")
> > > > Signed-off-by: Michael Kelley 
> > > > ---
> > > >  include/linux/hyperv.h | 22 +-
> > > >  1 file changed, 21 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> > > > index 2b00faf98017..6ef0557b4bff 100644
> > > > --- a/include/linux/hyperv.h
> > > > +++ b/include/linux/hyperv.h
> > > > @@ -164,8 +164,28 @@ struct hv_ring_buffer {
> > > > u8 buffer[];
> > > >  } __packed;
> > > >
> > > > +
> > > > +/*
> > > > + * If the requested ring buffer size is at least 8 times the size of 
> > > > the
> > > > + * header, steal space from the ring buffer for the header. Otherwise, 
> > > > add
> > > > + * space for the header so that is doesn't take too much of the ring 
> > > > buffer
> > > > + * space.
> > > > + *
> > > > + * The factor of 8 is somewhat arbitrary. The goal is to prevent 
> > > > adding a
> > > > + * relatively small header (4 Kbytes on x86) to a large-ish power-of-2 
> > > > ring
> > > > + * buffer size (such as 128 Kbytes) and so end up making a nearly 
> > > > twice as
> > > > + * large allocation that will be almost half wasted. As a contrasting 
> > > > example,
> > > > + * on ARM64 with 64 Kbyte page size, we don't want to take 64 Kbytes 
> > > > for the
> > > > + * header from a 128 Kbyte allocation, leaving only 64 Kbytes for the 
> > > > ring.
> > > > + * In this latter case, we must add 64 Kbytes for the header and not 
> > > > worry
> > > > + * about what's wasted.
> > > > + */
> > > > +#define VMBUS_HEADER_ADJ(payload_sz) \
> > > > +   ((payload_sz) >=  8 * sizeof(struct hv_ring_buffer) ? \
> > > > +   0 : sizeof(struct hv_ring_buffe

Re: [RFC 11/12] Drivers: hv: vmbus: Wait for MODIFYCHANNEL to finish when offlining CPUs

2024-06-24 Thread Boqun Feng
Hi Michael,

On Mon, Jun 03, 2024 at 10:09:39PM -0700, mhkelle...@gmail.com wrote:
[...]
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index bf35bb40c55e..571b2955b38e 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -264,6 +264,14 @@ struct vmbus_connection {
>   struct irq_domain *vmbus_irq_domain;
>   struct irq_chip vmbus_irq_chip;
>  
> + /*
> +  * VM-wide counts of MODIFYCHANNEL messages sent and completed.
> +  * Used when taking a CPU offline to make sure the relevant
> +  * MODIFYCHANNEL messages have been completed.
> +  */
> + u64 modchan_sent;
> + u64 modchan_completed;
> +

Looks to me, we can just use atomic64_t here: modifying channels is far
from hotpath, so the cost of atomic increment is not a big issue, and we
avoid possible data races now and in the future.

Thoughts?

Regards,
Boqun

>   /*
>* An offer message is handled first on the work_queue, and then
>* is further handled on handle_primary_chan_wq or
> -- 
> 2.25.1
> 



Re: [PATCH] drivers/hv: add CPU offlining support

2025-01-10 Thread Boqun Feng
Hi Hamza,

Thanks for the patch, a few comments below:

On Fri, Jan 10, 2025 at 03:05:06PM -0500, Hamza Mahfooz wrote:
> Currently, it is effectively impossible to offline CPUs. Since, most
> CPUs will have vmbus channels attached to them. So, as made mention of
> in commit d570aec0f2154 ("Drivers: hv: vmbus: Synchronize
> init_vp_index() vs. CPU hotplug"), rebind channels associated with CPUs
> that a user is trying to offline to a new "randomly" selected CPU.
> 
> Cc: Boqun Feng 
> Cc: Wei Liu 
> Signed-off-by: Hamza Mahfooz 
> ---
>  drivers/hv/hv.c| 57 +++---
>  drivers/hv/vmbus_drv.c | 51 +
>  include/linux/hyperv.h |  1 +
>  3 files changed, 73 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 36d9ba097ff5..42270a7a7a19 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -433,13 +433,40 @@ static bool hv_synic_event_pending(void)
>   return pending;
>  }
>  
> +static int hv_pick_new_cpu(struct vmbus_channel *channel,
> +unsigned int current_cpu)
> +{
> + int ret = 0;
> + int cpu;
> +
> + lockdep_assert_held(&vmbus_connection.channel_mutex);
> +
> + /*
> +  * We can't assume that the relevant interrupts will be sent before
> +  * the cpu is offlined on older versions of hyperv.
> +  */
> + if (vmbus_proto_version < VERSION_WIN10_V5_3)
> + return -EBUSY;
> +
> + cpus_read_lock();

hv_pick_new_cpu() is only called inside hv_synic_cleanup(), which is
only called with cpus_read_lock() held (because it's registered via
cpuhp_setup_state_nocalls_cpuslocked()). So the cpus_read_lock() is not
necessary here if I'm not missing anything. Moreover, given
cpus_read_lock() is a non-recursive read lock, it's actually incorrect
to re-acquire it here, see:

https://docs.kernel.org/locking/lockdep-design.html#recursive-read-locks

for more information.

> + cpu = cpumask_next(get_random_u32_below(nr_cpu_ids), cpu_online_mask);
> +
> + if (cpu >= nr_cpu_ids || cpu == current_cpu)
> + cpu = VMBUS_CONNECT_CPU;
> +
> + ret = vmbus_channel_set_cpu(channel, cpu);
> + cpus_read_unlock();
> +
> + return ret;
> +}
> +
>  /*
>   * hv_synic_cleanup - Cleanup routine for hv_synic_init().
>   */
>  int hv_synic_cleanup(unsigned int cpu)
>  {
>   struct vmbus_channel *channel, *sc;
> - bool channel_found = false;
> + int ret = 0;
>  
>   if (vmbus_connection.conn_state != CONNECTED)
>   goto always_cleanup;
> @@ -456,31 +483,31 @@ int hv_synic_cleanup(unsigned int cpu)
>  
>   /*
>* Search for channels which are bound to the CPU we're about to
> -  * cleanup.  In case we find one and vmbus is still connected, we
> -  * fail; this will effectively prevent CPU offlining.
> -  *
> -  * TODO: Re-bind the channels to different CPUs.
> +  * cleanup.
>*/
>   mutex_lock(&vmbus_connection.channel_mutex);
>   list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
>   if (channel->target_cpu == cpu) {
> - channel_found = true;
> - break;
> + ret = hv_pick_new_cpu(channel, cpu);
> +
> + if (ret) {
> + mutex_unlock(&vmbus_connection.channel_mutex);
> + return ret;
> + }
>   }
>   list_for_each_entry(sc, &channel->sc_list, sc_list) {
>   if (sc->target_cpu == cpu) {
> - channel_found = true;
> - break;
> + ret = hv_pick_new_cpu(channel, cpu);
> +
> + if (ret) {
> + 
> mutex_unlock(&vmbus_connection.channel_mutex);
> + return ret;
> + }
>   }
>   }
> - if (channel_found)
> - break;
>   }
>   mutex_unlock(&vmbus_connection.channel_mutex);
>  
> - if (channel_found)
> - return -EBUSY;
> -
>   /*
>* channel_found == false means that any channels that were previously
>* assigned to the CPU have been reassigned elsewhere with a call of
> @@ -497,5 +524,5 @@ int hv_synic_cleanup(unsigned int cpu)
>  
>   hv_synic_disable_regs(cpu);
>  
> - return 0;
> + return ret;
&g