Re: (subset) [PATCH 0/5] hyper-v: Don't assume cpu_possible_mask is dense

2025-01-02 Thread Martin K. Petersen
On Wed, 02 Oct 2024 20:53:28 -0700, mhkelle...@gmail.com wrote:

> Code specific to Hyper-V guests currently assumes the cpu_possible_mask
> is "dense" -- i.e., all bit positions 0 thru (nr_cpu_ids - 1) are set,
> with no "holes". Therefore, num_possible_cpus() is assumed to be equal
> to nr_cpu_ids.
> 
> Per a separate discussion[1], this assumption is not valid in the
> general case. For example, the function setup_nr_cpu_ids() in
> kernel/smp.c is coded to assume cpu_possible_mask may be sparse,
> and other patches have been made in the past to correctly handle
> the sparseness. See bc75e99983df1efd ("rcu: Correctly handle sparse
> possible cpu") as noted by Mark Rutland.
> 
> [...]

Applied to 6.14/scsi-queue, thanks!

[4/5] scsi: storvsc: Don't assume cpu_possible_mask is dense
  https://git.kernel.org/mkp/scsi/c/6cb7063feb2e

-- 
Martin K. Petersen  Oracle Linux Engineering



[PATCH] hv_balloon: Fallback to generic_online_page() for non-HV hot added mem

2025-01-02 Thread Jacob Pan
When device memory blocks are hot-added via add_memory_driver_managed(),
the HV balloon driver intercepts them but fails to online these memory
blocks. This could result in device driver initialization failures.

To address this, fall back to the generic online callback for memory
blocks not added by the HV balloon driver. Similar cases are handled the
same way in virtio-mem driver.

Suggested-by: Vikram Sethi 
Tested-by: Michael Frohlich 
Signed-off-by: Jacob Pan 
---
 drivers/hv/hv_balloon.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index a99112e6f0b8..c999daf34108 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -766,16 +766,18 @@ static void hv_online_page(struct page *pg, unsigned int 
order)
struct hv_hotadd_state *has;
unsigned long pfn = page_to_pfn(pg);
 
-   guard(spinlock_irqsave)(&dm_device.ha_lock);
-   list_for_each_entry(has, &dm_device.ha_region_list, list) {
-   /* The page belongs to a different HAS. */
-   if (pfn < has->start_pfn ||
-   (pfn + (1UL << order) > has->end_pfn))
-   continue;
+   scoped_guard(spinlock_irqsave, &dm_device.ha_lock) {
+   list_for_each_entry(has, &dm_device.ha_region_list, list) {
+   /* The page belongs to a different HAS. */
+   if (pfn < has->start_pfn ||
+   (pfn + (1UL << order) > has->end_pfn))
+   continue;
 
-   hv_bring_pgs_online(has, pfn, 1UL << order);
-   break;
+   hv_bring_pgs_online(has, pfn, 1UL << order);
+   return;
+   }
}
+   generic_online_page(pg, order);
 }
 
 static int pfn_covered(unsigned long start_pfn, unsigned long pfn_cnt)
-- 
2.34.1




[PATCH v2] uio_hv_generic: Add a check for HV_NIC for send, receive buffers setup

2025-01-02 Thread Naman Jain
Receive and send buffer allocation was originally introduced to support
DPDK's networking use case. These buffer sizes were further increased to
meet DPDK performance requirements. However, these large buffers are
unnecessary for any other UIO use cases.
Restrict the allocation of receive and send buffers only for HV_NIC device
type, saving 47 MB of memory per device.

While at it, fix some of the syntax related issues in the touched code
which are reported by "--strict" option of checkpatch.

Signed-off-by: Naman Jain 
---
Changes since v1:
https://lore.kernel.org/all/20241125125015.1500-1-namj...@linux.microsoft.com/
Rephrased commit msg as per Saurabh's suggestion.
---
 drivers/uio/uio_hv_generic.c | 86 ++--
 1 file changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 3976360d0096..1b19b5647495 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -296,51 +296,51 @@ hv_uio_probe(struct hv_device *dev,
pdata->info.mem[MON_PAGE_MAP].size = PAGE_SIZE;
pdata->info.mem[MON_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
 
-   pdata->recv_buf = vzalloc(RECV_BUFFER_SIZE);
-   if (pdata->recv_buf == NULL) {
-   ret = -ENOMEM;
-   goto fail_free_ring;
+   if (channel->device_id == HV_NIC) {
+   pdata->recv_buf = vzalloc(RECV_BUFFER_SIZE);
+   if (!pdata->recv_buf) {
+   ret = -ENOMEM;
+   goto fail_free_ring;
+   }
+
+   ret = vmbus_establish_gpadl(channel, pdata->recv_buf,
+   RECV_BUFFER_SIZE, 
&pdata->recv_gpadl);
+   if (ret) {
+   if (!pdata->recv_gpadl.decrypted)
+   vfree(pdata->recv_buf);
+   goto fail_close;
+   }
+
+   /* put Global Physical Address Label in name */
+   snprintf(pdata->recv_name, sizeof(pdata->recv_name),
+"recv:%u", pdata->recv_gpadl.gpadl_handle);
+   pdata->info.mem[RECV_BUF_MAP].name = pdata->recv_name;
+   pdata->info.mem[RECV_BUF_MAP].addr = (uintptr_t)pdata->recv_buf;
+   pdata->info.mem[RECV_BUF_MAP].size = RECV_BUFFER_SIZE;
+   pdata->info.mem[RECV_BUF_MAP].memtype = UIO_MEM_VIRTUAL;
+
+   pdata->send_buf = vzalloc(SEND_BUFFER_SIZE);
+   if (!pdata->send_buf) {
+   ret = -ENOMEM;
+   goto fail_close;
+   }
+
+   ret = vmbus_establish_gpadl(channel, pdata->send_buf,
+   SEND_BUFFER_SIZE, 
&pdata->send_gpadl);
+   if (ret) {
+   if (!pdata->send_gpadl.decrypted)
+   vfree(pdata->send_buf);
+   goto fail_close;
+   }
+
+   snprintf(pdata->send_name, sizeof(pdata->send_name),
+"send:%u", pdata->send_gpadl.gpadl_handle);
+   pdata->info.mem[SEND_BUF_MAP].name = pdata->send_name;
+   pdata->info.mem[SEND_BUF_MAP].addr = (uintptr_t)pdata->send_buf;
+   pdata->info.mem[SEND_BUF_MAP].size = SEND_BUFFER_SIZE;
+   pdata->info.mem[SEND_BUF_MAP].memtype = UIO_MEM_VIRTUAL;
}
 
-   ret = vmbus_establish_gpadl(channel, pdata->recv_buf,
-   RECV_BUFFER_SIZE, &pdata->recv_gpadl);
-   if (ret) {
-   if (!pdata->recv_gpadl.decrypted)
-   vfree(pdata->recv_buf);
-   goto fail_close;
-   }
-
-   /* put Global Physical Address Label in name */
-   snprintf(pdata->recv_name, sizeof(pdata->recv_name),
-"recv:%u", pdata->recv_gpadl.gpadl_handle);
-   pdata->info.mem[RECV_BUF_MAP].name = pdata->recv_name;
-   pdata->info.mem[RECV_BUF_MAP].addr
-   = (uintptr_t)pdata->recv_buf;
-   pdata->info.mem[RECV_BUF_MAP].size = RECV_BUFFER_SIZE;
-   pdata->info.mem[RECV_BUF_MAP].memtype = UIO_MEM_VIRTUAL;
-
-   pdata->send_buf = vzalloc(SEND_BUFFER_SIZE);
-   if (pdata->send_buf == NULL) {
-   ret = -ENOMEM;
-   goto fail_close;
-   }
-
-   ret = vmbus_establish_gpadl(channel, pdata->send_buf,
-   SEND_BUFFER_SIZE, &pdata->send_gpadl);
-   if (ret) {
-   if (!pdata->send_gpadl.decrypted)
-   vfree(pdata->send_buf);
-   goto fail_close;
-   }
-
-   snprintf(pdata->send_name, sizeof(pdata->send_name),
-"send:%u", pdata->send_gpadl.gpadl_handle);
-   pdata->info.mem[SEND_BUF_MAP].name = pdata->send_name;
-   pdata->info.mem[SEND_BUF_MAP].addr
-   = (uintptr_t)pdata->send_buf;
-   pdata->info.mem[SEND_BUF_MAP].size = SEND_BUFFER_SIZE;

[PATCH v5 2/2] Drivers: hv: vmbus: Log on missing offers if any

2025-01-02 Thread Naman Jain
From: John Starks 

When resuming from hibernation, log any channels that were present
before hibernation but now are gone.
In general, the boot-time devices configured for a resuming VM should be
the same as the devices in the VM at the time of hibernation. It's
uncommon for the configuration to have been changed such that offers
are missing. Changing the configuration violates the rules for
hibernation anyway.
The cleanup of missing channels is not straight-forward and dependent
on individual device driver functionality and implementation,
so it can be added in future with separate changes.

Signed-off-by: John Starks 
Co-developed-by: Naman Jain 
Signed-off-by: Naman Jain 
Reviewed-by: Easwar Hariharan 
Reviewed-by: Saurabh Sengar 
Reviewed-by: Michael Kelley 
---
Changes since v4:
Rebased on latest tip and added Michael's Reviewed-by tag.

Changes since v3:
Fixed minor checkpatch style warning coming with "--strict" option,
addressing Saurabh's comments.

Changes since v2:
* Addressed Michael's comments:
  * Changed commit msg as per suggestions
* Addressed Dexuan's comments, which came up in offline discussion:
  * Minor additions in commit subject.

Changes since v1:
* Added Easwar's Reviewed-By tag
* Addressed Saurabh's comments:
  * Added a note for missing channel cleanup in comments and commit msg
---
 drivers/hv/vmbus_drv.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index bf5608a74056..0f6cd44fff29 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2462,6 +2462,7 @@ static int vmbus_bus_suspend(struct device *dev)
 
 static int vmbus_bus_resume(struct device *dev)
 {
+   struct vmbus_channel *channel;
struct vmbus_channel_msginfo *msginfo;
size_t msgsize;
int ret;
@@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)
 
vmbus_request_offers();
 
+   mutex_lock(&vmbus_connection.channel_mutex);
+   list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+   if (channel->offermsg.child_relid != INVALID_RELID)
+   continue;
+
+   /* hvsock channels are not expected to be present. */
+   if (is_hvsock_channel(channel))
+   continue;
+
+   pr_err("channel %pUl/%pUl not present after resume.\n",
+  &channel->offermsg.offer.if_type,
+  &channel->offermsg.offer.if_instance);
+   /* ToDo: Cleanup these channels here */
+   }
+   mutex_unlock(&vmbus_connection.channel_mutex);
+
/* Reset the event for the next suspend. */
reinit_completion(&vmbus_connection.ready_for_suspend_event);
 
-- 
2.43.0




[PATCH v5 1/2] Drivers: hv: vmbus: Wait for boot-time offers during boot and resume

2025-01-02 Thread Naman Jain
Channel offers are requested during VMBus initialization and resume from
hibernation. Add support to wait for all boot-time channel offers to
be delivered and processed before returning from vmbus_request_offers.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing boot-time offers.

With this added functionality, remove earlier logic which keeps track
of count of offered channels post resume from hibernation. Once all
offers delivered message is received, no further boot-time offers are
going to be received. Consequently, logic to prevent suspend from
happening after previous resume had missing offers, is also removed.

Co-developed-by: John Starks 
Signed-off-by: John Starks 
Signed-off-by: Naman Jain 
Reviewed-by: Easwar Hariharan 
Reviewed-by: Saurabh Sengar 
Reviewed-by: Michael Kelley 
---
Changes since v4:
Rebased and added Michael's Reviewed-by tag.

Changes since v3:
Fixed checkpatch style warnings coming with "--strict" option,
addressing Saurabh's comments.
FYI, I kept code style same as earlier for below, to keep consistency
with other similar fields in the code and because of lack of options
due to 100 char limit.
***
CHECK: Lines should not end with a '('
FILE: drivers/hv/connection.c:37:
+   .all_offers_delivered_event = COMPLETION_INITIALIZER(
***

Changes since v2:
* Incorporated Easwar's suggestion to use secs_to_jiffies() as his
  changes are now merged.
* Addressed Michael's comments:
  * Used boot-time offers/channels/devices to maintain consistency
  * Rephrased CHANNELMSG_ALLOFFERS_DELIVERED handler function comments
for better explanation. Thanks for sharing the write-up.
  * Changed commit msg and other things as per suggestions
* Addressed Dexuan's comments, which came up in offline discussion:
  * Changed timeout for waiting for all offers delivered msg to 60s instead of 
10s.
Reason being, the host can experience some servicing events or diagnostics 
events,
which may take a long time and hence may fail to offer all the devices 
within 10s.
  * Minor additions in commit subject.

Changes since v1:
* Added Easwar's Reviewed-By tag
* Addressed Michael's comments:
  * Added explanation of all offers delivered message in comments
  * Removed infinite wait for offers logic, and changed it wait once.
  * Removed sub channel workqueue flush logic
  * Added comments on why MLX device offer is not expected as part of
this essential boot offer list. I refrained from adding too many
details on it as it felt like it is beyond the scope of this patch
series and may not be relevant to this. However, please let me know if
something needs to be added.
* Addressed Saurabh's comments:
  * Changed timeout value to 1 ms instead of 10*1
  * Changed commit msg as per suggestions
  * Added a comment for warning case of wait_for_completion timeout
---
 drivers/hv/channel_mgmt.c | 61 +--
 drivers/hv/connection.c   |  4 +--
 drivers/hv/hyperv_vmbus.h | 14 ++---
 drivers/hv/vmbus_drv.c| 16 --
 4 files changed, 51 insertions(+), 44 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 3c6011a48dab..6e084c207414 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -944,16 +944,6 @@ void vmbus_initiate_unload(bool crash)
vmbus_wait_for_unload();
 }
 
-static void check_ready_for_resume_event(void)
-{
-   /*
-* If all the old primary channels have been fixed up, then it's safe
-* to resume.
-*/
-   if (atomic_dec_and_test(&vmbus_connection.nr_chan_fixup_on_resume))
-   complete(&vmbus_connection.ready_for_resume_event);
-}
-
 static void vmbus_setup_channel_state(struct vmbus_channel *channel,
  struct vmbus_channel_offer_channel *offer)
 {
@@ -1109,8 +1099,6 @@ static void vmbus_onoffer(struct 
vmbus_channel_message_header *hdr)
 
/* Add the channel back to the array of channels. */
vmbus_channel_map_relid(oldchannel);
-   check_ready_for_resume_event();
-
mutex_unlock(&vmbus_connection.channel_mutex);
return;
}
@@ -1296,13 +1284,28 @@ EXPORT_SYMBOL_GPL(vmbus_hvsock_device_unregister);
 
 /*
  * vmbus_onoffers_delivered -
- * This is invoked when all offers have been delivered.
+ * The CHANNELMSG_ALLOFFERS_DELIVERED message arrives after all
+ * boot-time offers are delivered. A boot-time offer is for the primary
+ * channel for any virtual hardware configured in the VM at the time it boots.
+ * Boot-time offers include offers for physical devices assigned to the VM
+ * via Hyper-V's Discrete Device Assignment (DDA) fun

[PATCH v5 0/2] Drivers: hv: vmbus: Wait for boot-time offers and log missing offers if any

2025-01-02 Thread Naman Jain
After VM requests for channel offers during boot or resume from
hibernation, host offers the devices that the VM is configured with and
then sends a separate message indicating that all the boot-time channel
offers are delivered. Wait for this message to make this boot-time offers
request and receipt process synchronous.

Without this, user mode can race with VMBus initialization and miss
channel offers. User mode has no way to work around this other than
sleeping for a while, since there is no way to know when VMBus has
finished processing boot-time offers.

This is in analogy to a PCI bus not returning from probe until it has
scanned all devices on the bus.

As part of this implementation, some code cleanup is also done for the
logic which becomes redundant due to this change.

Second patch prints the channels which are not offered when resume
happens from hibernation to supply more information to the end user.

Changes since v4:
https://lore.kernel.org/all/20241118070725.3221-1-namj...@linux.microsoft.com/
Rebased on latest tip and added Michael's Reviewed-by tag.

Changes since v3:
https://lore.kernel.org/all/20241113084700.2940-1-namj...@linux.microsoft.com/
Fixed checkpatch style warnings coming with "--strict" option,
addressing Saurabh's comments.
FYI, I kept code style same as earlier for below, to keep consistency
with other similar fields in the code and because of lack of options
due to 100 char limit.
***
CHECK: Lines should not end with a '('
FILE: drivers/hv/connection.c:37:
+   .all_offers_delivered_event = COMPLETION_INITIALIZER(
***

Changes since v2:
https://lore.kernel.org/all/20241029080147.52749-1-namj...@linux.microsoft.com/
* Incorporated Easwar's suggestion to use secs_to_jiffies() as his
  changes are now merged.
* Addressed Michael's comments:
  * Used boot-time offers/channels/devices to maintain consistency
  * Rephrased CHANNELMSG_ALLOFFERS_DELIVERED handler function comments
for better explanation. Thanks for sharing the write-up.
  * Changed commit msg and other things as per suggestions
* Addressed Dexuan's comments, which came up in offline discussion:
  * Changed timeout for waiting for all offers delivered msg to 60s instead of 
10s.
Reason being, the host can experience some servicing events or diagnostics 
events,
which may take a long time and hence may fail to offer all the devices 
within 10s.
  * Minor additions in commit subject of both patches
* Rebased on latest linux-next master tip

Changes since v1:
https://lore.kernel.org/all/20241018115811.5530-1-namj...@linux.microsoft.com/
* Added Easwar's Reviewed-By tag
* Addressed Michael's comments:
  * Added explanation of all offers delivered message in comments
  * Removed infinite wait for offers logic, and changed it wait once.
  * Removed sub channel workqueue flush logic
  * Added comments on why MLX device offer is not expected as part of
this essential boot offer list. I refrained from adding too many
details on it as it felt like it is beyond the scope of this patch
series and may not be relevant to this. However, please let me know if
something needs to be added.
* Addressed Saurabh's comments:
  * Changed timeout value to 1 ms instead of 10*1000
  * Changed commit msg as per suggestions
  * Added a comment for warning case of wait_for_completion timeout
  * Added a note for missing channel cleanup in comments and commit msg

John Starks (1):
  Drivers: hv: vmbus: Log on missing offers if any

Naman Jain (1):
  Drivers: hv: vmbus: Wait for boot-time offers during boot and resume

 drivers/hv/channel_mgmt.c | 61 +--
 drivers/hv/connection.c   |  4 +--
 drivers/hv/hyperv_vmbus.h | 14 ++---
 drivers/hv/vmbus_drv.c| 31 ++--
 4 files changed, 67 insertions(+), 43 deletions(-)


base-commit: 8155b4ef3466f0e289e8fcc9e6e62f3f4dceeac2
-- 
2.43.0