On 11/12/2024 8:43 AM, Michael Kelley wrote:
From: Naman Jain <namj...@linux.microsoft.com> Sent: Sunday, November 10, 2024 
9:44 PM

On 11/7/2024 11:14 AM, Naman Jain wrote:

On 11/1/2024 12:44 AM, Michael Kelley wrote:
From: Naman Jain <namj...@linux.microsoft.com> Sent: Tuesday, October 29, 2024 
1:02 AM


@@ -2494,6 +2495,22 @@ static int vmbus_bus_resume(struct device *dev)


+    mutex_lock(&vmbus_connection.channel_mutex);
+    list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+        if (channel->offermsg.child_relid != INVALID_RELID)
+            continue;
+        /* hvsock channels are not expected to be present. */
+        if (is_hvsock_channel(channel))
+            continue;
+        pr_err("channel %pUl/%pUl not present after resume.\n",
+            &channel->offermsg.offer.if_type,
+            &channel->offermsg.offer.if_instance);
+        /* ToDo: Cleanup these channels here */
+    }
+    mutex_unlock(&vmbus_connection.channel_mutex);

Dexuan and John have explained how in Azure VMs, there should not be
any VFs assigned to the VM at the time of hibernation. So the above
check for missing offers does not trigger an error message due to
VF offers coming after the all-offers-received message.

But what about the case of a VM running on a local Hyper-V? I'm not
completely clear, but in that case I don't think any VFs are removed
before the hibernation, especially for VM-initiated hibernation. It's

I am not sure about this behavior. I have requested Dexuan offline
for a comment.

a reasonable scenario to later resume that same VM, with the same
VF assigned to the VM. Because of the way current code counts
the offers, vmbus_bus_resume() waits for the VF to be offered again,
and all the channels get correct post-resume relids. But the changes
in this patch set break that scenario. Since vmbus_bus_resume() now
proceeds before the VF offer arrives, hv_pci_resume() calling
vmbus_open() could use the pre-hibernation relid for the VF and break
things. Certainly the "not present after resume" error message would
be spurious.

Maybe the focus here is Azure, and it's tolerable for the local Hyper-V
case with a VF to not work pending later fixes. But I thought I'd call
out the potential issue (assuming my thinking is correct).


IIUC, below scenarios can happen based on your comment-

Case 1:
VF channel offer is received in time before hv_pci_resume() and there
are no problems.

Case 2:
Resume proceeds just after getting ALLOFFERS_DELIVERED msg and a warning
is printed that this VF channel is not present after resume.
Then two scenarios can happen:
    Case 2.1:
    VF channel offer is received before hv_pci_resume() and things work
    fine. Warning printed is spurious.
    Case 2.2:
    VM channel offer is not received before hv_pci_resume() and relid is
    not yet restored by onoffer. This is a problem. Warning is printed in
    this case for missing offer.

I think it all depends on whether or not VFs are removed in local
HyperV VMs. I'll try to get this information. Thanks for pointing this


Hi Michael,
I discussed with Dexuan and we tried these scenarios. Here are the

For the two ways of host initiated hibernation:

#1: Invoke-Hibernate $vm -Device (Uses the guest shutdown component)
#2: Invoke-Hibernate $vm -ComputerSystem (Uses the RequestStateChange

Question:  What Powershell module provides "Invoke-Hibernate"? It's not
present on my Windows 11 system that is running Hyper-V, and I can't
find any documentation about it on the web.  Or maybe Invoke-Hibernate
is a Powershell *script*?

Powertest is one of the internal packages, which has some commands to
trigger host-initiated hibernation. I should probably have mentioned
that in my original comment.

#1 does not remove the VF before sending the hibernate message to the VM
via hv_utils, but #2 does.

With both #1 and #2, during resume, the host offers the vPCI vmbus
device before the vmbus_onoffers_delivered() is called. Whether or not
VFs are removed doesn't matter here, because during resume the first
fresh kernel always requests the VF device, meaning it has become a
boot-time device when the 'old' kernel is resuming back. So the issue we
are discussing will not happen in practice and the patch won't break
things and won't print spurious warnings. If its OK, please let me know,
I'll then proceed with v3.

Ah, this is interesting. I'm assuming these are the details:

1)  VM boots with the intent of resuming from hibernation (though
Hyper-V doesn't know about that intent)
2)  Original fresh kernel is loaded and begins initialization
3)  VMBus offers come in for boot-time devices, which excludes SR-IOV VFs.
4)  ALLOFFERS_DELIVERED message comes in
5)  The storvsc driver initializes for the virtual disks on the VM
6)  Kernel initialization code finds and reads the swap space to see if a
hibernation image is present. If so, it reads in the hibernation image.
7)  The suspend sequence is initiated (just like during hibernation)
to shutdown the VMBus devices and terminate the VMBus connection.
8)  Control is transferred to the previously read-in hibernation image
9)  The hibernation image runs the resume sequence, which
initiates a new VMBus connection and requests offers
10) VMBus offers come in for whatever VMBus devices were present
when Step 7 initiated the suspend sequence. If a VF device was present
at that time, an offer for that VF device will come in and will match up
with the VF that was present in the VM at the time of hibernation.
11) ALLOFFERS_DELIVERED message comes in again for the
newly initiated VMBus connection.

3), 4) works differently IMO. There is no request_for_offers, or ALLOFFERS_DELIVERED for fresh kernel. Otherwise on adding the prints in kernel, we should have seen these function calls *twice* in one hibernation-resume cycle. But that is not the case.

When the older/original kernel boots up, and requests offers, it gets those VF offers again as part of boot time offers, and then
ALLOFFERS_DELIVERED msg comes. I'm still trying to figure out how fresh
kernel requests for VF offers or if it gets those offers automatically
from the host. I will update my findings so that it can be put up in
documentation which you mentioned.

The netvsc driver gets initialized *after* step 4, but we don't know
exactly *when* relative to the storvsc driver. The netvsc driver must
tell Hyper-V that it can handle an SR-IOV VF, and the VF offer is sent
sometime after that. While this netvsc/VF sequence is happening, the
storvsc driver is reading the hibernation image from swap (Step 6).

Maybe this is how fresh kernel gets the offers for VF devices.

I think the sequence you describe works when reading the
hibernation image from swap takes 10's of seconds, or even several
minutes in an Azure VM with a remote disk. That gives plenty
of time for the VF to get initialized and be fully present when Step 7
starts. But there's no *guarantee* that the VF is initialized by then.
It's also not clear to me what action by the guest causes Hyper-V to
treat the VF as "added to the VM" so that in Step 10 the VF offer is

The sequence you describe also happens in an Azure VM, even if
the VF is removed before hibernation. When the VF offer arrives
during Step 10, it doesn't match with any VFs that were in the VM
at the time of hibernation. It's treated as a new device, just like it
would be if the offer arrived after ALLOFFERS_DELIVERED.

But it seems like there's still the risk of having a fast swap disk
and a small hibernation image that can be read in a shorter amount
of time than it takes to initialize the VF to the point that Hyper-V
treats it as added to the VM. Without knowing what that point is,
it's hard to assess the likelihood of that happening. Or maybe there's
an interlock I'm not aware of that ensures Step 7 can't proceed
while the netvsc/VF sequence is in progress.

So maybe it's best to proceed with this patch, and deal with the
risk later when/if it becomes reality. I'm OK if you want to do
that. This has been an interesting discussion that I'll try to capture
in some high-level documentation about how Linux guests on
Hyper-V do hibernation!


I have sent v3 with the changes we discussed.


Reply via email to