Thanks Paride. I still think we should precisely understand that
difference in the logs, since in the BAD case we always see the
"azure.py" messages, not the other one. This could be related or at
least a clue on the root cause.

Regarding the kernel side, I've build a 5.11 kernel with debug patch [0]
- I'm attaching the patch here, very simple, just a parameter-delay in
the carrier notification. Unfortunately gjolly tried it in a custom
image and it didn't reproduce. My theory is that just delaying the
notification is not enough, due to the complex SR-IOV multi-interface
nature in Hyper-V, maybe there is network connectivity even before the
carrier is fully set UP, so the debug patch could be extended maybe to
block packet transmission in mlx5 for N seconds.

I have a feeling that Groovy should reproduce this, as discussed with
gjolly - in our first reproducer, we had a Hirsute image with Groovy 5.8
kernel and also we have cloud-init versions really alike in
Groovy/Hirsute. So, if reproduces in Groovy it shouldn't be a release
blocker, definitely.

Thanks!


[0] https://launchpad.net/~gpiccoli/+archive/ubuntu/test1919177/

** Patch added: "DBG-mlx5-Add-delaylink-parameter-to-delay-Link-up-event-.patch"
   
https://bugs.launchpad.net/cloud-init/+bug/1919177/+attachment/5488894/+files/DBG-mlx5-Add-delaylink-parameter-to-delay-Link-up-event-.patch

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1919177

Title:
  Azure: issues with accelerated networking on Hirsute

Status in cloud-init:
  Incomplete
Status in cloud-init package in Ubuntu:
  New
Status in linux-azure package in Ubuntu:
  New

Bug description:
  [General]

  On Azure, when provisioning a Hirsute VM with Accelerated Networking
  enabled, sometimes the SSH key is not setup properly and the user
  cannot log into the VM.

  [how to reproduce]

  Start a VM with AN enabled:

  ```
  az vm create --name "$VM_NAME --resource-group "$GROUP" --location "UK South" 
 --image 
'Canonical:0001-com-ubuntu-server-hirsute-daily:21_04-daily-gen2:latest' --size 
Standard_F8s_v2 --admin-username ubuntu --ssh-key-value "$SSH_KEY" 
--accelerated-networking
  ```

  After a moment, try to SSH: if you succeed, delete and recreate a new
  VM.

  [troubleshooting]

  To be able to connect into the VM to debug, run:

  ```
  az vm run-command invoke -g "$GROUP" -n "$VM_NAME" --command-id 
RunShellScript --scripts "sudo -u ubuntu ssh-import-id $LP_USERNAME"
  ```

  In "/run/cloud-init/instance-data.json", I can see:
  ```
       "publicKeys": [
        {
         "keyData": "<my-pub-key>",
         "path": "/home/ubuntu/.ssh/authorized_keys"
        }
       ],
  ```

  as expected.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to