This is a very well written up SRU. Thank you! I have three review points: 1. Since this involves impact to a specific Internet service, please document sign-off from the operators of api.snapcraft.io that they are happy with this change. I guess it may triple service load but specifically at the times when the service is already struggling under load? Plucky isn't released yet, so they're likely only going to notice a significant difference in production (if there will be any) when this SRU lands :)
2. I think there's an additional regression risk here: users for whom it already _always_ fails will now take even longer to fail. This might affected automated air-gapped deployments, for example, where api.snapcraft.io might currently time out, but now it will have to time out three times plus six seconds. I think that such an environment *should* explicitly and immediately reject, but firewalls are often not configured to do that in practice. Could this tip such an automated deployment over the edge if it itself has a timeout for completion? How long is the connection timeout? If short then probably this isn't significant; if long (eg. minutes) then it could be. It isn't a big deal for someone affected to fix this by extending their own timeout or (better) not triggering lxd when it is bound to fail anyway, but it might be infuriating to deal with that in an area that is already frustrating to some set of our users and the expectation is that it won't regress further in an SRU. 3. Test Plan: could you perhaps simulate this problem with api.snapcraft.io? For example, redirect it in /etc/hosts and then use `nc -l -p 443 </dev/null` or similar? It's not exactly the same, but if it's easy and good enough, then that would be better than a test that isn't certain to actually exercise the retry path if the real service happens to be working better at the time of the test. Of these, 2 gives me reason for hesitation since this is the kind of change that has affected me in the real world before. I'm on the fence as to whether it needs mitigating or not. Please could you consider this scenario, maybe try and measure the impact, and report your thoughts? 1 and 3 are OK to be resolved before release to -updates rather than blocking now. Apart from that, I've reviewed the current uploads in Noble and Oracular, everything else looks fine, and once the above is resolved I'd be happy to accept from the queues without re-review. I see bug tasks open for Focal and Jammy, but no uploads for them, so those are not reviewed. ** Changed in: lxd-installer (Ubuntu Noble) Status: New => Incomplete ** Changed in: lxd-installer (Ubuntu Oracular) Status: New => Incomplete -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2100564 Title: lxd-installer shim fails to install with snapstore error To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd-installer/+bug/2100564/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs