Hi guix! A few months ago, I published a paper on "Analyzing Infrastructure as Code to Prevent Intra-update Sniping Vulnerabilities" (https://lepiller.eu/pdf/hayha-extended.pdf) (and the tool: https://github.com/roptat/hayha).
Although in the paper we focus on cloudformation and AWS, I believe the same kind of issues can be found in any cloud or IaC toolings, such as guix deploy. To give you a concrete example, imagine the following situation. Not very realistic, but I hope it gets the point across. You manage your local network with guix deploy, and it contains a router and a web server. Imagine you want to update the web server's config to add an ssh service that listens for root and logs you in with no password. You are aware this is a security risk, but you trust your local network, so you also update the router's configuration to add a firewall rule blocking any SSH attempt from the outside. Unfortunately, although each system is updated atomically (although, services are not reloaded atomically), the infrastructure is not. It could be the case that the server is updated first, exposing root login to the internet, for as long as the router is not updated, hence the name "sniping". I think this is a serious threat, despite the silly example, as the attacker only needs to be there at the right time, with no specific knowledge or technique. In the example, any bot would soon discover the root login and maybe take automated actions to retain access. However, it is also an inherent security issue to this type of tools (and you could also very well mess up manually), so it's not clear to me what to do. Possible mitigations rely on user's awareness of the potential issue. In the previous example, we would need to update the router first, and only update the server once the router is updated. For a roll-back (resetting the firewall and removing ssh access), the other order is required. In other IaC tools, there is at least a way to describe dependencies between systems/services. I think we should at least implement such a feature in Guix too. As a rule of thumb, when you update multiple systems and one system provides security for another, you should update the security system before the protected system if you restrict access, and the other way around if you allow more access. Maybe we could add that to the manual, in addition to letting users configure upgrade order? In our paper, we were able to see that because Cloudformation has explicit "references" between systems. It's also more of an issue in Cloudformation, since you declare only small independent components and not whole systems (security resources are always separate from the resources they protect). There might be a way to improve guix language to force using references between systems, which would allow us to adopt a similar solution to what we propose in the paper. Or maybe it's time to advocate for "immutable infrastructure" :) Wdyt?