Buonasera, segnalo un articolo un po' tecnico che illustra cosa significa identificare univocamente i "software artifact", specialmente quando in forma binaria, che è la forma particolarmente problematica del software... da sempre :-)
«Identifying software» Ludovic Courtès, Maxim Cournoyer, Jan Nieuwenhuizen, Simon Tournier — March 4, 2024 https://guix.gnu.org/en/blog/2024/identifying-software/ --8<---------------cut here---------------start------------->8--- [...] 1 On Software Identification ════════════════════════════ The /Software Identification Ecosystem Option Analysis/ white paper released by CISA in October 2023 studies options towards the definition of /a software identification ecosystem that can be used across the complete, global software space for all key cybersecurity use cases/. Our experience lies in the design and development of [GNU Guix], a package manager, software deployment tool, and GNU/Linux distribution, which emphasizes three key elements: *reproducibility, provenance tracking, and auditability*. We explain in the following sections our approach and how it relates to the goal stated in the aforementioned white paper. Guix produces binary artifacts of varying complexity from source code: package binaries, application bundles (container images to be consumed by Docker and related tools), system installations, system bundles (container and virtual machine images). All these artifacts qualify as “software” and so does source code. Some of this “software” comes from well-identified upstream packages, sometimes with modifications added downstream by packagers (patches); binary artifacts themselves are the byproduct of a build process where the package manager uses /other/ binary artifacts it previously built (compilers, libraries, etc.) along with more source code (the package definition) to build them. How can one identify “software” in that sense? Software is dual: it exists in /source/ form and in /binary/, machine-executable form. The latter is the outcome of a complex computational process taking source code and intermediary binaries as input. Our thesis can be summarized as follows: *We consider that the requirements for source code identifiers differ from the requirements to identify binary artifacts.* Our view, embodied in GNU Guix, is that: 1. *Source code* can be identified in an unambiguous and distributed fashion through /inherent identifiers/ such as cryptographic hashes. 2. *Binary artifacts*, instead, need to be the byproduct of a /comprehensive and verifiable build process itself available as source code/. In the next sections, to clarify the context of this statement, we show how Guix identifies source code, how it defines the /source-to-binary/ path and ensures its verifiability, and how it provides provenance tracking. [GNU Guix] <https://guix.gnu.org> [...] As with Nix, build processes are identified by /derivations/, which are low-level, content-addressed build instructions; derivations may refer to other derivations and to source code. For instance, `/gnu/store/c9fqrmabz5nrm2arqqg4ha8jzmv0kc2f-gcc-11.3.0.drv' uniquely identifies the derivation to build a specific variant of version 11.3.0 of the GNU Compiler Collection (GCC). Changing the package definition—patches being applied, build flags, set of dependencies—, or similarly changing one of the packages it depends on, leads to a different derivation (more information can be found in [Eelco Dolstra's PhD thesis]). Derivations form a graph that *captures the entirety of the build processes leading to a binary artifact*. In contrast, mere package name/version pairs such as `gcc 11.3.0' fail to capture the breadth and depth elements that lead to a binary artifact. This is a shortcoming of systems such as the *Common Platform Enumeration* (CPE) standard: it fails to express whether a vulnerability that applies to `gcc 11.3.0' applies to it regardless of how it was built, patched, and configured, or whether certain conditions are required. [reproducible builds] <https://reproducible-builds.org> [Nix package manager] <https://nixos.org> [Eelco Dolstra's PhD thesis] <https://edolstra.github.io/pubs/phd-thesis.pdf> [...] 5 Provenance Tracking ═════════════════════ We define provenance tracking as the ability *to map a binary artifact back to its complete corresponding source*. Provenance tracking is necessary to allow the recipient of a binary artifact to access the corresponding source code and to verify the source/binary correspondence if they wish to do so. [...] In other words, because Guix itself defines how artifacts are built, *the revision of the Guix source coupled with the package name unambiguously identify the package's binary artifact*. As scientists, we build on this property to achieve reproducible research workflows, as explained in this [2022 article in /Nature Scientific Data/]; as engineers, we value this property to analyze the systems we are running and determine which known vulnerabilities and bugs apply. Again, a software bill of materials (SBOM) written as a mere list of package name/version pairs would fail to capture as much information. The *Artifact Dependency Graph (ADG) of [OmniBOR]*, while less ambiguous, falls short in two ways: it is too fine-grained for typical cybersecurity applications (at the level of individual source files), and it only captures the alleged source/binary correspondence of individual files but not the process to go from source to binary. [`guix pack'] <https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html> [the `time-machine' command] <https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html> [2022 article in /Nature Scientific Data/] <https://doi.org/10.1038/s41597-022-01720-9> [OmniBOR] <https://omnibor.io/> [...] --8<---------------cut here---------------end--------------->8--- -- 380° (Giovanni Biscuolo public alter ego) «Noi, incompetenti come siamo, non abbiamo alcun titolo per suggerire alcunché» Disinformation flourishes because many people care deeply about injustice but very few check the facts. Ask me about <https://stallmansupport.org>.
signature.asc
Description: PGP signature
_______________________________________________ nexa mailing list nexa@server-nexa.polito.it https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa