Buonasera,

segnalo un articolo un po' tecnico che illustra cosa significa
identificare univocamente i "software artifact", specialmente quando in
forma binaria, che è la forma particolarmente problematica del
software... da sempre :-)

«Identifying software»
Ludovic Courtès, Maxim Cournoyer,
Jan Nieuwenhuizen, Simon Tournier — March 4, 2024
https://guix.gnu.org/en/blog/2024/identifying-software/

--8<---------------cut here---------------start------------->8---

[...]

1 On Software Identification
════════════════════════════

  The /Software Identification Ecosystem Option Analysis/ white paper
  released by CISA in October 2023 studies options towards the
  definition of /a software identification ecosystem that can be used
  across the complete, global software space for all key cybersecurity
  use cases/.

  Our experience lies in the design and development of [GNU Guix], a
  package manager, software deployment tool, and GNU/Linux distribution,
  which emphasizes three key elements: *reproducibility, provenance
  tracking, and auditability*. We explain in the following sections our
  approach and how it relates to the goal stated in the aforementioned
  white paper.

  Guix produces binary artifacts of varying complexity from source code:
  package binaries, application bundles (container images to be consumed
  by Docker and related tools), system installations, system bundles
  (container and virtual machine images).

  All these artifacts qualify as “software” and so does source
  code. Some of this “software” comes from well-identified upstream
  packages, sometimes with modifications added downstream by packagers
  (patches); binary artifacts themselves are the byproduct of a build
  process where the package manager uses /other/ binary artifacts it
  previously built (compilers, libraries, etc.) along with more source
  code (the package definition) to build them. How can one identify
  “software” in that sense?

  Software is dual: it exists in /source/ form and in /binary/,
  machine-executable form. The latter is the outcome of a complex
  computational process taking source code and intermediary binaries as
  input.

  Our thesis can be summarized as follows:

        *We consider that the requirements for source code
         identifiers differ from the requirements to identify
         binary artifacts.*

        Our view, embodied in GNU Guix, is that:

        1. *Source code* can be identified in an unambiguous and
            distributed fashion through /inherent identifiers/
            such as cryptographic hashes.

        2. *Binary artifacts*, instead, need to be the byproduct
            of a /comprehensive and verifiable build process
            itself available as source code/.

  In the next sections, to clarify the context of this statement, we
  show how Guix identifies source code, how it defines the
  /source-to-binary/ path and ensures its verifiability, and how it
  provides provenance tracking.

[GNU Guix] <https://guix.gnu.org>

[...]

  As with Nix, build processes are identified by /derivations/, which
  are low-level, content-addressed build instructions; derivations may
  refer to other derivations and to source code. For instance,
  `/gnu/store/c9fqrmabz5nrm2arqqg4ha8jzmv0kc2f-gcc-11.3.0.drv' uniquely
  identifies the derivation to build a specific variant of version
  11.3.0 of the GNU Compiler Collection (GCC). Changing the package
  definition—patches being applied, build flags, set of dependencies—,
  or similarly changing one of the packages it depends on, leads to a
  different derivation (more information can be found in [Eelco
  Dolstra's PhD thesis]).

  Derivations form a graph that *captures the entirety of the build
  processes leading to a binary artifact*. In contrast, mere package
  name/version pairs such as `gcc 11.3.0' fail to capture the breadth
  and depth elements that lead to a binary artifact. This is a
  shortcoming of systems such as the *Common Platform Enumeration* (CPE)
  standard: it fails to express whether a vulnerability that applies to
  `gcc 11.3.0' applies to it regardless of how it was built, patched,
  and configured, or whether certain conditions are required.


[reproducible builds] <https://reproducible-builds.org>

[Nix package manager] <https://nixos.org>

[Eelco Dolstra's PhD thesis]
<https://edolstra.github.io/pubs/phd-thesis.pdf>

[...]

5 Provenance Tracking
═════════════════════

  We define provenance tracking as the ability *to map a binary artifact
  back to its complete corresponding source*. Provenance tracking is
  necessary to allow the recipient of a binary artifact to access the
  corresponding source code and to verify the source/binary
  correspondence if they wish to do so.

[...]

  In other words, because Guix itself defines how artifacts are built,
  *the revision of the Guix source coupled with the package name
  unambiguously identify the package's binary artifact*. As scientists,
  we build on this property to achieve reproducible research workflows,
  as explained in this [2022 article in /Nature Scientific Data/]; as
  engineers, we value this property to analyze the systems we are
  running and determine which known vulnerabilities and bugs apply.

  Again, a software bill of materials (SBOM) written as a mere list of
  package name/version pairs would fail to capture as much
  information. The *Artifact Dependency Graph (ADG) of [OmniBOR]*, while
  less ambiguous, falls short in two ways: it is too fine-grained for
  typical cybersecurity applications (at the level of individual source
  files), and it only captures the alleged source/binary correspondence
  of individual files but not the process to go from source to binary.


[`guix pack']
<https://guix.gnu.org/manual/en/html_node/Invoking-guix-pack.html>

[the `time-machine' command]
<https://guix.gnu.org/manual/en/html_node/Invoking-guix-time_002dmachine.html>

[2022 article in /Nature Scientific Data/]
<https://doi.org/10.1038/s41597-022-01720-9>

[OmniBOR] <https://omnibor.io/>

[...]

--8<---------------cut here---------------end--------------->8---


-- 
380° (Giovanni Biscuolo public alter ego)

«Noi, incompetenti come siamo,
 non abbiamo alcun titolo per suggerire alcunché»

Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about <https://stallmansupport.org>.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
nexa mailing list
nexa@server-nexa.polito.it
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa

Reply via email to