I am truly sorry for taking this long to reply.

Overall, this is amazing work.  Big +1 from me.  I have just a few
editorial suggestions — I'm noting them here for completeness, I'll
apply them myself in a minute.


On Sat, 2022-05-28 at 19:17 +0000, Sheng Yu wrote:
> From ee52f60557d72d6274610d461eec1d28453a464f Mon Sep 17 00:00:00 2001
> From: Sheng Yu <syu...@protonmail.com>
> Date: Sat, 28 May 2022 15:06:46 -0400
> Subject: [PATCH] GLEP 78 draft update
> 
> Signed-off-by: Sheng Yu <syu...@protonmail.com>
> ---
>  glep-0078.rst | 114 ++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 96 insertions(+), 18 deletions(-)
> 
> diff --git a/glep-0078.rst b/glep-0078.rst
> index 1f7cd9b..82c74c8 100644
> --- a/glep-0078.rst
> +++ b/glep-0078.rst
> @@ -2,12 +2,13 @@
>  GLEP: 78
>  Title: Gentoo binary package container format
>  Author: Michał Górny <mgo...@gentoo.org>
> +        Sheng Yu <syu...@protonmail.com>
>  Type: Standards Track
>  Status: Draft
>  Version: 1
>  Created: 2018-11-15
> -Last-Modified: 2019-07-29
> -Post-History: 2018-11-17, 2019-07-08
> +Last-Modified: 2021-10-10
> +Post-History: 2018-11-17, 2019-07-08, 2021-09-13, 2021-09-22, 2022-05-28
>  Content-Type: text/x-rst
>  ---
>  
> @@ -154,10 +155,15 @@ The following obligatory goals have been set for a 
> replacement format:
>     enough to let user inspect and manipulate it without special tooling
>     or detailed knowledge.
>  
> -3. **The file format must provide support for OpenPGP signatures.**
> +3. **The file format must be able to detect its own data corruption.**
> +   In particular, it needs to contain the checksum of its own data for
> +   package manager to be able to verify its integrity without relying
> +   on additional files.
> +
> +4. **The file format must provide support for OpenPGP signatures.**
>     Preferably, it should use standard OpenPGP message formats.
>  
> -4. **The file format must allow for efficient metadata updates.**
> +5. **The file format must allow for efficient metadata updates.**
>     In particular, it should be possible to update the metadata without
>     having to recompress package files.
>  
> @@ -186,35 +192,39 @@ The container format
>  The gpkg package container is an uncompressed .tar achive whose filename
>  should use ``.gpkg.tar`` suffix.
>  
> -The archive contains a number of files, stored in a single directory
> -whose name should match the basename of the package file.  However,
> -the implementation must be able to process an archive where
> -the directory name is mismatched.  There should be no explicit archive
> -member entry for the directory.
> +The archive contains a number of files.  All package-related files
> +should be stored in a single directory whose name matches the basename
> +of the package file.  However, the implementation must be able to
> +process an archive where the directory name is mismatched.  There should
> +be no explicit archive member entry for the directory.
>  
>  The package directory contains the following members, in order:
>  
>  1. The package format identifier file ``gpkg-1`` (required).
>  
> -2. A signature for the metadata archive: ``metadata.tar${comp}.sig``
> +2. The metadata archive ``metadata.tar${comp}``, optionally compressed
> +   (required).
> +
> +3. A signature for the metadata archive: ``metadata.tar${comp}.sig``
>     (optional).
>  
> -3. The metadata archive ``metadata.tar${comp}``, optionally compressed
> -   (required).
> +4. The filesystem image archive ``image.tar${comp}``, optionally
> +   compressed (required).
>  
> -4. A signature for the filesystem image archive:
> +5. A signature for the filesystem image archive:
>     ``image.tar${comp}.sig`` (optional).
>  
> -5. The filesystem image archive ``image.tar${comp}``, optionally
> -   compressed (required).
> +6. The package Manifest data file ``Manifest``, optionally clear-text
> +   signed (required)

Editorial: full stop is missing here.

>  
>  It is recommended that relative order of the archive members is
>  preserved.  However, implementations must support archives with members
>  out of order.
>  
>  The container may be extended with additional members in the future.
> -The implementations should ignore unrecognized members and preserve
> -them across package updates.
> +If the Manifest is present, all files contained in the archive must
> +be listed in it and verify successfully.  The package manager should
> +ignore unknown files but preserve them across package updates.
>  
>  
>  Permitted .tar format features
> @@ -301,10 +311,29 @@ suffixed using the standard suffix for the particular 
> compressed file
>  type (e.g. ``.bz2`` for bzip2 format).
>  
>  
> +The package Manifest file
> +-------------------------
> +
> +The Manifest file must include digests of all files in the binary
> +package container, except for itself.  The purpose of this file is
> +to provide the package manager with an ability to detect corruption
> +or alteration of the binary package before attempting to read the
> +inner archive contents.  This file also provides protection against
> +signature reuse/replacement attacks if the OpenPGP signatures are used.
> +
> +The implementation follows the Manifest specifications in GLEP 74
> +[#GLEP74]_ and uses the DATA tag for files within the container.
> +
> +The implementation should be able to detect checksum mismatches,
> +as well as missing, duplicate, or extraneous files within the

Editorial: don't leave 'the' at the end of the line.

> +container.  In the case of verification failure, no subsequent
> +operations on the archive should be performed.
> +
> +
>  OpenPGP member signatures
>  -------------------------
>  
> -The archive members support optional OpenPGP signatures.
> +The archive members and Manifest support optional OpenPGP signatures.
>  The implementations must allow the user to specify whether OpenPGP
>  signatures are to be expected in remotely fetched packages.
>  
> @@ -490,6 +519,38 @@ Debian has a similar guideline for the inner tar of 
> their package
>  format  [#DEB-FORMAT]_.
>  
>  
> +.tar security issues
> +--------------------
> +
> +Some of the original features of .tar are obsolete with the modern
> +usage.
> +
> +Firstly, .tar permits duplicate files to exist [#TARDUP]_.  The

Same.

> +later duplicate files overwrite the previously extracted files when
> +extracting all files in order.  This is useful for incremental
> +backups.  However, a general-purpose archiving tools may choose
> +arbitrary files matching a path name, leading to checksum or
> +signature bypass.  To prevent this, duplicate files are forbidden
> +from existing.
> +
> +Secondly, .tar lacks integrity checks, except for the header
> +self-check.  Data corruption can usually be detected through
> +integrity checks in the additional compression layer.  However,
> +this does not provide a way of verifying the integrity of the

Here too.

> +compressed data in advance.  For this reason, an additional
> +Manifest file is included that provides checksums for other
> +files in the archive.  A corrupted Manifest invalidates the whole
> +package.
> +
> +Thirdly, many .tar implementations have various security problems,
> +including the Python tarfile module [#ISSUE21109]_.  They provide
> +multiple attack vectors, e.g. permitting overwriting files outside the
> +destination directory using special filenames, symlinks, hard links or

Here 'the' and 'or'.

> +device files.  For this purpose, only regular files are permitted inside
> +the container.  It is recommended to process the container data in place
> +rather than extracting it.
> +
> +
>  Member ordering
>  ---------------
>  
> @@ -511,6 +572,14 @@ them.  Covering the compressed archives helps to prevent 
> zipbomb
>  attacks.  Covering the individual members rather than the whole package
>  provides for verification of partially fetched binary packages.
>  
> +However, signing individual files does not guarantee that all members
> +are originating from the same binary package.  This opens up the

Here too.

> +possibility of a replacement/reuse attack, e.g. combining the signed
> +metadata from foo-1.1 with signed image from foo-1.0.  The new binary
> +package passes the signature check.  To prevent this type of attack,
> +we need the additional Menifest file and its signature to verify the

...and here.

> +authenticity of the complete binary package.
> +
>  
>  Format versioning
>  -----------------
> @@ -564,10 +633,19 @@ References
>  .. [#TAR-PORTABILITY] Michał Górny, Portability of tar features
>     (https://dev.gentoo.org/~mgorny/articles/portability-of-tar-features.html)
>  
> +.. [#GLEP74] GLEP 74: Full-tree verification using Manifest files
> +   (https://www.gentoo.org/glep/glep-0074.html)
> +
>  .. [#XPAK2GPKG] xpak2gpkg: Proof-of-concept converter from tbz2/xpak
>     to gpkg binpkg format
>     (https://github.com/mgorny/xpak2gpkg)
>  
> +.. [#TARDUP] tar: Multiple Members with the Same Name
> +   (https://www.gnu.org/software/tar/manual/html_node/multiple.html)
> +
> +.. [#ISSUE21109] Python tarfile: Traversal attack vulnerability
> +   (https://bugs.python.org/issue21109)
> +
>  
>  Copyright
>  =========
> -- 
> 2.35.1

-- 
Best regards,
Michał Górny


Reply via email to