Guillem Jover:
Hi!
On Thu, 2024-02-22 at 12:10:06 +0100, Niels Thykier wrote:
Package: debian-policy
Severity: wishlist
While the parser is technically correct given the Policy description above,
I cannot see any case where this is a desirable outcome. Instead, I think we
should classify this as a syntax error such that users are instructed to
indent `${shlib:Depends}` token.
Currently both the dpkg C and Perl parsers are extremely lax (even more
than what the deb822(5) manual page or the Debian Policy permit! No matter
the outcome of this discussion, I'm going to make both these parsers at
least as strict as the current dpkg documentation.
[...]
Perhaps the allowed syntax should even be defined by context, so that
the initial underscores are only allowed for debconf template files.
(This would be made strict for the Perl parsers for the build tools,
strict for the C parser in charge of building packages or installing
them anew, and lax with warnings for existing installations to avoid
bricking potential current users with such field names.)
I would like to avoid "context" based rules here. Most "deb822" parsers
are not context aware other than "maybe" allowing GPG signatures. I
think only `dpkg` and `lintian` supports it. For lintian, I think it is
limited to `debconf` templates (which just have lintian accept
"space-only continuation lines" in addition to " ."). Many parsers are
just ad-hoc regex based parsers.
For me, adding context awareness would complicate rather than simplify.
But see below.
A simple fix would be to forbid `$` at the start of the field. Though, I
think at this point it would make sense to prune more special characters
from the list like `%&{}[]()/\\<>|$` from anywhere in the field. Note that
we definitely need to keep `_` as it is used in debconf template files for
translations. Both single and double underscore is used here, though they
always come first in the field name.
The reason why I want a tighter bound is that currently, the following
things are also considered valid field names:
[...]
In all cases, I see no value to allowing these absurd field names and only
potential downsides.
I was pondering about this and while I'm in general a strong proponent
of rather strict parsing, to avoid garbage in garbage out, that
external users would otherwise have to then protect against and deal with.
In this case I could see how making the parsing overly strict could stymie
potential innovation, like when debconf started using «_» to mark some
fields in a special way. I mean we can always go around and carve out new
exceptions, but that requires way more effort, and then one needs to
justify those new uses, and convince others, which might be hard (once
the allowed format is very strict)! We also do not have visibility
over privately used field names. And there's the problem that allowing
new syntax might require one release cycle to be able to reintroduce
it, if the previous dpkg/apt need to be able to support it.
In my view, any innovation at that level should involve moving away from
deb822 as the source format and therefore, this is a non-issue for me.
If needed be, it can always be translated into deb822 for `dpkg`, `dak`,
etc. This is just to make it explicit that I am not assigning this part
of your argument as much weight as you might have hoped.
This is also why I focus on what we are using today rather than what
someone might think of using in the future.
Additionally, even if we had restricted `deb822` from starting fields
with `_`, we would not have prevented `debconf` translation feature.
They would have had plenty of options like a different prefix.
Perhaps, as you mention, a more lenient parser that just restricts field
start (from at least anything that can easily appear in a dependency field,
for example) would accomplish most of the benefit, w/o blocking potential
future directions? Say something like the following untested regex:
^[_]*[A-Za-z0-9]+[!-9;-~]+$
(Or perhaps if we are allowing bracket characters, then require them to
be balanced, but that might be too much effort, if we are never going to
make use of them.)
I am against anything that can be interpreted as a path (a basename is
unavoidable) or when unescaped in shell context will trigger some kind
of shell specific thing. For me, that rules out:
"`ls`" -> Runs `ls`
"(ls)" -> Runs `ls` in a subshell
"foo;ls" -> Can cut a command in half and run ls.
"{foo,bar}" -> Can split an argument in half (command line argument
injection)
The project has repeatedly showed that we throw deb822 field names
around like they are not going to cause issues. There will be a next
time if we do not stop this at its root. On the other hand, the project
has not shown any need for any of these in any setting (innovation or
otherwise).
I would additionally also restrict the use of `0x7F` (DEL) and I assume
the regex is `ASCII` restricted (I don't want to deal with
LRM/Left-to-Right marks, though I think policy also ASCII restricts
field names, so that might have been implicit in your reply).
Alternatively we could have hard and soft restriction, hard with
something like the above regex producing errors, and soft like the
first regex, which would only produce warnings (although that might
still be enough of a deterrent that one would still need to update
parsers anyway for new syntax).
For me, that is unnecessary complexity. I want simplicity, not
flexibility. Both as a deb822 consumer and as a (generic) deb822 parser
writer.
Note, I am not against parsers supporting this an opt-in (I would not in
the parser I am working on). So `dpkg-deb --nocheck` could switch to
your "soft restriction" if you wanted to. But I am against it defining
it in the specification for what *all* parsers should accept/support (in
the "SHOULD"/"MUST" sense. I would not be happy but I could accept a MAY).
But in any case, this latter regex would not allow for the exact same
extensibility that debconf used at the time to prefix the field with a
special character (although it very much preserves syntax freedom at
the end), which perhaps is an indicator, that we should simply restrict
it all, and if a new syntax is required we can always introduce it later
on?
New syntax that comes to mind could be stuff like:
Field-Name(something): value
!Field: value
Or things along these lines.
Both examples could have side-effects when used unescaped shell, which
has happened numerous times in the past and will happen again if prior
experience is an indicator.
The closest thing I could see us doing in this vein would have been
`Description[de]` or so. But then the project seems to have favored
`Description-de` instead. Now `Description[de]` could have had some nice
semantic properties in some cases, but I feel the ship has sailed on
which convention the project uses.
But someone will eventually try to do this. We already received a bug report
against debhelper long ago about being able to do path traversal via
questionable package names. Largely, this was considered a non-issue because
dpkg in its default configuration would abort and debhelper has a "run
arbitrary code via the upstream build system" feature anyway.
Do you have a reference on hand to that report? Because if dpkg fails
I'd assume that might be a side effect from something else, and it
might make sense to make sure this is handled explicitly.
Thanks,
Guillem
My use of `abort` here was a short hand for `stops the build` rather
than `SIGABRT` (in the latter case, I would have informed you). I think
it was `dpkg-deb` failing because it wanted `--nocheck` and the use-case
for the "attack" (I think it was via `equivs`) did not enable the
attacker to pass that.
It was reported privately as a security bug, so there is no public
record. It was also an "execute shell code" rather than path traversal
as I remembered. Nevertheless, that attack relied on package name (field
value) rather than field name. I was using it as an example that we take
for granted that files are well behaved.
For field names as path traversal, then I was probably thinking of
CVE-2009-4013 in lintian
(https://salsa.debian.org/lintian/lintian/-/commit/952b56c2b670a42bdb35608ee02e74593aa0d3d6).
Best regards,
Niels
PS: For the curious, the reported attack had been plugged in sid by
`debhelper` no longer using shell as often for simple operations by the
time it was reported. I still provided the following patches
a43b521162aa9d653d7ea36516839ef4484522c2..6718818559818fbd139fe79d393d9bceedea9595
to close the gap in case other helpers might still use shell and think
certain field names or values would be safe to pass