Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-10 Thread Claudio Bantaloukas via Gcc


On 09/04/2024 18:57, Andreas Schwab wrote:
> On Apr 09 2024, anderson.jonath...@gmail.com wrote:
> 
>> - This xz backdoor injection unpacked attacker-controlled files and ran them 
>> during `configure`. Newer build systems implement a build abstraction (aka 
>> DSL) that acts similar to a sandbox and enforces rules (e.g. the only code 
>> run during `meson setup` is from `meson.build` files and CMake). Generally 
>> speaking the only way to disobey those rules is via an "escape" command 
>> (e.g. `run_command()`) of which there are few. This reduces the task of 
>> auditing the build scripts for sandbox-breaking malicious intent 
>> significantly, only the "escapes" need investigation and they which 
>> should(tm) be rare for well-behaved projects.
> 
> Just like you can put your backdoor in *.m4 files, you can put them in
> *.cmake files.
>

Hi Andreas,
Indeed you're right and seeing the hijacked CMakeLists.txt in the commit 
was eye opening.

There is a not so subtle difference though. The amount of nasty that the 
attacker thought could get away with was different between the 
build-to-host.m4 and the CMakeLists.txt attack vectors.

For the CMakeLists.txt file, the wanted change was very small, adding a 
dot to a piece of c code so that the test that runs it goes into one of 
the perfectly acceptable states (cannot compile the c file), thus 
disabling a security feature.
This change was "hidden" in a patch containing a bunch of pointless 
renames and veiled in plausible deniability (oops, that dot went in the 
by mistake, silly me, here's a patch to fix the file). The attacker was 
lucky because noone really checked.

https://git.tukaani.org/?p=xz.git;a=commitdiff;h=328c52da8a2bbb81307644efdb58db2c422d9ba7;hp=eb8ad59e9bab32a8d655796afd39597ea6dcc64d

Compare that to what the m4 file did. Russ Cox has an interesting 
analysis https://research.swtch.com/xz-script

 From which I'll pick a choice quote: "makes a handful of inscrutable 
changes that don’t look terribly out of place".

I figured out the problem with the CMakeFile.txt quickly. I'm not 100% 
sure if the configure.ac is ok (after looking at it for 10 minutes, it 
looks ok, but I'm not sure!). I would not be able to recognise good code 
from bad in the m4 file.

Admittedly, I'm biased in favour of cmake's DSL, have more experience 
with it despite using ./configure since the mid 90s and have a 
preference. But it would be hard to argue against the fact that benign 
m4, as practiced in the wild today is hard to separate from malicious m4 
by a majority of developers, including experienced ones like Mr. Cox above.

Cheers,
Claudio Bantaloukas

Re: [RFC] Linux system call builtins

2024-04-10 Thread Szabolcs Nagy via Gcc
The 04/09/2024 23:59, Matheus Afonso Martins Moreira via Gcc wrote:
> > and using raw syscalls outside of the single runtime the
> > application is using is problematic (at least on linux).
> 
> Why do you say they are problematic on Linux though? Please elaborate.

because the portable c api layer and syscall abi layer
has a large enough gap that applications can break
libc internals by doing raw syscalls.

and it's not just the call convention that's target
specific (this makes the c syscall() function hard to
use on linux)

and linux evolves fast enough that raw syscalls have
to be adjusted over time (to support new features)
which is harder when they are all over the place
instead of in the libc only.

> 
> The ABI being stable should mean that I can for example
> strace a program, analyze the system calls and implement
> a new version of it that performs the same functions.

you could do that with syscall() but it is not very
useful as the state of the system is not the same
when you rerun a process so syscalls would likely
fail or do different things than in the first run.

> > clone cannot even be used from c code in general
> > as CLONE_VM is not compatible with c semantics
> > without a new stack (child clobbers the parent stack)
> > so the c builtin would not always work
> > it is also a syscall that only freestanding
> > application can use not something that calls
> > into the libc
> 
> There are major projects out there which do use it regardless.

that does not make it right.

> For example, systemd:
> 
> https://github.com/systemd/systemd/blob/main/src/basic/raw-clone.h
> https://github.com/systemd/systemd/blob/main/src/shared/async.h
> https://github.com/systemd/systemd/blob/main/src/shared/async.c
> https://github.com/systemd/systemd/blob/main/docs/CODING_STYLE.md
> 
> > even in a freestanding application it is tricky to use right
> 
> No argument from me there. It is tricky...
> The compiler should make it possible though.
> 
> > so i don't see why clone is the quintessential example.
> 
> I think it is the best example because attempting to use clone
> is not actually supported by glibc.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=10311
> 
> "If you use clone() you're on your own."

should be

"if you use clone() *or* raw clone syscall then
 you're on your own"

which is roughly what i said in that discussion.

so your proposal does not fix this particular issue,
just provide a simpler footgun.

> > i guess it's ok if it is by default an error
> > outside of -ffreestanding.
> 
> Hosted C programs could also make good use of them.

they should not.

> They could certainly start out exclusive to freestanding C
> and then made available to general code if there's demand.


Re: [RFC] Linux system call builtins

2024-04-10 Thread Paul Koning via Gcc



> On Apr 9, 2024, at 9:48 PM, Matheus Afonso Martins Moreira via Gcc 
>  wrote:
> 
> ...
> MIPS calling conventions work like this:
> 
>> mips/n32,64 a0 a1 a2 a3 a4 a5
>> mips/o32a0 a1 a2 a3 ...
>> mips/o32args5-8 are passed on the stack

Yes, for regular function calls, but at least in the case of NetBSD, not for 
syscalls.  They have a somewhat odd calling convention that doesn't match any 
of the normal function call ABIs, though it's similar.

paul



[RFC] Linux system call builtins

2024-04-10 Thread Matheus Afonso Martins Moreira via Gcc
> because the portable c api layer and syscall abi layer
> has a large enough gap that applications can break
> libc internals by doing raw syscalls.

I think that problem cannot really be fixed.
System call users just have to be aware of it.

It's true that using certain system calls can clobber libc state.
However, it should be up to the programmer to decide whether or not
that's acceptable. The compiler should empower them regardless.

> and it's not just the call convention that's target
> specific (this makes the c syscall() function hard to
> use on linux)

Yes. On Linux, the ABIs are stable and the set of system calls
is only ever added to. However, this property does not hold
across different architectures. Some targets have several numbered
variants of a system call while others only have the latest version.
It's true that this is a source of complexity for system call users.

> and linux evolves fast enough that raw syscalls have
> to be adjusted over time (to support new features)
> which is harder when they are all over the place
> instead of in the libc only.

When those adjustments are done, they avoid breaking existing programs.
New versions of the system calls are created, old ones are maintained.
Existing binaries continue to work. Even statically linked binaries.
It's true that it's harder to update binaries all at once
when they are statically linked but Linux still supports them.

> that does not make it right.

I don't agree with that. I don't think the libc should be required
or that the system calls should be a privilege of the libc.
The libc should be entirely optional.

In the issue I linked you also noticed that the Linux manuals
frequently mix up the kernel perspective with that of the libc,
and that of glibc in particular.

> unfortunately the linux manuals mix the system call
> (linux behaviour) and libc api (glibc behaviour)
> in the same man page in general, and mainly focus
> on the linux behaviour, not on the c api semantics.

That's also something I intend to work on.

> so your proposal does not fix this particular issue,
> just provide a simpler footgun.

This issue cannot really be fixed. I believe the only solution to that
would be to impose a libc on all of Linux user space. I think that goes
against the spirit of Linux as a kernel that's completely independent
of its userspace.

The footgun already exists in inline assembly form regardless.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-10 Thread Frank Ch. Eigler via Gcc
Hi -

> In Autotools, `make dist` produces a tarball that contains many
> files not present in the source respoitory, it includes build system
> core files and this fact was used for the xz attack. In contrast,
> for newer build systems the "release tarball" is purely a snapshot
> of the source repository: there is no `cmake dist`, and `meson dist`
> is essentially `git archive` [...]

For what it's worth, not every auto* using project uses "make dist" to
build their release tarballs.  If they can get over the matter of
including auto*-generated scripts being located in the source repo,
then indeed a "git archive" is sufficient.  Several of the projects I
work on do just this.  (As a bonus, that makes the git repos immediately
buildable by developers, without need to re-auto* anything.)

- FChE


[RFC] Linux system call builtins

2024-04-10 Thread Matheus Afonso Martins Moreira via Gcc
> Yes, for regular function calls,
> but at least in the case of NetBSD,
> not for syscalls.

Those are the registers Linux uses for system calls on MIPS.
They are documented as such here:

https://www.man7.org/linux/man-pages/man2/syscall.2.html

> The second table shows the registers used
> to pass the system call arguments.
>
> ...
>
> mips/o32   a0a1a2a3
> mips/n32,64a0a1a2a3a4a5
>
> ...

So they match the normal function calling convention? That's neat.
I don't have much experience with MIPS so I didn't recognize it.
I'm not sure how NetBSD does system calls but I know the ABI
is not considered stable.


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-10 Thread Alejandro Colomar via Gcc
Hi Joel,

On Wed, Apr 03, 2024 at 08:53:21AM -0500, Joel Sherrill wrote:
> On Wed, Apr 3, 2024, 3:09 AM Florian Weimer via Gdb 
> wrote:
> 
> > * Guinevere Larsen via Overseers:
> >
> > > Beyond that, we (GDB) are already experimenting with approved-by, and
> > > I think glibc was doing the same.
> >
> > The glibc project uses Reviewed-by:, but it's completely unrelated to
> > this.  Everyone still pushes their own patches, and there are no
> > technical countermeasures in place to ensure that the pushed version is
> > the reviewed version.
> >
> 
> Or that there isn't "collusion" between a malicious author and reviewer.
> Just tagging it approved or reviewed by just gives you two people to blame.
> It is not a perfect solution either.

If those tags are given in a mailing list _and_ mails to the mailing
list are PGP-signed, then you can verify that the tags were valid, and
not just invented.

And with signed commits you have a guarantee that one doesn't overwrite
history attributing commits to other committers (it could happen with
authors, and indeed it seems to have happened in this case, but if
patches are sent via signed mail, then it's also verifyiable).

In the email side, there are a few things to improve:

For sending signed emails, there's patatt(5) (used by b4(5)), but it
might not be comfortable to use for everyone.  For those preferring
normal MUAs, neomutt(1) is an alternative:



And I have a few patches for improving the security of protected
messages:




And the corresponding security-vulnerability reports:







(I find it funny that I didn't know about this xz issue until yesterday,
 so not when I reported those issues, but they are interestingly
 related.)

It would also be interesting to require showing range-diffs between
patch revisions.  They make it much more difficult to introduce a
vulnerability after a reviewer has turned its mins into approving the
patch.  Of course, the patch could go in if the submitter lies in the
range-diff and the vuln is undetected, but then it can be verified a
posteriory to prove that there was a lie.

I recently started applying all of these (signing all email, including
patches, sign all commits and tags, and provide range-diffs), and it's
not too uncomfrotable; I'd say it's even more comfortable than not doing
it, as it allows me to easily roll back a patch to an older revision if
I find I introduced a mistake, and find where I introduced it.


Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-10 Thread Frank Ch. Eigler via Gcc
Hi -

>This is very true, however a few words of caution: IME this is a
>maintainability nightmare. Fixing patches that forgot to regenerate,
>regenerating on rebase, confirming everything is up-to-date before
>merge, etc etc. It can be handled, I have, but it was painful and
>time-consuming.The hardest part was ensuring everyone was actually
>running the "right" version of Auto* [...]

One way to make the nightmare into a light hassle is to let developers
commit auto* hand-written inputs with or without Complete Properly
refreshed generated bits, and let a maintainer or bot (but I repeat
myself) periodically regenerate the derived auto* content.

- FChE


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-10 Thread James K. Lowden
On Mon, 1 Apr 2024 17:06:17 +0200
Mark Wielaard  wrote:

> We should discuss what we have been doing and should do more to
> mitigate and prevent the next xz-backdoor. 

Since we're working on a compiler, "On Trusting Trust" comes to mind.
Russ Cox posted some thoughts last year that might be applicable.  

https://research.swtch.com/nih

On a different tack, ISTM it might also be possible to use quantitative
methods.  AIUI the xz attack was discovered while investigating
exorbitant power consumption.  Could the compiler's power consumption
be measured over some baseline, perhaps on a line-by-line basis?  If
so, each new commit could be power-measured, and deemed acceptable if
it's within some threshold, perhaps 10%.  That's a guess; over time
we'd learn how much variation to expect.  

As a public organization, any would-be attacker would obviously know
what we're doing, and would know to keep his attack under the
threshhold. That makes his job harder, which would have the effect of
encouraging him to look elsewhere. 

--jkl