Re: using explicit_bzero

Andres Freund Thu, 01 Aug 2019 10:07:02 -0700

Hi,

On 2019-08-01 20:08:15 +1200, Thomas Munro wrote:
> On Tue, Jul 30, 2019 at 5:58 PM Andres Freund <and...@anarazel.de> wrote:
> > > +#include "c.h"
> > > +static void (* volatile bzero_p)(void *, size_t) = bzero2;
> >
> > Hm, I'm not really sure that this does that much. Especially when the
> > call is via a function in the same translation unit.
>
> Yeah, I wondered the same (when reading the OpenSSH version).  You'd
> think you'd need a non-static global so it has to assume that it could
> change.


The implementations in other projects I saw did the above trick, but
also marked the symbol as weak. Telling the compiler it can't know what
version will be used at runtime. But that adds a bunch of compiler
dependencies too.


> > > +void
> > > +explicit_bzero(void *buf, size_t len)
> > > +{
> > > +     bzero_p(buf, len);
> >
> > I've not followed this discussion. But why isn't the obvious
> > implementation here memset(...); pg_compiler_barrier()?
> >
> > A quick web search indicates that that's what a bunch of projects in the
> > cryptography space also ended up with (well, __asm__ __volatile__("" :::
> > "memory"), which is what pg_compiler_barrier boils down to for
> > gcc/clang/compatibles).
>
> At a glance, I think 3.4.3 of this 2017 paper says that might not work
> on Clang and those other people might have a bug:
>
> https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-yang.pdf

https://bugs.llvm.org/show_bug.cgi?id=15495

We could just combine it with volatile out of paranoia anyway. But I'm
also more than bit doubtful about this bugreport. There's simply no
memory here. It's not that the memset is optimized away, it's that
there's no memory at all. It results in:

        .file   "test.c"
        .globl  foo                     # -- Begin function foo
        .p2align        4, 0x90
        .type   foo,@function
foo:                                    # @foo
        .cfi_startproc
# %bb.0:
        #APP
        #NO_APP
        retq

there's no secrets left over here.  If you actuall force the memory be
filled, even if it's afterwards dead, you do get the memory
cleaned. E.g.

#include <string.h>

static void mybzero(char *buf, int len) {
  memset(buf,0,len);
  asm("" : : : "memory");
}

extern void grab_password(char *buf, int len);

int main(int argc, char **argv)
{
  char buf[512];

  grab_password(buf, sizeof(buf));

  mybzero(buf, sizeof(buf));

  return 0;
}

results in

main:                                   # @main
        .cfi_startproc
# %bb.0:
        pushq   %rbx
        .cfi_def_cfa_offset 16
        subq    $512, %rsp              # imm = 0x200
        .cfi_def_cfa_offset 528
        .cfi_offset %rbx, -16
        movq    %rsp, %rbx
        movq    %rbx, %rdi
        movl    $512, %esi              # imm = 0x200
        callq   grab_password
        movl    $512, %edx              # imm = 0x200
        movq    %rbx, %rdi
        xorl    %esi, %esi
        callq   memset
        #APP
        #NO_APP
        xorl    %eax, %eax
        addq    $512, %rsp              # imm = 0x200
        .cfi_def_cfa_offset 16
        popq    %rbx
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end0:
        .size   main, .Lfunc_end0-main
        .cfi_endproc
                                        # -- End function

Although - and that is not surprising - if you lie and mark
grab_password as being pure (__attribute__((pure)), which signals the
function has no sideeffects except its return value), it'll optimize the
whole memory away again. But no secrets leaked again:

main:                                   # @main
        .cfi_startproc
# %bb.0:
        #APP
        #NO_APP
        xorl    %eax, %eax
        retq
.Lfunc_end0:
        .size   main, .Lfunc_end0-main
        .cfi_endproc


Out of paranoia we could go add add the additional step and have a
barrier variant that's variable specific, and make the __asm__
__volatile__ als take the input as "r"(buf), which'd prevent even this
issue (because now the memory is actually understood as being used).

Which turns out to be e.g. what google did for boringssl...

https://boringssl.googlesource.com/boringssl/+/ad1907fe73334d6c696c8539646c21b11178f20f%5E!/#F0

Greetings,

Andres Freund

Re: using explicit_bzero

Reply via email to