Hi, On 2019-08-01 20:08:15 +1200, Thomas Munro wrote: > On Tue, Jul 30, 2019 at 5:58 PM Andres Freund <and...@anarazel.de> wrote: > > > +#include "c.h" > > > +static void (* volatile bzero_p)(void *, size_t) = bzero2; > > > > Hm, I'm not really sure that this does that much. Especially when the > > call is via a function in the same translation unit. > > Yeah, I wondered the same (when reading the OpenSSH version). You'd > think you'd need a non-static global so it has to assume that it could > change.
The implementations in other projects I saw did the above trick, but also marked the symbol as weak. Telling the compiler it can't know what version will be used at runtime. But that adds a bunch of compiler dependencies too. > > > +void > > > +explicit_bzero(void *buf, size_t len) > > > +{ > > > + bzero_p(buf, len); > > > > I've not followed this discussion. But why isn't the obvious > > implementation here memset(...); pg_compiler_barrier()? > > > > A quick web search indicates that that's what a bunch of projects in the > > cryptography space also ended up with (well, __asm__ __volatile__("" ::: > > "memory"), which is what pg_compiler_barrier boils down to for > > gcc/clang/compatibles). > > At a glance, I think 3.4.3 of this 2017 paper says that might not work > on Clang and those other people might have a bug: > > https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-yang.pdf https://bugs.llvm.org/show_bug.cgi?id=15495 We could just combine it with volatile out of paranoia anyway. But I'm also more than bit doubtful about this bugreport. There's simply no memory here. It's not that the memset is optimized away, it's that there's no memory at all. It results in: .file "test.c" .globl foo # -- Begin function foo .p2align 4, 0x90 .type foo,@function foo: # @foo .cfi_startproc # %bb.0: #APP #NO_APP retq there's no secrets left over here. If you actuall force the memory be filled, even if it's afterwards dead, you do get the memory cleaned. E.g. #include <string.h> static void mybzero(char *buf, int len) { memset(buf,0,len); asm("" : : : "memory"); } extern void grab_password(char *buf, int len); int main(int argc, char **argv) { char buf[512]; grab_password(buf, sizeof(buf)); mybzero(buf, sizeof(buf)); return 0; } results in main: # @main .cfi_startproc # %bb.0: pushq %rbx .cfi_def_cfa_offset 16 subq $512, %rsp # imm = 0x200 .cfi_def_cfa_offset 528 .cfi_offset %rbx, -16 movq %rsp, %rbx movq %rbx, %rdi movl $512, %esi # imm = 0x200 callq grab_password movl $512, %edx # imm = 0x200 movq %rbx, %rdi xorl %esi, %esi callq memset #APP #NO_APP xorl %eax, %eax addq $512, %rsp # imm = 0x200 .cfi_def_cfa_offset 16 popq %rbx .cfi_def_cfa_offset 8 retq .Lfunc_end0: .size main, .Lfunc_end0-main .cfi_endproc # -- End function Although - and that is not surprising - if you lie and mark grab_password as being pure (__attribute__((pure)), which signals the function has no sideeffects except its return value), it'll optimize the whole memory away again. But no secrets leaked again: main: # @main .cfi_startproc # %bb.0: #APP #NO_APP xorl %eax, %eax retq .Lfunc_end0: .size main, .Lfunc_end0-main .cfi_endproc Out of paranoia we could go add add the additional step and have a barrier variant that's variable specific, and make the __asm__ __volatile__ als take the input as "r"(buf), which'd prevent even this issue (because now the memory is actually understood as being used). Which turns out to be e.g. what google did for boringssl... https://boringssl.googlesource.com/boringssl/+/ad1907fe73334d6c696c8539646c21b11178f20f%5E!/#F0 Greetings, Andres Freund