Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Segher Boessenkool Fri, 07 Aug 2020 16:00:23 -0700

Hi!

On Fri, Aug 07, 2020 at 11:06:38AM -0500, Qing Zhao wrote:
> > It would be nice if this described anywhere what the benefit of this is,
> > including actual hard numbers.  I only see it is very costly, and I see
> > no benefit whatsoever.
> 
> I will add the motivation of this patch clearly in the next patch update. 
> Here, for your reference, As I mentioned in other emails you might miss,


Well, the GCC ML archive doesn't cross month boundaries, so things are
hard to look up if I have deleted my own copy already :-(

> From my understanding (I am not a security expert though), this patch should 
> serve two purpose:
> 
> 1. Erase the registers upon return to avoid information leak;

But only some of the registers.

> 2. ROP mitigation, for details on this, please refer to paper:
> 
> "Clean the Scratch Registers: A Way to Mitigate Return-Oriented Programming 
> Attacks"
> 
> https://ieeexplore.ieee.org/document/8445132 
> <https://ieeexplore.ieee.org/document/8445132>

Do you have a link to this that people can actually read?

> From the above paper, The call-used registers are used by the ROP hackers as 
> following:
> 
> "Based on the practical experience of reading and writing ROP code. we find 
> the features of ROP attacks as follows.
> 
> First, the destination of using gadget chains in usual is performing system 
> call or system function to perform 
> malicious behaviour such as file access, network access and W ⊕ X disable. In 
> most cases, the adversary
> would like to disable W ⊕ X.

That makes things easier, for sure, but is just a nicety really.

> Because once W ⊕ X has been disabled, shellcode can be executed directly
> instead of rewritting shellcode to ROP chains which may cause some troubles 
> for the adversary. In upper 
> example, the system call is number 59 which is “execve” system call.
> 
> Second, if the adversary performs ROP attacks using system call instruction, 
> no matter on x86 or x64 
> architecture, the register would be used to pass parameter. Or if the 
> adversary performs ROP attacks 
> using system function such as “read” or “mprotect”, on x64 system, the 
> register would still be used to 
> pass parameters, as mentioned in subsection B and C.”
> 
> We can see that call-used registers might be used by the ROP hackers to pass 
> parameters to the system call.
> If compiler can clean these registers before routine “return", then ROP 
> attack will be invalid. 

So the idea is that clearing (or otherwise interfering with) the registers
used for parameter passing makes making useful ROP chains harder?

> Yes, there will be performance overhead from adding these register wiping 
> insns. However, it’s necessary to
> add overhead for security purpose.

The point is the balance between how expensive it is, vs. how much it
makes it harder to exploit the code.

But of course any user can make that judgment themselves.  For us it
mostly matters what the cost is to targets that use it, to targets that
do not use it, and to the generic code, vs. what value we give to our
users :-)

> > "call-used" is such a bad name.  "call-clobbered" is better already, but
> > "volatile" (over calls) is most obvious I think.
> 
> In our GCC compiler source code, we used the name “call-used” a lot, of 
> course, “call-clobbered” is
> also used frequently.  Do these names refer to the same set of registers, 
> i.e, the register set that  
> will be corrupted by function call?

Anything that isn't "call-saved" or "fixed" is called "call-used",
essentially.  (And the relation with "fixed" isn't always clear).

> If so, I am okay with name “call-clobbered” if this name sounds better. 

It's more obvious, at least to me.

> > There are at least four different kinds of volatile registers:
> > 
> > 1) Argument registers are volatile, on most ABIs.
> These are the registers that  need to be cleaned up upon function return for 
> the ROP mitigation described in the paper
> mentioned above.
> 
> > 2) The *linker* (or dynamic linker!) may insert code that needs some
> >   registers for itself;
> > 3) Registers only used for scratch space;
> > 4) Registers used for returning the function value.
> 
> I think that the above 1,3,4 should be all covered by “call_used_regs”. 

1 and 4 are the *same* (or overlap) on most ABIs.  3 can be as well, it
depends what the compiler is allowed to do; normally, if the compiler
wants a register, the parameter passing regs are among the cheapest it
can use.

2 you cannot touch usefully at all, for your purposes.

> Not sure about 2, could you explain a little bit more on 2 (The linker may 
> insert code that needs some register for itself)? 

Sure.  The linker can decide it needs to insert some code to restore a
"global pointer" or similar in the function return path (or anything
else -- it just has to follow the ABI, which the generic compiler does
not know enough about at all).

> I have agreed that moving the zeroing regs part entirely to target. 
> Middle-end will only compute a hard regs set that need to be
> zeroed and pass it to target.

The registers you *want* to interfere with are the parameter passing
registers, minus the ones used for the return value of the current
function; not *all* call-clobbered registers.

The generic compiler does not have enough information at all to do this
as far as I can see, and it would fit much better to what the backend
does anyway?

> >  It is a
> > huge security leak otherwise.  And, the generic code has nothing to do
> > with this, define hooks that ask the target to clear stuff, instead?
> 
> Yes, I think that these kind of details are not good to be exposed to 
> middle-end.

I think you should make a hook that just does the whole thing.  There is
nothing useful (or even correct) the generic code can do.  (The command
line flag to do this could be generic, and the hook to actually generate
the code for it as well of course, but other than that, there are so
many more differences between targets, subtargets, and OSes here, and
most of those not expressed anywhere else yet, that it doesn't seem
worth it to artificially make the generic code handle any of this.  For
comparison, pretty much all of the "normal" prologue/epilogue handling
is done in target code already).

> >> But why not simplify it all to a single hook
> >> 
> >>  targetm.calls.zero_regs (used-not-live-hardregset, gpr_only);
> >> 
> >> ?
> > 
> > Yeah.  With a much better name though (it should say what it is for, or
> > describe at a *high level* what it does).
> Okay.

So everything else I write here ius just a very long-winded way of
saying "Yes.  This." to this :-)

> >  But the epilogue can use
> > some volatile registers as well, including to hold sensitive info.  And
> > of course everything is different if you use separate shrink-wrapping,
> > but that work is done already when you get here (so it is too late?)
> 
> Could you please explain this part a little bit more?

For example, on PowerPC, to restore the return address we first have to
load it into a general purpose register (and then move it to LR).
Usually r0 is used, and r0 is call-clobbered (but not used for parameter
passing or return value passing).

The return address of course is very sensitive information (exposing any
return address makes ASLR useless immediately).  But this isn't in the
scope of this protection, I see.

Thanks for the explanations, much appreciated,


Segher

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

Reply via email to