64: randomize kernel stack offset upon syscall

Reshetova, Elena Mon, 29 Apr 2019 00:47:00 -0700

> > On Apr 26, 2019, at 7:01 AM, Theodore Ts'o <ty...@mit.edu> wrote:
> >
> >> On Fri, Apr 26, 2019 at 11:33:09AM +0000, Reshetova, Elena wrote:
> >> Adding Eric and Herbert to continue discussion for the chacha part.
> >> So, as a short summary I am trying to find out a fast (fast enough to be 
> >> used per
> syscall
> >> invocation) source of random bits with good enough security properties.
> >> I started to look into chacha kernel implementation and while it seems 
> >> that it is
> designed to
> >> work with any number of rounds, it does not expose less than 12 rounds 
> >> primitive.
> >> I guess this is done for security sake, since 12 is probably the lowest 
> >> bound we
> want people
> >> to use for the purpose of encryption/decryption, but if we are to build an 
> >> efficient
> RNG,
> >> chacha8 probably is a good tradeoff between security and speed.
> >>
> >> What are people's opinions/perceptions on this? Has it been considered 
> >> before to
> create a
> >> kernel RNG based on chacha?
> >
> > Well, sure.  The get_random_bytes() kernel interface and the
> > getrandom(2) system call uses a CRNG based on chacha20.  See
> > extract_crng() and crng_reseed() in drivers/char/random.c.
> >
> > It *is* possible to use an arbitrary number of rounds if you use the
> > low level interface exposed as chacha_block(), which is an
> > EXPORT_SYMBOL interface so even modules can use it.  "Does not expose
> > less than 12 rounds" applies only if you are using the high-level
> > crypto interface.
> >
> > We have used cut down crypto algorithms for performance critical
> > applications before; at one point, we were using a cut down MD4(!) for
> > initial TCP sequence number generation.  But that was getting rekeyed
> > every five minutes, and the goal was to make it just hard enough that
> > there were other easier ways of DOS attacking a server.
> >
> > I'm not a cryptographer, so I'd really us to hear from multiple
> > experts about the security level of, say, ChaCha8 so we understand
> > exactly kind of security we'd offering.  And I'd want that interface
> > to be named so that it's clear it's only intended for a very specific
> > use case, since it will be tempting for other kernel developers to use
> > it in other contexts, with undue consideration.
> >
> >
> 
> I don’t understand why we’re even considering weaker primitives.


I guess one reasoning here was that cryptographic primitives are expensive 
performance-wise
and we are not really have a full crypto use case here with all standard 
requirements
for CRNG, such as reconstructing earlier inputs, etc. So, it was a natural wish 
to try to find smth
cheaper that does the job, but if we can make performance reasonable, I am all 
for the
proper primitives. 

>It seems to me
> that we should be using the “fast-erasure” construction for all 
> get_random_bytes()
> invocations. Specifically, we should have a per cpu buffer that stores some 
> random
> bytes and a count of how many random bytes there are. get_random_bytes() 
> should
> take bytes from that buffer and *immediately* zero those bytes in memory. When
> the buffer is empty, it gets refilled with the full strength CRNG.

Ideally it would be great to call smth fast and secure on each syscall without 
a per-cpu
buffer, so that's why I was asking on chacha8. As Eric pointed it should not be 
used for
cryptographic purpose, but I think it is reasonably secure for our purpose, 
especially if
the generator is sometimes reseeded with fresh entropy. 
However, it very well might be that is too slow anyway. 

So, I think then we can do the per-cpu approach as you suggesting. 
Have a per-cpu buffer big enough as you suggested (couple of pages) from where
we regularly read 8 bits at the time and zero them as we go. 

I am just not sure on the right refill strategy in this case?
Should we try to maintain this per-cpu buffer always with some random bytes by
having a work queued that would refill it (or part of it, i.e. a page from a 
set of 4 pages) 
regularly from CRNG source? 
I guess how often we need to refill will depend so much on the syscall rate
on that cpu, so it might be hard to find a reasonable period.
In any case we need to prepare to do a refill straight from a syscall,
if despite our best efforts to keep the buffer refilled we run out of bits.
Is it ok to get a visible performance hit at this point? In worse case we will 
need to
generate n pages worth of random numbers, which is going to take a
while. 

I will try doing this PoC and measure implications (without the worker
refill to start with). Let's see how bad (performance wise it looks).

Best Regards,
Elena.




 
> The obvious objection is “oh no, a side channel could leak the buffer,” to 
> which I say
> so what?  A side channel could just as easily leak the entire CRNG state.
> 
> For Elena’s specific use case, we would probably want a
> try_get_random_bytes_notrace() that *only* tries the percpu buffer, since 
> this code
> runs so early in the syscall path that we can’t run real C code.  Or it could 
> be moved a
> bit later, I suppose — the really early part is not really an interesting 
> attack surface.

RE: [PATCH] x86/entry/64: randomize kernel stack offset upon syscall

Reply via email to