> > On Apr 26, 2019, at 7:01 AM, Theodore Ts'o <ty...@mit.edu> wrote: > > > >> On Fri, Apr 26, 2019 at 11:33:09AM +0000, Reshetova, Elena wrote: > >> Adding Eric and Herbert to continue discussion for the chacha part. > >> So, as a short summary I am trying to find out a fast (fast enough to be > >> used per > syscall > >> invocation) source of random bits with good enough security properties. > >> I started to look into chacha kernel implementation and while it seems > >> that it is > designed to > >> work with any number of rounds, it does not expose less than 12 rounds > >> primitive. > >> I guess this is done for security sake, since 12 is probably the lowest > >> bound we > want people > >> to use for the purpose of encryption/decryption, but if we are to build an > >> efficient > RNG, > >> chacha8 probably is a good tradeoff between security and speed. > >> > >> What are people's opinions/perceptions on this? Has it been considered > >> before to > create a > >> kernel RNG based on chacha? > > > > Well, sure. The get_random_bytes() kernel interface and the > > getrandom(2) system call uses a CRNG based on chacha20. See > > extract_crng() and crng_reseed() in drivers/char/random.c. > > > > It *is* possible to use an arbitrary number of rounds if you use the > > low level interface exposed as chacha_block(), which is an > > EXPORT_SYMBOL interface so even modules can use it. "Does not expose > > less than 12 rounds" applies only if you are using the high-level > > crypto interface. > > > > We have used cut down crypto algorithms for performance critical > > applications before; at one point, we were using a cut down MD4(!) for > > initial TCP sequence number generation. But that was getting rekeyed > > every five minutes, and the goal was to make it just hard enough that > > there were other easier ways of DOS attacking a server. > > > > I'm not a cryptographer, so I'd really us to hear from multiple > > experts about the security level of, say, ChaCha8 so we understand > > exactly kind of security we'd offering. And I'd want that interface > > to be named so that it's clear it's only intended for a very specific > > use case, since it will be tempting for other kernel developers to use > > it in other contexts, with undue consideration. > > > > > > I don’t understand why we’re even considering weaker primitives.
I guess one reasoning here was that cryptographic primitives are expensive performance-wise and we are not really have a full crypto use case here with all standard requirements for CRNG, such as reconstructing earlier inputs, etc. So, it was a natural wish to try to find smth cheaper that does the job, but if we can make performance reasonable, I am all for the proper primitives. >It seems to me > that we should be using the “fast-erasure” construction for all > get_random_bytes() > invocations. Specifically, we should have a per cpu buffer that stores some > random > bytes and a count of how many random bytes there are. get_random_bytes() > should > take bytes from that buffer and *immediately* zero those bytes in memory. When > the buffer is empty, it gets refilled with the full strength CRNG. Ideally it would be great to call smth fast and secure on each syscall without a per-cpu buffer, so that's why I was asking on chacha8. As Eric pointed it should not be used for cryptographic purpose, but I think it is reasonably secure for our purpose, especially if the generator is sometimes reseeded with fresh entropy. However, it very well might be that is too slow anyway. So, I think then we can do the per-cpu approach as you suggesting. Have a per-cpu buffer big enough as you suggested (couple of pages) from where we regularly read 8 bits at the time and zero them as we go. I am just not sure on the right refill strategy in this case? Should we try to maintain this per-cpu buffer always with some random bytes by having a work queued that would refill it (or part of it, i.e. a page from a set of 4 pages) regularly from CRNG source? I guess how often we need to refill will depend so much on the syscall rate on that cpu, so it might be hard to find a reasonable period. In any case we need to prepare to do a refill straight from a syscall, if despite our best efforts to keep the buffer refilled we run out of bits. Is it ok to get a visible performance hit at this point? In worse case we will need to generate n pages worth of random numbers, which is going to take a while. I will try doing this PoC and measure implications (without the worker refill to start with). Let's see how bad (performance wise it looks). Best Regards, Elena. > The obvious objection is “oh no, a side channel could leak the buffer,” to > which I say > so what? A side channel could just as easily leak the entire CRNG state. > > For Elena’s specific use case, we would probably want a > try_get_random_bytes_notrace() that *only* tries the percpu buffer, since > this code > runs so early in the syscall path that we can’t run real C code. Or it could > be moved a > bit later, I suppose — the really early part is not really an interesting > attack surface.