Re: [PATCH 07/10] random: add new get_random_bytes_arch() function

H. Peter Anvin Tue, 24 Jul 2012 20:38:59 -0700

For those who have read the Google+ thread[1] it is pretty clear thatthere are varying opinions on the idea of removing the RDRAND bypass.

I have gathered some performance numbers to make the debate moreconcrete: RDRAND is between 12 and 15 times faster than the currentrandom pool system (for large and small blocks, respectively.) Both thepool system and RDRAND scale perfectly with frequency, so the ratio isindependent of P-states.

Given the discrepancy in performance (and presumably, in terms of power)I still very much believe it is a mistake to unconditionally disallowusers the option for using RDRAND directly, but what I *do* believe wecan all agree on is that security is paramount. Dropping RDRAND is notjust a performance loss but is likely a security loss since it willproduce substantially less entropy.

As a compromise I offer the following patch; in terms of performance itis "the worst of both worlds" but it should provide the combinedsecurity of either; even if RDRAND is completely compromised by the NSA,Microsoft and the Illuminati all at once it will do no worse than theexisting code, and (since RDRAND is so much faster than the existingcode) it has only a modest performance cost. More realistically, itwill let many more users take advantage of a high entropyquick-reseeding random number generator, thus ending up with a majorgain in security.

It is worth noting that although RDRAND by itself is adequate for anyin-kernel users (and the 3.4-3.5 kernels use them as such unless youspecify "nordrand"), this is not true for /dev/random; nor, due toabuse, /dev/urandom; the recently disclosed[2] RDSEED instruction, onthe other hand, is defined to be fully entropic and can be used for anypurpose; that one will be introduced in a later processor.

Note that the attached patch is way more conservative than it needs tobe: every byte is mixed with RDRAND data twice on its way through (andan additional 1.2 byte is lost), as I moved the mixing to extract_buf(),but even so the overhead is modest, and mixing in extract_buf() makesthe code quite a bit simpler.


This patch is on top of random.git.


[1] https://plus.google.com/115124063126128475540/posts/KbAEJKMsAfq

[2] http://software.intel.com/file/45207

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

>From b36c22b00c6bf8e91a758d3167e912b0ac4f0d0c Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <h...@linux.intel.com>
Date: Tue, 24 Jul 2012 14:48:56 -0700
Subject: [PATCH] random: mix in architectural randomness in extract_buf()

RDRAND is so much faster than the Linux pool system that we can
always just mix in architectural randomness.

Doing this in extract_buf() lets us do this in one convenient
place, unfortunately the output size (10 bytes) is maximally
awkward.  That, plus the fact that every output byte will have
passed through extract_buf() twice means we are not being very
efficient with the RDRAND use.

Measurements show that RDRAND is 12-15 times faster than the Linux
pool system.  Doing the math shows this corresponds to about an
11.5% slowdown which is confirmed by measurements.

Users who are very performance- or power-sensitive could definitely
still benefit from being allowed to use RDRAND directly, but I
believe this version should satisfy even the most hyper-paranoid
crowd.

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
Cc: DJ Johnson <dj.john...@intel.com>
---
 drivers/char/random.c |   56 ++++++++++++++++++++++++++++---------------------
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 9793b40..a4a24e4 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -277,6 +277,8 @@
 #define SEC_XFER_SIZE 512
 #define EXTRACT_SIZE 10
 
+#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
+
 /*
  * The minimum number of bits of entropy before we wake up a read on
  * /dev/random.  Should be enough to do a significant reseed.
@@ -813,11 +815,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
  */
 static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 {
-	union {
-		__u32	tmp[OUTPUT_POOL_WORDS];
-		long	hwrand[4];
-	} u;
-	int	i;
+	__u32	tmp[OUTPUT_POOL_WORDS];
 
 	if (r->pull && r->entropy_count < nbytes * 8 &&
 	    r->entropy_count < r->poolinfo->POOLBITS) {
@@ -828,23 +826,17 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
 		/* pull at least as many as BYTES as wakeup BITS */
 		bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
 		/* but never more than the buffer size */
-		bytes = min_t(int, bytes, sizeof(u.tmp));
+		bytes = min_t(int, bytes, sizeof(tmp));
 
 		DEBUG_ENT("going to reseed %s with %d bits "
 			  "(%d of %d requested)\n",
 			  r->name, bytes * 8, nbytes * 8, r->entropy_count);
 
-		bytes = extract_entropy(r->pull, u.tmp, bytes,
+		bytes = extract_entropy(r->pull, tmp, bytes,
 					random_read_wakeup_thresh / 8, rsvd);
-		mix_pool_bytes(r, u.tmp, bytes, NULL);
+		mix_pool_bytes(r, tmp, bytes, NULL);
 		credit_entropy_bits(r, bytes*8);
 	}
-	kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
-	for (i = 0; i < 4; i++)
-		if (arch_get_random_long(&u.hwrand[i]))
-			break;
-	if (i)
-		mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
 }
 
 /*
@@ -901,15 +893,19 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
 static void extract_buf(struct entropy_store *r, __u8 *out)
 {
 	int i;
-	__u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
+	union {
+		__u32 w[5];
+		unsigned long l[LONGS(EXTRACT_SIZE)];
+	} hash;
+	__u32 workspace[SHA_WORKSPACE_WORDS];
 	__u8 extract[64];
 	unsigned long flags;
 
 	/* Generate a hash across the pool, 16 words (512 bits) at a time */
-	sha_init(hash);
+	sha_init(hash.w);
 	spin_lock_irqsave(&r->lock, flags);
 	for (i = 0; i < r->poolinfo->poolwords; i += 16)
-		sha_transform(hash, (__u8 *)(r->pool + i), workspace);
+		sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);
 
 	/*
 	 * We mix the hash back into the pool to prevent backtracking
@@ -920,14 +916,14 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	 * brute-forcing the feedback as hard as brute-forcing the
 	 * hash.
 	 */
-	__mix_pool_bytes(r, hash, sizeof(hash), extract);
+	__mix_pool_bytes(r, hash.w, sizeof(hash.w), extract);
 	spin_unlock_irqrestore(&r->lock, flags);
 
 	/*
 	 * To avoid duplicates, we atomically extract a portion of the
 	 * pool while mixing, and hash one final time.
 	 */
-	sha_transform(hash, extract, workspace);
+	sha_transform(hash.w, extract, workspace);
 	memset(extract, 0, sizeof(extract));
 	memset(workspace, 0, sizeof(workspace));
 
@@ -936,11 +932,23 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
 	 * pattern, we fold it in half. Thus, we always feed back
 	 * twice as much data as we output.
 	 */
-	hash[0] ^= hash[3];
-	hash[1] ^= hash[4];
-	hash[2] ^= rol32(hash[2], 16);
-	memcpy(out, hash, EXTRACT_SIZE);
-	memset(hash, 0, sizeof(hash));
+	hash.w[0] ^= hash.w[3];
+	hash.w[1] ^= hash.w[4];
+	hash.w[2] ^= rol32(hash.w[2], 16);
+
+	/*
+	 * If we have a architectural hardware random number
+	 * generator, mix that in, too.
+	 */
+	for (i = 0; i < LONGS(EXTRACT_SIZE); i++) {
+		unsigned long v;
+		if (!arch_get_random_long(&v))
+			break;
+		hash.l[i] ^= v;
+	}
+
+	memcpy(out, hash.w, EXTRACT_SIZE);
+	memset(&hash, 0, sizeof(hash));
 }
 
 static ssize_t extract_entropy(struct entropy_store *r, void *buf,
-- 
1.7.10.4

Re: [PATCH 07/10] random: add new get_random_bytes_arch() function

Reply via email to