Re: [DRAFT PATCH] random32: make prandom_u32() output unpredictable

2020-08-11 Thread George Spelvin
On Mon, Aug 10, 2020 at 11:04:55PM +0200, Willy Tarreau wrote: > What could be improved is the way the input values are mixed (just > added hence commutative for now). I didn't want to call a siphash > round on the hot paths, but just shifting the previous noise value > before adding would work, su

Re: [DRAFT PATCH] random32: make prandom_u32() output unpredictable

2020-08-11 Thread George Spelvin
On Mon, Aug 10, 2020 at 01:47:00PM +0200, Willy Tarreau wrote: > except that I retrieve it only on 1/8 calls > and use the previous noise in this case. Er... that's quite different. I was saying you measure them all, and do: struct siprand_state { ... + uint32_t noise[i]; +

Re: [DRAFT PATCH] random32: make prandom_u32() output unpredictable

2020-08-09 Thread George Spelvin
On Sun, Aug 09, 2020 at 07:33:03PM +0200, Willy Tarreau wrote: > Not that low in fact because they don't know precisely when the call is > made. I mean, let's say we're in the worst case, with two VMs running on > two siblings of the same core, with the same TSC, on a 3 GHz machine. The > attacker

Re: [DRAFT PATCH] random32: make prandom_u32() output unpredictable

2020-08-09 Thread George Spelvin
On Sun, Aug 09, 2020 at 11:38:05AM +0200, Willy Tarreau wrote: > So I gave it a quick test under Qemu and it didn't show any obvious > performance difference compared to Tausworthe, which is a good thing, > even though there's a significant amount of measurement noise in each > case. Thank you ver

Re: [DRAFT PATCH] random32: make prandom_u32() output unpredictable

2020-08-09 Thread George Spelvin
On Sun, Aug 09, 2020 at 11:26:31AM +0300, Amit Klein wrote: > BITS_PER_LONG==23 ??? > Should be ==32, I guess. Of course. I've been testing on a 64-bit system so hadn't exercised that branch yet, and it's a case of "seeing what I expect to see".

[DRAFT PATCH] random32: make prandom_u32() output unpredictable

2020-08-09 Thread George Spelvin
alds Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Replaces: f227e3ec3b5c ("random32: update the net random state on interrupt and activity") Signed-off-by: George Spelvin --- lib/random32.c | 437 ++--

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
On Sat, Aug 08, 2020 at 01:52:37PM -0700, Linus Torvalds wrote: > On Sat, Aug 8, 2020 at 1:47 PM George Spelvin wrote: >> I *just* finished explaining, using dribs and drabs of entropy allows an >> *information theoretical attack* which *no* crypto can prevent. > > Th

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
On Sat, Aug 08, 2020 at 12:49:30PM -0700, Andy Lutomirski wrote: > I don't care about throwing this stuff away. My plan (not quite > implemented yet) is to have a percpu RNG stream and to never to anything > resembling mixing anything in. The stream is periodically discarded and > reinitialized

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
On Sat, Aug 08, 2020 at 09:18:27PM +0200, Florian Westphal wrote: > Can't we keep prandom_u32 as-is...? Most of the usage, esp. in the > packet schedulers, is fine. > > I'd much rather have a prandom_u32_hashed() or whatever for > those cases where some bits might leak to the outside and then con

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
On Sat, Aug 08, 2020 at 11:19:01AM -0700, Linus Torvalds wrote: > If siphash is a good enough hash to make the pseudo-random state hard > to guess, then it's also a good enough hash to hide the small part of > the fast-pool we mix in. No! No, no, a thousand times no. I *just* finished explaining

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
On Sat, Aug 08, 2020 at 07:44:51PM +0200, Willy Tarreau wrote: > On Sat, Aug 08, 2020 at 03:26:28PM +0000, George Spelvin wrote: >> So any discussion of reseeding begins by *assuming* an attacker has >> captured the PRNG state. If we weren't worried about this possibility, &g

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
On Sat, Aug 08, 2020 at 10:07:51AM -0700, Andy Lutomirski wrote: >>- Cryptographically strong ChaCha, batched >>- Cryptographically strong ChaCha, with anti-backtracking. > > I think we should just anti-backtrack everything. With the "fast key > erasure" construction, already implemented

Re: Flaw in "random32: update the net random state on interrupt and activity"

2020-08-08 Thread George Spelvin
I'm quite annoyed at the way that this discussion has drifted away from the main problem, which is that f227e3ec3b5c is broken and needs reverted. The original bug report is accurate, and it's a critical issue. My proposed fix is described (patch imminent) below. THE PROBLEM: The reseeding desi

Re: Revising prandom_32 generator

2019-03-27 Thread George Spelvin
On Wed, 27 Mar 2019 at 14:32:55 -0400, Hannes Frederic Sowa wrote: > On Tue, Mar 26, 2019, at 20:07, George Spelvin wrote: >> On Tue, 26 Mar 2019 at 14:03:55 -0400, Hannes Frederic Sowa wrote: >>> Are you trying to fix the modulo biases? I think that prandom_u32_max >>>

Re: Revising prandom_32 generator

2019-03-26 Thread George Spelvin
On Tue, 26 Mar 2019 at 14:03:55 -0400, Hannes Frederic Sowa wrote: > On Tue, Mar 26, 2019, at 12:10, George Spelvin wrote: > The conversation definitely makes sense. I also fixed the seeding of prandom; it now uses the random_ready_callback system for seeding. > Are you trying to fix t

Re: Revising prandom_32 generator

2019-03-26 Thread George Spelvin
> A little backstory. I started the prandom_32 stuff long ago when I wanted > a better (longer period) PRNG for use in netem. Took the existing code from > older version of GNU scientific library (pre GPLv3). If there is something > faster with better properties go for it. But the whole point of

Re: Revising prandom_32 generator

2019-03-26 Thread George Spelvin
Apologies... I severely mangled the headers in the previous message, feeding "Daniel Borkmann " to a parser which split on whitespace and thought that was three separate recipients. :-( To limit the damage, please either reply to *this* message or trim the headers manually. Sorry about that.

Revising prandom_32 generator

2019-03-26 Thread George Spelvin
I started on a project to correct all of the instances of "prandom_u32() % FOO" in the kernel (there are lots) to "prandom_u32_max(FOO)". This has snowballed into an ongoing audit of random number use in the kernel, including uses of get_random_bytes() when get_random_u32 or prandom_u32() would su

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-28 Thread George Spelvin
around. > On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote: >> For example, two mix-backs of 64 bits gives you 65 bit security, not 128. >> (Because each mixback can be guessed and verified separately.) > Exactly, but the full reseed after running out of entropy is strong &g

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > On 24.12.2016 00:39, George Spelvin wrote: >> We just finished discussing why 8 bytes isn't enough. If you only >> feed back 8 bytes, an attacker who can do 2^64 computation can find it >> (by guessing and computing forward to verify

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > In general this looks good, but bitbuffer needs to be protected from > concurrent access, thus needing at least some atomic instruction and > disabling of interrupts for the locking if done outside of > get_random_long. Thus I liked your previous approach more where yo

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
(Cc: list trimmed slightly as the topic is wandering a bit.) Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: >> Adding might_lock() annotations will improve coverage a lot. > > Might be hard to find the correct lock we take later down the code

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
Hannes Frederic Sowa wrote: > A lockdep test should still be done. ;) Adding might_lock() annotations will improve coverage a lot. > Yes, that does look nice indeed. Accounting for bits instead of bytes > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > case you can't sat

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> I do tend to like Ted's version in which we use batched > get_random_bytes() output. If it's fast enough, it's simpler and lets > us get the full strength of a CSPRNG. With the ChaCha20 generator, that's fine, although note that this abandons anti-backtracking entirely. It also takes locks, so

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> Having slept on this, I like it less. The problem is that a > backtracking attacker doesn't just learn H(random seed || entropy_0 || > secret || ...) -- they learn the internal state of the hash function > that generates that value. This probably breaks any attempt to apply > security propertie

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> True, but it's called get_random_int(), and it seems like making it > stronger, especially if the performance cost is low to zero, is a good > thing. If it's cheap enough, I don't mind. But it's documented as being marginal-quality, used where speed is more important. In particular, it's *not*

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread George Spelvin
Andy Lutomirski wrote: > I don't even think it needs that. This is just adding a > non-destructive final operation, right? It is, but the problem is that SipHash is intended for *small* inputs, so the standard implementations aren't broken into init/update/final functions. There's just one big f

Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
> Plus the benchmark was bogus anyway, and when I built a more specific > harness -- actually comparing the TCP sequence number functions -- > SipHash was faster than MD5, even on register starved x86. So I think > we're fine and this chapter of the discussion can come to a close, in > order to mov

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
As a separate message, to disentangle the threads, I'd like to talk about get_random_long(). After some thinking, I still like the "state-preserving" construct that's equivalent to the current MD5 code. Yes, we could just do siphash(current_cpu || per_cpu_counter, global_key), but it's nice to pr

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Theodore Ts'o wrote: > On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote: >> SipHash annihilates the competition on 64-bit superscalar hardware. >> SipHash dominates the field on 64-bit in-order hardware. >> SipHash wins easily on 32-bit hardware *wit

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Eric Dumazet wrote: > Now I am quite confused. > > George said : >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >> HalfSipHash-2-4 12.7 4.5 3.2 >> MD5

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Linus wrote: >> How much does kernel_fpu_begin()/kernel_fpu_end() cost? > > It's now better than it used to be, but it's absolutely disastrous > still. We're talking easily many hundreds of cycles. Under some loads, > thousands. I think I've been thoroughly dissuaded, but just to clarify one thing

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Actually, DJB just made a very relevant suggestion. As I've mentioned, the 32-bit performance problems are an x86-specific problem. ARM does very well, and other processors aren't bad at all. SipHash fits very nicely (and runs very fast) in the MMX registers. They're 64 bits, and there are 8 of

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Eric Dumazet wrote: > On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote: >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >>

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
> I do not see why SipHash, if faster than MD5 and more secure, would be a > problem. Because on 32-bit x86, it's slower. Cycles per byte on 1024 bytes of data: Pentium Core 2 Ivy 4 Duo Bridge SipHash-2-4 38.9 8.3 5.8

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Theodore Ts'o wrote: > On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: >> 1) Anything that requires actual long-term security will use >> SipHash2-4, with the 64-bit output and the 128-bit key. This includes >> things like TCP sequence numbers. This seems pretty uncontroversial

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin
David Laight wrote: > From: George Spelvin ... >> uint32_t >> hsiphash24(char const *in, size_t len, uint32_t const key[2]) >> { >> uint32_t c = key[0]; >> uint32_t d = key[1]; >> uint32_t a = 0x6c796765 ^ 0x736f6d65; >> uint32_t

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar, here's my benchmark code. It just computes the hash of all n*(n+1)/2 possible non-empty substrings of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte". (The average length is (n+2)/3, c.f. https://oeis.org

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago. My target application was ext4 directory hashing, resulting in different implementation choices, although I still think that a rolled-up implementation like this is reasonable. Reducing I-cache impact speeds up the calling code. One th

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch. Do you think it should return "u32" (as you currently have it) or "unsigned long"? I thought the latter, since it doesn't cost any more and makes more > I wonder if this could also lead to a similar aliasing > with arch_get_random_int, since I'm pretty sur

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no > no. Considering we already have a massive speed-up here with the > secure version, there's zero reason to start weakening the security > because we're trigger happy with our benchmarks. No no no. Just to clarify, I was disc

Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful: You could perhaps save two rounds in siphash_*u64. The final word with the length (called "b" in your implementation) only needs to be there if the input is variable-sized. If every use of a given key is of a fixed-size input, you don't need a length suffix.

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()? In > some cases it's being used in performance sensitive areas, and where > anti-DoS protection might be enough. In others, maybe not so much. This is tricky. The entire get_random_int() structure is an abuse of the hash function

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > I saw that jiffies addition in there and was wondering what it was all > about. It's currently added _after_ the siphash input, not before, to > keep with how the old algorithm worked. I'm not sure if this is > correct or if there's something wrong with that, as I haven'

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote: > Tested this. Distribution and avalanche effect are still good. Speed > wise I see about a 33% improvement over siphash (20 nsecs/op versus 32 > nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer > to a more palatable replacement for jhash. Do we lose any sec

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and >> should be used always. Don't even compile the 32-bit code, to prevent >> anyone accidentally using it, and make hsiphash an alias for siphash. > Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I > l

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit > output, with the output length parameter as part of the hash algorithm > in there. When I code this for my kernel patchset, I very likely will > only implement one output length size. Right now I'm leaning toward > 32-bit. A 1

RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > Isn't that equivalent to: > v0 = key[0]; > v1 = key[1]; > v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL); > v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL); (Pre-XORing key[] with the first two constants which, if the

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
Jean-Philippe Aumasson wrote: > If a halved version of SipHash can bring significant performance boost > (with 32b words instead of 64b words) with an acceptable security level > (64-bit enough?) then we may design such a version. It would be fairly significant, a 2x speed benefit on a lot of 32-b

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> If a halved version of SipHash can bring significant performance boost > (with 32b words instead of 64b words) with an acceptable security level > (64-bit enough?) then we may design such a version. I was thinking if the key could be pushed to 80 bits, that would be nice, but honestly 64 bits is

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> While SipHash is extremely fast for a cryptographically secure function, > it is likely a tiny bit slower than the insecure jhash, and so replacements > will be evaluated on a case-by-case basis based on whether or not the > difference in speed is negligible and whether or not the current jhash u

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-29 Thread George Spelvin
>> Also not mentioned in the documentation is that some algorithms *do* >> have different implementations depending on key size. SHA-2 is the >> classic example. > What do you mean by that? SHA has no keying at all. In this case, the analagous property is hash size. Sorry, I thought that was so

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
> We have actually gained quite a bit of documentation recently. > Have you looked at Documentation/DocBook/crypto-API.tmpl? > > More is always welcome of course. It's improved since I last looked at it, but there are still many structures that aren't described: - struct crypto_instance - struct

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
Herbert Xu wrote: > I'm currently working on cts and I'm removing the stack usage > altogether by having it operate on the src/dst SG lists only. Wow, I should see how you do that. I couldn't get it below 3 blocks of temporary, and the dst SG list only gives you one and a half. > BTW, the only c

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
crypto_cipher_encrypt_one() or avoid stack buffers entirely. commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a Author: George Spelvin Date: Sat Oct 10 17:26:08 2015 -0400 crypto: cts - Reduce internal buffer usage It only takes a 3-block temporary buffer to handle all the tricky CTS cases

Re: [PATCH v4 net-next] net: Implement fast csum_partial for x86_64

2016-02-28 Thread George Spelvin
I was just noticing that these two: > +static inline unsigned long add64_with_carry(unsigned long a, unsigned long > b) > +{ > + asm("addq %2,%0\n\t" > + "adcq $0,%0" > + : "=r" (a) > + : "0" (a), "rm" (b)); > + return a; > +} > + > +static inline unsigne

RE: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-10 Thread George Spelvin
David Laight wrote: > Separate renaming allows: > 1) The value to tested without waiting for pending updates to complete. >Useful for IE and DIR. I don't quite follow. It allows the value to be tested without waiting for pending updates *of other bits* to complete. Obviusly, the update of th

RE: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-09 Thread George Spelvin
David Laight wrote: > Since adcx and adox must execute in parallel I clearly need to re-remember > how dependencies against the flags register work. I'm sure I remember > issues with 'false dependencies' against the flags. The issue is with flags register bits that are *not* modified by an instruc

Re: [PATCH v3 net-next] net: Implement fast csum_partial for x86_64

2016-02-08 Thread George Spelvin
David Laight wrote: > I'd need convincing that unrolling the loop like that gives any significant > gain. > You have a dependency chain on the carry flag so have delays between the > 'adcq' > instructions (these may be more significant than the memory reads from l1 > cache). If the carry chain