On Mon, Aug 10, 2020 at 11:04:55PM +0200, Willy Tarreau wrote:
> What could be improved is the way the input values are mixed (just
> added hence commutative for now). I didn't want to call a siphash
> round on the hot paths, but just shifting the previous noise value
> before adding would work, su
On Mon, Aug 10, 2020 at 01:47:00PM +0200, Willy Tarreau wrote:
> except that I retrieve it only on 1/8 calls
> and use the previous noise in this case.
Er... that's quite different. I was saying you measure them all, and do:
struct siprand_state {
...
+ uint32_t noise[i];
+
On Sun, Aug 09, 2020 at 07:33:03PM +0200, Willy Tarreau wrote:
> Not that low in fact because they don't know precisely when the call is
> made. I mean, let's say we're in the worst case, with two VMs running on
> two siblings of the same core, with the same TSC, on a 3 GHz machine. The
> attacker
On Sun, Aug 09, 2020 at 11:38:05AM +0200, Willy Tarreau wrote:
> So I gave it a quick test under Qemu and it didn't show any obvious
> performance difference compared to Tausworthe, which is a good thing,
> even though there's a significant amount of measurement noise in each
> case.
Thank you ver
On Sun, Aug 09, 2020 at 11:26:31AM +0300, Amit Klein wrote:
> BITS_PER_LONG==23 ???
> Should be ==32, I guess.
Of course. I've been testing on a 64-bit system so hadn't
exercised that branch yet, and it's a case of "seeing what
I expect to see".
alds
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and
activity")
Replaces: f227e3ec3b5c ("random32: update the net random state on interrupt and
activity")
Signed-off-by: George Spelvin
---
lib/random32.c | 437 ++--
On Sat, Aug 08, 2020 at 01:52:37PM -0700, Linus Torvalds wrote:
> On Sat, Aug 8, 2020 at 1:47 PM George Spelvin wrote:
>> I *just* finished explaining, using dribs and drabs of entropy allows an
>> *information theoretical attack* which *no* crypto can prevent.
>
> Th
On Sat, Aug 08, 2020 at 12:49:30PM -0700, Andy Lutomirski wrote:
> I don't care about throwing this stuff away. My plan (not quite
> implemented yet) is to have a percpu RNG stream and to never to anything
> resembling mixing anything in. The stream is periodically discarded and
> reinitialized
On Sat, Aug 08, 2020 at 09:18:27PM +0200, Florian Westphal wrote:
> Can't we keep prandom_u32 as-is...? Most of the usage, esp. in the
> packet schedulers, is fine.
>
> I'd much rather have a prandom_u32_hashed() or whatever for
> those cases where some bits might leak to the outside and then con
On Sat, Aug 08, 2020 at 11:19:01AM -0700, Linus Torvalds wrote:
> If siphash is a good enough hash to make the pseudo-random state hard
> to guess, then it's also a good enough hash to hide the small part of
> the fast-pool we mix in.
No!
No, no, a thousand times no.
I *just* finished explaining
On Sat, Aug 08, 2020 at 07:44:51PM +0200, Willy Tarreau wrote:
> On Sat, Aug 08, 2020 at 03:26:28PM +0000, George Spelvin wrote:
>> So any discussion of reseeding begins by *assuming* an attacker has
>> captured the PRNG state. If we weren't worried about this possibility,
&g
On Sat, Aug 08, 2020 at 10:07:51AM -0700, Andy Lutomirski wrote:
>>- Cryptographically strong ChaCha, batched
>>- Cryptographically strong ChaCha, with anti-backtracking.
>
> I think we should just anti-backtrack everything. With the "fast key
> erasure" construction, already implemented
I'm quite annoyed at the way that this discussion has drifted away from
the main problem, which is that f227e3ec3b5c is broken and needs reverted.
The original bug report is accurate, and it's a critical issue.
My proposed fix is described (patch imminent) below.
THE PROBLEM:
The reseeding desi
On Wed, 27 Mar 2019 at 14:32:55 -0400, Hannes Frederic Sowa wrote:
> On Tue, Mar 26, 2019, at 20:07, George Spelvin wrote:
>> On Tue, 26 Mar 2019 at 14:03:55 -0400, Hannes Frederic Sowa wrote:
>>> Are you trying to fix the modulo biases? I think that prandom_u32_max
>>>
On Tue, 26 Mar 2019 at 14:03:55 -0400, Hannes Frederic Sowa wrote:
> On Tue, Mar 26, 2019, at 12:10, George Spelvin wrote:
> The conversation definitely makes sense.
I also fixed the seeding of prandom; it now uses the
random_ready_callback system for seeding.
> Are you trying to fix t
> A little backstory. I started the prandom_32 stuff long ago when I wanted
> a better (longer period) PRNG for use in netem. Took the existing code from
> older version of GNU scientific library (pre GPLv3). If there is something
> faster with better properties go for it. But the whole point of
Apologies... I severely mangled the headers in the previous
message, feeding "Daniel Borkmann " to a
parser which split on whitespace and thought that was three separate
recipients. :-(
To limit the damage, please either reply to *this* message or trim the
headers manually. Sorry about that.
I started on a project to correct all of the instances of
"prandom_u32() % FOO" in the kernel (there are lots)
to "prandom_u32_max(FOO)".
This has snowballed into an ongoing audit of random number use in the
kernel, including uses of get_random_bytes() when get_random_u32 or
prandom_u32() would su
around.
> On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote:
>> For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
>> (Because each mixback can be guessed and verified separately.)
> Exactly, but the full reseed after running out of entropy is strong
&g
Hannes Frederic Sowa wrote:
> On 24.12.2016 00:39, George Spelvin wrote:
>> We just finished discussing why 8 bytes isn't enough. If you only
>> feed back 8 bytes, an attacker who can do 2^64 computation can find it
>> (by guessing and computing forward to verify
Hannes Frederic Sowa wrote:
> In general this looks good, but bitbuffer needs to be protected from
> concurrent access, thus needing at least some atomic instruction and
> disabling of interrupts for the locking if done outside of
> get_random_long. Thus I liked your previous approach more where yo
(Cc: list trimmed slightly as the topic is wandering a bit.)
Hannes Frederic Sowa wrote:
> On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
>> Adding might_lock() annotations will improve coverage a lot.
>
> Might be hard to find the correct lock we take later down the code
Hannes Frederic Sowa wrote:
> A lockdep test should still be done. ;)
Adding might_lock() annotations will improve coverage a lot.
> Yes, that does look nice indeed. Accounting for bits instead of bytes
> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> case you can't sat
> I do tend to like Ted's version in which we use batched
> get_random_bytes() output. If it's fast enough, it's simpler and lets
> us get the full strength of a CSPRNG.
With the ChaCha20 generator, that's fine, although note that this abandons
anti-backtracking entirely.
It also takes locks, so
> Having slept on this, I like it less. The problem is that a
> backtracking attacker doesn't just learn H(random seed || entropy_0 ||
> secret || ...) -- they learn the internal state of the hash function
> that generates that value. This probably breaks any attempt to apply
> security propertie
> True, but it's called get_random_int(), and it seems like making it
> stronger, especially if the performance cost is low to zero, is a good
> thing.
If it's cheap enough, I don't mind. But it's documented as being
marginal-quality, used where speed is more important.
In particular, it's *not*
Andy Lutomirski wrote:
> I don't even think it needs that. This is just adding a
> non-destructive final operation, right?
It is, but the problem is that SipHash is intended for *small* inputs,
so the standard implementations aren't broken into init/update/final
functions.
There's just one big f
> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to mov
As a separate message, to disentangle the threads, I'd like to
talk about get_random_long().
After some thinking, I still like the "state-preserving" construct
that's equivalent to the current MD5 code. Yes, we could just do
siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
pr
Theodore Ts'o wrote:
> On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote:
>> SipHash annihilates the competition on 64-bit superscalar hardware.
>> SipHash dominates the field on 64-bit in-order hardware.
>> SipHash wins easily on 32-bit hardware *wit
Eric Dumazet wrote:
> Now I am quite confused.
>
> George said :
>> Cycles per byte on 1024 bytes of data:
>> Pentium Core 2 Ivy
>> 4 Duo Bridge
>> SipHash-2-4 38.9 8.3 5.8
>> HalfSipHash-2-4 12.7 4.5 3.2
>> MD5
Linus wrote:
>> How much does kernel_fpu_begin()/kernel_fpu_end() cost?
>
> It's now better than it used to be, but it's absolutely disastrous
> still. We're talking easily many hundreds of cycles. Under some loads,
> thousands.
I think I've been thoroughly dissuaded, but just to clarify one thing
Actually, DJB just made a very relevant suggestion.
As I've mentioned, the 32-bit performance problems are an x86-specific
problem. ARM does very well, and other processors aren't bad at all.
SipHash fits very nicely (and runs very fast) in the MMX registers.
They're 64 bits, and there are 8 of
Eric Dumazet wrote:
> On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote:
>> Cycles per byte on 1024 bytes of data:
>> Pentium Core 2 Ivy
>> 4 Duo Bridge
>> SipHash-2-4 38.9 8.3 5.8
>>
> I do not see why SipHash, if faster than MD5 and more secure, would be a
> problem.
Because on 32-bit x86, it's slower.
Cycles per byte on 1024 bytes of data:
Pentium Core 2 Ivy
4 Duo Bridge
SipHash-2-4 38.9 8.3 5.8
Theodore Ts'o wrote:
> On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote:
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial
David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>> uint32_t c = key[0];
>> uint32_t d = key[1];
>> uint32_t a = 0x6c796765 ^ 0x736f6d65;
>> uint32_t
To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.
It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte".
(The average length is (n+2)/3, c.f. https://oeis.org
BTW, here's some SipHash code I wrote for Linux a while ago.
My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable. Reducing I-cache impact speeds
up the calling code.
One th
> I already did this. Check my branch.
Do you think it should return "u32" (as you currently have it) or
"unsigned long"? I thought the latter, since it doesn't cost any
more and makes more
> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty sur
> 64-bit security for an RNG is not reasonable even with rekeying. No no
> no. Considering we already have a massive speed-up here with the
> secure version, there's zero reason to start weakening the security
> because we're trigger happy with our benchmarks. No no no.
Just to clarify, I was disc
An idea I had which mght be useful:
You could perhaps save two rounds in siphash_*u64.
The final word with the length (called "b" in your implementation)
only needs to be there if the input is variable-sized.
If every use of a given key is of a fixed-size input, you don't need
a length suffix.
> What should we do with get_random_int() and get_random_long()? In
> some cases it's being used in performance sensitive areas, and where
> anti-DoS protection might be enough. In others, maybe not so much.
This is tricky. The entire get_random_int() structure is an abuse of
the hash function
Jason A. Donenfeld wrote:
> I saw that jiffies addition in there and was wondering what it was all
> about. It's currently added _after_ the siphash input, not before, to
> keep with how the old algorithm worked. I'm not sure if this is
> correct or if there's something wrong with that, as I haven'
Tom Herbert wrote:
> Tested this. Distribution and avalanche effect are still good. Speed
> wise I see about a 33% improvement over siphash (20 nsecs/op versus 32
> nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer
> to a more palatable replacement for jhash. Do we lose any sec
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and
>> should be used always. Don't even compile the 32-bit code, to prevent
>> anyone accidentally using it, and make hsiphash an alias for siphash.
> Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I
> l
> It appears that hsiphash can produce either 32-bit output or 64-bit
> output, with the output length parameter as part of the hash algorithm
> in there. When I code this for my kernel patchset, I very likely will
> only implement one output length size. Right now I'm leaning toward
> 32-bit.
A 1
Jason A. Donenfeld wrote:
> Isn't that equivalent to:
> v0 = key[0];
> v1 = key[1];
> v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL);
> v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL);
(Pre-XORing key[] with the first two constants which, if the
Jean-Philippe Aumasson wrote:
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.
It would be fairly significant, a 2x speed benefit on a lot of 32-b
> If a halved version of SipHash can bring significant performance boost
> (with 32b words instead of 64b words) with an acceptable security level
> (64-bit enough?) then we may design such a version.
I was thinking if the key could be pushed to 80 bits, that would be nice,
but honestly 64 bits is
> While SipHash is extremely fast for a cryptographically secure function,
> it is likely a tiny bit slower than the insecure jhash, and so replacements
> will be evaluated on a case-by-case basis based on whether or not the
> difference in speed is negligible and whether or not the current jhash u
>> Also not mentioned in the documentation is that some algorithms *do*
>> have different implementations depending on key size. SHA-2 is the
>> classic example.
> What do you mean by that? SHA has no keying at all.
In this case, the analagous property is hash size. Sorry, I thought
that was so
> We have actually gained quite a bit of documentation recently.
> Have you looked at Documentation/DocBook/crypto-API.tmpl?
>
> More is always welcome of course.
It's improved since I last looked at it, but there are still many structures
that aren't described:
- struct crypto_instance
- struct
Herbert Xu wrote:
> I'm currently working on cts and I'm removing the stack usage
> altogether by having it operate on the src/dst SG lists only.
Wow, I should see how you do that. I couldn't get it below 3
blocks of temporary, and the dst SG list only gives you
one and a half.
> BTW, the only c
crypto_cipher_encrypt_one() or avoid stack buffers entirely.
commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a
Author: George Spelvin
Date: Sat Oct 10 17:26:08 2015 -0400
crypto: cts - Reduce internal buffer usage
It only takes a 3-block temporary buffer to handle all the tricky
CTS cases
I was just noticing that these two:
> +static inline unsigned long add64_with_carry(unsigned long a, unsigned long
> b)
> +{
> + asm("addq %2,%0\n\t"
> + "adcq $0,%0"
> + : "=r" (a)
> + : "0" (a), "rm" (b));
> + return a;
> +}
> +
> +static inline unsigne
David Laight wrote:
> Separate renaming allows:
> 1) The value to tested without waiting for pending updates to complete.
>Useful for IE and DIR.
I don't quite follow. It allows the value to be tested without waiting
for pending updates *of other bits* to complete.
Obviusly, the update of th
David Laight wrote:
> Since adcx and adox must execute in parallel I clearly need to re-remember
> how dependencies against the flags register work. I'm sure I remember
> issues with 'false dependencies' against the flags.
The issue is with flags register bits that are *not* modified by
an instruc
David Laight wrote:
> I'd need convincing that unrolling the loop like that gives any significant
> gain.
> You have a dependency chain on the carry flag so have delays between the
> 'adcq'
> instructions (these may be more significant than the memory reads from l1
> cache).
If the carry chain
59 matches
Mail list logo