[PATCH v3] ubsan: Avoid unnecessary 128-bit shifts

2019-04-04 Thread George Spelvin
ges people from adding inefficient code. Note that the shifts in (unsigned, and by a compile-time constant amount) are simpler and generated inline. Signed-off-by: George Spelvin Acked-By: Andrey Ryabinin Feedback-from: Rasmus Villemoes Cc: linux-s...@vger.kernel.org Cc: Heiko Carstens --- include

[PATCH v2] ubsan: Avoid unnecessary 128-bit shifts

2019-04-02 Thread George Spelvin
bsence discourages people from adding inefficient code. Note that the shifts in (unsigned, and by a compile-time constant amount) are simpler and generated inline. Signed-off-by: George Spelvin Acked-By: Andrey Ryabinin Cc: linux-s...@vger.kernel.org Cc: Heiko Carstens --- lib/ubsan

[PATCH] ubsan: Avoid unnecessary 128-bit shifts

2019-04-01 Thread George Spelvin
s in different subsystems, and that needs handling. Option 1: Submit this for 5.2 and turn on INT128 for s390 in 5.3. Option 2: Let the arches cherry-pick this patch pre-5.2. My preference is for option 2, but that requires permission from ubsan's owner. Andrey? Signed-off-by: George Spelvin

[PATCH 6/5] lib/list_sort: Fix GCC warning

2019-03-28 Thread George Spelvin
x27;t actually help GCC 8.3's code generation, so just omit it. Signed-off-by: George Spelvin Fixes: 820c81be5237 ("lib/list_sort: simplify and remove MAX_LIST_LENGTH_BITS") Cc: Andrew Morton Cc: Stephen Rothwell --- lib/list_sort.c | 14 +- 1 file changed, 9 insert

Re: [RESEND PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-28 Thread George Spelvin
Than you all for the build warning report. The warning produced by gcc versions 4.9, 7.3, 8.1, whatever version Stephen Rothwell is running, is: lib/list_sort.c:17:36: warning: __pure__ attribute ignored [-Wattributes] The relevant code is: 10: /* 11: * By declaring the compare function with the

Re: [RFC PATCH v2] random: add get_random_max() function

2019-03-28 Thread George Spelvin
By the way, I just noticed that my fallback get_random_max64() algorithm (if there's no __int128 type) is completely broken and will need rewriting. It would work if I rejected and regenerated the high half if the low half were out of range, but that's not what it does. The worst case is a range

Re: [RFC PATCH] random: add get_random_max() function

2019-03-24 Thread George Spelvin
P.S. The cited paper calls your algorithm the "OpenBSD algorithm" and has a bunch of benchmarks comparing it to others in Fisher-Yates shuffles of sizes 1e3..1e9. Including all overhead (base PRNG, shuffle), it's 3x slower for 32-bit operations and 8x slower for 64-bit up to arrays of size 1e6, af

Re: [RFC PATCH] random: add get_random_max() function

2019-03-24 Thread George Spelvin
On Sun, 24 Mar 2019 at 21:47:50 +0100, Jason A. Donenfeld wrote: > I generally use a slightly simpler algorithm in various different projects: > > //[0, bound) > static unsigned long random_bounded(unsigned long bound) > { >unsigned long ret; >const unsigned long max_mod_bound = (1

[RESEND PATCH v2 3/5] lib/sort: Avoid indirect calls to built-in swap

2019-03-19 Thread George Spelvin
ction, there are a few extra predicted branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 767 -> 703 bytes (-64) Signed-off-by: George Spelvin Acke

[RESEND PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller

2019-03-19 Thread George Spelvin
I'm running this code right now, with CONFIG_TEST_SORT and CONFIG_TEST_LIST_SORT, but I confess I haven't rebooted since the last round of minor edits to quell checkpatch. I figure there will be at least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap

[RESEND PATCH v2 1/5] lib/sort: Make swap functions more generic

2019-03-19 Thread George Spelvin
the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Andy Shevchenko Feedback-from: Rasmus Villemoes Feedbac

[RESEND PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-19 Thread George Spelvin
feature. x86-64 code size 1086 -> 739 bytes (-347) (Yes, I see checkpatch complaining about no space after comma in "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Rasmus Villemoes Feedback-from: Andy

[RESEND PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-19 Thread George Spelvin
6 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Signed-off-by:

[RESEND PATCH v2 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-19 Thread George Spelvin
ode size 885 -> 767 bytes (-118) (I see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov --- lib/sort.c | 110 ++--- 1 file changed, 8

[PATCH v2 0/5] lib/sort & lib/list_sort: faster and smaller

2019-03-15 Thread George Spelvin
s to quell checkpatch. I figure there will be at least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap functions more generic lib/sort: Use more efficient bottom-up heapsort variant lib/sort: Avoid indirect calls to built-in swap lib/list_sort: Simplify

[PATCH v2 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
feature. x86-64 code size 1086 -> 739 bytes (-347) (Yes, I see checkpatch complaining about no space after comma in "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Rasmus Villemoes Feedback-from: Andy

[PATCH v2 3/5] lib/sort: Avoid indirect calls to built-in swap

2019-03-15 Thread George Spelvin
ction, there are a few extra predicted branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 767 -> 703 bytes (-64) Signed-off-by: George Spelvin Acke

[PATCH v2 1/5] lib/sort: Make swap functions more generic

2019-03-15 Thread George Spelvin
the great tradition of bikeshedding, the names were by far the most contentious issue during review of this patch series. x86-64 code size 872 -> 886 bytes (+14) Signed-off-by: George Spelvin Acked-by: Andrey Abramov Feedback-from: Andy Shevchenko Feedback-from: Rasmus Villemoes Feedbac

[PATCH v2 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-15 Thread George Spelvin
ode size 885 -> 767 bytes (-118) (I see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Signed-off-by: George Spelvin Acked-by: Andrey Abramov --- lib/sort.c | 110 ++--- 1 file changed, 8

[PATCH v2 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-15 Thread George Spelvin
6 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q Signed-off-by:

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
Indeed, thanks to everyone who commented. The extra conceptual complexity and reduced readbility is Just Not Worth It. v2 (and final, as far as I'm concerned) follows.

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
On Fri, 15 Mar 2019 at 13:57:05 +0100, Geert Uytterhoeven wrote: > On Fri, Mar 15, 2019 at 11:23 AM George Spelvin wrote: >> On Fri, 15 Mar 2019 at 09:20:58 +0100, Geert Uytterhoeven wrote: >>> On Fri, Mar 15, 2019 at 5:33 AM George Spelvin wrote: >>>> One question

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-15 Thread George Spelvin
On Fri, 15 Mar 2019 at 09:20:58 +0100, Geert Uytterhoeven wrote: > On Fri, Mar 15, 2019 at 5:33 AM George Spelvin wrote: >> On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote: >>> On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote: >>>> +

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-14 Thread George Spelvin
On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote: > On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote: >> +for (bit = 1; count & bit; bit <<= 1) { >> +cur = merge(priv, (cmp_func)cmp, pending, cur); >> +

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
>> swap_bytes / swap_4byte_words / swap_8byte_words >> swap_bytes / swap_ints / swap_longs >> swap_1 / swap_4 / swap_8 >> Pistols at dawn? On Thu, 14 Mar 2019 at 22:59:55 +0300, Andrey Abramov wrote: > Yes, in my opinion, swap_bytes / swap_ints / swap_longs are the > most readable because we have

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
On Thu, 14 Mar 2019 at 11:41:26 +0100, Geert Uytterhoeven wrote: > On Thu, Mar 14, 2019 at 11:10 AM George Spelvin wrote: >> On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >>>> How about one of: >>>> swap_bytes / swap_ints / swap_longs >>>&g

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >> Although I'm thinking of: >> >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> >> (void)base; >> #ifndef CONFIG_HAVE_EF

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-14 Thread George Spelvin
On Sat, 09 Mar 2019 at 23:19:49 +0300, Andrey Abramov wrote: >> Although I'm thinking of: >> >> static bool __attribute_const__ >> is_aligned(const void *base, size_t size, unsigned char align) >> { >> unsigned char lsbits = (unsigned char)size; >> >> (void)base; >> #ifndef CONFIG_HAVE_EF

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-14 Thread George Spelvin
On Thu, 14 Mar 2019 at 11:10:41 +0200, Andy Shevchenko wrote: > On Tue, Mar 05, 2019 at 03:06:44AM +0000, George Spelvin wrote: >> +/* Do merges corresponding to set lsbits in count */ > >> +for (bit = 1; count & bit; bit <<= 1) { >> +

Re: [PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-13 Thread George Spelvin
On Thu, 14 Mar 2019 at 00:28:16 +0100, Rasmus Villemoes wrote: > On 05/03/2019 06.58, George Spelvin wrote: >> This patch avoids badly unbalanced merges on unlucky input sizes. >> It slightly increases the code size, but saves an average of 0.2*n >> calls to cmp(). >>

Re: [PATCH 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-13 Thread George Spelvin
On Wed, 13 Mar 2019 at 23:29:40 +0100, Rasmus Villemoes wrote: > On 21/02/2019 09.21, George Spelvin wrote: >> +/** >> + * parent - given the offset of the child, find the offset of the parent. >> + * @i: the offset of the heap element whose parent is sought. Non-z

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-13 Thread George Spelvin
Thank you for your thoughtful comments! On Wed, 13 Mar 2019 at 23:23:44 +0100, Rasmus Villemoes wrote: > On 21/02/2019 07.30, George Spelvin wrote: > + * @align: required aignment (typically 4 or 8) > > typo aLignment Thanks; fixed! >> + * Returns true if elements can be copie

Re: [PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-10 Thread George Spelvin
Rasmus Villemoes wrote: > On 05/03/2019 04.06, George Spelvin wrote: >> + * (Actually, it is always called with @a being the element which was >> + * originally first, so it is not necessary to to distinguish the @a < @b >> + * and @a == @b cases; the return value may be a

Re: [PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-09 Thread George Spelvin
Andy Shevchenko wrote: > Shouldn't simple memcpy cover these case for both 32- and 64-bit > architectures? Speaking of replacing swap with copying via temporary buffers, one idea that did come to mind was avoiding swap for sufficiently small objects. Every sift-down is actually a circular shift.

[PATCH 1/5] lib/sort: Make swap functions more generic

2019-03-08 Thread George Spelvin
, very few users of sort() sort pointers (or pointer-sized objects); most sort structures containing at least two words. (E.g. drivers/acpi/fan.c:acpi_fan_get_fps() sorts an array of 40-byte struct acpi_fan_fps.) x86-64 code size 872 -> 885 bytes (+8) Signed-off-by: George Spelvin --- lib

[PATCH 4/5] lib/list_sort: Simplify and remove MAX_LIST_LENGTH_BITS

2019-03-08 Thread George Spelvin
n "__attribute__((nonnull(2,3,4,5)))". Checkpatch is wrong.) Signed-off-by: George Spelvin --- include/linux/list_sort.h | 1 + lib/list_sort.c | 152 -- 2 files changed, 96 insertions(+), 57 deletions(-) diff --git a/include/linux/l

[PATCH 5/5] lib/list_sort: Optimize number of calls to comparison function

2019-03-08 Thread George Spelvin
1998.0986 https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.5380 Queue-Mergesort Mordecai J. Golin, Robert Sedgewick Information Processing Letters, 48(5):253--259, 10 December 1993 https://doi.org/10.1016/0020-0190(93)90088-q https://sci-hub.tw/10.1016/0020-0190(93)90088-Q

[PATCH 2/5] lib/sort: Use more efficient bottom-up heapsort variant

2019-03-08 Thread George Spelvin
770 bytes (-115) (I see the checkpatch complaint about "else if (n -= size)". The alternative is significantly uglier.) Signed-off-by: George Spelvin --- lib/sort.c | 102 +++-- 1 file changed, 75 insertions(+), 27 deletions(-) diff

[PATCH 3/5] lib/sort: Avoid indirect calls to built-in swap

2019-03-08 Thread George Spelvin
ction, there are a few additional (highly predictable) conditional branches on the code path.) This actually *shrinks* the x86-64 code, because it inlines the various swap functions inside do_swap, eliding function prologues & epilogues. x86-64 code size 770 -> 709 bytes (-61) Signed-of

[PATCH 0/5] lib/sort & lib/list_sort: faster and smaller

2019-03-08 Thread George Spelvin
heckpatch. I figure there will be at least one round of comments and final testing. George Spelvin (5): lib/sort: Make swap functions more generic lib/sort: Use more efficient bottom-up heapsort variant lib/sort: Avoid indirect calls to built-in swap lib/list_sort: Simplify and rem

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-28 Thread George Spelvin
around. > On Fri, 2016-12-23 at 20:17 -0500, George Spelvin wrote: >> For example, two mix-backs of 64 bits gives you 65 bit security, not 128. >> (Because each mixback can be guessed and verified separately.) > Exactly, but the full reseed after running out of entropy is strong &g

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > On 24.12.2016 00:39, George Spelvin wrote: >> We just finished discussing why 8 bytes isn't enough. If you only >> feed back 8 bytes, an attacker who can do 2^64 computation can find it >> (by guessing and computing forward to verify

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
Hannes Frederic Sowa wrote: > In general this looks good, but bitbuffer needs to be protected from > concurrent access, thus needing at least some atomic instruction and > disabling of interrupts for the locking if done outside of > get_random_long. Thus I liked your previous approach more where yo

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-23 Thread George Spelvin
(Cc: list trimmed slightly as the topic is wandering a bit.) Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote: >> Adding might_lock() annotations will improve coverage a lot. > > Might be hard to find the correct lock we take later down the code

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
Hannes Frederic Sowa wrote: > A lockdep test should still be done. ;) Adding might_lock() annotations will improve coverage a lot. > Yes, that does look nice indeed. Accounting for bits instead of bytes > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > case you can't sat

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> I do tend to like Ted's version in which we use batched > get_random_bytes() output. If it's fast enough, it's simpler and lets > us get the full strength of a CSPRNG. With the ChaCha20 generator, that's fine, although note that this abandons anti-backtracking entirely. It also takes locks, so

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> Having slept on this, I like it less. The problem is that a > backtracking attacker doesn't just learn H(random seed || entropy_0 || > secret || ...) -- they learn the internal state of the hash function > that generates that value. This probably breaks any attempt to apply > security propertie

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-22 Thread George Spelvin
> True, but it's called get_random_int(), and it seems like making it > stronger, especially if the performance cost is low to zero, is a good > thing. If it's cheap enough, I don't mind. But it's documented as being marginal-quality, used where speed is more important. In particular, it's *not*

Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)

2016-12-21 Thread George Spelvin
Andy Lutomirski wrote: > I don't even think it needs that. This is just adding a > non-destructive final operation, right? It is, but the problem is that SipHash is intended for *small* inputs, so the standard implementations aren't broken into init/update/final functions. There's just one big f

Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
> Plus the benchmark was bogus anyway, and when I built a more specific > harness -- actually comparing the TCP sequence number functions -- > SipHash was faster than MD5, even on register starved x86. So I think > we're fine and this chapter of the discussion can come to a close, in > order to mov

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
As a separate message, to disentangle the threads, I'd like to talk about get_random_long(). After some thinking, I still like the "state-preserving" construct that's equivalent to the current MD5 code. Yes, we could just do siphash(current_cpu || per_cpu_counter, global_key), but it's nice to pr

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Theodore Ts'o wrote: > On Wed, Dec 21, 2016 at 01:37:51PM -0500, George Spelvin wrote: >> SipHash annihilates the competition on 64-bit superscalar hardware. >> SipHash dominates the field on 64-bit in-order hardware. >> SipHash wins easily on 32-bit hardware *wit

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Eric Dumazet wrote: > Now I am quite confused. > > George said : >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >> HalfSipHash-2-4 12.7 4.5 3.2 >> MD5

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Linus wrote: >> How much does kernel_fpu_begin()/kernel_fpu_end() cost? > > It's now better than it used to be, but it's absolutely disastrous > still. We're talking easily many hundreds of cycles. Under some loads, > thousands. I think I've been thoroughly dissuaded, but just to clarify one thing

Re: HalfSipHash Acceptable Usage

2016-12-21 Thread George Spelvin
Actually, DJB just made a very relevant suggestion. As I've mentioned, the 32-bit performance problems are an x86-specific problem. ARM does very well, and other processors aren't bad at all. SipHash fits very nicely (and runs very fast) in the MMX registers. They're 64 bits, and there are 8 of

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Eric Dumazet wrote: > On Tue, 2016-12-20 at 22:28 -0500, George Spelvin wrote: >> Cycles per byte on 1024 bytes of data: >> Pentium Core 2 Ivy >> 4 Duo Bridge >> SipHash-2-4 38.9 8.3 5.8 >>

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
> I do not see why SipHash, if faster than MD5 and more secure, would be a > problem. Because on 32-bit x86, it's slower. Cycles per byte on 1024 bytes of data: Pentium Core 2 Ivy 4 Duo Bridge SipHash-2-4 38.9 8.3 5.8

Re: HalfSipHash Acceptable Usage

2016-12-20 Thread George Spelvin
Theodore Ts'o wrote: > On Mon, Dec 19, 2016 at 06:32:44PM +0100, Jason A. Donenfeld wrote: >> 1) Anything that requires actual long-term security will use >> SipHash2-4, with the 64-bit output and the 128-bit key. This includes >> things like TCP sequence numbers. This seems pretty uncontroversial

RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-19 Thread George Spelvin
David Laight wrote: > From: George Spelvin ... >> uint32_t >> hsiphash24(char const *in, size_t len, uint32_t const key[2]) >> { >> uint32_t c = key[0]; >> uint32_t d = key[1]; >> uint32_t a = 0x6c796765 ^ 0x736f6d65; >> uint32_t

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
To follow up on my comments that your benchmark results were peculiar, here's my benchmark code. It just computes the hash of all n*(n+1)/2 possible non-empty substrings of a buffer of n (called "max" below) bytes. "cpb" is "cycles per byte". (The average length is (n+2)/3, c.f. https://oeis.org

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-17 Thread George Spelvin
BTW, here's some SipHash code I wrote for Linux a while ago. My target application was ext4 directory hashing, resulting in different implementation choices, although I still think that a rolled-up implementation like this is reasonable. Reducing I-cache impact speeds up the calling code. One th

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> I already did this. Check my branch. Do you think it should return "u32" (as you currently have it) or "unsigned long"? I thought the latter, since it doesn't cost any more and makes more > I wonder if this could also lead to a similar aliasing > with arch_get_random_int, since I'm pretty sur

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> 64-bit security for an RNG is not reasonable even with rekeying. No no > no. Considering we already have a massive speed-up here with the > secure version, there's zero reason to start weakening the security > because we're trigger happy with our benchmarks. No no no. Just to clarify, I was disc

Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
An idea I had which mght be useful: You could perhaps save two rounds in siphash_*u64. The final word with the length (called "b" in your implementation) only needs to be there if the input is variable-sized. If every use of a given key is of a fixed-size input, you don't need a length suffix.

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> What should we do with get_random_int() and get_random_long()? In > some cases it's being used in performance sensitive areas, and where > anti-DoS protection might be enough. In others, maybe not so much. This is tricky. The entire get_random_int() structure is an abuse of the hash function

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > I saw that jiffies addition in there and was wondering what it was all > about. It's currently added _after_ the siphash input, not before, to > keep with how the old algorithm worked. I'm not sure if this is > correct or if there's something wrong with that, as I haven'

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
Tom Herbert wrote: > Tested this. Distribution and avalanche effect are still good. Speed > wise I see about a 33% improvement over siphash (20 nsecs/op versus 32 > nsecs). That's about 3x of jhash speed (7 nsecs). So that might closer > to a more palatable replacement for jhash. Do we lose any sec

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
>> On a 64-bit machine, 64-bit SipHash is *always* faster than 32-bit, and >> should be used always. Don't even compile the 32-bit code, to prevent >> anyone accidentally using it, and make hsiphash an alias for siphash. > Fascinating! Okay. So I'll alias hsiphash to siphash on 64-bit then. I > l

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-16 Thread George Spelvin
> It appears that hsiphash can produce either 32-bit output or 64-bit > output, with the output length parameter as part of the hash algorithm > in there. When I code this for my kernel patchset, I very likely will > only implement one output length size. Right now I'm leaning toward > 32-bit. A 1

RE: [PATCH v5 2/4] siphash: add Nu{32,64} helpers

2016-12-16 Thread George Spelvin
Jason A. Donenfeld wrote: > Isn't that equivalent to: > v0 = key[0]; > v1 = key[1]; > v2 = key[0] ^ (0x736f6d6570736575ULL ^ 0x646f72616e646f6dULL); > v3 = key[1] ^ (0x646f72616e646f6dULL ^ 0x7465646279746573ULL); (Pre-XORing key[] with the first two constants which, if the

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
Jean-Philippe Aumasson wrote: > If a halved version of SipHash can bring significant performance boost > (with 32b words instead of 64b words) with an acceptable security level > (64-bit enough?) then we may design such a version. It would be fairly significant, a 2x speed benefit on a lot of 32-b

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> If a halved version of SipHash can bring significant performance boost > (with 32b words instead of 64b words) with an acceptable security level > (64-bit enough?) then we may design such a version. I was thinking if the key could be pushed to 80 bits, that would be nice, but honestly 64 bits is

Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF

2016-12-15 Thread George Spelvin
> While SipHash is extremely fast for a cryptographically secure function, > it is likely a tiny bit slower than the insecure jhash, and so replacements > will be evaluated on a case-by-case basis based on whether or not the > difference in speed is negligible and whether or not the current jhash u

Re: [PATCH] siphash: add cryptographically secure hashtable function

2016-12-10 Thread George Spelvin
> There's a 32-bit secret random salt (inet_ehash_secret) which means > that in practice, inet_ehashfn() will select 1 out of 2^32 different > hash functions at random each time you boot the kernel; without > knowing which one it selected, how can a local or remote attacker can > force IPv4 connect

Re: next build: 143 builds: 1 failed, 142 passed, 1 error, 22 warnings (next-20160801)

2016-08-04 Thread George Spelvin
> As these patches were meant for generic non-m68k code, I didn't plan > to take them through my tree. Well, that explains everything. Sorry for letting it fall through the cracks. I assumed that since you send pull requests to Linus regularly, you'd send it directly. That's why I only acked th

Re: next build: 143 builds: 1 failed, 142 passed, 1 error, 22 warnings (next-20160801)

2016-08-01 Thread George Spelvin
Arnd Bergmann wrote: >> Warnings: >> lib/test_hash.c:224:7: warning: "HAVE_ARCH__HASH_32" is not defined >> [-Wundef] >> lib/test_hash.c:229:7: warning: "HAVE_ARCH_HASH_32" is not defined >> [-Wundef] >> lib/test_hash.c:234:7: warning: "HAVE_ARCH_HASH_64" is not defined >> [-Wundef]

Re: [PATCH] firmware: declare __{start,end}_builtin_fw as pointers

2016-06-28 Thread George Spelvin
+#define external_array(type, name) \ + ({ \ + extern type name[]; \ + type *name_ptr = name; \ + asm ("" : "+r" (name_ptr)); \ + name_ptr; \ + }) I've had to pull similar tricks to persuade GCC to generate the code I wanted (in m

Re: [LTP] [patch V2 00/20] timer: Refactor the timer wheel

2016-06-23 Thread George Spelvin
Cyril Hrubis wrote: > Thomas Gleixner wrote: >> Err. You know that the timer expired because sigtimedwait() returns >> EAGAIN. And the only thing you can reliably check for is that the timer did >> not expired to early. Anything else is guesswork and voodoo programming. > But seriously is there a

Re: [PATCH v5 0/7] /dev/random - a new approach

2016-06-20 Thread George Spelvin
> With that being said, wouldn't it make sense to: > > - Get rid of the entropy heuristic entirely and just assume a fixed value of > entropy for a given event? What does that gain you? You can always impose an upper bound, but *some* evidence that it's not a metronome is nice to have. > - rem

Re: [patch V2 12/20] timer: Switch to a non cascading wheel

2016-06-18 Thread George Spelvin
I want to read this even more, but here's a dump of my comments so far... > 1) Cascading is avoided (except for extreme long time timers) > + * Note: This implementation might be suboptimal vs. timers enqueued in the > + *cascade level because we do not look at the timers to figure out when >

Re: [PATCH] tools/perf: fix the word selected in find_*_bit

2016-06-15 Thread George Spelvin
Madhavan Srinivasan wrote: > +#if (__BYTE_ORDER == __BIG_ENDIAN) && (BITS_PER_LONG != 64) > + tmp = addr[(((nbits - 1)/BITS_PER_LONG) - (start / BITS_PER_LONG))] > + ^ invert; > +#else > tmp = addr[start / BITS_PER_LONG] ^ invert

Re: [patch 13/20] timer: Switch to a non cascading wheel

2016-06-14 Thread George Spelvin
Thomas Gleixner wrote: > On Tue, 14 Jun 2016, George Spelvin wrote: >> If you need to shrink TIMER_ARRAYMASK to fit another flag bit, > > We can accomodate wheel with 512 buckets with the current ARRAYMASK and that > really should be enough. You're absolutely correct, but

Re: [patch 13/20] timer: Switch to a non cascading wheel

2016-06-14 Thread George Spelvin
Peter Zijlstra wrote: > That did occur to me as well; however I think it would be best to > eradicate all forms of cascading entirely -- if at all possible. Agreed. >> to be replaced with __builtin_clz or similar: > Problem is for the archs that don't have that, the 5 layer branch is > trivial f

Re: [patch 13/20] timer: Switch to a non cascading wheel

2016-06-14 Thread George Spelvin
On Tue, 14 Jun 2016, Thomas Gleixner wrote: > I thought about that and when looking at those long timeout thingies > I came to the conclusion that it's simply not worth the trouble. Okay. A comment might be nice, just to stop someone else wasting brain power on it. E.g. /* * If the timer happe

Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

2016-06-14 Thread George Spelvin
Boris Brezillon wrote: > On 12 Jun 2016 16:24:53 George Spelvin wrote: >> Boris Brezillon wrote: >> My problem is that I don't really understand MLC programming. > I came to the same conclusion: we really have these 2 cases in the > wild, which makes it even more compli

Re: [patch 13/20] timer: Switch to a non cascading wheel

2016-06-14 Thread George Spelvin
Nice cleanup! I think I see a buglet in your level-5 cascading. Suppose a timer is requested far in the future for a time that is an exact multiple of 32768 jiffies. collect_expired_timers() scans level 5 after all the previous ones, and will cascade it to level 0, in a level-0 bucket which has

Re: [patch 11/20] hlist: Add hlist_is_last_node() helper

2016-06-13 Thread George Spelvin
Shouldn't this function be called hlist_is_only_node()? hlist_is_last_node would be simply !n->next, as "last" doesn't imply the lack of others. I understand how the name might make sense in the context of removing entries from the list, but in other contexts, it's confusing.

[PATCH v2] vfs: add simple direct-mapped dcache lookup front-end

2016-06-12 Thread George Spelvin
;, but it's quite noticeably slower than the new streamlined __d_lookup_rcu(). And as you can tell, that means that we must have a 90%+ hitrate in the new L1 dcache lookup, since we only see 10% as much time in the slow routine as in the L1 front-end. [George Spelvin speaking] I fixed th

Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

2016-06-12 Thread George Spelvin
Boris Brezillon wrote: > On 12 Jun 2016 08:25:49 -0400 > "George Spelvin" wrote: >> (In fact, an interesting >> question is whether bad pages should be skipped or not!) > > There's no such thing. We have bad blocks, but when a block is bad all > the page

Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

2016-06-12 Thread George Spelvin
>> (Another thing I thought of, but am less sure of, is packing the group >> and pair numbers into a register-passable int rather than a structure. >> Even 2 bits for the group is probably the most that will ever be needed, >> but it's easy to say the low 4 bits are the group and the high 28 are >>

Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

2016-06-12 Thread George Spelvin
>> It also applies an offset of +1, to avoid negative numbers and the >> problems of signed divides. > It seems to cover all cases. I wasn't sure why you used a signed int for the interface. (Another thing I thought of, but am less sure of, is packing the group and pair numbers into a register-p

[PATCH] vfs: add simple direct-mapped dcache lookup front-end

2016-06-11 Thread George Spelvin
;, but it's quite noticeably slower than the new streamlined __d_lookup_rcu(). And as you can tell, that means that we must have a 90%+ hitrate in the new L1 dcache lookup, since we only see 10% as much time in the slow routine as in the L1 front-end. [George Spelvin speaking] I fixed th

Re: [PATCH 2/4] mtd: nand: implement two pairing scheme

2016-06-11 Thread George Spelvin
lying all of this, the corrected code looks like the following: (Note the addition of "const" and "__pure" annotations, which should get copied to the "struct mtd_pairing_scheme" declaration.) Signed-off-by: George Spelvin /* * In distance-3 pairing, odd pages i ar

Re: [PATCH v2 3/2] lib/uuid.c: Silence an unchecked return value warning

2016-06-07 Thread George Spelvin
Andy Shevchenko wrote: > Something wrong with mail configuration? Oops, sorry, I forgot to delete the header. > On Sun, 2016-06-05 at 15:25 -0400, George Spelvin wrote: >> It's also faster, as hex_to_bin() *is* inlined within hex2bin() >> (if you compile with -O).

Re: [PATCH v2 1/2] lib/vsprintf.c: Simplify uuid_string()

2016-06-05 Thread George Spelvin
r >From andriy.shevche...@linux.intel.com Sun Jun 05 14:21:40 2016 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,421,1459839600"; d="scan'208";a="969274163" Subject: Re: [PATCH v2 1/2] lib/vsprintf.c: Simplify uuid_string() From: Andy Shevchenko To: Ge

Re: [PATCH v2 3/2] lib/uuid.c: Silence an unchecked return value warning

2016-06-05 Thread George Spelvin
>From andriy.shevche...@linux.intel.com Sun Jun 05 14:19:48 2016 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,421,1459839600"; d="scan'208";a="995605979" Subject: Re: [PATCH v2 3/2] lib/uuid.c: Silence an unchecked return value warning From: Andy

Re: [PATCH v2 2/2] lib/uuid.c: eliminate uuid_[bl]e_index arrays

2016-06-04 Thread George Spelvin
Joe Perches wrote: > And Ingo commented last month: > https://lkml.org/lkml/2016/4/29/69 > > Maybe this __uuid_to_bin function should be made public and > the acpi version in drivers/acpi/acpica/utuuid.c should be > removed. I agree with the second part, but not the first; the uuid_le_to_bin() wra

[PATCH v2 3/2] lib/uuid.c: Silence an unchecked return value warning

2016-06-04 Thread George Spelvin
-24.4% arm 92 104 +12 13.0% 116 -12 -10.3% thumb 50 62 +12 24.0% 100 -38 -38.0% arm64 116 124 +8 6.9%148 -24 -16.2% Signed-off-by: George Spelvin --- lib/uuid.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

[PATCH v2 2/2] lib/uuid.c: eliminate uuid_[bl]e_index arrays

2016-06-03 Thread George Spelvin
: George Spelvin --- include/linux/uuid.h | 5 +++-- lib/uuid.c | 24 +--- lib/vsprintf.c | 26 ++ 3 files changed, 22 insertions(+), 33 deletions(-) diff --git a/include/linux/uuid.h b/include/linux/uuid.h index 2d095fc6..238f16f7 100644

[PATCH v2 1/2] lib/vsprintf.c: Simplify uuid_string()

2016-06-03 Thread George Spelvin
Percentage x86-32 245 199 -46 -18.8% x86-64 246 186 -60 -24.4% arm 292 264 -28 -9.6% thumb 220 160 -60 -27.3% arm64 296 244 -52 -17.6% Signed-off-by: George Spelvin --- lib/vsprintf.c | 16 1 file changed, 8

  1   2   3   4   5   6   >