jump to the regions, but I
didn't notice any difference in testing a reasonable number of regions
(less than 100). We could set a max limit?
2) Additional checks in ptrace to single step over critical sections.
We also prevent setting breakpoints, as these also seem to confuse
gdb some
Introduce the notion of 'restartable sequence'. This is a user-defined range
within which we guarantee user-execution will occur serially with respect
to scheduling events such as migration or competition with other threads.
Preemption, or other interruption within this region, results in control
Implements basic tests of RSEQ functionality.
"basic_percpu_ops_test" implements a few simple per-cpu operations and
testing their correctness.
---
tools/testing/selftests/rseq/Makefile | 14 +
.../testing/selftests/rseq/basic_percpu_ops_test.c | 331
Implements the x86 (i386 & x86-64) ABIs for interrupting and restarting
execution within restartable sequence sections.
Ptrace is modified to single step over the entire critical region.
---
arch/x86/entry/common.c | 3 ++
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
On Thu, Oct 22, 2015 at 12:11:42PM -0700, Andy Lutomirski wrote:
> On Thu, Oct 22, 2015 at 11:06 AM, Dave Watson wrote:
> >
> >RSS CPUIDLE LATENCYMS
> > jemalloc 4.0.0 31G 33% 390
> > jemalloc + this
4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
>
> Signed-off-by: Kees Cook
Thanks
Acked-by: Dave Watson
> ---
> net/tls/tls_sw.c | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 4dc766b
will be used by both AVX and AVX2.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 951 ---
1 file changed, 318 insertions(+), 633 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S
b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index
adds support for those
keysizes.
The final patch updates the C glue code, passing everything through
the crypt_by_sg() function instead of the previous memcpy based
routines.
Dave Watson (12):
x86/crypto: aesni: Merge GCM_ENC_DEC
x86/crypto: aesni: Introduce gcm_context_data
x86/crypto: a
stores to the new struct are always done unlaligned to
avoid compiler issues, see e5b954e8 "Use unaligned loads from
gcm_context_data"
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 378 +++
arch/x86/crypto/aesni-intel_glue.c | 58 ++-
Macro-ify function save and restore. These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 94 +---
1 file changed, 36 insertions(+), 58 deletions(-)
diff --git a/arch/x86
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 8
1 file changed, 8 insertions(+)
diff --git a/arch/x86/crypto
that this diff depends on using gcm_context_data - 256 bit keys
require 16 HashKeys + 15 expanded keys, which is larger than
struct crypto_aes_ctx, so they are stored in struct gcm_context_data.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 188 +--
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 51
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 228 ++-
arch/x86/crypto/aesni-intel_glue.c
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.
Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.
The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson
---
arch/x86
The precompute functions differ only by the sub-macros
they call, merge them to a single macro. Later diffs
add more code to fill in the gcm_context_data structure,
this allows changes in a single place.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 76
calls.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64.S | 102 +--
1 file changed, 59 insertions(+), 43 deletions(-)
diff --git a/arch/x86/crypto/aesni-intel_avx-x86_64.S
b/arch/x86/crypto/aesni-intel_avx-x86_64.S
index 44a4a8b43ca4..ff00ad19064d 100644
necessary encryption/decryption routines.
GENX_OPTSIZE is still checked at the start of crypt_by_sg. The
total size of the data is checked, since the additional overhead
is in the init function, calculating additional HashKeys.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_avx-x86_64
On 10/27/15 04:56 PM, Paul Turner wrote:
> This series is a new approach which introduces an alternate ABI that does not
> depend on open-coded assembly nor a central 'repository' of rseq sequences.
> Sequences may now be inlined and the preparatory[*] work for the sequence can
> be written in a hi
https://lwn.net/Articles/657999/
3) NIC offload. To support running aesni routines on the NIC instead
of the processor, we would probably need enough of the framing
interface put in kernel.
Dave Watson (2):
Crypto support aesni rfc5288
Crypto kernel tls socket
arch/x86/crypto/ae
diff --git a/crypto/algif_tls.c b/crypto/algif_tls.c
new file mode 100644
index 000..123ade3
--- /dev/null
+++ b/crypto/algif_tls.c
@@ -0,0 +1,1233 @@
+/*
+ * algif_tls: User-space interface for TLS
+ *
+ * Copyright (C) 2015, Dave Watson
+ *
+ * This file provides the user-space API for AEAD c
Support rfc5288 using intel aesni routines. See also rfc5246.
AAD length is 13 bytes padded out to 16. Padding bytes have to be
passed in in scatterlist currently, which probably isn't quite the
right fix.
The assoclen checks were moved to the individual rfc stubs, and the
common routines suppor
On 11/23/15 02:27 PM, Sowmini Varadhan wrote:
> On (11/23/15 09:43), Dave Watson wrote:
> > Currently gcm(aes) represents ~80% of our SSL connections.
> >
> > Userspace interface:
> >
> > 1) A transform and op socket are created using the userspace crypto
In some of our hottest network services, fget_light + fput overhead
can represent 1-2% of the processes' total CPU usage. I'd like to
discuss ways to reduce this overhead.
One proposal we have been testing is removing the refcount increment
and decrement, and using some sort of safe memory reclam
>> +static inline __attribute__((always_inline))
>> +bool rseq_finish(struct rseq_lock *rlock,
>> + intptr_t *p, intptr_t to_write,
>> + struct rseq_state start_value)
>> This ABI looks like it will work fine for our use case. I don't think it
>> has been mentioned yet, but we may still need multi
On 08/19/16 02:24 PM, Josh Triplett wrote:
> On Fri, Aug 19, 2016 at 01:56:11PM -0700, Andi Kleen wrote:
> > > Nobody gets a cpu number just to get a cpu number - it's not a useful
> > > thing to benchmark. What does getcpu() so much that we care?
> >
> > malloc is the primary target I believe. Sa
14: merge enc/dec
also use new routine if cryptlen < AVX_GEN2_OPTSIZE
optimize case if assoc is already linear
Dave Watson (14):
x86/crypto: aesni: Merge INITIAL_BLOCKS_ENC/DEC
x86/crypto: aesni: Macro-ify func save/restore
x86/crypto: aesni: Add GCM_INIT macro
x86/
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.
Use macro counter \@ to simplify implementation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 298 ++
1 file changed, 48
Reduce code duplication by introducting GCM_INIT macro. This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 84
Macro-ify function save and restore. These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 53 ++-
1 file changed, 24 insertions(+), 29 deletions(-)
diff --git a
Make a macro for the main encode/decode routine. Only a small handful
of lines differ for enc and dec. This will also become the main
scatter/gather update routine.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 293 +++---
1 file changed
need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 205 --
1 file changed, 106
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 51
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 71 ---
1 file changed, 43
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.
Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.
The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount. Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e2 for the average case.
Signed-off-by: Dave Watson
---
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson
---
arch/x86
The asm macros are all set up now, introduce entry points.
GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S
ous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_glue.c | 133 +
1 file changed, 133 insertions(+)
diff --git a/arch/x86/crypto/ae
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls. It is passed
as the second argument (after crypto keys), other args are
renumbered.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 115
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 172 ++
1 file changed, 63 insertions
Macro-ify function save and restore. These will be used in new functions
added for scatter/gather update operations.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 53 ++-
1 file changed, 24 insertions(+), 29 deletions(-)
diff --git a
Use macro operations to merge implemetations of INITIAL_BLOCKS,
since they differ by only a small handful of lines.
Use macro counter \@ to simplify implementation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 298 ++
1 file changed, 48
Reduce code duplication by introducting GCM_INIT macro. This macro
will also be exposed as a function for implementing scatter/gather
support, since INIT only needs to be called once for the full
operation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 84
Merge encode and decode tag calculations in GCM_COMPLETE macro.
Scatter/gather routines will call this once at the end of encryption
or decryption.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 172 ++
1 file changed, 63 insertions
AAD hash only needs to be calculated once for each scatter/gather operation.
Move it to its own macro, and call it from GCM_INIT instead of
INITIAL_BLOCKS.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 71 ---
1 file changed, 43
%aes_loop_initial_4974
1.27%gcmaes_encrypt_sg.constprop.15
Dave Watson (14):
x86/crypto: aesni: Merge INITIAL_BLOCKS_ENC/DEC
x86/crypto: aesni: Macro-ify func save/restore
x86/crypto: aesni: Add GCM_INIT macro
x86/crypto: aesni: Add GCM_COMPLETE macro
x86/crypto: aesni: Merge encode and
Prepare to handle partial blocks between scatter/gather calls.
For the last partial block, we only want to calculate the aadhash
in GCM_COMPLETE, and a new partial block macro will handle both
aadhash update and encrypting partial blocks between calls.
Signed-off-by: Dave Watson
---
arch/x86
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values.
pblocklen, aadhash, and pblockenckey are also updated at the end
of each scatter/gather operation, to be carried over to the next
operation.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 51
Before this diff, multiple calls to GCM_ENC_DEC will
succeed, but only if all calls are a multiple of 16 bytes.
Handle partial blocks at the start of GCM_ENC_DEC, and update
aadhash as appropriate.
The data offset %r11 is also updated after the partial block.
Signed-off-by: Dave Watson
We can fast-path any < 16 byte read if the full message is > 16 bytes,
and shift over by the appropriate amount. Usually we are
reading > 16 bytes, so this should be faster than the READ_PARTIAL
macro introduced in b20209c91e2 for the average case.
Signed-off-by: Dave Watson
---
need to be calculated once per key and could
be moved to when set_key is called, however, the current glue code
falls back to generic aes code if fpu is disabled.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 205 --
1 file changed, 106
Make a macro for the main encode/decode routine. Only a small handful
of lines differ for enc and dec. This will also become the main
scatter/gather update routine.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 293 +++---
1 file changed
them out
with scatterlist_map_and_copy.
Only the SSE routines are updated so far, so leave the previous
gcmaes_en/decrypt routines, and branch to the sg ones if the
keysize is inappropriate for avx, or we are SSE only.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_glue.c | 166
Introduce a gcm_context_data struct that will be used to pass
context data between scatter/gather update calls. It is passed
as the second argument (after crypto keys), other args are
renumbered.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S | 115
The asm macros are all set up now, introduce entry points.
GCM_INIT and GCM_COMPLETE have arguments supplied, so that
the new scatter/gather entry points don't have to take all the
arguments, and only the ones they need.
Signed-off-by: Dave Watson
---
arch/x86/crypto/aesni-intel_asm.S
On 02/12/18 03:12 PM, Junaid Shahid wrote:
> Hi Dave,
>
>
> On 02/12/2018 11:51 AM, Dave Watson wrote:
>
> > +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int
> > assoclen,
> > + u8 *hash_subkey, u8 *iv, voi
On 02/13/18 08:42 AM, Stephan Mueller wrote:
> > +static int gcmaes_encrypt_sg(struct aead_request *req, unsigned int
> > assoclen, + u8 *hash_subkey, u8 *iv, void *aes_ctx)
> > +{
> > + struct crypto_aead *tfm = crypto_aead_reqtfm(req);
> > + unsigned long auth_tag_len = crypto
Hi Paul,
Thanks for looking at this again!
On 07/27/17 11:12 AM, Paul E. McKenney wrote:
> Hello!
>
> But my main question is whether the throttling shown below is acceptable
> for your use cases, namely only one expedited sys_membarrier() permitted
> per scheduling-clock period (1 millisecond
(no blocking). It also works on NOHZ_FULL configurations.
I tested this with our hazard pointer use case on x86_64, and it seems
to work great. We don't currently have any uses needing SHARED.
Tested-by: Dave Watson
Thanks!
https://github.com/facebook/folly/blob/master/folly/experimental/haz
Would pairing one rseq_start with two rseq_finish do the trick
there ?
>>>
>>> Yes, two rseq_finish works, as long as the extra rseq management overhead
>>> is not substantial.
>>
>> I've added a commit implementing rseq_finish2() in my rseq volatile
>> dev branch. You can fetch it at:
>
62 matches
Mail list logo