Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-02 Thread Geert Uytterhoeven
On Mon, Feb 29, 2016 at 1:41 PM, Mathieu Desnoyers wrote: > - On Feb 29, 2016, at 5:39 AM, Arnd Bergmann a...@arndb.de wrote: > >> On Monday 29 February 2016 11:32:21 Peter Zijlstra wrote: >>> On Sun, Feb 28, 2016 at 12:39:54AM +, Mathieu Desnoyers wrote: >>> >>> > /* This structure needs

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-02 Thread Peter Zijlstra
On Tue, Mar 01, 2016 at 01:47:38PM -0800, H. Peter Anvin wrote: > On 03/01/16 13:32, Peter Zijlstra wrote: > > On Tue, Mar 01, 2016 at 08:23:12PM +, Mathieu Desnoyers wrote: > >> I think it's important that user-space fast-paths can quickly > >> detect whether the feature is enabled without hav

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-01 Thread H. Peter Anvin
On 03/01/16 13:32, Peter Zijlstra wrote: > On Tue, Mar 01, 2016 at 08:23:12PM +, Mathieu Desnoyers wrote: >> I think it's important that user-space fast-paths can quickly >> detect whether the feature is enabled without having to rely on >> always reading a separate cache-line. I've put togethe

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-01 Thread Peter Zijlstra
On Tue, Mar 01, 2016 at 10:32:02PM +0100, Peter Zijlstra wrote: > > /* > > * Thread-local ABI rseq_seqnum field. > > * Updated by the kernel, and read by user-space with > > * single-copy atomicity semantics. Aligned on 32-bit. > > * Values: > >

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-01 Thread Peter Zijlstra
On Tue, Mar 01, 2016 at 08:23:12PM +, Mathieu Desnoyers wrote: > I think it's important that user-space fast-paths can quickly > detect whether the feature is enabled without having to rely on > always reading a separate cache-line. I've put together an ABI > proposal that take into account the

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-01 Thread Mathieu Desnoyers
- On Feb 29, 2016, at 5:35 AM, Peter Zijlstra pet...@infradead.org wrote: > On Sun, Feb 28, 2016 at 02:32:28PM +, Mathieu Desnoyers wrote: >> The part of ABI I'm trying to express here is for discoverability >> of available features by user-space. For instance, a kernel >> could be configu

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-01 Thread Mathieu Desnoyers
- On Mar 1, 2016, at 1:25 PM, H. Peter Anvin h...@zytor.com wrote: > On 02/27/16 16:39, Mathieu Desnoyers wrote: >> >> Very good points! Would the following interfaces be acceptable ? >> >> /* This structure needs to be aligned cache line size. */ >> struct thread_local_abi { >> int3

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-03-01 Thread H. Peter Anvin
On 02/27/16 16:39, Mathieu Desnoyers wrote: > > Very good points! Would the following interfaces be acceptable ? > > /* This structure needs to be aligned cache line size. */ > struct thread_local_abi { > int32_t cpu_id; /* Aligned on > 32-bit. */ > u

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread H. Peter Anvin
On February 29, 2016 4:41:49 AM PST, Mathieu Desnoyers wrote: > >Agreed that many architectures issue slower instructions when reading >from packed structures, which is unwanted. > And detrimental to atomicity. >Could we require that each field be naturally aligned and require that >they are pl

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread Arnd Bergmann
On Monday 29 February 2016 12:41:49 Mathieu Desnoyers wrote: > - On Feb 29, 2016, at 5:39 AM, Arnd Bergmann a...@arndb.de wrote: > > What's making things worse is that on some architectures, adding > > __packed will force access by bytes rather than just reading > > a 32-bit or 64-bit numbers

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread Mathieu Desnoyers
- On Feb 29, 2016, at 5:39 AM, Arnd Bergmann a...@arndb.de wrote: > On Monday 29 February 2016 11:32:21 Peter Zijlstra wrote: >> On Sun, Feb 28, 2016 at 12:39:54AM +, Mathieu Desnoyers wrote: >> >> > /* This structure needs to be aligned cache line size. */ >> > struct thread_local_abi {

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread Arnd Bergmann
On Monday 29 February 2016 11:32:21 Peter Zijlstra wrote: > On Sun, Feb 28, 2016 at 12:39:54AM +, Mathieu Desnoyers wrote: > > > /* This structure needs to be aligned cache line size. */ > > struct thread_local_abi { > > int32_t cpu_id; > > uint32_t rseq_seqnum; > > uint64_t rseq_post_c

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread Peter Zijlstra
On Sun, Feb 28, 2016 at 02:32:28PM +, Mathieu Desnoyers wrote: > The part of ABI I'm trying to express here is for discoverability > of available features by user-space. For instance, a kernel > could be configured with "CONFIG_RSEQ=n", and userspace should > not rely on the rseq fields of the

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread Peter Zijlstra
On Sun, Feb 28, 2016 at 12:39:54AM +, Mathieu Desnoyers wrote: > /* This structure needs to be aligned cache line size. */ > struct thread_local_abi { > int32_t cpu_id; > uint32_t rseq_seqnum; > uint64_t rseq_post_commit_ip; > /* Add new fields at the end. */ > } __attribute__((packe

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-29 Thread Peter Zijlstra
On Sat, Feb 27, 2016 at 10:35:28AM -0800, Linus Torvalds wrote: > On Sat, Feb 27, 2016 at 6:58 AM, Peter Zijlstra wrote: > > > > Paul's patches have the following structure: > > > > struct thread_local_abi { > > union { > > struct { > > u32 cpu_i

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-28 Thread Linus Torvalds
On Sun, Feb 28, 2016 at 5:07 AM, Geert Uytterhoeven wrote: > > __alignof__(u64) is not 8 on all architectures. Indeed, which is why I said "make sure it's 64-bit aligned". We do it manually for ABI structures (although we did have some discussion about adding a alignment directive, and then havin

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-28 Thread Mathieu Desnoyers
- On Feb 27, 2016, at 7:57 PM, Linus Torvalds torva...@linux-foundation.org wrote: > On Sat, Feb 27, 2016 at 4:39 PM, Mathieu Desnoyers > wrote: >> >> >> I'm particularly interested to know what are the best practices to >> deal with an extensible bitfield (the features mask). cpu_set_t >> a

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-28 Thread Geert Uytterhoeven
Hi Linus, On Sat, Feb 27, 2016 at 7:35 PM, Linus Torvalds wrote: > On Sat, Feb 27, 2016 at 6:58 AM, Peter Zijlstra wrote: >> >> Paul's patches have the following structure: >> >> struct thread_local_abi { >> union { >> struct { >> u32 cpu_id; >

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread Linus Torvalds
On Sat, Feb 27, 2016 at 4:39 PM, Mathieu Desnoyers wrote: > > > I'm particularly interested to know what are the best practices to > deal with an extensible bitfield (the features mask). cpu_set_t > and sigmask each seem to do their own thing. Quite frankly, why would the kernel ever touch anythi

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread Mathieu Desnoyers
- On Feb 27, 2016, at 1:35 PM, Linus Torvalds torva...@linux-foundation.org wrote: > On Sat, Feb 27, 2016 at 6:58 AM, Peter Zijlstra wrote: >> >> Paul's patches have the following structure: >> >> struct thread_local_abi { >> union { >> struct { >>

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread H. Peter Anvin
On February 27, 2016 10:35:28 AM PST, Linus Torvalds wrote: >On Sat, Feb 27, 2016 at 6:58 AM, Peter Zijlstra >wrote: >> >> Paul's patches have the following structure: >> >> struct thread_local_abi { >> union { >> struct { >> u32 cpu_id; >>

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread Linus Torvalds
On Sat, Feb 27, 2016 at 6:58 AM, Peter Zijlstra wrote: > > Paul's patches have the following structure: > > struct thread_local_abi { > union { > struct { > u32 cpu_id; > u32 seq; > }; >

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread H. Peter Anvin
On February 27, 2016 6:15:01 AM PST, Mathieu Desnoyers wrote: >- On Feb 27, 2016, at 1:24 AM, H. Peter Anvin h...@zytor.com wrote: > >> On 02/26/16 16:40, Mathieu Desnoyers wrote: I think it would be a good idea to make this a general pointer for >the kernel to be able to write

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread Peter Zijlstra
On Sat, Feb 27, 2016 at 02:15:01PM +, Mathieu Desnoyers wrote: > I'm concerned that this thread-local ABI structure may become messy. > Let's just imagine how we would first introduce a "cpu_id" field (int32_t), > and eventually add a "seqnum" field for rseq in the future (unsigned long). The

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-27 Thread Mathieu Desnoyers
- On Feb 27, 2016, at 1:24 AM, H. Peter Anvin h...@zytor.com wrote: > On 02/26/16 16:40, Mathieu Desnoyers wrote: >>> >>> I think it would be a good idea to make this a general pointer for the >>> kernel to >>> be able to write per thread state to user space, which obviously can't be >>> don

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread H. Peter Anvin
On 02/26/16 16:40, Mathieu Desnoyers wrote: I think it would be a good idea to make this a general pointer for the kernel to be able to write per thread state to user space, which obviously can't be done with the vDSO. This means the libc per thread startup should query the kernel for the size

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Mathieu Desnoyers
- On Feb 26, 2016, at 6:04 PM, H. Peter Anvin h...@zytor.com wrote: > On February 26, 2016 12:24:15 PM PST, Mathieu Desnoyers > wrote: >>- On Feb 26, 2016, at 1:01 PM, Thomas Gleixner t...@linutronix.de >>wrote: >> >>> On Fri, 26 Feb 2016, Mathieu Desnoyers wrote: - On Feb 26, 20

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread H. Peter Anvin
On February 26, 2016 12:24:15 PM PST, Mathieu Desnoyers wrote: >- On Feb 26, 2016, at 1:01 PM, Thomas Gleixner t...@linutronix.de >wrote: > >> On Fri, 26 Feb 2016, Mathieu Desnoyers wrote: >>> - On Feb 26, 2016, at 11:29 AM, Thomas Gleixner >t...@linutronix.de wrote: >>> > Right. There is

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Mathieu Desnoyers
- On Feb 26, 2016, at 1:01 PM, Thomas Gleixner t...@linutronix.de wrote: > On Fri, 26 Feb 2016, Mathieu Desnoyers wrote: >> - On Feb 26, 2016, at 11:29 AM, Thomas Gleixner t...@linutronix.de wrote: >> > Right. There is no point in having two calls and two update mechanisms for >> > a >> >

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Thomas Gleixner
On Fri, 26 Feb 2016, Mathieu Desnoyers wrote: > - On Feb 26, 2016, at 11:29 AM, Thomas Gleixner t...@linutronix.de wrote: > > Right. There is no point in having two calls and two update mechanisms for a > > very similar purpose. > > > > So let userspace have one struct where cpu/seq and whatev

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Mathieu Desnoyers
- On Feb 25, 2016, at 6:32 PM, Rasmus Villemoes li...@rasmusvillemoes.dk wrote: > On Wed, Feb 24 2016, Mathieu Desnoyers wrote: > >> >>Typically, a library or application will keep the cpu number >>cache in a thread-local storage variable, or other memory >>area

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Mathieu Desnoyers
- On Feb 26, 2016, at 11:29 AM, Thomas Gleixner t...@linutronix.de wrote: > On Fri, 26 Feb 2016, Peter Zijlstra wrote: >> On Thu, Feb 25, 2016 at 05:17:51PM +, Mathieu Desnoyers wrote: >> > - On Feb 25, 2016, at 12:04 PM, Peter Zijlstra pet...@infradead.org >> > wrote: >> > >> > > On

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Thomas Gleixner
On Fri, 26 Feb 2016, Peter Zijlstra wrote: > On Thu, Feb 25, 2016 at 05:17:51PM +, Mathieu Desnoyers wrote: > > - On Feb 25, 2016, at 12:04 PM, Peter Zijlstra pet...@infradead.org > > wrote: > > > > > On Thu, Feb 25, 2016 at 04:55:26PM +, Mathieu Desnoyers wrote: > > >> - On Feb 2

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-26 Thread Peter Zijlstra
On Thu, Feb 25, 2016 at 05:17:51PM +, Mathieu Desnoyers wrote: > - On Feb 25, 2016, at 12:04 PM, Peter Zijlstra pet...@infradead.org wrote: > > > On Thu, Feb 25, 2016 at 04:55:26PM +, Mathieu Desnoyers wrote: > >> - On Feb 25, 2016, at 4:56 AM, Peter Zijlstra pet...@infradead.org

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-25 Thread Rasmus Villemoes
On Wed, Feb 24 2016, Mathieu Desnoyers wrote: > >Typically, a library or application will keep the cpu number >cache in a thread-local storage variable, or other memory >areas belonging to each thread. It is recommended to perform >a volatile read of the cp

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-25 Thread Mathieu Desnoyers
- On Feb 25, 2016, at 12:04 PM, Peter Zijlstra pet...@infradead.org wrote: > On Thu, Feb 25, 2016 at 04:55:26PM +, Mathieu Desnoyers wrote: >> - On Feb 25, 2016, at 4:56 AM, Peter Zijlstra pet...@infradead.org wrote: >> The restartable sequences are intrinsically designed to work >> on

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-25 Thread Peter Zijlstra
On Thu, Feb 25, 2016 at 04:55:26PM +, Mathieu Desnoyers wrote: > - On Feb 25, 2016, at 4:56 AM, Peter Zijlstra pet...@infradead.org wrote: > The restartable sequences are intrinsically designed to work > on per-cpu data, so they need to fetch the current CPU number > within the rseq critica

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-25 Thread Mathieu Desnoyers
- On Feb 25, 2016, at 4:56 AM, Peter Zijlstra pet...@infradead.org wrote: > On Tue, Feb 23, 2016 at 06:28:36PM -0500, Mathieu Desnoyers wrote: >> This approach is inspired by Paul Turner and Andrew Hunter's work >> on percpu atomics, which lets the kernel handle restart of critical >> sections

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-25 Thread Peter Zijlstra
On Tue, Feb 23, 2016 at 06:28:36PM -0500, Mathieu Desnoyers wrote: > This approach is inspired by Paul Turner and Andrew Hunter's work > on percpu atomics, which lets the kernel handle restart of critical > sections. [1] [2] So I'd like a few extra words on the intersection with that work. Yes, t

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-24 Thread Mathieu Desnoyers
- On Feb 24, 2016, at 6:11 AM, Thomas Gleixner t...@linutronix.de wrote: > On Tue, 23 Feb 2016, Mathieu Desnoyers wrote: >> +/* >> + * If parent process has a thread-local ABI, the child inherits. Only >> applies >> + * when forking a process, not a thread. >> + */ >> +void getcpu_cache_fork(

Re: [PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-24 Thread Thomas Gleixner
On Tue, 23 Feb 2016, Mathieu Desnoyers wrote: > +/* > + * If parent process has a thread-local ABI, the child inherits. Only applies > + * when forking a process, not a thread. > + */ > +void getcpu_cache_fork(struct task_struct *t) > +{ > + t->cpu_cache = current->cpu_cache; > +} > + > +void g

[PATCH v4 1/5] getcpu_cache system call: cache CPU number of running thread

2016-02-23 Thread Mathieu Desnoyers
Expose a new system call allowing threads to register one userspace memory area where to store the CPU number on which the calling thread is running. Scheduler migration sets the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU