* Peter Zijlstra (pet...@infradead.org) wrote: [...]
Hi Peter, Looking at this simplified version of perf's ring buffer synchronization, I get concerned about the following issue: > /* > * One important detail is that the kbuf part and the kbuf_writer() are > * strictly per cpu and we can thus rely on program order for those. > * > * Only the userspace consumer can possibly run on another cpu, and thus we > * need to ensure data consistency for those. > */ > > struct buffer { > u64 size; > u64 tail; > u64 head; > void *data; > }; > > struct buffer *kbuf, *ubuf; > > /* > * If there's space in the buffer; store the data @buf; otherwise > * discard it. > */ > void kbuf_write(int sz, void *buf) > { > u64 tail, head, offset; > > do { > tail = ACCESS_ONCE(ubuf->tail); > offset = head = kbuf->head; > if (CIRC_SPACE(head, tail, kbuf->size) < sz) { > /* discard @buf */ > return; > } > head += sz; > } while (local_cmpxchg(&kbuf->head, offset, head) != offset) > Let's suppose we have a thread executing kbuf_write(), interrupted by an IRQ or NMI right after a successful local_cmpxchg() (space reservation in the buffer). If the nested execution context also calls kbuf_write(), it will therefore update ubuf->head (below) with the second reserved space, and only after that will it return to the original thread context and continue executing kbuf_write(), thus overwriting ubuf->head with the prior-to-last reserved offset. All this probably works OK most of the times, when we have an event flow guaranteeing that a following event will fix things up, but there appears to be a risk of losing events near the end of the trace when those are in nested execution contexts. Thoughts ? Thanks, Mathieu > /* > * Ensure that if we see the userspace tail (ubuf->tail) such > * that there is space to write @buf without overwriting data > * userspace hasn't seen yet, we won't in fact store data before > * that read completes. > */ > > smp_mb(); /* A, matches with D */ > > memcpy(kbuf->data + offset, buf, sz); > > /* > * Ensure that we write all the @buf data before we update the > * userspace visible ubuf->head pointer. > */ > smp_wmb(); /* B, matches with C */ > > ubuf->head = kbuf->head; > } > > /* > * Consume the buffer data and update the tail pointer to indicate to > * kernel space there's 'free' space. > */ > void ubuf_read(void) > { > u64 head, tail; > > tail = ACCESS_ONCE(ubuf->tail); > head = ACCESS_ONCE(ubuf->head); > > /* > * Ensure we read the buffer boundaries before the actual buffer > * data... > */ > smp_rmb(); /* C, matches with B */ > > while (tail != head) { > obj = ubuf->data + tail; > /* process obj */ > tail += obj->size; > tail %= ubuf->size; > } > > /* > * Ensure all data reads are complete before we issue the > * ubuf->tail update; once that update hits, kbuf_write() can > * observe and overwrite data. > */ > smp_mb(); /* D, matches with A */ > > ubuf->tail = tail; > } -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev