On Tue, Mar 29, 2016 at 05:29:12PM +0200, Olivier MATZ wrote: > Hi, > > > On 03/29/2016 10:54 AM, Bruce Richardson wrote: > >On Mon, Mar 28, 2016 at 06:48:07PM +0300, Lazaros Koromilas wrote: > >>Hi Olivier, > >> > >>We could have two threads (running on different cores in the general > >>case) that both succeed the cmpset operation. In the dequeue path, > >>when n == 0, then cons_next == cons_head, and cmpset will always > >>succeed. Now, if they both see an old r->cons.tail value from a > >>previous dequeue, they can get stuck in the while > > > >Hi, > > > >I don't see how threads reading an "old r->cons.tail" value is even possible. > >The head and tail pointers on the ring are marked in the code as volatile, so > >all reads and writes to those values are always done from memory and not > >cached > >in registers. No deadlock should be possible on that while loop, unless a > >process crashes in the middle of a ring operation. Each thread which updates > >the head pointer from x to y, is responsible for updating the tail pointer in > >a similar manner. The loop ensures the tail updates are in the same order as > >the > >head updates. > > > >If you believe deadlock is possible, can you outline the sequence of > >operations > >which would lead to such a state, because I cannot see how it could occur > >without > >a crash inside one of the threads. > > I think the deadlock Lazaros describes could occur in the following > condition: > > current ring state > r->prod.head = 0 > r->prod.tail = 0 > > core 0 core 1 > ==================================================================== > enqueue 0 object > cmpset(&r->prod.head, 0, 0) > core 0 is interrupted here > enqueue 1 object > cmpset(&r->prod.head, 0, 1) > copy the objects in box 0 > while (r->prod.tail != prod_head)) > r->prod.tail = prod_next > copy 0 object (-> nothing to do) > while (r->prod.tail != prod_head)) > <loop forever> > > > I think this issue is indeed fixed by Lazaros' patch (I missed it > in previous review). However, I don't think this deadlock could > happen once we avoided the (n == 0) case. > Yes, I agree with your scenario. However, I thought the issue was occuring with non-zero updates, but I may well be mistaken. If it's fixed now, all good... :-)
/Bruce