On Tue 2019-08-27 00:36:18, John Ogness wrote: > Hi Petr, > > AndreaP responded with some explanation (and great links!) on the topic > of READ_ONCE. But I feel like your comments about the WRITE_ONCE were > not addressed. I address that (and your other comments) below... > > On 2019-08-23, Petr Mladek <pmla...@suse.com> wrote: > >> --- /dev/null > >> +++ b/kernel/printk/numlist.c > >> +/** > >> + * numlist_push() - Add a node to the list and assign it a sequence > >> number. > >> + * > >> + * @nl: The numbered list to push to. > >> + * > >> + * @n: A node to push to the numbered list. > >> + * The node must not already be part of a list. > >> + * > >> + * @id: The ID of the node. > >> + * > >> + * A node is added in two steps: The first step is to make this node the > >> + * head, which causes a following push to add to this node. The second > >> step is > >> + * to update @next_id of the former head node to point to this one, which > >> + * makes this node visible to any task that sees the former head node. > >> + */ > >> +void numlist_push(struct numlist *nl, struct nl_node *n, unsigned long id) > >> +{ [...] > >> + > >> + /* bB: #1 */ > >> + head_id = atomic_long_read(&nl->head_id); > >> + > >> + for (;;) { > >> + /* bC: */ > >> + while (!numlist_read(nl, head_id, &seq, NULL)) { > >> + /* > >> + * @head_id is invalid. Try again with an > >> + * updated value. > >> + */ > >> + > >> + cpu_relax(); > > > > I have got very confused by this. cpu_relax() suggests that this > > cycle is busy waiting until a particular node becomes valid. > > My first though was that it must cause deadlock in NMI when > > the interrupted code is supposed to make the node valid. > > > > But it is the other way. The head is always valid when it is > > added to the list. It might become invalid when another CPU > > moves the head and the old one gets reused. > > > > Anyway, I do not see any reason for cpu_relax() here. > > You are correct. The cpu_relax() should not be there. But there is still > an issue that this could spin hard if the head was recycled and this CPU > does not yet see the new head value.
I do not understand this. The head could get reused only after head_id was replaced with the following valid node. The next cycle is done with a new id that should be valid. Of course, the new ID might get reused as well. But then we just repeat the cycle. We have to be able to find a valid head after few cycles. The last valid ID could not get reused because nodes can be removed only if was not the last valid node. > To handle that, and in preparation for my next version, I'm now using a > read_acquire() to load the ID in the node() callback (matching the > set_release() in assign_desc()). This ensures that if numlist_read() > fails, the new head will be visible. I do not understand this either. The above paragraph seems to describe a race. I do not see how it could cause an infinite loop. Best Regards, Petr