On 2/12/25 17:42, Alexander Atanasov wrote:
On 12.02.25 11:02, Pavel Tikhomirov wrote:


On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:


On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced it opened windows for races between
running updates, pending and new updates. Current logic to
deal with them is not correct.

Examples of exact race situation in code would be appreciated, as now it's impossible to unserstand which exact race we are trying to fix.

E.g. preferably in this format:

Thread 1:
A1
B1
                 Thread2:
                 A2
                 B2
C1
                 C2



Too many to describe them that way - we have three actors.
Locking and logic is wrong.

Sorry, but it is not obvious what was wrong with it in the first place. To review and verify the fix, I need to have full information about what the problem was (it's unclear from JIRA too, unless I reinvestigate all the crashes there).

Should i go over the code and explain it?

Can you please explain general lock logic a bit and maybe with couple of references to code. Because for me "races between running updates, pending and new updates" brings more questions when answers: What is "running" updates? What is "pending" and "new" updates? What is "updates"?


Ok let me try - an md page can be in three states / which comments are updated in .h file/
  - MD_DIRTY - page awaits writeback - needs to be updated on disk
 - MD_WRITEBACK - page update is being  processed - write out started and is in progress

there is only one piwb - in md_page struct.
Let's walk over ploop_locate_new_cluster_and_attach_pio as it is the base case, others are derivatives.


First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
The there is :
if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))

It uses piwb to check for dirty - if we have piwb we delay.
But the problem is in the case MD_WRITEBACK , i.e. it is not started
page is waiting for it to start.

Next check the type ALLOC vs DISCARD - we have only one piwb.
If it is the same it does not delay. And the currently waiting piwb
can be executed before it is completely updated.
For example in cow case a pio is added to cow_llist  -> the update can go and complete before it is added, so it won't be executed.

RACE - bat_write_complete clears  MD_WRITEBACK and sets piwb to NULL
early before the update is completed. Delayes pios can cause new updates
while the initial one is not completed.


Next it marks page dirty - but along that it is adding it to the writeback list - RACE - thread can see it before update is ready.
Next lock is unlocked.

To summarize - the race is between update preparation and thread
executing the updates - in several forms.

Request thread                            Execution Thread
Start to prepare update
take lock
mark for update and put into the list
release lock
                                           Executes unfinished update
                                          clears flags and NULLs piwb - > crash somewhere along the path

do more work to finish preparation

Yes, this explanation is very helpful, now with new MD_UPDATING bit we are protected against concurrent "updates" which previously could've missed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.

And that means that we need to add memory barriers between clear_bit(MD_UPDATING) and previous code we want to protect (e.g. adding pio to cow_llist in ploop_process_one_delta_cow) else we would definitely still have races and crashes due to compiler/cpu reordering. Probably easiest way to do so is to put bat_lock around all clear_bit(MD_UPDATING).

--
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Reply via email to