Re: [Devel] [PATCH vz9] dm-ploop: fix and rework md updates

Alexander Atanasov Wed, 12 Feb 2025 03:19:40 -0800

On 12.02.25 13:12, Pavel Tikhomirov wrote:

On 2/12/25 17:42, Alexander Atanasov wrote:
On 12.02.25 11:02, Pavel Tikhomirov wrote:
On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced it opened windows for races between
running updates, pending and new updates. Current logic to
deal with them is not correct.
Examples of exact race situation in code would be appreciated, asnow it's impossible to unserstand which exact race we are trying tofix.
E.g. preferably in this format:

Thread 1:
A1
B1
                 Thread2:
                 A2
                 B2
C1
                 C2
Too many to describe them that way - we have three actors.
Locking and logic is wrong.
Sorry, but it is not obvious what was wrong with it in the firstplace. To review and verify the fix, I need to have full informationabout what the problem was (it's unclear from JIRA too, unless Ireinvestigate all the crashes there).
Should i go over the code and explain it?
Can you please explain general lock logic a bit and maybe with coupleof references to code. Because for me "races between running updates,pending and new updates" brings more questions when answers: What is"running" updates? What is "pending" and "new" updates? What is"updates"?
Ok let me try - an md page can be in three states / which comments areupdated in .h file/
  - MD_DIRTY - page awaits writeback - needs to be updated on disk
- MD_WRITEBACK - page update is being processed - write out startedand is in progress
there is only one piwb - in md_page struct.
Let's walk over ploop_locate_new_cluster_and_attach_pio as it is thebase case, others are derivatives.
First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
The there is :
if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))

It uses piwb to check for dirty - if we have piwb we delay.
But the problem is in the case MD_WRITEBACK , i.e. it is not started
page is waiting for it to start.

Next check the type ALLOC vs DISCARD - we have only one piwb.
If it is the same it does not delay. And the currently waiting piwb
can be executed before it is completely updated.
For example in cow case a pio is added to cow_llist -> the update cango and complete before it is added, so it won't be executed.
RACE - bat_write_complete clears  MD_WRITEBACK and sets piwb to NULL
early before the update is completed. Delayes pios can cause new updates
while the initial one is not completed.
Next it marks page dirty - but along that it is adding it to thewriteback list - RACE - thread can see it before update is ready.
Next lock is unlocked.

To summarize - the race is between update preparation and thread
executing the updates - in several forms.

Request thread                            Execution Thread
Start to prepare update
take lock
mark for update and put into the list
release lock
                                           Executes unfinished update
clears flags and NULLs piwb- > crash somewhere along the path
do more work to finish preparation
Yes, this explanation is very helpful, now with new MD_UPDATING bit weare protected against concurrent "updates" which previously could'vemissed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.
And that means that we need to add memory barriers betweenclear_bit(MD_UPDATING) and previous code we want to protect (e.g. addingpio to cow_llist in ploop_process_one_delta_cow) else we woulddefinitely still have races and crashes due to compiler/cpu reordering.Probably easiest way to do so is to put bat_lock around allclear_bit(MD_UPDATING).



only the set part is important, even if clear is seen later it is okay.
it will be processed on the next cycle


--
Regards,
Alexander Atanasov

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH vz9] dm-ploop: fix and rework md updates

Reply via email to