On 12.02.25 13:12, Pavel Tikhomirov wrote:
On 2/12/25 17:42, Alexander Atanasov wrote:
On 12.02.25 11:02, Pavel Tikhomirov wrote:
On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced it opened windows for races between
running updates, pending and new updates. Current logic to
deal with them is not correct.
Examples of exact race situation in code would be appreciated, as
now it's impossible to unserstand which exact race we are trying to
fix.
E.g. preferably in this format:
Thread 1:
A1
B1
Thread2:
A2
B2
C1
C2
Too many to describe them that way - we have three actors.
Locking and logic is wrong.
Sorry, but it is not obvious what was wrong with it in the first
place. To review and verify the fix, I need to have full information
about what the problem was (it's unclear from JIRA too, unless I
reinvestigate all the crashes there).
Should i go over the code and explain it?
Can you please explain general lock logic a bit and maybe with couple
of references to code. Because for me "races between running updates,
pending and new updates" brings more questions when answers: What is
"running" updates? What is "pending" and "new" updates? What is
"updates"?
Ok let me try - an md page can be in three states / which comments are
updated in .h file/
- MD_DIRTY - page awaits writeback - needs to be updated on disk
- MD_WRITEBACK - page update is being processed - write out started
and is in progress
there is only one piwb - in md_page struct.
Let's walk over ploop_locate_new_cluster_and_attach_pio as it is the
base case, others are derivatives.
First under lock spin_lock_irqsave(&ploop->bat_lock, flags);
The there is :
if (piwb && (piwb->type != type || test_bit(MD_WRITEBACK, &md->status)))
It uses piwb to check for dirty - if we have piwb we delay.
But the problem is in the case MD_WRITEBACK , i.e. it is not started
page is waiting for it to start.
Next check the type ALLOC vs DISCARD - we have only one piwb.
If it is the same it does not delay. And the currently waiting piwb
can be executed before it is completely updated.
For example in cow case a pio is added to cow_llist -> the update can
go and complete before it is added, so it won't be executed.
RACE - bat_write_complete clears MD_WRITEBACK and sets piwb to NULL
early before the update is completed. Delayes pios can cause new updates
while the initial one is not completed.
Next it marks page dirty - but along that it is adding it to the
writeback list - RACE - thread can see it before update is ready.
Next lock is unlocked.
To summarize - the race is between update preparation and thread
executing the updates - in several forms.
Request thread Execution Thread
Start to prepare update
take lock
mark for update and put into the list
release lock
Executes unfinished update
clears flags and NULLs piwb
- > crash somewhere along the path
do more work to finish preparation
Yes, this explanation is very helpful, now with new MD_UPDATING bit we
are protected against concurrent "updates" which previously could've
missed e.g. the pio ploop_submit_cow_index_wb adds to the cow_list.
And that means that we need to add memory barriers between
clear_bit(MD_UPDATING) and previous code we want to protect (e.g. adding
pio to cow_llist in ploop_process_one_delta_cow) else we would
definitely still have races and crashes due to compiler/cpu reordering.
Probably easiest way to do so is to put bat_lock around all
clear_bit(MD_UPDATING).
only the set part is important, even if clear is seen later it is okay.
it will be processed on the next cycle
--
Regards,
Alexander Atanasov
_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel