From: Alexander Atanasov
convert bool high prio to status bit - reduce size and gain atomicity
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in
From: Alexander Atanasov
ploop_advance_local_after_bat_wb is handling only one
md page, so it is possible do simplify discard completion
done in ploop_piwb_discard_completed.
initialization overlaps and there is an extra md page
lookup which can be avoided.
remove the function and integrate it i
Prepare pios earlier in preparation to try to execute them earlier.
Convert more places to use lock less lists.
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a diffe
From: Alexander Atanasov
GFP_NOIO can sleep so we can not use it while we
are called from block layer and device mapper in
interrupt context. It is also invalid under rcu_nesting
which happens too.
(traces in the issue)
https://virtuozzo.atlassian.net/browse/VSTOR-98291
Signed-off-by: Alexander
From: Alexander Atanasov
currently threads do
old_flags = current->flags;
current->flags |= ploop flags;
...
current->flags = old_flags;
this can break process flags
To fix this use current_restore_flags(..) macro which only removes
our flags and it preserves any other flags.
https://virtuoz
From: Andrey Zhadchenko
Revert llist conversion for metadata writeback.
Create new list for priority metadata updates, which are triggered
by FUA requests.
Write metadata for all other requests in batch after some delay.
Add new parameter to specify delay time.
Always submit COW and discard io to
From: Andrey Zhadchenko
Currently we have single bat_rwlock for the whole ploop. However,
runtime locking granularity can be reduced to single metadata page.
In this patch, add rwlock to metadata structure, use it when
accessing md->levels and md->page at the sime time to protect
readers against
From: Alexander Atanasov
improve how and when we wait for pios to complete
before proceeding with next pios.
force writing of pending md pages in case we have a sync pios.
remove now unused functions.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
P
From: Alexander Atanasov
Do not advance pio past bio end - this should not happen
but try to catch and log it as an error.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requs
From: Andrey Zhadchenko
We are planning to delay metadata writeback, so we want to
immediately apply metadata changes to BAT page.
Make all request that trigger PIWB_TYPE_ALLOC apply metadata
changes immediately.
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
Use llist and remove deferred_lock around pio dispatching.
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which results
From: Alexander Atanasov
reduce locking for data ready pios chained to bat updates
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
wh
From: Alexander Atanasov
unify how MD_WRITEBACK bit is set across the code -
one of the files is using a wrapper the rest of the code
directly sets the bit. To avoid confusion remove the wrapper.
Remove the lock and rely on atmoic bitops to make a further
lock reorganization easier.
https://virt
From: Andrey Zhadchenko
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
perf
From: Alexander Atanasov
These locks are used in both interrupt and user context so we must
preserve interrupts state.
https://virtuozzo.atlassian.net/browse/VSTOR-98471
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts i
On 12.02.25 11:02, Pavel Tikhomirov wrote:
On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced it opened windows for races between
running updates, pending and new updates. Current logic
On 12.02.25 13:12, Pavel Tikhomirov wrote:
On 2/12/25 17:42, Alexander Atanasov wrote:
On 12.02.25 11:02, Pavel Tikhomirov wrote:
On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced i
From: Alexander Atanasov
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
pe
From: Alexander Atanasov
md_page is always present in memory. In that case
md_page->page could be always be mapped and we would not need to perform
kmap_atomic/kunmap_atomic during each lookup
Convert bat updates to use kmap_local_page/kunmap_local.
https://virtuozzo.atlassian.net/browse/VSTOR-
From: Alexander Atanasov
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
pe
From: Alexander Atanasov
Prepare to reduce locking by using atomic 32 bit access to the fields.
To ensure this we need to use the _ONCE macros.
https://virtuozzo.atlassian.net/browse/VSTOR-91659
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
md_page.wb_link is never a listhead, it's a list node => no list
initialization is needed, list_add() will correctly initialize the data.
Signed-off-by: Konstantin Khorenko
---
drivers/md/dm-ploop-bat.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/md/dm-ploop-bat.c b/drivers/md/dm-
From: Alexander Atanasov
Currently there are two workers one to handle pios,
one to handle flush (via vfs_fsync). This workers are
created unbound which means they are run whenever there is a free
CPU. When ploop sends pios (via ploop_dispatch_pios) it checks
if there are data and if there are fl
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
performance and makes further optimistations possible.
v1:
- addressed feedback, i've left a few requests to merge changes
into bigger patches out, as to keep changes in smal
From: Alexander Atanasov
Fix direct bitops to use set_bit/clear_bit which
are atomic - this is required since there are
some places in code that do not use locking when
operating on that bits. this is also a preparation
to relax locking.
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-
We are going to move most of pio lists to llists,
so while we have in parallel situations when a pio can be added to a
list or to a llist, let's have a union of both "list" and "llist".
Once we remove all pio.list users, we will remove the "list" field.
Signed-off-by: Konstantin Khorenko
---
dr
From: Andrey Zhadchenko
On every flush request we should submit all accumulated metadata
changes, wait for their completion and only then do the flush.
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and sca
Process pios in runner threads while preserving order.
Metadata writeback requries all prios pios to be processed,
since they can generate updates, so we have to wait before
processing writeback. Fsync is yet sequential too.
Both can be improved in a next iterration.
https://virtuozzo.atlassian.n
From: Alexander Atanasov
Encode device minor number into thread name.
If we use dm-ploop as prefix since name gets
truncated so shorten it to ploop.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scall
From: Alexander Atanasov
The difference between hlist_unhashed_lockless and hlist_unhashed
is that _lockless version uses READ_ONCE to do the check.
Since it is used without locks we must switch to the _lockless variant.
Also make locking clusters and adding to inflight_pios return result
so we
From: Alexander Atanasov
For some reason this fixed xfs issues .
I do not know why yet.
To be investigated. And probably reverted.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop process
From: Alexander Atanasov
nr_bat_entries are updated while resizing, some places
read it without holding the bat_lock, to ensure a good
value is read use READ_ONCE and WRITE_ONCE when updating.
During grow/resize and shrink pios are suspended and
code waits for all active inflight pios to complete
From: Alexander Atanasov
Convert to lockless lists - intermix with regular list due to
that next pointer in both list_head and llist_head is the first
field, and prev is not used. Do this so we can make babysteps
forward.
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexande
From: Alexander Atanasov
Add specific list ids for writeback and flush pios, and process them
inside the runners.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a d
Split ploop_process_delta_cow in two - list iterator and processing
functions, unify handling with other pio types.
We need this to be able to call pio by pio from threads.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: op
From: Andrey Zhadchenko
as all it's users are gone or reworked
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which re
From: Andrey Zhadchenko
When mapping request, save all flags, rather then only operation
type. Use it later to check if the request is FUA or not.
Unfortunately there is no REQ_FUA equivalent on IOCB layer, but
IOCB_DSYNC is usually translated to it, as can be seen in
__iomap_dio_rw().
https://v
From: Andrey Zhadchenko
Do not bother with locked clusters. For llseek we can ignore all
concurrent operations.
Do not call ploop_bat_entries() on every cluster as this is slow.
Just remember md and read it directly.
Return EINVAL if it fails to find md page.
https://virtuozzo.atlassian.net/bro
From: Andrey Zhadchenko
as we already got all info from ploop_bat_entries() earlier.
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
w
From: Alexander Atanasov
lock bitmap in grow update header and disable delayed writeback
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in paral
From: Alexander Atanasov
Prepare for threads. Convert rwlock to spin lock.
This patch converts existing rwlocks only.
In the next one lock is used where required.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistat
From: Alexander Atanasov
We need to wait for write back to complete to issue pending flushes.
To know if we have to wait return numer of pios submitted.
Check if we have pending pios blocked by md update.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
From: Alexander Atanasov
There are some requirements listed in the comment inside
ploop_bat_write_complete:
* Success: now update local BAT copy. We could do this
* from our delayed work, but we want to publish new
* mapping in the fastest way. This must be done before
From: Alexander Atanasov
Add conditions that prevent writeback delay:
- if we have a pending pios
- if doing utility operations we do not want to delay writeout too
Add interface to disable delay.
Hook it into suspend/resuming of sending pios. Which is done before
ops that require it.
https://v
From: Alexander Atanasov
If we see a bio with REQ_FUA set we need to request a sync.
https://virtuozzo.atlassian.net/browse/VSTOR-91816
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
From: Alexander Atanasov
- no point to sync before write
- no point to reverse the list order since it is resubmit and
order is lost already
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Pl
From: Alexander Atanasov
Move file space allocation into a separate thread, try to
preallocate space in advance.
On each cluster allocation check if next allocation will possibly
require file space allocation and trigger the allocator thread to
perform it in background.
In case we try to alloca
From: Alexander Atanasov
When running out of space pios are delayed and retried
from a timer. This timer is the only thing that runs in
interrupt context and brings requirement to use _irqsave
variants, since a complete request processing can be started
from the timer. To avoid this set a flag fr
From: Andrey Zhadchenko
Drop ploop_cluster_is_in_top_delta(), because
- it internally searches md on every call
- it takes read lock every time
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and scalling
From: Andrey Zhadchenko
Drop extra ploop_cluster_is_in_top_delta() as we are planning to
access BAT anyway
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different
After a pio is split and prepared try to execute if immediately
without going to the worker thread.
https://virtuozzo.atlassian.net/browse/VSTOR-91820
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads
From: Alexander Atanasov
remove the wb_batch_list_prio, use only wb_batch_list and
use the md high_prio flag for immediate submission.
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop pro
Currently data pios and fluses are separated into different lists before
handled to workqueue. This can lead to executing of flushes before relevant
data pios and it is not possible to get that dependency in the worker.
So put both data and flush pios into prepare list. This way worker can
get sing
From: Alexander Atanasov
Send all pios attached to sync operation to runners
to call endio.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in pa
From: Alexander Atanasov
Delayed metadata writeback results in a hang, disable
it until it is fixed. Pios end in the waiting list of
md and get stuck there never to complete.
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requ
a pio may complete after md page update, in that case we
must not complete the update but wait for the last data pio
and only then complete them all.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalli
From: Andrey Zhadchenko
Create new flush pio when we see REQ_PREFLUSH flag. Call original
pio after the flush is done.
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in
On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced it opened windows for races between
running updates, pending and new updates. Current logic to
deal with them is not correct.
Examples
From: Alexander Atanasov
Prepare locking for threads. Some operations require to lock bat data
(see next patch) and in some cases we need to lock md page while hodling
bat spin lock but we can not take sleeping lock while holding a spin lock.
To address this use a spin lock for md pages instead
From: Andrey Zhadchenko
Flushed should be explicitly called by the writer if he really
wants it.
Drop ploop_md_write_endio() and ploop_md_fsync_endio().
https://virtuozzo.atlassian.net/browse/VSTOR-91817
Signed-off-by: Andrey Zhadchenko
==
Patchset description:
ploop: optimistations and sc
From: Alexander Atanasov
Do not bulk write all dirtied bat pages in a common timeout.
Instead mark dirty time for each page and delay for each page
by the timeout. If page is tried to be redirtied - update the
dirty time so for hot page try to accumulate more changes before
writeout.
https://vir
From: Alexander Atanasov
reduce locking for cow pios chained to bat updates
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where pos
From: Alexander Atanasov
On suspended or stopped sync delta.
Enable use of per cpu pool cache for bio allocated from kaio.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requ
From: Alexander Atanasov
direct IO write result ENOTBLK or 0(in ext4 case) means
retry IO in buffered mode. We wrongly assumed that it is
a short write and handled it incorrectly
Since we can not retry in buffered mode, code is not ready
for it. Take a different route. This error happens if
pag
From: Alexander Atanasov
Merging required to back this change, so do it again.
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement i
With the locking reduced it opened windows for races between
running updates, pending and new updates. Current logic to
deal with them is not correct.
Current flags are:
MD_DIRTY - means md page is dirty and is waiting for writeback
MD_WRITEBACK - write back is in progress
But we drop the lock aft
On 12.02.25 11:52, Alexander Atanasov wrote:
On 12.02.25 11:50, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
@@ -2067,21 +2109,46 @@ static inline int
ploop_submit_metadata_writeback(struct ploop *ploop, int force
*/
llist_for_each_safe(pos, t, ll_wb_batch
Fast path results in rcu lockups and hangs.
The reason is that we got called from fs,
then we try to re-enter the fs but fs is not ready for this.
There is an idea to investigate - try to skip dispatcher thread
and directly execute in a runner thread.
Signed-off-by: Alexander Atanasov
==
Pa
From: Alexander Atanasov
Prepare for threads. When preparing bat updates there are two important
things to protect - md->status MD_DIRTY bit and holes bitmap.
Use bat_lock to protect them.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset des
From: Alexander Atanasov
Create threads to execute pios in parallel - call them pio runners.
Use number of CPUs to determine the number of threads started.
>From worker each pio is sent to a thread in round-robin fashion
thru work_llist. Maintain the number of pios sent so we can wait
for them to
On 12.02.25 11:32, Konstantin Khorenko wrote:
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
performance and makes further optimistations possible.
v1:
- addressed feedback, i've left a few requests to merge changes
i
From: Alexander Atanasov
Move to multithreaded model and remove work queue.
https://virtuozzo.atlassian.net/browse/VSTOR-91821
Signed-off-by: Alexander Atanasov
==
Patchset description:
ploop: optimistations and scalling
Ploop processes requsts in a different threads in parallel
where pos
On 2/11/25 22:25, Alexander Atanasov wrote:
@@ -2067,21 +2109,46 @@ static inline int
ploop_submit_metadata_writeback(struct ploop *ploop, int force
*/
llist_for_each_safe(pos, t, ll_wb_batch) {
md = list_entry((struct list_head *)pos, typeof(*md), wb_link);
+
On 12.02.25 11:50, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
@@ -2067,21 +2109,46 @@ static inline int
ploop_submit_metadata_writeback(struct ploop *ploop, int force
*/
llist_for_each_safe(pos, t, ll_wb_batch) {
md = list_entry((struct list_hea
On 2/12/25 17:42, Alexander Atanasov wrote:
If you really want to read md->piwb to piwb variable BEFORE
ploop_write_cluster_sync, you should use READ_ONCE, or other
appropriately used memory barrier.
READ_ONCE is not a memory barrier - it is a COMPILER BARRIER.
https://docs.kernel.org/co
On 2/12/25 17:42, Alexander Atanasov wrote:
On 12.02.25 11:02, Pavel Tikhomirov wrote:
On 2/12/25 14:38, Alexander Atanasov wrote:
On 12.02.25 7:54, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
With the locking reduced it opened windows for races between
running u
On 13.02.25 7:40, Pavel Tikhomirov wrote:
On 2/12/25 17:46, Alexander Atanasov wrote:
On 12.02.25 11:32, Konstantin Khorenko wrote:
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
performance and makes further optimistation
On 2/12/25 17:33, Konstantin Khorenko wrote:
@@ -325,17 +322,24 @@ static int ploop_split_pio_to_list(struct ploop *ploop,
struct pio *pio,
if (!split)
goto err;
- list_add_tail(&split->list, &list);
+ llist_add(&split->llist, &llist)
On 2/12/25 17:33, Konstantin Khorenko wrote:
@@ -1879,20 +1906,15 @@ static void ploop_submit_embedded_pio(struct ploop
*ploop, struct pio *pio)
worker = &ploop->fsync_worker;
}
- spin_lock_irqsave(&ploop->deferred_lock, flags);
if (unlikely(ploop->stop_sub
Looks good except one missed hunk.
We can handle excess pio->llist.next = NULL and INIT_LIST_HEADs, and
list_for_each_ENTRY_safe separately later, as those are not directly
connected to this rework.
note: Looks like you've accidentally over-wrote original patch authors.
Reviewed-by: Pavel Ti
On 2/12/25 17:46, Alexander Atanasov wrote:
On 12.02.25 11:32, Konstantin Khorenko wrote:
Ploop processes requsts in a different threads in parallel
where possible which results in significant improvement in
performance and makes further optimistations possible.
v1:
- addressed feedback, i
On 2/11/25 22:25, Alexander Atanasov wrote:
@@ -2607,6 +2674,7 @@ int ploop_prepare_reloc_index_wb(struct ploop *ploop,
type = PIWB_TYPE_RELOC;
err = -EIO;
+ spin_lock_irq(&ploop->bat_lock);
if (test_bit(MD_DIRTY, &md->status) || test_bit(MD_WRITEBACK,
&md
On 12.02.25 10:08, Pavel Tikhomirov wrote:
On 2/11/25 22:25, Alexander Atanasov wrote:
@@ -2607,6 +2674,7 @@ int ploop_prepare_reloc_index_wb(struct ploop
*ploop,
type = PIWB_TYPE_RELOC;
err = -EIO;
+ spin_lock_irq(&ploop->bat_lock);
if (test_bit(MD_DIRTY, &md->status
83 matches
Mail list logo