The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at g...@bitbucket.org:openvz/vzkernel.git after rh9-5.14.0-427.44.1.vz9.80.6 ------> commit f07789ed3693d6dc0afd23b5bf14939323f6aff4 Author: Alexander Atanasov <alexander.atana...@virtuozzo.com> Date: Fri Jan 24 17:36:10 2025 +0200
dm-ploop: make ploop_bat_write_complete ready for parallel pio completion There are some requirements listed in the comment inside ploop_bat_write_complete: * Success: now update local BAT copy. We could do this * from our delayed work, but we want to publish new * mapping in the fastest way. This must be done before * data bios completion, since right after we complete * a bio, subsequent read wants to see written data * (ploop_map() wants to see not zero bat_entries[.]) Currently it assumes sequential pio completion but with threads it must be ready to have them in any order. To meet that only ploop_advance_local_after_bat_wb when if it is the last call to ploop_bat_write_complete. Complete ready data and cow pios - protect lists with piwb->lock and deferred_lock - as per other users do. Code assumes that no one can touch the lists since it is assumed to be complete which is not valid when pios are processed in parallel. https://virtuozzo.atlassian.net/browse/VSTOR-91821 Signed-off-by: Alexander Atanasov <alexander.atana...@virtuozzo.com> ====== Patchset description: ploop: optimistations and scalling Ploop processes requsts in a different threads in parallel where possible which results in significant improvement in performance and makes further optimistations possible. Known bugs: - delayed metadata writeback is not working and is missing error handling - patch to disable it until fixed - fast path is not working - causes rcu lockups - patch to disable it Further improvements: - optimize md pages lookups Alexander Atanasov (50): dm-ploop: md_pages map all pages at creation time dm-ploop: Use READ_ONCE/WRITE_ONCE to access md page data dm-ploop: fsync after all pios are sent dm-ploop: move md status to use proper bitops dm-ploop: convert wait_list and wb_batch_llist to use lockless lists dm-ploop: convert enospc handling to use lockless lists dm-ploop: convert suspended_pios list to use lockless list dm-ploop: convert the rest of the lists to use llist variant dm-ploop: combine processing of pios thru prepare list and remove fsync worker dm-ploop: move from wq to kthread dm-ploop: move preparations of pios into the caller from worker dm-ploop: fast path execution for reads dm-ploop: do not use a wrapper for set_bit to make a page writeback dm-ploop: BAT use only one list for writeback dm-ploop: make md writeback timeout to be per page dm-ploop: add interface to disable bat writeback delay dm-ploop: convert wb_batch_list to lockless variant dm-ploop: convert high_prio to status dm-ploop: split cow processing into two functions dm-ploop: convert md page rw lock to spin lock dm-ploop: convert bat_rwlock to bat_lock spinlock dm-ploop: prepare bat updates under bat_lock dm-ploop: make ploop_bat_write_complete ready for parallel pio completion dm-ploop: make ploop_submit_metadata_writeback return number of requests sent dm-ploop: introduce pio runner threads dm-ploop: add pio list ids to be used when passing pios to runners dm-ploop: process pios via runners dm-ploop: disable metadata writeback delay dm-ploop: disable fast path dm-ploop: use lockless lists for chained cow updates list dm-ploop: use lockless lists for data ready pios dm-ploop: give runner threads better name dm-ploop: resize operation - add holes bitmap locking dm-ploop: remove unnecessary operations dm-ploop: use filp per thread dm-ploop: catch if we try to advance pio past bio end dm-ploop: support REQ_FUA for data pios dm-ploop: proplerly access nr_bat_entries dm-ploop: fix locking and improve error handling when submitting pios dm-ploop: fix how ENOTBLK is handled dm-ploop: sync when suspended or stopping dm-ploop: rework bat completion logic dm-ploop: rework logic in pio processing dm-ploop: end fsync pios in parallel dm-ploop: make filespace preallocations async dm-ploop: resubmit enospc pios from dispatcher thread dm-ploop: dm-ploop: simplify discard completion dm-ploop: use GFP_ATOMIC instead of GFP_NOIO dm-ploop: fix locks used in mixed context dm-ploop: fix how current flags are managed inside threads Andrey Zhadchenko (13): dm-ploop: do not flush after metadata writes dm-ploop: set IOCB_DSYNC on all FUA requests dm-ploop: remove extra ploop_cluster_is_in_top_delta() dm-ploop: introduce per-md page locking dm-ploop: reduce BAT accesses on discard completion dm-ploop: simplify llseek dm-ploop: speed up ploop_prepare_bat_update() dm-ploop: make new allocations immediately visible in BAT dm-ploop: drop ploop_cluster_is_in_top_delta() dm-ploop: do not wait for BAT update for non-FUA requests dm-ploop: add delay for metadata writeback dm-ploop: submit all postponed metadata on REQ_OP_FLUSH dm-ploop: handle REQ_PREFLUSH Feature: dm-ploop: ploop target driver --- drivers/md/dm-ploop-map.c | 47 ++++++++++++++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c index 129447510033..86c7b7b946b2 100644 --- a/drivers/md/dm-ploop-map.c +++ b/drivers/md/dm-ploop-map.c @@ -895,35 +895,47 @@ static void ploop_bat_write_complete(struct pio *pio, void *piwb_ptr, struct ploop_cow *cow; struct pio *data_pio; unsigned long flags; - - if (!bi_status) { - /* - * Success: now update local BAT copy. We could do this - * from our delayed work, but we want to publish new - * mapping in the fastest way. This must be done before - * data bios completion, since right after we complete - * a bio, subsequent read wants to see written data - * (ploop_map() wants to see not zero bat_entries[.]). - */ - ploop_advance_local_after_bat_wb(ploop, piwb, true); + LIST_HEAD(lready_pios); + LIST_HEAD(lcow_pios); + int completed = atomic_read(&piwb->count) == 1; + + if (completed) { + /* We are the last count so it is safe to advance bat */ + if (!bi_status) { + /* + * Success: now update local BAT copy. We could do this + * from our delayed work, but we want to publish new + * mapping in the fastest way. This must be done before + * data bios completion, since right after we complete + * a bio, subsequent read wants to see written data + * (ploop_map() wants to see not zero bat_entries[.]). + */ + ploop_advance_local_after_bat_wb(ploop, piwb, true); + } } spin_lock_irqsave(&piwb->lock, flags); - piwb->completed = true; + if (completed) + piwb->completed = completed; piwb->bi_status = bi_status; + list_splice_init(&piwb->ready_data_pios, &lready_pios); spin_unlock_irqrestore(&piwb->lock, flags); + spin_lock_irqsave(&ploop->deferred_lock, flags); + list_splice_init(&piwb->cow_list, &lcow_pios); + spin_unlock_irqrestore(&ploop->deferred_lock, flags); + /* - * End pending data bios. Unlocked, as nobody can - * add a new element after piwc->completed is true. + * End pending data bios. */ - while ((data_pio = ploop_pio_list_pop(&piwb->ready_data_pios)) != NULL) { + + while ((data_pio = ploop_pio_list_pop(&lready_pios)) != NULL) { if (bi_status) data_pio->bi_status = bi_status; ploop_pio_endio(data_pio); } - while ((aux_pio = ploop_pio_list_pop(&piwb->cow_list))) { + while ((aux_pio = ploop_pio_list_pop(&lcow_pios))) { cow = aux_pio->endio_cb_data; ploop_complete_cow(cow, bi_status); } @@ -1901,7 +1913,6 @@ static void ploop_submit_metadata_writeback(struct ploop *ploop) */ llist_for_each_safe(pos, t, ll_wb_batch) { md = list_entry((struct list_head *)pos, typeof(*md), wb_link); - INIT_LIST_HEAD(&md->wb_link); if (!llist_empty(&md->wait_llist) || test_bit(MD_HIGHPRIO, &md->status) || time_before(md->dirty_timeout, timeout) || @@ -1913,7 +1924,9 @@ static void ploop_submit_metadata_writeback(struct ploop *ploop) clear_bit(MD_DIRTY, &md->status); clear_bit(MD_HIGHPRIO, &md->status); ploop_index_wb_submit(ploop, md->piwb); + ret++; } else { + INIT_LIST_HEAD(&md->wb_link); llist_add((struct llist_node *)&md->wb_link, &ploop->wb_batch_llist); } } _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel