The commit is pushed to "branch-rh9-5.14.0-427.44.1.vz9.80.x-ovz" and will appear at g...@bitbucket.org:openvz/vzkernel.git after rh9-5.14.0-427.44.1.vz9.80.6 ------> commit 760c978a364af97438758553edd41397fdc15cfc Author: Alexander Atanasov <alexander.atana...@virtuozzo.com> Date: Fri Jan 24 17:36:22 2025 +0200
dm-ploop: use filp per thread For some reason this fixed xfs issues . I do not know why yet. To be investigated. And probably reverted. https://virtuozzo.atlassian.net/browse/VSTOR-91821 Signed-off-by: Alexander Atanasov <alexander.atana...@virtuozzo.com> ====== Patchset description: ploop: optimistations and scalling Ploop processes requsts in a different threads in parallel where possible which results in significant improvement in performance and makes further optimistations possible. Known bugs: - delayed metadata writeback is not working and is missing error handling - patch to disable it until fixed - fast path is not working - causes rcu lockups - patch to disable it Further improvements: - optimize md pages lookups Alexander Atanasov (50): dm-ploop: md_pages map all pages at creation time dm-ploop: Use READ_ONCE/WRITE_ONCE to access md page data dm-ploop: fsync after all pios are sent dm-ploop: move md status to use proper bitops dm-ploop: convert wait_list and wb_batch_llist to use lockless lists dm-ploop: convert enospc handling to use lockless lists dm-ploop: convert suspended_pios list to use lockless list dm-ploop: convert the rest of the lists to use llist variant dm-ploop: combine processing of pios thru prepare list and remove fsync worker dm-ploop: move from wq to kthread dm-ploop: move preparations of pios into the caller from worker dm-ploop: fast path execution for reads dm-ploop: do not use a wrapper for set_bit to make a page writeback dm-ploop: BAT use only one list for writeback dm-ploop: make md writeback timeout to be per page dm-ploop: add interface to disable bat writeback delay dm-ploop: convert wb_batch_list to lockless variant dm-ploop: convert high_prio to status dm-ploop: split cow processing into two functions dm-ploop: convert md page rw lock to spin lock dm-ploop: convert bat_rwlock to bat_lock spinlock dm-ploop: prepare bat updates under bat_lock dm-ploop: make ploop_bat_write_complete ready for parallel pio completion dm-ploop: make ploop_submit_metadata_writeback return number of requests sent dm-ploop: introduce pio runner threads dm-ploop: add pio list ids to be used when passing pios to runners dm-ploop: process pios via runners dm-ploop: disable metadata writeback delay dm-ploop: disable fast path dm-ploop: use lockless lists for chained cow updates list dm-ploop: use lockless lists for data ready pios dm-ploop: give runner threads better name dm-ploop: resize operation - add holes bitmap locking dm-ploop: remove unnecessary operations dm-ploop: use filp per thread dm-ploop: catch if we try to advance pio past bio end dm-ploop: support REQ_FUA for data pios dm-ploop: proplerly access nr_bat_entries dm-ploop: fix locking and improve error handling when submitting pios dm-ploop: fix how ENOTBLK is handled dm-ploop: sync when suspended or stopping dm-ploop: rework bat completion logic dm-ploop: rework logic in pio processing dm-ploop: end fsync pios in parallel dm-ploop: make filespace preallocations async dm-ploop: resubmit enospc pios from dispatcher thread dm-ploop: dm-ploop: simplify discard completion dm-ploop: use GFP_ATOMIC instead of GFP_NOIO dm-ploop: fix locks used in mixed context dm-ploop: fix how current flags are managed inside threads Andrey Zhadchenko (13): dm-ploop: do not flush after metadata writes dm-ploop: set IOCB_DSYNC on all FUA requests dm-ploop: remove extra ploop_cluster_is_in_top_delta() dm-ploop: introduce per-md page locking dm-ploop: reduce BAT accesses on discard completion dm-ploop: simplify llseek dm-ploop: speed up ploop_prepare_bat_update() dm-ploop: make new allocations immediately visible in BAT dm-ploop: drop ploop_cluster_is_in_top_delta() dm-ploop: do not wait for BAT update for non-FUA requests dm-ploop: add delay for metadata writeback dm-ploop: submit all postponed metadata on REQ_OP_FLUSH dm-ploop: handle REQ_PREFLUSH Feature: dm-ploop: ploop target driver --- drivers/md/dm-ploop-bat.c | 15 ++++++++++++++- drivers/md/dm-ploop-cmd.c | 6 ++++-- drivers/md/dm-ploop-map.c | 41 +++++++++++++++++++++++++---------------- drivers/md/dm-ploop-target.c | 8 ++++++++ drivers/md/dm-ploop.h | 7 ++++++- 5 files changed, 57 insertions(+), 20 deletions(-) diff --git a/drivers/md/dm-ploop-bat.c b/drivers/md/dm-ploop-bat.c index afbd43d74c6a..88b1d02b47d5 100644 --- a/drivers/md/dm-ploop-bat.c +++ b/drivers/md/dm-ploop-bat.c @@ -482,7 +482,7 @@ int ploop_add_delta(struct ploop *ploop, u32 level, struct file *file, bool is_r struct rb_root md_root = RB_ROOT; loff_t file_size; u32 nr_be; - int ret; + int ret, i; ret = ploop_check_delta_length(ploop, file, &file_size); if (ret) @@ -503,6 +503,19 @@ int ploop_add_delta(struct ploop *ploop, u32 level, struct file *file, bool is_r ploop_apply_delta_mappings(ploop, deltas, level, &md_root, nr_be); deltas[level].file = file; + deltas[level].mtfile = kcalloc(ploop->nkt_runners, sizeof(*file), + GFP_KERNEL); + if (!deltas[level].mtfile) { + ret = ENOMEM; + goto out; + } + for (i = 0; i < ploop->nkt_runners; i++) { + deltas[level].mtfile[i] = file_clone_open(file); + if (!deltas[level].mtfile[i]) { + ret = ENOMEM; + goto out; + } + } deltas[level].file_size = file_size; deltas[level].file_preallocated_area_start = file_size; deltas[level].nr_be = nr_be; diff --git a/drivers/md/dm-ploop-cmd.c b/drivers/md/dm-ploop-cmd.c index e080d6afaa7f..50a23212270d 100644 --- a/drivers/md/dm-ploop-cmd.c +++ b/drivers/md/dm-ploop-cmd.c @@ -316,7 +316,8 @@ static int ploop_grow_relocate_cluster(struct ploop *ploop, } spin_lock_irq(&ploop->bat_lock); - ret = ploop_prepare_reloc_index_wb(ploop, &md, clu, &new_dst); + ret = ploop_prepare_reloc_index_wb(ploop, &md, clu, &new_dst, + ploop_top_delta(ploop)->file); spin_unlock_irq(&ploop->bat_lock); if (ret < 0) { PL_ERR("reloc: can't prepare it: %d", ret); @@ -380,7 +381,8 @@ static int ploop_grow_update_header(struct ploop *ploop, int ret; /* hdr is in the same page as bat_entries[0] index */ - ret = ploop_prepare_reloc_index_wb(ploop, &md, 0, NULL); + ret = ploop_prepare_reloc_index_wb(ploop, &md, 0, NULL, + ploop_top_delta(ploop)->file); if (ret) return ret; piwb = md->piwb; diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c index 4b4facc79aba..7d531564b24c 100644 --- a/drivers/md/dm-ploop-map.c +++ b/drivers/md/dm-ploop-map.c @@ -75,6 +75,7 @@ void ploop_index_wb_init(struct ploop_index_wb *piwb, struct ploop *ploop) void ploop_init_pio(struct ploop *ploop, unsigned int bi_op, struct pio *pio) { + pio->runner_id = 0; pio->ploop = ploop; pio->css = NULL; pio->bi_op = bi_op; @@ -694,7 +695,7 @@ static int ploop_handle_discard_pio(struct ploop *ploop, struct pio *pio, punch_hole: ploop_remap_to_cluster(ploop, pio, dst_clu); pos = to_bytes(pio->bi_iter.bi_sector); - ret = ploop_punch_hole(ploop_top_delta(ploop)->file, pos, + ret = ploop_punch_hole(ploop_top_delta(ploop)->mtfile[pio->runner_id], pos, pio->bi_iter.bi_size); if (ret || ploop->nr_deltas != 1) { if (ret) @@ -1080,9 +1081,9 @@ ALLOW_ERROR_INJECTION(ploop_find_dst_clu_bit, ERRNO); static int ploop_truncate_prealloc_safe(struct ploop *ploop, struct ploop_delta *delta, - loff_t len, const char *func) + loff_t len, struct file *file, + const char *func) { - struct file *file = delta->file; loff_t old_len = delta->file_size; loff_t new_len = len; int ret; @@ -1106,12 +1107,11 @@ static int ploop_truncate_prealloc_safe(struct ploop *ploop, } ALLOW_ERROR_INJECTION(ploop_truncate_prealloc_safe, ERRNO); -static int ploop_allocate_cluster(struct ploop *ploop, u32 *dst_clu) +static int ploop_allocate_cluster(struct ploop *ploop, u32 *dst_clu, struct file *file) { struct ploop_delta *top = ploop_top_delta(ploop); u32 clu_size = CLU_SIZE(ploop); loff_t off, pos, end, old_size; - struct file *file = top->file; unsigned long flags; int ret; @@ -1156,7 +1156,7 @@ static int ploop_allocate_cluster(struct ploop *ploop, u32 *dst_clu) } if (end > old_size) { - ret = ploop_truncate_prealloc_safe(ploop, top, end, __func__); + ret = ploop_truncate_prealloc_safe(ploop, top, end, file, __func__); if (ret) { ploop_hole_set_bit(*dst_clu, ploop); return ret; @@ -1175,7 +1175,7 @@ ALLOW_ERROR_INJECTION(ploop_allocate_cluster, ERRNO); * in ploop->holes_bitmap and bat_page. */ static int ploop_alloc_cluster(struct ploop *ploop, struct ploop_index_wb *piwb, - u32 clu, u32 *dst_clu) + u32 clu, u32 *dst_clu, struct file *file) { bool already_alloced = false; map_index_t *to; @@ -1198,8 +1198,8 @@ static int ploop_alloc_cluster(struct ploop *ploop, struct ploop_index_wb *piwb, if (already_alloced) goto out; - ret = ploop_allocate_cluster(ploop, dst_clu); - if (ret < 0) + ret = ploop_allocate_cluster(ploop, dst_clu, file); + if (unlikely(ret < 0)) goto out; to = kmap_local_page(piwb->bat_page); @@ -1332,7 +1332,7 @@ static void ploop_submit_rw_mapped(struct ploop *ploop, struct pio *pio) pos = to_bytes(pio->bi_iter.bi_sector); - file = ploop->deltas[pio->level].file; + file = ploop->deltas[pio->level].mtfile[pio->runner_id]; /* Don't touch @pio after that */ if (pio->css && !ploop->nokblkcg) { @@ -1436,14 +1436,14 @@ static int ploop_initiate_cluster_cow(struct ploop *ploop, unsigned int level, } ALLOW_ERROR_INJECTION(ploop_submit_cluster_cow, ERRNO); -static void ploop_submit_cluster_write(struct ploop_cow *cow) +static void ploop_submit_cluster_write(struct ploop_cow *cow, struct file *file) { struct pio *aux_pio = cow->aux_pio; struct ploop *ploop = cow->ploop; u32 dst_clu; int ret; - ret = ploop_allocate_cluster(ploop, &dst_clu); + ret = ploop_allocate_cluster(ploop, &dst_clu, file); if (unlikely(ret < 0)) goto error; cow->dst_clu = dst_clu; @@ -1524,6 +1524,7 @@ static void ploop_submit_cow_index_wb(struct ploop_cow *cow) static void ploop_process_one_delta_cow(struct ploop *ploop, struct pio *aux_pio) { struct ploop_cow *cow; + struct file *file; cow = aux_pio->endio_cb_data; if (unlikely(aux_pio->bi_status != BLK_STS_OK)) { @@ -1532,12 +1533,15 @@ static void ploop_process_one_delta_cow(struct ploop *ploop, struct pio *aux_pio } /* until type is changed */ INIT_LIST_HEAD(&aux_pio->list); + + file = ploop_top_delta(ploop)->mtfile[aux_pio->runner_id]; + if (cow->dst_clu == BAT_ENTRY_NONE) { /* * Stage #1: assign dst_clu and write data * to top delta. */ - ploop_submit_cluster_write(cow); + ploop_submit_cluster_write(cow, file); } else { /* * Stage #2: data is written to top delta. @@ -1572,6 +1576,7 @@ static bool ploop_locate_new_cluster_and_attach_pio(struct ploop *ploop, u32 page_id; int err; unsigned long flags; + struct file *file; WARN_ON_ONCE(pio->queue_list_id != PLOOP_LIST_DEFERRED); spin_lock_irqsave(&ploop->bat_lock, flags); @@ -1596,7 +1601,9 @@ static bool ploop_locate_new_cluster_and_attach_pio(struct ploop *ploop, piwb = md->piwb; - err = ploop_alloc_cluster(ploop, piwb, clu, dst_clu); + file = ploop_top_delta(ploop)->mtfile[pio->runner_id]; + + err = ploop_alloc_cluster(ploop, piwb, clu, dst_clu, file); if (err) { pio->bi_status = errno_to_blk_status(err); clear_bit(MD_DIRTY, &md->status); @@ -2076,6 +2083,7 @@ int ploop_pio_runner(void *data) llist_for_each_safe(pos, t, llwork) { pio = list_entry((struct list_head *)pos, typeof(*pio), list); INIT_LIST_HEAD(&pio->list); + pio->runner_id = worker->runner_id; switch (pio->queue_list_id) { case PLOOP_LIST_FLUSH: WARN_ON_ONCE(1); /* We must not see flushes here */ @@ -2350,7 +2358,8 @@ static void ploop_handle_cleanup(struct ploop *ploop, struct pio *pio) */ int ploop_prepare_reloc_index_wb(struct ploop *ploop, struct md_page **ret_md, - u32 clu, u32 *dst_clu) + u32 clu, u32 *dst_clu, + struct file *file) { enum piwb_type type = PIWB_TYPE_ALLOC; u32 page_id = ploop_bat_clu_to_page_nr(clu); @@ -2380,7 +2389,7 @@ int ploop_prepare_reloc_index_wb(struct ploop *ploop, * holes_bitmap. */ ploop_bat_page_zero_cluster(ploop, piwb, clu); - err = ploop_alloc_cluster(ploop, piwb, clu, dst_clu); + err = ploop_alloc_cluster(ploop, piwb, clu, dst_clu, file); if (err) goto out_reset; } diff --git a/drivers/md/dm-ploop-target.c b/drivers/md/dm-ploop-target.c index 56539406ce10..e8beacc5841b 100644 --- a/drivers/md/dm-ploop-target.c +++ b/drivers/md/dm-ploop-target.c @@ -201,6 +201,10 @@ static void ploop_destroy(struct ploop *ploop) while (ploop->nr_deltas-- > 0) { if (ploop->deltas[ploop->nr_deltas].file) fput(ploop->deltas[ploop->nr_deltas].file); + for (i = 0; i < ploop->nkt_runners; i++) { + if (ploop->deltas[ploop->nr_deltas].mtfile[i]) + fput(ploop->deltas[ploop->nr_deltas].mtfile[i]); + } } WARN_ON(ploop_has_pending_activity(ploop)); WARN_ON(!ploop_empty_htable(ploop->exclusive_pios)); @@ -379,6 +383,10 @@ static struct ploop_worker *ploop_worker_create(struct ploop *ploop, if (IS_ERR(task)) goto out_err; worker->task = task; + if (*pref == 'r') + worker->runner_id = id - 1; + else + worker->runner_id = 0; init_llist_head(&worker->work_llist); wake_up_process(task); diff --git a/drivers/md/dm-ploop.h b/drivers/md/dm-ploop.h index ad1033c0898a..9f193afab618 100644 --- a/drivers/md/dm-ploop.h +++ b/drivers/md/dm-ploop.h @@ -42,6 +42,7 @@ struct ploop_pvd_header { struct ploop_delta { struct file *file; + struct file **mtfile; loff_t file_size; loff_t file_preallocated_area_start; u32 nr_be; /* nr BAT entries (or file length in clus if RAW) */ @@ -150,6 +151,7 @@ struct ploop_worker { struct ploop *ploop; struct task_struct *task; struct llist_head work_llist; + unsigned int runner_id; atomic_t inflight_pios; struct ploop_worker *next; }; @@ -310,6 +312,8 @@ struct pio { void (*complete)(struct pio *me); void *data; + unsigned int runner_id; + atomic_t md_inflight; }; @@ -596,7 +600,8 @@ extern int ploop_rw_page_sync(unsigned rw, struct file *file, extern void ploop_map_and_submit_rw(struct ploop *ploop, u32 dst_clu, struct pio *pio, u8 level); extern int ploop_prepare_reloc_index_wb(struct ploop *ploop, - struct md_page **ret_md, u32 clu, u32 *dst_clu); + struct md_page **ret_md, u32 clu, u32 *dst_clu, + struct file *file); extern void ploop_break_bat_update(struct ploop *ploop, struct md_page *md); extern void ploop_index_wb_submit(struct ploop *, struct ploop_index_wb *); extern int ploop_message(struct dm_target *ti, unsigned int argc, char **argv, _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel