Prepare pios earlier in preparation to try to execute them earlier. Convert more places to use lock less lists.
https://virtuozzo.atlassian.net/browse/VSTOR-91820 Signed-off-by: Alexander Atanasov <alexander.atana...@virtuozzo.com> ====== Patchset description: ploop: optimistations and scalling Ploop processes requsts in a different threads in parallel where possible which results in significant improvement in performance and makes further optimistations possible. Known bugs: - delayed metadata writeback is not working and is missing error handling - patch to disable it until fixed - fast path is not working - causes rcu lockups - patch to disable it Further improvements: - optimize md pages lookups Alexander Atanasov (50): dm-ploop: md_pages map all pages at creation time dm-ploop: Use READ_ONCE/WRITE_ONCE to access md page data dm-ploop: fsync after all pios are sent dm-ploop: move md status to use proper bitops dm-ploop: convert wait_list and wb_batch_llist to use lockless lists dm-ploop: convert enospc handling to use lockless lists dm-ploop: convert suspended_pios list to use lockless list dm-ploop: convert the rest of the lists to use llist variant dm-ploop: combine processing of pios thru prepare list and remove fsync worker dm-ploop: move from wq to kthread dm-ploop: move preparations of pios into the caller from worker dm-ploop: fast path execution for reads dm-ploop: do not use a wrapper for set_bit to make a page writeback dm-ploop: BAT use only one list for writeback dm-ploop: make md writeback timeout to be per page dm-ploop: add interface to disable bat writeback delay dm-ploop: convert wb_batch_list to lockless variant dm-ploop: convert high_prio to status dm-ploop: split cow processing into two functions dm-ploop: convert md page rw lock to spin lock dm-ploop: convert bat_rwlock to bat_lock spinlock dm-ploop: prepare bat updates under bat_lock dm-ploop: make ploop_bat_write_complete ready for parallel pio completion dm-ploop: make ploop_submit_metadata_writeback return number of requests sent dm-ploop: introduce pio runner threads dm-ploop: add pio list ids to be used when passing pios to runners dm-ploop: process pios via runners dm-ploop: disable metadata writeback delay dm-ploop: disable fast path dm-ploop: use lockless lists for chained cow updates list dm-ploop: use lockless lists for data ready pios dm-ploop: give runner threads better name dm-ploop: resize operation - add holes bitmap locking dm-ploop: remove unnecessary operations dm-ploop: use filp per thread dm-ploop: catch if we try to advance pio past bio end dm-ploop: support REQ_FUA for data pios dm-ploop: proplerly access nr_bat_entries dm-ploop: fix locking and improve error handling when submitting pios dm-ploop: fix how ENOTBLK is handled dm-ploop: sync when suspended or stopping dm-ploop: rework bat completion logic dm-ploop: rework logic in pio processing dm-ploop: end fsync pios in parallel dm-ploop: make filespace preallocations async dm-ploop: resubmit enospc pios from dispatcher thread dm-ploop: dm-ploop: simplify discard completion dm-ploop: use GFP_ATOMIC instead of GFP_NOIO dm-ploop: fix locks used in mixed context dm-ploop: fix how current flags are managed inside threads Andrey Zhadchenko (13): dm-ploop: do not flush after metadata writes dm-ploop: set IOCB_DSYNC on all FUA requests dm-ploop: remove extra ploop_cluster_is_in_top_delta() dm-ploop: introduce per-md page locking dm-ploop: reduce BAT accesses on discard completion dm-ploop: simplify llseek dm-ploop: speed up ploop_prepare_bat_update() dm-ploop: make new allocations immediately visible in BAT dm-ploop: drop ploop_cluster_is_in_top_delta() dm-ploop: do not wait for BAT update for non-FUA requests dm-ploop: add delay for metadata writeback dm-ploop: submit all postponed metadata on REQ_OP_FLUSH dm-ploop: handle REQ_PREFLUSH Feature: dm-ploop: ploop target driver --- drivers/md/dm-ploop-cmd.c | 13 +----- drivers/md/dm-ploop-map.c | 98 +++++++++++++++++++++++---------------- drivers/md/dm-ploop.h | 2 +- 3 files changed, 60 insertions(+), 53 deletions(-) diff --git a/drivers/md/dm-ploop-cmd.c b/drivers/md/dm-ploop-cmd.c index 2b85be2171cf..1cd215800a16 100644 --- a/drivers/md/dm-ploop-cmd.c +++ b/drivers/md/dm-ploop-cmd.c @@ -123,7 +123,6 @@ ALLOW_ERROR_INJECTION(ploop_inflight_bios_ref_switch, ERRNO); static void ploop_resume_submitting_pios(struct ploop *ploop) { - LIST_HEAD(list); struct llist_node *suspended_pending; spin_lock_irq(&ploop->deferred_lock); @@ -132,16 +131,8 @@ static void ploop_resume_submitting_pios(struct ploop *ploop) spin_unlock_irq(&ploop->deferred_lock); suspended_pending = llist_del_all(&ploop->llsuspended_pios); - if (suspended_pending) { - struct llist_node *pos, *t; - - llist_for_each_safe(pos, t, suspended_pending) { - struct pio *pio = llist_entry(pos, typeof(*pio), llist); - - list_add(&pio->list, &list); - } - ploop_submit_embedded_pios(ploop, &list); - } + if (suspended_pending) + ploop_submit_embedded_pios(ploop, suspended_pending); } static int ploop_suspend_submitting_pios(struct ploop *ploop) diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c index d11428c67f9b..cb24738efc7c 100644 --- a/drivers/md/dm-ploop-map.c +++ b/drivers/md/dm-ploop-map.c @@ -143,18 +143,11 @@ static void ploop_init_prq_and_embedded_pio(struct ploop *ploop, void ploop_enospc_timer(struct timer_list *timer) { struct ploop *ploop = from_timer(ploop, timer, enospc_timer); - LIST_HEAD(list); - struct llist_node *pos, *t; struct llist_node *enospc_pending = llist_del_all(&ploop->enospc_pios); if (enospc_pending) { enospc_pending = llist_reverse_order(enospc_pending); - llist_for_each_safe(pos, t, enospc_pending) { - struct pio *pio = llist_entry(pos, typeof(*pio), llist); - - list_add(&pio->list, &list); - } - ploop_submit_embedded_pios(ploop, &list); + ploop_submit_embedded_pios(ploop, enospc_pending); } } @@ -303,11 +296,15 @@ static struct pio *ploop_split_and_chain_pio(struct ploop *ploop, ALLOW_ERROR_INJECTION(ploop_split_and_chain_pio, NULL); static int ploop_split_pio_to_list(struct ploop *ploop, struct pio *pio, - struct list_head *ret_list) + struct llist_head *ret_llist) { u32 clu_size = CLU_SIZE(ploop); + struct llist_node *pos, *t; struct pio *split; - LIST_HEAD(list); + LLIST_HEAD(llist); + struct llist_node *lltmp; + + WARN_ON(!pio->bi_iter.bi_size); while (1) { loff_t start = to_bytes(pio->bi_iter.bi_sector); @@ -325,17 +322,24 @@ static int ploop_split_pio_to_list(struct ploop *ploop, struct pio *pio, if (!split) goto err; - list_add_tail(&split->list, &list); + llist_add(&split->llist, &llist); } - list_splice_tail(&list, ret_list); - list_add_tail(&pio->list, ret_list); + pio->llist.next = NULL; + llist_add(&pio->llist, &llist); + lltmp = llist_reverse_order(llist_del_all(&llist)); + pio->llist.next = NULL; + llist_add_batch(lltmp, &pio->llist, ret_llist); + return 0; err: - while ((pio = ploop_pio_list_pop(&list)) != NULL) { + llist_for_each_safe(pos, t, llist.first) { + pio = llist_entry(pos, typeof(*pio), llist); pio->bi_status = BLK_STS_RESOURCE; + pio->llist.next = NULL; ploop_pio_endio(pio); } + return -ENOMEM; } ALLOW_ERROR_INJECTION(ploop_split_pio_to_list, ERRNO); @@ -1632,12 +1636,11 @@ ALLOW_ERROR_INJECTION(ploop_create_bvec_from_rq, NULL); static void ploop_prepare_one_embedded_pio(struct ploop *ploop, struct pio *pio, - struct list_head *deferred_pios) + struct llist_head *lldeferred_pios) { struct ploop_rq *prq = pio->endio_cb_data; struct request *rq = prq->rq; struct bio_vec *bvec = NULL; - LIST_HEAD(list); int ret; if (rq->bio != rq->biotail) { @@ -1656,16 +1659,18 @@ static void ploop_prepare_one_embedded_pio(struct ploop *ploop, pio->bi_iter.bi_idx = 0; pio->bi_iter.bi_bvec_done = 0; } else { - /* Single bio already provides bvec array */ + /* Single bio already provides bvec array + * bvec is updated to the correct on submit + * it is different after partial IO + */ bvec = rq->bio->bi_io_vec; - pio->bi_iter = rq->bio->bi_iter; } pio->bi_iter.bi_sector = ploop_rq_pos(ploop, rq); pio->bi_io_vec = bvec; pio->queue_list_id = PLOOP_LIST_DEFERRED; - ret = ploop_split_pio_to_list(ploop, pio, deferred_pios); + ret = ploop_split_pio_to_list(ploop, pio, lldeferred_pios); if (ret) goto err_nomem; @@ -1677,7 +1682,7 @@ static void ploop_prepare_one_embedded_pio(struct ploop *ploop, static void ploop_prepare_embedded_pios(struct ploop *ploop, struct llist_node *pios, - struct list_head *deferred_pios) + struct llist_head *deferred_pios) { struct pio *pio; struct llist_node *pos, *t; @@ -1693,12 +1698,16 @@ static void ploop_prepare_embedded_pios(struct ploop *ploop, } static void ploop_process_deferred_pios(struct ploop *ploop, - struct list_head *pios) + struct llist_head *pios) { struct pio *pio; + struct llist_node *pos, *t; - while ((pio = ploop_pio_list_pop(pios)) != NULL) + llist_for_each_safe(pos, t, pios->first) { + pio = llist_entry(pos, typeof(*pio), llist); + INIT_LIST_HEAD(&pio->list); /* until type is changed */ ploop_process_one_deferred_bio(ploop, pio); + } } static void ploop_process_one_discard_pio(struct ploop *ploop, struct pio *pio) @@ -1797,19 +1806,13 @@ static void ploop_submit_metadata_writeback(struct ploop *ploop) } } -static void process_ploop_fsync_work(struct ploop *ploop) +static void process_ploop_fsync_work(struct ploop *ploop, struct llist_node *llflush_pios) { struct file *file; struct pio *pio; int ret; - struct llist_node *llflush_pios; struct llist_node *pos, *t; - llflush_pios = llist_del_all(&ploop->pios[PLOOP_LIST_FLUSH]); - - if (!llflush_pios) - return; - file = ploop_top_delta(ploop)->file; /* All flushes are done as one */ ret = vfs_fsync(file, 0); @@ -1828,12 +1831,13 @@ static void process_ploop_fsync_work(struct ploop *ploop) void do_ploop_run_work(struct ploop *ploop) { - LIST_HEAD(deferred_pios); + LLIST_HEAD(deferred_pios); struct llist_node *llembedded_pios; struct llist_node *lldeferred_pios; struct llist_node *lldiscard_pios; struct llist_node *llcow_pios; struct llist_node *llresubmit; + struct llist_node *llflush_pios; unsigned int old_flags = current->flags; current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO; @@ -1841,24 +1845,23 @@ void do_ploop_run_work(struct ploop *ploop) llembedded_pios = llist_del_all(&ploop->pios[PLOOP_LIST_PREPARE]); lldeferred_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DEFERRED]); + llresubmit = llist_del_all(&ploop->llresubmit_pios); lldiscard_pios = llist_del_all(&ploop->pios[PLOOP_LIST_DISCARD]); llcow_pios = llist_del_all(&ploop->pios[PLOOP_LIST_COW]); - llresubmit = llist_del_all(&ploop->llresubmit_pios); /* add old deferred back to the list */ if (lldeferred_pios) { struct llist_node *pos, *t; - struct pio *pio; - + /* Add one by one we need last for batch add */ llist_for_each_safe(pos, t, lldeferred_pios) { - pio = llist_entry(pos, typeof(*pio), llist); - INIT_LIST_HEAD(&pio->list); - list_add(&pio->list, &deferred_pios); + llist_add(pos, &deferred_pios); } } ploop_prepare_embedded_pios(ploop, llembedded_pios, &deferred_pios); + llflush_pios = llist_del_all(&ploop->pios[PLOOP_LIST_FLUSH]); + if (llresubmit) ploop_process_resubmit_pios(ploop, llist_reverse_order(llresubmit)); @@ -1872,8 +1875,9 @@ void do_ploop_run_work(struct ploop *ploop) ploop_submit_metadata_writeback(ploop); - if (!llist_empty(&ploop->pios[PLOOP_LIST_FLUSH])) - process_ploop_fsync_work(ploop); + if (llflush_pios) + process_ploop_fsync_work(ploop, llist_reverse_order(llflush_pios)); + current->flags = old_flags; } @@ -1916,6 +1920,9 @@ static void ploop_submit_embedded_pio(struct ploop *ploop, struct pio *pio) { struct ploop_rq *prq = pio->endio_cb_data; struct request *rq = prq->rq; + LLIST_HEAD(deferred_pios); + struct pio *spio; + struct llist_node *pos, *t; if (blk_rq_bytes(rq)) { pio->queue_list_id = PLOOP_LIST_PREPARE; @@ -1930,17 +1937,26 @@ static void ploop_submit_embedded_pio(struct ploop *ploop, struct pio *pio) } ploop_inc_nr_inflight(ploop, pio); - llist_add(&pio->llist, &ploop->pios[PLOOP_LIST_PREPARE]); + ploop_prepare_one_embedded_pio(ploop, pio, &deferred_pios); + + llist_for_each_safe(pos, t, deferred_pios.first) { + spio = llist_entry(pos, typeof(*pio), llist); + llist_add(&spio->llist, &ploop->pios[PLOOP_LIST_DEFERRED]); + } ploop_schedule_work(ploop); } -void ploop_submit_embedded_pios(struct ploop *ploop, struct list_head *list) +void ploop_submit_embedded_pios(struct ploop *ploop, struct llist_node *llist) { struct pio *pio; + struct llist_node *pos, *t; - while ((pio = ploop_pio_list_pop(list)) != NULL) + llist_for_each_safe(pos, t, llist) { + pio = llist_entry(pos, typeof(*pio), llist); + INIT_LIST_HEAD(&pio->list); ploop_submit_embedded_pio(ploop, pio); + } } int ploop_clone_and_map(struct dm_target *ti, struct request *rq, diff --git a/drivers/md/dm-ploop.h b/drivers/md/dm-ploop.h index c0cb08cff16a..77013cff9cec 100644 --- a/drivers/md/dm-ploop.h +++ b/drivers/md/dm-ploop.h @@ -572,7 +572,7 @@ extern int ploop_add_delta(struct ploop *ploop, u32 level, extern int ploop_check_delta_length(struct ploop *ploop, struct file *file, loff_t *file_size); extern void ploop_submit_embedded_pios(struct ploop *ploop, - struct list_head *list); + struct llist_node *llist); extern void ploop_dispatch_pios(struct ploop *ploop, struct pio *pio, struct list_head *pio_list); extern void do_ploop_work(struct work_struct *ws); -- 2.43.5 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel