Re: [Devel] [PATCH vz9] ploop: fsync after all pios are sent

Andrey Zhadchenko Wed, 09 Oct 2024 05:43:55 -0700


On 10/8/24 13:46, Alexander Atanasov wrote:

On 8.10.24 14:35, Andrey Zhadchenko wrote:
On 10/4/24 09:58, Alexander Atanasov wrote:
Currently there are two workers one to handle pios,
one to handle flush (via vfs_fsync). This workers are
created unbound which means they are run whenever there is a free
CPU. When ploop sends pios (via ploop_dispatch_pios) it checks
if there are data and if there are flush pios. If both are
present both workers are scheduled to run. Which results in
a lot of writes and sync in parallel - which is slow and incorrect.
Slow due to the underlaying fs trying to write and sync, instead of
cache writes and then sync them. And incorrect due to the fact
that REQ_FLUSH ploop handles must complete after all is send to disk
which when run in parallel is not guaranteed to happen.
I think we are not expected to meet such strict REQ_FLUSHrequirements. First of all, flushes can be reordered with otherin-flight requests on block layer level, so the proper way for theuser is to issue flush after completion of requests he needs to flush.And I think we can even get already reordered flushes/writes fromlayers above.
if we have a list of pios containing both data and flushes, you questionbellow , we queue work to both workers which can result in reordering.By requirements to hit the disk i mean the REQ_PREFLUS/REQ_FUAconversion to REQ_FLUSH which we get. We can get reordered requests butlet's not introduce more.
To address this process flushes after all pending pios are processed
and submitted and then process any pending flushes.
Could you enlighten me if KIOCB API really does follow submissionorder? Is it possible that some submitted writes can be done afterlater submitted flush?Also our metadata writeback can trigger cow write, which will only besubmitted on the next thread iteration.
https://virtuozzo.atlassian.net/browse/VSTOR-93454
Signed-off-by: Alexander Atanasov <alexander.atana...@virtuozzo.com>
---
  drivers/md/dm-ploop-map.c | 51 ++++++++++++++++++++++++---------------
  1 file changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/md/dm-ploop-map.c b/drivers/md/dm-ploop-map.c
index ca2659ba30f5..481ca6556d69 100644
--- a/drivers/md/dm-ploop-map.c
+++ b/drivers/md/dm-ploop-map.c
@@ -369,7 +369,7 @@ void ploop_dispatch_pios(struct ploop *ploop,struct pio *pio,
      if (is_data)
          queue_work(ploop->wq, &ploop->worker);
-    if (is_flush)
+    else if (is_flush)
What does this change?
Do we expect is_flush and is_data at the same time?
I don't see anything preventing that to happen.


Code from ploop_dispatch_pio():

        if (pio->queue_list_id == PLOOP_LIST_FLUSH)
                *is_flush = true;
        else
                *is_data = true;

So I think these flags in a context of ploop_dispatch_pios() aremutually exclusive.

Lets drop this hunk

          queue_work(ploop->wq, &ploop->fsync_worker);
  }

@@ -1780,6 +1780,29 @@ static voidploop_submit_metadata_writeback(struct ploop *ploop)

      }
  }
+static void process_ploop_fsync_work(struct ploop *ploop)
+{
+    LIST_HEAD(flush_pios);
+    struct file *file;
+    struct pio *pio;
+    int ret;
+    spin_lock_irq(&ploop->deferred_lock);
+    list_splice_init(&ploop->pios[PLOOP_LIST_FLUSH], &flush_pios);
+    spin_unlock_irq(&ploop->deferred_lock);
+
+    file = ploop_top_delta(ploop)->file;
+    ret = vfs_fsync(file, 0);
+
+    while ((pio = ploop_pio_list_pop(&flush_pios)) != NULL) {
+        if (unlikely(ret)) {
+            pio->bi_status = errno_to_blk_status(ret);
+            if (static_branch_unlikely(&ploop_standby_check))
+                ploop_check_standby_mode(ploop, ret);
+        }
+        ploop_pio_endio(pio);
+    }
+}
+
  void do_ploop_work(struct work_struct *ws)
  {
      struct ploop *ploop = container_of(ws, struct ploop, worker);
@@ -1788,6 +1811,7 @@ void do_ploop_work(struct work_struct *ws)
      LIST_HEAD(discard_pios);
      LIST_HEAD(cow_pios);
      LIST_HEAD(resubmit_pios);
+    bool do_fsync = false;
      unsigned int old_flags = current->flags;
      current->flags |= PF_IO_THREAD|PF_LOCAL_THROTTLE|PF_MEMALLOC_NOIO;
@@ -1798,6 +1822,8 @@ void do_ploop_work(struct work_struct *ws)
      list_splice_init(&ploop->pios[PLOOP_LIST_DISCARD], &discard_pios);
      list_splice_init(&ploop->pios[PLOOP_LIST_COW], &cow_pios);
      list_splice_init(&ploop->resubmit_pios, &resubmit_pios);
+    if (!list_empty(&ploop->pios[PLOOP_LIST_FLUSH]))
+        do_fsync = true;
      spin_unlock_irq(&ploop->deferred_lock);

ploop_prepare_embedded_pios(ploop, &embedded_pios,&deferred_pios);

@@ -1810,31 +1836,16 @@ void do_ploop_work(struct work_struct *ws)
      ploop_submit_metadata_writeback(ploop);
      current->flags = old_flags;
+
+    if (do_fsync)
+        process_ploop_fsync_work(ploop);


With this, do we even need ploop->fsync_worker?

Also, if we queue something on fsync_worker, won't it just straightrun do_ploop_fsync_work() on some event thread on another processor?And here the workqueue will likely be already empty when called


My goal is to remove it later - there are some notes about resubmitting

partial io and potential issue if using only one workqueue, which idon't quite get yet.


Then we should postpone this patch for later.

Currently we have two workqueues, and with this patch we can do syncsfrom both of them. I do not see how it is better

_______________________________________________
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel

Re: [Devel] [PATCH vz9] ploop: fsync after all pios are sent

Reply via email to