On Fri, 2 Aug 2024, Kent Overstreet wrote:
On Tue, Jul 23, 2024 at 09:08:00PM GMT, Camila Alvarez wrote:
bch2_extent_fallocate should not return -BCH_ERR_transaction_restart_nested
when there are no buckets available.
Returning -BCH_ERR_transaction_restart_nested causes the sector allocation to
retry even if there's no space available.
It seems like you're saying this fixes a livelock, but I don't see that:
if we return operation_blocked, the closure_sync we do before we return
will still wait.
More explanation?
Yes, sorry for the delay. The closure_sync will still wait, but even after
waiting there's no space left.
I think the issue is that we're trying to allocate a bucket when none are
available:
closure_wait(&c->freelist_wait, cl);
journal_write_done is called at some point:
bch2_do_discard(c)
closure_wake_up(&c->freelist_wait)
...
That apparently doesn't free any buckets (should we add some sort of
assertion to notify us when this happens?), so there is still no space
left, so bch2_bucket_alloc_trans keeps returning -BCH_ERR_freelist_empty.
With this bch2_alloc_sectors_start_trans return
-BCH_err_bucket_alloc_blocked, which bch2_extent_fallocate turns into
-BCH_ERR_transaction_restart_nested.
Finally __bchfs_fallocate sees the error as BCH_ERR_transaction_restart,
so it tries again.
The issue is that the journal never discards any buckets so it just keeps
retrying indefinitely.
I've been thinking, and maybe we don't want to just keep __bchfs_fallocate
from retrying, but would prefer to control how many times it does.
Does all of this make any sense to you?
Reported-by: [email protected]
Signed-off-by: Camila Alvarez <[email protected]>
---
fs/bcachefs/io_misc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/bcachefs/io_misc.c b/fs/bcachefs/io_misc.c
index 82f9170dab3f..b5b92ef030df 100644
--- a/fs/bcachefs/io_misc.c
+++ b/fs/bcachefs/io_misc.c
@@ -90,8 +90,6 @@ int bch2_extent_fallocate(struct btree_trans *trans,
opts.data_replicas,
opts.data_replicas,
BCH_WATERMARK_normal, 0, &cl, &wp);
- if (bch2_err_matches(ret, BCH_ERR_operation_blocked))
- ret = -BCH_ERR_transaction_restart_nested;
if (ret)
goto err;
--
2.34.1