On 12/04/2021 16:21, Johannes Thumshirn wrote: >> If it affects only the zoned case, I also don't see why doing it when >> not in zoned mode (and -o discard=sync is set). >> Unless of course the default discard mechanism is broken somehow, in >> which case we need to find out why and fix it. > > I'm at it. >
OK I've found out what's wrong. In btrfs_relocate_block_group() we're calling
btrfs_inc_block_group_ro(). btrfs_update_block_group() will call
btrfs_mark_bg_unused() to add the block group to the 'fs_info->unused_bgs' list.
But in btrfs_delete_unused_bgs() we have this check:
if (block_group->reserved || block_group->pinned ||
block_group->used || block_group->ro ||
list_is_singular(&block_group->list)) {
/*
* We want to bail if we made new allocations or have
* outstanding allocations in this block group. We do
* the ro check in case balance is currently acting on
* this block group.
*/
trace_btrfs_skip_unused_block_group(block_group);
spin_unlock(&block_group->lock);
up_write(&space_info->groups_sem);
goto next;
}
So we're skipping over the removal of the block group. I've instrumented the
kernel and also tested with non-zoned devices, this skip is always present
with block_group->ro = 1.
Also the auto-relocation patch has a problem, not decrementing block_group->ro
and so it's left at ro = 2 after auto relocation got triggered.
So the question is, why are we leaving block_group->ro at 1 after relocation
finished? Probably that the allocator doesn't pick the block group for the next
allocation.
What am I missing?
Thanks,
Johannes
