On Mon, Oct 07, 2019 at 06:38:10PM -0400, Dennis Zhou wrote: > On Mon, Oct 07, 2019 at 04:37:28PM -0400, Josef Bacik wrote: > > On Mon, Oct 07, 2019 at 04:17:34PM -0400, Dennis Zhou wrote: > > > Async discard will use the free space cache as backing knowledge for > > > which extents to discard. This patch plumbs knowledge about which > > > extents need to be discarded into the free space cache from > > > unpin_extent_range(). > > > > > > An untrimmed extent can merge with everything as this is a new region. > > > Absorbing trimmed extents is a tradeoff to for greater coalescing which > > > makes life better for find_free_extent(). Additionally, it seems the > > > size of a trim isn't as problematic as the trim io itself. > > > > > > When reading in the free space cache from disk, if sync is set, mark all > > > extents as trimmed. The current code ensures at transaction commit that > > > all free space is trimmed when sync is set, so this reflects that. > > > > > > Signed-off-by: Dennis Zhou <[email protected]> > > > --- > > > fs/btrfs/extent-tree.c | 15 ++++++++++----- > > > fs/btrfs/free-space-cache.c | 38 ++++++++++++++++++++++++++++++------- > > > fs/btrfs/free-space-cache.h | 10 +++++++++- > > > fs/btrfs/inode-map.c | 13 +++++++------ > > > 4 files changed, 57 insertions(+), 19 deletions(-) > > > > > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > > > index 77a5904756c5..b9e3bedad878 100644 > > > --- a/fs/btrfs/extent-tree.c > > > +++ b/fs/btrfs/extent-tree.c > > > @@ -2782,7 +2782,7 @@ fetch_cluster_info(struct btrfs_fs_info *fs_info, > > > } > > > > > > static int unpin_extent_range(struct btrfs_fs_info *fs_info, > > > - u64 start, u64 end, > > > + u64 start, u64 end, u32 fsc_flags, > > > const bool return_free_space) > > > { > > > struct btrfs_block_group_cache *cache = NULL; > > > @@ -2816,7 +2816,9 @@ static int unpin_extent_range(struct btrfs_fs_info > > > *fs_info, > > > if (start < cache->last_byte_to_unpin) { > > > len = min(len, cache->last_byte_to_unpin - start); > > > if (return_free_space) > > > - btrfs_add_free_space(cache, start, len); > > > + __btrfs_add_free_space(fs_info, > > > + cache->free_space_ctl, > > > + start, len, fsc_flags); > > > } > > > > > > start += len; > > > @@ -2894,6 +2896,7 @@ int btrfs_finish_extent_commit(struct > > > btrfs_trans_handle *trans) > > > > > > while (!trans->aborted) { > > > struct extent_state *cached_state = NULL; > > > + u32 fsc_flags = 0; > > > > > > mutex_lock(&fs_info->unused_bg_unpin_mutex); > > > ret = find_first_extent_bit(unpin, 0, &start, &end, > > > @@ -2903,12 +2906,14 @@ int btrfs_finish_extent_commit(struct > > > btrfs_trans_handle *trans) > > > break; > > > } > > > > > > - if (btrfs_test_opt(fs_info, DISCARD_SYNC)) > > > + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) { > > > ret = btrfs_discard_extent(fs_info, start, > > > end + 1 - start, NULL); > > > + fsc_flags |= BTRFS_FSC_TRIMMED; > > > + } > > > > > > clear_extent_dirty(unpin, start, end, &cached_state); > > > - unpin_extent_range(fs_info, start, end, true); > > > + unpin_extent_range(fs_info, start, end, fsc_flags, true); > > > mutex_unlock(&fs_info->unused_bg_unpin_mutex); > > > free_extent_state(cached_state); > > > cond_resched(); > > > @@ -5512,7 +5517,7 @@ u64 btrfs_account_ro_block_groups_free_space(struct > > > btrfs_space_info *sinfo) > > > int btrfs_error_unpin_extent_range(struct btrfs_fs_info *fs_info, > > > u64 start, u64 end) > > > { > > > - return unpin_extent_range(fs_info, start, end, false); > > > + return unpin_extent_range(fs_info, start, end, 0, false); > > > } > > > > > > /* > > > diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c > > > index d54dcd0ab230..f119895292b8 100644 > > > --- a/fs/btrfs/free-space-cache.c > > > +++ b/fs/btrfs/free-space-cache.c > > > @@ -747,6 +747,14 @@ static int __load_free_space_cache(struct btrfs_root > > > *root, struct inode *inode, > > > goto free_cache; > > > } > > > > > > + /* > > > + * Sync discard ensures that the free space cache is always > > > + * trimmed. So when reading this in, the state should reflect > > > + * that. > > > + */ > > > + if (btrfs_test_opt(fs_info, DISCARD_SYNC)) > > > + e->flags |= BTRFS_FSC_TRIMMED; > > > + > > > if (!e->bytes) { > > > kmem_cache_free(btrfs_free_space_cachep, e); > > > goto free_cache; > > > @@ -2165,6 +2173,7 @@ static bool try_merge_free_space(struct > > > btrfs_free_space_ctl *ctl, > > > bool merged = false; > > > u64 offset = info->offset; > > > u64 bytes = info->bytes; > > > + bool is_trimmed = btrfs_free_space_trimmed(info); > > > > > > /* > > > * first we want to see if there is free space adjacent to the range we > > > @@ -2178,7 +2187,8 @@ static bool try_merge_free_space(struct > > > btrfs_free_space_ctl *ctl, > > > else > > > left_info = tree_search_offset(ctl, offset - 1, 0, 0); > > > > > > - if (right_info && !right_info->bitmap) { > > > + if (right_info && !right_info->bitmap && > > > + (!is_trimmed || btrfs_free_space_trimmed(right_info))) { > > > if (update_stat) > > > unlink_free_space(ctl, right_info); > > > else > > > @@ -2189,7 +2199,8 @@ static bool try_merge_free_space(struct > > > btrfs_free_space_ctl *ctl, > > > } > > > > > > if (left_info && !left_info->bitmap && > > > - left_info->offset + left_info->bytes == offset) { > > > + left_info->offset + left_info->bytes == offset && > > > + (!is_trimmed || btrfs_free_space_trimmed(left_info))) { > > > > So we allow merging if we haven't trimmed this entry, or if the adjacent > > entry > > is already trimmed? This means we'll merge if we trimmed the new entry > > regardless of the adjacent entries status, or if the new entry is drity. > > Why is > > that? Thanks, > > > > This is the tradeoff I called out above here: > > > > Absorbing trimmed extents is a tradeoff to for greater coalescing which > > > makes life better for find_free_extent(). Additionally, it seems the > > > size of a trim isn't as problematic as the trim io itself. > > A problematic example case: > > |----trimmed----|/////X/////|-----trimmed-----| > > If region X gets freed and returned to the free space cache, we end up > with the following: > > |----trimmed----|-untrimmed-|-----trimmed-----| > > This isn't great because now we need to teach find_free_extent() to span > multiple btrfs_free_space entries, something I didn't want to do. So the > other option is to overtrim trading for a simpler find_free_extent(). > Then the above becomes: > > |-------------------trimmed-------------------| > > It makes the assumption that if we're inserting, it's generally is free > space being returned rather than we needed to slice out from the middle > of a block. It does still have degenerative cases, but it's better than > the above. The merging also allows for stuff to come out of bitmaps more > proactively too. > > Also from what it seems, the cost of a discard operation is quite costly > relative to the amount your discarding (1 larger discard is better than > several smaller discards) as it will clog up the device too.
OOOOOh I fucking get it now. That's going to need a comment, because it's not obvious at all. However I still wonder if this is right. Your above examples are legitimate, but say you have | 512mib adding back that isn't trimmed |------- 512mib trimmed ------| we'll merge these two, but really we should probably trim that 512mib chunk we're adding right? Thanks, Josef
