On 7/14/25 11:49, Gao Xiang wrote: > Hi Chao, > > On 2025/7/14 11:34, Chao Yu wrote: >> This patch supports to readahead more blocks in erofs_readdir(), it can >> enhance readdir performance in large direcotry. >> >> readdir test in a large directory which contains 12000 sub-files. >> >> files_per_second >> Before: 926385.54 >> After: 2380435.562 >> >> Meanwhile, let's introduces a new sysfs entry to control page count >> of readahead to provide more flexible policy for readahead of readdir(). >> - location: /sys/fs/erofs/<disk>/dir_ra_pages >> - default value: 4 >> - range: [0, 128] >> - disable readahead: set the value to 0 >> >> Signed-off-by: Chao Yu <c...@kernel.org> >> --- >> v2: >> - introduce sysfs node to control page count of readahead during >> readdir(). >> Documentation/ABI/testing/sysfs-fs-erofs | 7 +++++++ >> fs/erofs/dir.c | 9 +++++++++ >> fs/erofs/internal.h | 5 +++++ >> fs/erofs/super.c | 2 ++ >> fs/erofs/sysfs.c | 5 +++++ >> 5 files changed, 28 insertions(+) >> >> diff --git a/Documentation/ABI/testing/sysfs-fs-erofs >> b/Documentation/ABI/testing/sysfs-fs-erofs >> index bf3b6299c15e..500c93484e4c 100644 >> --- a/Documentation/ABI/testing/sysfs-fs-erofs >> +++ b/Documentation/ABI/testing/sysfs-fs-erofs >> @@ -35,3 +35,10 @@ Description: Used to set or show hardware accelerators >> in effect >> and multiple accelerators are separated by '\n'. >> Supported accelerator(s): qat_deflate. >> Disable all accelerators with an empty string (echo > accel). >> + >> +What: /sys/fs/erofs/<disk>/dir_ra_pages >> +Date: July 2025 >> +Contact: "Chao Yu" <c...@kernel.org> >> +Description: Used to set or show page count of readahead during >> readdir(), >> + the range of value is [0, 128], by default it is 4, set it to >> + 0 to disable readahead. >> diff --git a/fs/erofs/dir.c b/fs/erofs/dir.c >> index 3e4b38bec0aa..40f828d5b670 100644 >> --- a/fs/erofs/dir.c >> +++ b/fs/erofs/dir.c >> @@ -47,8 +47,10 @@ static int erofs_readdir(struct file *f, struct >> dir_context *ctx) >> struct inode *dir = file_inode(f); >> struct erofs_buf buf = __EROFS_BUF_INITIALIZER; >> struct super_block *sb = dir->i_sb; >> + struct file_ra_state *ra = &f->f_ra; >> unsigned long bsz = sb->s_blocksize; >> unsigned int ofs = erofs_blkoff(sb, ctx->pos); >> + unsigned long nr_pages = DIV_ROUND_UP_POW2(dir->i_size, PAGE_SIZE); > > why using DIV_ROUND_UP_POW2 rather than DIV_ROUND_UP here?
Seems DIV_ROUND_UP_POW2() doesn't use bit shift as expected, let me use DIV_ROUND_UP() instead. > >> int err = 0; >> bool initial = true; >> @@ -63,6 +65,13 @@ static int erofs_readdir(struct file *f, struct >> dir_context *ctx) >> break; >> } >> + /* readahead blocks to enhance performance in large directory */ >> + if (EROFS_I_SB(dir)->dir_ra_pages && nr_pages - dbstart > 1 && > > dbstart is a byte-oriented value, so I'm not sure if it > works as you expect.. Oh, I checked patch in 6.6, found that I missed to handle it correctly when porting code to upstream, let me fix this. > >> + !ra_has_index(ra, dbstart)) >> + page_cache_sync_readahead(dir->i_mapping, ra, f, >> + dbstart, min(nr_pages - dbstart, > > same here. > >> + (pgoff_t)EROFS_I_SB(dir)->dir_ra_pages)); >> + >> de = erofs_bread(&buf, dbstart, true); >> if (IS_ERR(de)) { >> erofs_err(sb, "failed to readdir of logical block %llu of nid >> %llu", >> diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h >> index 0d19bde8c094..f0e5b4273aa8 100644 >> --- a/fs/erofs/internal.h >> +++ b/fs/erofs/internal.h >> @@ -157,6 +157,7 @@ struct erofs_sb_info { >> /* sysfs support */ >> struct kobject s_kobj; /* /sys/fs/erofs/<devname> */ >> struct completion s_kobj_unregister; >> + unsigned int dir_ra_pages; >> /* fscache support */ >> struct fscache_volume *volume; >> @@ -238,6 +239,10 @@ EROFS_FEATURE_FUNCS(xattr_filter, compat, >> COMPAT_XATTR_FILTER) >> #define EROFS_I_BL_XATTR_BIT (BITS_PER_LONG - 1) >> #define EROFS_I_BL_Z_BIT (BITS_PER_LONG - 2) >> +/* default/maximum readahead pages of directory */ >> +#define DEFAULT_DIR_RA_PAGES 4 >> +#define MAX_DIR_RA_PAGES 128 > > better to add EROFS_ prefix for them. > > Also could we setup those blocks or bytes instead > of pages? > > If users would like to setup values, they may don't > care more the page size since they only care about > images. Let's use bytes, and then roundup to blksize? > > Also why do we limit maximum number even if users > would like to readahead more? (such as fadvise > allows -1 too) No test, but I suspect there is border effect even we set it to i_size. Anyway, let me remove the upper boundary limitation, unless there is further requirement of limitation. Thanks, > > Thanks, > Gao Xiang >