On Thu, Apr 4, 2024 at 7:00 AM Gao Xiang <hsiang...@linux.alibaba.com> wrote: > > Hi Sandeep, > > On 2024/4/4 07:57, Sandeep Dhavale wrote: > > Add optimization to treat data blocks filled with 0s as a hole. > > Even though diskspace savings are comparable to chunk based or dedupe, > > having no block assigned saves us redundant disk IOs during read. > > > > This patch detects if the block is filled with zeros and marks > > chunk as erofs_holechunk so there is no physical block assigned. > > > > Signed-off-by: Sandeep Dhavale <dhav...@google.com> > > --- > > lib/blobchunk.c | 25 ++++++++++++++++++++++++- > > 1 file changed, 24 insertions(+), 1 deletion(-) > > > > diff --git a/lib/blobchunk.c b/lib/blobchunk.c > > index 641e3d4..8535058 100644 > > --- a/lib/blobchunk.c > > +++ b/lib/blobchunk.c > > @@ -232,6 +232,21 @@ static void erofs_update_minextblks(struct > > erofs_sb_info *sbi, > > *minextblks = lb; > > } > > > > +static bool erofs_is_buf_zeros(void *buf, unsigned long len) > > +{ > > + int i, words; > > + const unsigned long *words_buf = buf; > > + words = len / sizeof(unsigned long); > > + > > + DBG_BUGON(len % sizeof(unsigned long)); > > + > > + for (i = 0; i < words; i++) { > > + if (words_buf[i]) > > + return false; > > + } > > + return true; > > +} > > + > > int erofs_blob_write_chunked_file(struct erofs_inode *inode, int fd, > > erofs_off_t startoff) > > { > > @@ -323,7 +338,15 @@ int erofs_blob_write_chunked_file(struct erofs_inode > > *inode, int fd, > > ret = -EIO; > > goto err; > > } > > - > > + if (len == chunksize && erofs_is_buf_zeros(chunkdata, len)) { > > + /* if data is all zeros, treat this block as hole */ > > + *(void **)idx++ = &erofs_holechunk; > > + erofs_update_minextblks(sbi, interval_start, pos, > > + &minextblks); > > + interval_start = pos + len; > > + lastch = NULL; > > + continue; > > + } > > chunk = erofs_blob_getchunk(sbi, chunkdata, len); > > Yes, it's a valid opt. Yet, I guess we could calculate the unique hash value > of all zeroes (with chunksize) in advance (e.g. when initialization). > > And compare the buf hash and the all-zeroed hash in erofs_blob_getchunk(), if > they are the same, let's just compare all data then (or not compare since > it's little chance to have such collision. In that case, erofs_blob_getchunk > returns erofs_holechunk. > > Does it sound a good idea to you? > Hi Gao, Sure, I will re-work this in v2.
Thanks, Sandeep. > Thanks, > Gao Xiang > > > if (IS_ERR(chunk)) { > > ret = PTR_ERR(chunk);