Re: [Bug-tar] [PATCH] Remove nonportable check for files containing only zeroes

Andreas Dilger Mon, 22 Jan 2018 12:28:07 -0800

On Jan 20, 2018, at 5:06 PM, Mark H Weaver <m...@netris.org> wrote:
> 
> Andreas Dilger <adil...@dilger.ca> writes:
> 
>> On Jan 10, 2018, at 4:50 AM, Pavel Raiskup <prais...@redhat.com> wrote:
>>> 
>>> On Wednesday, January 10, 2018 3:42:52 AM CET Mark H Weaver wrote:
>>>> From da922703282b0d3b8837a99a9c7fdd32f1d20d49 Mon Sep 17 00:00:00 2001
>>>> From: Mark H Weaver <m...@netris.org>
>>>> Date: Tue, 9 Jan 2018 20:16:14 -0500
>>>> Subject: [PATCH] Remove nonportable check for files containing only zeroes.
>>>> 
>>>> This check benefitted only one unlikely case (large files containing
>>>> only zeroes, on systems that do not support SEEK_HOLE)
>>> 
>>> It drops the optimization even for situations when SEEK_HOLE is not
>>> available, which is not 100% necessary.  I'm not proposing doing otherwise
>>> (I actually proposed this in [1]), but I'm rather CCing Andreas once more,
>>> as that's the original requester, the use-cases with lustre were
>>> definitely not unlikely and the question is whether SEEK_HOLE covers them
>>> nowadays.
>>> 
>>> [1] https://lists.gnu.org/archive/html/bug-tar/2016-07/msg00017.html
>> 
>> Sorry for the late reply on this thread.
>> 
>> It should be noted that the real problem was not related to backing
>> up files at the Lustre level, but rather backing up files directly from
>> the backing ext4 filesystem of the metadata target, for example if migrating
>> to new hardware, or for backup/restore of only that target.  The MDT stored
>> the actual file size in the metadata inode (could be GB or TB in size), but
>> the file data was stored on data servers on other nodes.
>> 
>> This meant that using the old tar versions to do the MDT backup might take
>> days at 100% CPU just to write sparse files to the tarball.  If tar is now
>> using SEEK_HOLE to determine sparseness, then the ext4-level backups with
>> newer systems should work OK (SEEK_HOLE was added to ext4 in the 3.0 kernel,
>> and was improved in 3.7, though a data consistency bug with unwritten data
>> was just fixed in 4.12).
>> 
>> That means SEEK_HOLE is NOT available in RHEL 6.x kernels, which are still
>> in fairly widespread (though declining) use.
> 
> Ah, I see, thank you for this explanation.  So this corner case is not
> so unlikely as I supposed.
> 
>> I'd prefer that the heuristic for sparse files without SEEK_HOLE not
>> be removed completely, but I do think that it needs to be fixed for
>> the small inline file and file in cache cases.
> 
> I appreciate your concerns.  Unfortunately, as I explain below, this
> heuristic is based on a false assumption, even for very large files, and
> leads to data loss on Btrfs.
> 
> If we are to continue using this heuristic at all, we will need a way to
> decide when it can be used safely.  I don't know of a good way to do
> this, but I'm open to suggestions.
> 
> It makes me wonder how many RHEL 6.x users will use GNU tar 1.31 on
> their otherwise very old enterprise systems to back up their disks.
> I would expect those users to use their enterprise-grade old 'tar'.
> 
>>>> and was based on an assumption about file system behavior that is not
>>>> mandated by POSIX and no longer holds in practice, namely that for
>>>> sufficiently large files, (st_blocks == 0) implies that the file
>>>> contains only zeroes.  Examples of file systems that violate this
>>>> assumption include Linux's /proc file system and Btrfs.
>> 
>> Is that comment correct, namely that btrfs has "large" files that report
>> st_blocks == 0 even though they contain data?
> 
> Yes, on Btrfs I reliably see (st_blocks == 0) on a recently written,
> mostly sparse file with size > 8G, using linux-libre-4.14.14.  More
> specifically, the "storing sparse files > 8G" test in tar's test suite
> reliably fails on my system:
> 
>  140: storing sparse files > 8G                       FAILED (sparse03.at:29)


I'd consider this a bug in Btrfs.  As mentioned previously, we had the same
problem with ext4 (twice) and Lustre, and in both cases fixed this by adding
in the (dirty page cache pages/512) if the current block count is zero:

int ext4_file_getattr(const struct path *path, struct kstat *stat,
                      u32 request_mask, unsigned int query_flags)
{
        struct inode *inode = d_inode(path->dentry);
        u64 delalloc_blocks;

        ext4_getattr(path, stat, request_mask, query_flags);

        /*
         * If there is inline data in the inode, the inode will normally not
         * have data blocks allocated (it may have an external xattr block).
         * Report at least one sector for such files, so tools like tar, rsync,
         * others don't incorrectly think the file is completely sparse.
         */
        if (unlikely(ext4_has_inline_data(inode)))
                stat->blocks += (stat->size + 511) >> 9;

        /*
         * We can't update i_blocks if the block allocation is delayed
         * otherwise in the case of system crash before the real block
         * allocation is done, we will have i_blocks inconsistent with
         * on-disk file blocks.
         * We always keep i_blocks updated together with real
         * allocation. But to not confuse with user, stat
         * will return the blocks that include the delayed allocation
         * blocks for this file.
         */
        delalloc_blocks = EXT4_C2B(EXT4_SB(inode->i_sb),
                                   EXT4_I(inode)->i_reserved_data_blocks);
        stat->blocks += delalloc_blocks << (inode->i_sb->s_blocksize_bits - 9);
        return 0;
}


static int vvp_object_glimpse(const struct lu_env *env,
                              const struct cl_object *obj, struct ost_lvb *lvb)
{
        struct inode *inode = vvp_object_inode(obj);

        ENTRY;
        lvb->lvb_mtime = LTIME_S(inode->i_mtime);
        lvb->lvb_atime = LTIME_S(inode->i_atime);
        lvb->lvb_ctime = LTIME_S(inode->i_ctime);

        /*
         * LU-417: Add dirty pages block count lest i_blocks reports 0, some
         * "cp" or "tar" on remote node may think it's a completely sparse file
         * and skip it.
         */
        if (lvb->lvb_size > 0 && lvb->lvb_blocks == 0)
                lvb->lvb_blocks = dirty_cnt(inode);

        RETURN(0);
}

Cheers, Andreas

signature.asc
Description: Message signed with OpenPGP

Re: [Bug-tar] [PATCH] Remove nonportable check for files containing only zeroes

Reply via email to