gentle ping On Fri, Mar 19, 2021 at 12:22 PM Ondrej Dubaj <odu...@redhat.com> wrote:
> Hello, > any update here? > > Thanks. > Ondrej > > On Mon, Mar 1, 2021 at 11:05 AM Ondrej Dubaj <odu...@redhat.com> wrote: > >> Ping, any updates here? >> >> Thanks. >> >> On Mon, Feb 15, 2021 at 5:07 PM Ondrej Dubaj <odu...@redhat.com> wrote: >> >>> Gentle ping >>> >>> On Mon, Jan 18, 2021 at 12:02 PM Ondrej Dubaj <odu...@redhat.com> wrote: >>> >>>> One of the customer faced I/O errors while archiving a huge file 11 TB and >>>> observed that after Tar had hit read I/O error due to xfs filesystem, it >>>> still continue writing 0's to the file using strace. However there was no >>>> indication for tar that it was writing 0's when the error occurred. >>>> >>>> Later it was found that it is expected behavior to write 0's as the file >>>> header is already written. Hence, it need to be padded with 0's. >>>> >>>> Using the reproducing steps provided by customer, we can see this behavior. >>>> >>>> Padding 0's is expected behavior however it does so silently (for Read >>>> error at byte...), it should say it is Padding with zeros similar to how >>>> it reports "File Shrank , padding with zeroes" >>>> >>>> During the reproducer steps provided by customer we see that sometimes tar >>>> report "Read I/O errors" as "File shrank, padding with 0" , we see in the >>>> step(2) provided. >>>> >>>> Reproducer available here: >>>> >>>> #!/bin/bash >>>> # Reproducer "tardust" >>>> # >>>> # When "tar create" reads a file there are several shortcomings when it >>>> hits read error >>>> # >>>> # 1) When read() returns 0 bytes due to read error, then this happens >>>> # read(4, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # read(4, 0x563adef7b000, 3584) = -1 EIO (Input/output error) >>>> # write(2, "tar: ", 5tar: ) = 5 >>>> # write(2, "/mntx/testfile: Read error at by"..., 70/mntx/testfile: Read >>>> error at byte 260653056, while reading 3584 bytes) = 70 >>>> # write(2, ": Input/output error", 20: Input/output error) = 20 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # Actual behaviour: it prints a message about "Read error", but it >>>> conceals the information it will pad the output with zeros >>>> # Expected behaviour: it should also print the information "padding with >>>> zero" >>>> # 2) There is a 2nd shortcoming about tar not differentiate between "read >>>> error" and "file shrinkage" >>>> # That means when it sees a short read due to read error, it does not >>>> report read error. >>>> # It looks like this: >>>> # read(4, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # read(4, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 2560 <<< HERE >>>> # write(2, "tar: ", 5tar: ) = 5 >>>> # write(2, "/mntx/testfile: File shrank by 5"..., 65/mntx/testfile: File >>>> shrank by 53927936 bytes; padding with zeros) = 65 >>>> # write(2, "\n", 1 >>>> # ) = 1 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # Summary: A read error is not reported here. At least it now says >>>> "padding with zeros" >>>> # Expected behaviour: it should report a read error, so the user knows >>>> what it going on. >>>> # >>>> # 3) Side-Note: >>>> # The blocking factor is applied to the output. When reading a file, all >>>> reads are misaligned by 512 bytes. >>>> # This is because it writes a 512 header for every archived file. >>>> # That means the first read from file is 512bytes too short: >>>> # Running with tar-blocking-factor=7 >>>> # fstat(1, {st_mode=S_IFREG|0644, st_size=17827, ...}) = 0 >>>> # write(1, "/mntx/testfile\n", 15/mntx/testfile >>>> # ) = 15 >>>> # read(4, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3072) = 3072 #1st read 512bytes too short >>>> # write(3, "mntx/testfile\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3584) >>>> = 3584 >>>> # read(4, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # write(3, >>>> "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., >>>> 3584) = 3584 >>>> # >>>> # 4) Reproducer overview: >>>> # - Create a 500MB testimage, then create a testfile1 in the image >>>> # - Use losetup/dmsetup with the "dust" target type >>>> # - you can inject IO errors at specified block number in "dust" >>>> # - You must hit a 4K boundary to see EIO, so use tar-blocking-factor=7 and >>>> # - vary the bad blocknumber to find the case (1) >>>> echo Step 1 Create disk image >>>> dd if=/dev/zero of=/tmp/testimage bs=1M count=500 || exit >>>> echo Step 2 Create XFS in image >>>> mkfs.xfs /tmp/testimage || exit >>>> echo Step 3 Use losetup so the file can be used a block device >>>> losetup /dev/loop1 /tmp/testimage || exit >>>> losetup >>>> echo Step 6 Now create the testfile, this will have read error injected >>>> later >>>> mkdir /mntx >>>> mount /dev/loop1 /mntx || exit >>>> dd if=/dev/zero of=/mntx/testfile bs=1M count=300 || exit >>>> umount /mntx >>>> echo Step7 Now iterating through bad blocks >>>> echo As result, there are strace output file a1000 ... a1040 >>>> for i in `seq 1000 1 1040` >>>> do >>>> echo >>>> echo Badblock $i >>>> let ERR=i >>>> let ERR1=i+1 >>>> let NUMSECTOR2=1024000-ERR1 >>>> #echo ERR1 is $ERR1 >>>> #echo NUMSECTOR2 is $NUMSECTOR2 >>>> dmsetup create tardust <<EOF >>>> 0 $ERR linear /dev/loop1 0 >>>> $ERR 1 error >>>> $ERR1 $NUMSECTOR2 linear /dev/loop1 $ERR1 >>>> EOF >>>> #dmsetup ls >>>> #dmsetup status >>>> #dmsetup table >>>> mount /dev/mapper/tardust /mntx || exit >>>> strace tar cvbf 7 /tmp/tardust.tar /mntx/testfile >&/tmp/a$i >>>> umount /mntx >>>> dmsetup remove tardust >>>> grep -e error -e shrank /tmp/a$i >>>> done >>>> echo "Done: inspect the strace output file for error behaviour (grep error >>>> ; Look at last read()-call )" >>>> losetup -d /dev/loop1 >>>> >>>> ================= >>>> >>>> Actual results: >>>> - When tar hits a disk read error when reading file from disk and creating >>>> an archive, then it prints "file shrank" >>>> - then it writes zeros (aka padding) according to initial file size (but >>>> does not print that message) >>>> - This happens in most cases (due to tar-block-size / disk-block-size / >>>> read-shift-by-512-bytes interaction) >>>> - I provided a reproducer which shows under which circumstances it >>>> correctly prints "Read error at byte…" >>>> >>>> Expected results: >>>> - When there is a read error, THEN tar shall report a read error >>>> - When there is a read error, THEN tar shall NOT report a "file shrank" >>>> - In addition it SHALL print "Padding with zeros". This is missing >>>> currently. >>>> >>>> >>>> Regards, >>>> >>>> Ondrej Dubaj >>>> >>>>