I sort of followed most of the technical part of that but I still don’t understand why it’s not a bug to show different information about two identical files.
Which may indicate that I didn’t understand the technical part very well. As an end user, it’s hard to understand how that inconsistency isn’t both undesirable and a bug. I could maybe see if they were two files with the same byte-count but different composition that made the calculations off by 1, but this is an identical file and it’s showing up with two different sizes, in a tool meant to report sizes. That just seems “obviously” wrong even if it’s somehow technically explainable. TjL On Sun, Dec 15, 2019 at 4:19 PM Bernhard Voelker <m...@bernhard-voelker.de> wrote: > tag 38621 notabug > close 38621 > stop > > On 2019-12-15 06:15, TJ Luoma wrote: > > I ended up with two version of the same file > > 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and > > wanted to check to see if they were the same file. > > > > I checked the size with `gdu` like so: > > > > % /usr/local/bin/gdu --si -s *pkg > > 101M StreamDeck-4.4.2.12189.pkg > > 102M Stream_Deck_4.4.2.12189.pkg > > > > Which led me to think they were different files / sizes. But when I > > used `ls -l` I was surprised to see this: > > > > % command ls -l *pkg > > -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 > StreamDeck-4.4.2.12189.pkg > > -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 > Stream_Deck_4.4.2.12189.pkg > > > > So they _are_ the same size. Are they the same file? I used `md5` to > check > > > > % command md5 -r *pkg > > 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg > > 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg > > > > OK, so these are exactly the same file. So… why did `gdu` tell me they > > are different sizes? > > > > % gdu --version > > du (GNU coreutils) 8.31 > > Copyright (C) 2019 Free Software Foundation, Inc. > > License GPLv3+: GNU GPL version 3 or later < > https://gnu.org/licenses/gpl.html>. > > This is free software: you are free to change and redistribute it. > > There is NO WARRANTY, to the extent permitted by law. > > > > Written by Torbjorn Granlund, David MacKenzie, Paul Eggert, > > and Jim Meyering. > > > > I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via > `brew`. > > > > Any help would be appreciated. > > This is a "sparse" file, i.e., a file with longer sequences of Zeroes > somewhere in between which can be stored more efficient on the disk. > Any application reading the data will get the correct number of Zeroes, > while some disk space is saved. > > E.g. the following creates a 300M file, with the first 100M and the last > 100M > with random data, and the 100M between is a "hole": > > # Write the 1st 100M (as usual). > $ dd bs=1M count=100 if=/dev/urandom of=f > 100+ 0 records in > 100+0 records out > 104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s > > # Write another 100M, but starting at a position of 200M, > # thus leaving Zeroes in between. > $ dd bs=1M seek=200 count=100 if=/dev/urandom of=f > 100+0 records in > 100+0 records out > 104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s > > $ ls -logh f > -rw-r--r-- 1 300M Dec 15 18:17 f > > $ du -h f # shows the space occupied on disk. > 200M f > > $ du --apparent-size -h f # shows the size applications would read. > 300M f > > See the documentation of 'cp' and 'du': > https://www.gnu.org/software/coreutils/cp (the --sparse option) > https://www.gnu.org/software/coreutils/du (the --apparent-size option) > > As this is not a bug in du(1), I'm marking this as such, and close the > ticket > in our bug tracker. The discussion can continue, of course. > > Have a nice day, > Berny >