tag 38621 notabug close 38621 stop On 2019-12-15 06:15, TJ Luoma wrote: > I ended up with two version of the same file > 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and > wanted to check to see if they were the same file. > > I checked the size with `gdu` like so: > > % /usr/local/bin/gdu --si -s *pkg > 101M StreamDeck-4.4.2.12189.pkg > 102M Stream_Deck_4.4.2.12189.pkg > > Which led me to think they were different files / sizes. But when I > used `ls -l` I was surprised to see this: > > % command ls -l *pkg > -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg > -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 > Stream_Deck_4.4.2.12189.pkg > > So they _are_ the same size. Are they the same file? I used `md5` to check > > % command md5 -r *pkg > 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg > 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg > > OK, so these are exactly the same file. So… why did `gdu` tell me they > are different sizes? > > % gdu --version > du (GNU coreutils) 8.31 > Copyright (C) 2019 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <https://gnu.org/licenses/gpl.html>. > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > Written by Torbjorn Granlund, David MacKenzie, Paul Eggert, > and Jim Meyering. > > I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via `brew`. > > Any help would be appreciated.
This is a "sparse" file, i.e., a file with longer sequences of Zeroes somewhere in between which can be stored more efficient on the disk. Any application reading the data will get the correct number of Zeroes, while some disk space is saved. E.g. the following creates a 300M file, with the first 100M and the last 100M with random data, and the 100M between is a "hole": # Write the 1st 100M (as usual). $ dd bs=1M count=100 if=/dev/urandom of=f 100+ 0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s # Write another 100M, but starting at a position of 200M, # thus leaving Zeroes in between. $ dd bs=1M seek=200 count=100 if=/dev/urandom of=f 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s $ ls -logh f -rw-r--r-- 1 300M Dec 15 18:17 f $ du -h f # shows the space occupied on disk. 200M f $ du --apparent-size -h f # shows the size applications would read. 300M f See the documentation of 'cp' and 'du': https://www.gnu.org/software/coreutils/cp (the --sparse option) https://www.gnu.org/software/coreutils/du (the --apparent-size option) As this is not a bug in du(1), I'm marking this as such, and close the ticket in our bug tracker. The discussion can continue, of course. Have a nice day, Berny