AHA! Ok, now I understand a little better. I have seen the difference between "size" and "size on disk" and did not realize that applied here.
I'm still not 100% clear on _why_ two "identical" files would have different results for "size on disk" (it _seems_ like those should be identical) but I suspect that the answer is probably of a technical nature that would be "over my head" so to speak, and truthfully, all I really need to know is "sometimes that happens" rather than understanding the technical details of why. I appreciate you taking the time to educate me further about this. Cheers Tj On Mon, Dec 16, 2019 at 2:47 AM Bernhard Voelker <m...@bernhard-voelker.de> wrote: > > On 2019-12-16 07:25, TJ Luoma wrote: > > I sort of followed most of the technical part of that but I still don’t > > understand why it’s not a bug to show different information about two > > identical files. > > > > Which may indicate that I didn’t understand the technical part very well. > > > > As an end user, it’s hard to understand how that inconsistency isn’t both > > undesirable and a bug. > > > > I could maybe see if they were two files with the same byte-count but > > different composition that made the calculations off by 1, but this is an > > identical file and it’s showing up with two different sizes, in a tool > > meant to report sizes. > > > > That just seems “obviously” wrong even if it’s somehow technically > > explainable. > > Thanks for following up on this for further clarifications. > > I think the problem is the word "size": > while 'ls' and 'du --apparent-size' show the length of the content of > a file, 'du' (without --apparent-size') reports the space the file > needs on disk. > > $ du --help | sed 3q > Usage: du [OPTION]... [FILE]... > or: du [OPTION]... --files0-from=F > Summarize disk usage of the set of FILEs, recursively for directories. > ____________^^^^^^^^^^ > > One reason for those sizes to differ are "holes". As an extreme case, > one can create a 4 Terabyte file (just NULs) on a filesystem which is > much smaller than that: > > # Filesystem size. > $ df -h --out=size,target . > Size Mounted on > 591G /mnt > > # Create a NUL-only file of size 4 Terabyte. > $ truncate -s4T f2 > > # 'ls' shows the 4T of file size. > $ ls -logh f2 > -rw-r--r-- 1 4.0T Dec 16 08:36 f2 > > # 'du' shows that the file does not even require any disk usage. > $ du -h f2 > 0 f2 > > # ... but with '--apparent-size' reports the real (content) size. > $ du -h --apparent-size f2 > 4.0T f2 > > # Any program will see the 4T content transparently. > $ wc -c < f2 > 4398046511104 > > In your case, the file was a mixture of regular data and holes, > and 'cp' (without --sparse=always) tried to automatically determine > if the target file should have holes or not (see 'man cp'). > Therefore, your 2 files had a different disk usage, but the net length > of the content is identical, of course. > > Have a nice day, > Berny