TJ Luoma wrote: > AHA! Ok, now I understand a little better. I have seen the difference > between "size" and "size on disk" and did not realize that applied > here. > > I'm still not 100% clear on _why_ two "identical" files would have > different results for "size on disk" (it _seems_ like those should be > identical) but I suspect that the answer is probably of a technical > nature that would be "over my head" so to speak, and truthfully, all I > really need to know is "sometimes that happens" rather than > understanding the technical details of why.
I think at the start is where the confusion began. Because the commands are named to show that they were intended to show different things. 'du' is named for showing disk usage 'ls' is named for listing files And those are rather different things! Let's dig into the details. The long format for information says: ‘-l’ ‘--format=long’ ‘--format=verbose’ In addition to the name of each file, print the file type, file mode bits, number of hard links, owner name, group name, size, and timestamp (*note Formatting file timestamps::), normally the modification timestamp (the mtime, *note File timestamps::). Print question marks for information that cannot be determined. So we know that ls lists the size of the file. But let me specifically say that this is tagged to the *file*. It's file centric. There is also the -s option. ‘-s’ ‘--size’ Print the disk allocation of each file to the left of the file name. This is the amount of disk space used by the file, which is usually a bit more than the file’s size, but it can be less if the file has holes. This displays how much disk space the file consumes instead of the size of the file. The two being different things. And then the 'du' documentation says: ‘du’ reports the amount of disk space used by the set of specified files And so du is the disk used by the file. But as we know the amount of disk used is dependent upon the file system holding the file. Different file systems will have different storage methods and the amount of disk space being consumed by a file will be different and somewhat unrelated to the size of the file. Disk space consumed to hold the file could be larger or smaller than the file size. In particular if the file is sparse then there are "holes" in the middle that are all zero data and do not need to be stored. Thereby saving the space. In which case it will be smaller. Or since files are stored in blocks the final block will have some fragment of space at the end that is past the end of the file but too small to be used for other files. In which case it will be larger. Therefore it is not surprising that the numbers displayed for disk usage is not the same as the file content size. They would really only line up exactly if the file content size is a multiple of the file system storage block size and every block is fully represented on disk. Otherwise they will always be at least somewhat different in number. As long as I am here I should mention 'df' which shows disk free space information. One sometimes thinks that adding up the file content size should add up to du disk usage size, but it doesn't. And one sometimes thinks that adding up all of the du disk usage sizes should add up to the df disk free sizes, but it doesn't. That is due to a similar reason. File systems reserve a min-free amount of space for superuser level processes to ensure continued operation even if the disk is fulling up from non-privileged processes. Also file system efficiency and performance drops dramatically as the file system fills up. Therefore the file system reports space with the min-free reserved space in mind. And once again this is different on different file systems. But let me return to your first bit of information. The ls long listing of the files. Your version of ls gave an indication that something was different about the second file. > % command ls -l *pkg > -rw-r--r-- 1 tjluoma staff 88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg > -rw-r--r--@ 1 tjluoma staff 88885047 Dec 15 00:02 > Stream_Deck_4.4.2.12189.pkg See that '@' in that position? The GNU ls coreutils 8.30 documentation I am looking at says: Following the file mode bits is a single character that specifies whether an alternate access method such as an access control list applies to the file. When the character following the file mode bits is a space, there is no alternate access method. When it is a printing character, then there is such a method. GNU ‘ls’ uses a ‘.’ character to indicate a file with a security context, but no other alternate access method. A file with any other combination of alternate access methods is marked with a ‘+’ character. I did not see anywhere that documented what an '@' means. Therefore it is likely something applied in a downstream patch. Likely a software distribution specific modification. But I don't really know. I live under a rock and don't get out much. But likely meaning that the second file listed with the file mode '@' is not stored on disk in a typical way. That's probably the first clue that it is different. But actually I do not know as I do not see files listed that way here. Bob