Re: du: Unexpected duplication

Collin Funk Tue, 22 Jul 2025 17:26:10 -0700

Hi Sylvestre,

Sylvestre Ledru <sylves...@debian.org> writes:


> I am wondering if this is an undocumented behavior or a bug in du:
>
> $ mkdir dir
>
> $ /usr/bin/du dir dir  dir dir
> 0    dir
>
> $ /usr/bin/du -l dir dir  dir dir
> 0    dir
> 0    dir
> 0    dir
> 0    dir
>
> And -l means "count sizes many times if hard linked".
>
> Is that on purpose or a bug?

Sorry for the months late response. I saw this while cleaning out old
mail.

It is difficult for me to know the original intention since the 'git
log' does not contain much about this option. This is because it was
written long before Fileutils and friends were merged into Coreutils
[1].

But the cause is that du hashes the st_ino and st_dev of the files it
processes to identify duplicates. Here is some files that I created to
demonstrate:

    $ touch file1
    $ ln file1 file2
    $ ln file1 file3
    $ stat --format=%n:%d:%i:%h file1 file2 file3
    file1:46:101655983:3
    file2:46:101655983:3
    file3:46:101655983:3

This explains why we do not see repeated files when using du without
arguments. Both when we refer to the regular file multiple times or the
hard links:

    $ du file1 file1 file1
    0   file1
    $ du file1 file2 file3
    0   file1

This was most likely the inspiration for 'du -l', so you could print the
specified hard links:

    $ du -l file1 file2 file3
    0   file1
    0   file2
    0   file3

Since we identify duplicates using st_ino/st_dev instead of the name
for identifying and there is no way to determine a hard link (something
similar to S_ISLNK for symbolic links), we print anything with st_nlink
greater than 1.

I suppose we could use a hash set of the file names to filter out
duplicates before they are processed if -l/--count-links is used. But I
am not sure if it is worth changing the behavior at this point.

Collin

[1] https://sources.debian.org/src/fileutils/3.16-5.3/

Re: du: Unexpected duplication

Reply via email to