On 2/13/25 15:33, bug.repor...@mail.sheugh.com wrote:
I am running some code to identify duplicate
files within a user filesystem.

On my Linux distribution, there are at least the following
tools to find and eliminate dupes:

duperemove
fdupes
freedup
jdupes

This proved to be a somewhat challenging task as
it was not immediately obvious how the use of the
du command would meet my needs.

I eventually found that this worked :

du -ab ~/<directory>

where <directory> contained possibly duplicated files.

I'm a bit puzzled that you use du(1) for finding duplicate
files, because that tool will only give you the size and the
file name.
To find dupes, you also eventually need the owner/group and
permission bits - so stat(1) would be better here.
Finally, it's obvious that you also need to consider the
content of the files.
All that given: of course you can use du(1) to get some of
the information, but to collect also the other necessary bits
sounds inherently racy.

To compare the content of more than one directory I
needed the complete filepath to uniquely identify
files.

Here's my question : can I rely on this feature to
continue into the indefinite future?

As you did not show what exactly you want to rely on, it's a bit
hard to guess.
Definitely `du -ab ~/<directory>` may output the entries in a
different order.

Regarding the per-line output, GNU coreutils' du(1) currently [1] uses

  "%d\t%s\n", <size>, <filename>

while POSIX [2] requires a blank instead of a tab:

  STDOUT

    The output from du shall consist of the amount of space allocated
    to a file and the name of the file, in the following format:

    "%d %s\n", <size>, <pathname>

Well, other du(1) implementations like Solaris 11, MacOSX 12, AIX, FreeBSD
and OpenBSD are also using '\t' before the file name, so it seems more
likely that a future POSIX specification may adapt to reality rather than
all implementation changing their behavior in this regards.

Future is always hard to predict, but asking for "indefinite future" is
quite a too loooong time ...

[1] 
https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/du.c?id=f2e3234301#n388
[2] https://pubs.opengroup.org/onlinepubs/9799919799/utilities/du.html

Have a nice day,
Berny

Reply via email to