On 2/13/25 15:33, bug.repor...@mail.sheugh.com wrote:
I am running some code to identify duplicate files within a user filesystem.
On my Linux distribution, there are at least the following tools to find and eliminate dupes: duperemove fdupes freedup jdupes
This proved to be a somewhat challenging task as it was not immediately obvious how the use of the du command would meet my needs. I eventually found that this worked : du -ab ~/<directory> where <directory> contained possibly duplicated files.
I'm a bit puzzled that you use du(1) for finding duplicate files, because that tool will only give you the size and the file name. To find dupes, you also eventually need the owner/group and permission bits - so stat(1) would be better here. Finally, it's obvious that you also need to consider the content of the files. All that given: of course you can use du(1) to get some of the information, but to collect also the other necessary bits sounds inherently racy.
To compare the content of more than one directory I needed the complete filepath to uniquely identify files. Here's my question : can I rely on this feature to continue into the indefinite future?
As you did not show what exactly you want to rely on, it's a bit hard to guess. Definitely `du -ab ~/<directory>` may output the entries in a different order. Regarding the per-line output, GNU coreutils' du(1) currently [1] uses "%d\t%s\n", <size>, <filename> while POSIX [2] requires a blank instead of a tab: STDOUT The output from du shall consist of the amount of space allocated to a file and the name of the file, in the following format: "%d %s\n", <size>, <pathname> Well, other du(1) implementations like Solaris 11, MacOSX 12, AIX, FreeBSD and OpenBSD are also using '\t' before the file name, so it seems more likely that a future POSIX specification may adapt to reality rather than all implementation changing their behavior in this regards. Future is always hard to predict, but asking for "indefinite future" is quite a too loooong time ... [1] https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/du.c?id=f2e3234301#n388 [2] https://pubs.opengroup.org/onlinepubs/9799919799/utilities/du.html Have a nice day, Berny