Hi list,

I've already emailed Ben but I forgot to email the list also. Below is a
small script for a *nix system to find and pretty print a list of duplicate
files. Maybe that will help someone trying to do the same thing.

--Josh

---cut--- COMMAND LINE ---cut---

find ./perl-5.6.1 -type f -ls | perl dup_find.pl

---cut--- dup_find.pl ---cut---

#!/usr/bin/perl -w
use strict;

my %files;
#
# expects input from a 'find . -type f -ls' command like so:
#
# 2326616    4 -r--r--r--   1 josh     users        3619 Jan  9 11:46
../perl-5.6.1/lib/XSLoader.pm
#

while(<>) {
  chomp;
  my $line = $_;
  my @f = split(' ', $line, 11);
  my ($file_name) = $f[10] =~ m/\/([^\/]+)$/;
  my $sig = $file_name . $f[6];
  if(exists $files{$sig}) {
    push @{$files{$sig}{'dups'}}, $line;
  } else {
    $files{$sig}{'orig'} = $line;
    $files{$sig}{'dups'} = [];
  }
}

foreach my $sig (sort keys %files) {
  my $orig = $files{$sig}{'orig'};
  my @dups = @{$files{$sig}{'dups'}};
  foreach ($orig, @dups) {
    s/^\s+//;
    s/\s+$//;
  }
  if($#dups != -1) {
    print "File:      $orig\n";
    print "Duplicate: ";
    print join("\nDuplicate: ", @dups);
    print "\n\n";
  }
}

---cut--- OUTPUT ---cut---

File:      1310726   56 -r--r--r--   1 josh     users       49651 Apr  6
2001 ./perl-5.6.1/ext/B/B/C.pm
Duplicate: 3653636   56 -r--r--r--   1 josh     users       49651 Apr  6
2001 ./perl-5.6.1/lib/B/C.pm

File:      1310727   60 -r--r--r--   1 josh     users       56243 Mar 19
2001 ./perl-5.6.1/ext/B/B/CC.pm
Duplicate: 3653641   60 -r--r--r--   1 josh     users       56243 Mar 19
2001 ./perl-5.6.1/lib/B/CC.pm

File:      1310728   28 -r--r--r--   1 josh     users       25562 Apr  8
2001 ./perl-5.6.1/ext/B/B/Concise.pm
Duplicate: 3653638   28 -r--r--r--   1 josh     users       25562 Apr  8
2001 ./perl-5.6.1/lib/B/Concise.pm

....

----- Original Message -----
From: "Ben Crane" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, January 10, 2002 8:39 AM
Subject: traversing a file tree...one step further


> Hi list,
>
> I have got a program running that opens a text file,
> finds file names within the txt file and then runs a
> file::find to determine the location of these files
> (within the same directory). My next question is: If i
> want to write a list of all the files within more than
> one directory how do I do it.
>
> I have the initial start directory, it locates all the
> files within it and prints them out. but within the
> directory are more subdirectories...what I want is to
> produce a list of every file within the main directory
> and the sub directories.
>
> why? the files in these directories change on a
> constant basis and I want a txt file (updated every
> day) to determine what's there and what isn't...its
> part of our corporate website and simple file
> management is becoming very hard.
>
> I was thinking of using file::depth but am not
> entirely sure if it's the right solution. My next idea
> was to put a list of sub directories in an array and
> then loop through the array opening each respective
> sub direc and printing the files within...at the
> moment my text file returns a set of files within the
> directory and a list of sub directories...if this
> method is simple, how do I dump info into an array for
> later use (the array will have data added to it whilst
> inside a foreach(..) loop...
>
> Any/all help would be appreciated.
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to