Hi,
one approach is to sort that files first, and work with sorted files - you
then need to read them only once.
Second approach is to load smaller file into memory - to create a has with
something like
while (<FILE1>) { chomp; $found1{$_}++; }
and then read second file and compare it:
while (<FILE2>) {
chomp;
if ($found1{$_}) {
$found_both{$_}++;
} else {
print "$_\n";
}
}
foreach (keys %found1) {
print "$_\n" unless $found_both{$_};
}
but this will consume lot of memory.
On Wednesday 06 June 2001 17:46, Steve Whittle wrote:
> Hi,
>
> I'm trying to write a script that removes duplicates between two files and
> writes the unique values to a new file. For example, have one file with the
> following file 1:
>
> red
> green
> blue
> black
> grey
>
> and another file 2:
>
> black
> red
>
> and I want to create a new file that contains:
>
> green
> blue
> grey
>
> I have written a script that takes each entry in file 1 and then reads
> through file 2 to see if it exists there, if not, it writes it to a new
> file. If there is a duplicate, nothing is written to the new file. The real
> file 1 I'm dealing with has more than 2 million rows and the real file 2
> has more than 100,000 rows so I don't think my method is very efficient.
> I've looked through the web and perl references and can't find an easier
> way. Am I missing something? Any ideas?
>
> Thanks,
>
> Steve Whittle
--
Ondrej Par
Internet Securities
Software Engineer
e-mail: [EMAIL PROTECTED]
Phone: +420 2 222 543 45 ext. 112