On Mon, 10 Feb 2003 17:53:10 -0800, Madhu Reddy wrote:

>    I want find a duplicate records in a large file....
> it contains around 22 millions records.....

I also feel like Wiggins that a database is the right way to solve it.

On the other hand, 22 millions isn't that big on modern computers. The
only problem is that with Perl's extra overhead for each variable a hash
with all unique items would become to big too handle it.

One way to solve it is to use the CPAN module
DB_File
(In fact, it's only a hidden database - Berkeley DB)

Another way is to use a module that uses less memory wasting hash
structures, e.g.
Tie::GHash
Tie::SubstrHash

The remaining part should be quite the same like described in
perldoc -q duplicate.


Best Wishes,
Janek

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to