Hi - > -----Original Message----- > From: Madhu Reddy [mailto:[EMAIL PROTECTED] > Sent: Saturday, February 22, 2003 11:12 AM > To: [EMAIL PROTECTED] > Subject: Out of memory while finding duplicate rows > > > Hi, > I have a script that will find out duplicate rows > in a file...in a file i have 13 millions of > records.... > out of that not morethan 5% are duplicate.... > > for finding duplicate i am using following function... > > while (<FH>) { > if (find_duplicates ()) { > $dup++ > } > > } > > # return 1, if record is duplicate > #returns 0, if record is not duplicate > sub find_duplicates () > { > $key = substr($_,10,10); > if ( exists $keys{$key} ) { > $keys{$key}++; > return 1; #duplicate row > } else { > $keys{$key}++; > return 0; #not a duplicate > } > } > --------------------------------------------- > here i am storing 13 millions into hash... > I think that is why i am getting out of memory..... > > how to avoid this ? > > Thanx > -Madhu >
Yeah, Madhu, you are treading on the edge of memory capabilties... You may need to use a database (MySQL comes to mind), and write a key-value table that could accomplish your task as a hash would, and you could handle as many records as your disk space allows. Do you currently have a database installed? If you are running on Windows, even Access would work. Have you used the perl DBI (CPAN) interface? Just some thoughts... Aloha => Beau; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]