> From: Dave Sherohman [mailto:d...@sherohman.org] > Sent: Tuesday, February 03, 2009 11:25 AM > Subject: Re: Slow Script > > On Tue, Feb 03, 2009 at 06:14:48PM +0100, Gorka wrote: > > Hi! I've got a perl script with this for: > > > > for (my $j=0;$j<=$#fichero1;$j++) > > { > > if (@fichero1[$j] eq $valor1) > > { > > $token = 1; > > } > > } > > > > The problem is that fichero1 has 32 millions of records and moreover > I've > > got to repeat this for several millions times, so this way it would take > > years to finish. > > Does anybody know a way to optimize this script? Is there any other > linux > > programing language I could make this more quickly whith? > > Thank you! > > Although the Perl could definitely be optimized (and you've already been > shown one way to do so), your core issue is that you're doing several > million passes over 32 million records. That's not going to be fast in > any language. (Even if you can check a million records per second, > that's 32 seconds per pass, or about 6 hours for 1,000 passes, or just > over a year for a million passes.) [snip]
I was just thinking that as well. Does the OP have multiple boxes he can run this on? This could easily break down into a parallel process either by manual or programmatic assignment. Splitting up the parallel task is pretty easy; Google even has a shell script for easy parallel processing [1]. Of course there are a fair bit of If's in this. (If there are resources. If the data can be split/shared easily. Ect Ect.) If not, Dave's idea for a database is a good idea too. ~Stack~ [1] http://code.google.com/p/ppss/ Note: you will probably need to do a fair bit of tweaking for this but the ideas are what will be most useful to you anyway. -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org